Skip to content
New Report: Get your copy of The 2024 State of the Internet Report! | Download Today
Blogs

Homelab and Break Things

This is a blog post about things that went really poorly. But, that’s okay, because they were supposed to go poorly. Let me explain.

I joined Censys as a Site Reliability Engineer in December of 2023. My career in technology has been heavily influenced by cloud computing changing the specific expertise that you have to be concerned about as an engineer. Over the last fifteen years, many of us spent much less time running cabling in a data center and a lot more time fighting with a cloud console of one variety or another. Yet, like fashion, everything in this industry is cyclical and more folks are talking every day about the cost savings and freedom benefits of running your services on physical hardware that you own. Censys is certainly no exception: we have been running a portion of our systems on-premises for some time now.

In past roles where I’ve had to work with physical hardware, I’ve been supported by entire teams that manage that directly. Censys is a much smaller organization and my new role required me to understand parts of the stack that I’ve largely been able to ignore before now. So how does someone who has spent much of their career with their head in the “cloud,” so to speak, get up to speed with the concerns of real, physical hardware?

Enter the Censys Professional Development Budget. This is an annual stipend of money for us to use how we see fit (within reason) to further our professional skills. I decided to use part of mine to build out a home lab that I could use to experiment with physical hardware firsthand. Today I’m going to talk about what I built and some of the more frustrating stumbles along the way.

What’s in a Homelab Anyway?

The practice of “homelabbing” has been around in the technology community. You’ll find numerous Reddit threads of folks talking about their home labs, sharing pictures of them, and more. The term itself refers to the practice of building out a sandbox environment of sorts in your home that allows you to work with problems that are analogous to (if not the same as) the problems you’d face in your professional life.

The limits of what you can learn with this approach are bounded only by budget and your curiosity. What, exactly, each individual chooses to put in their homelab depends on what they’re interested in learning. For myself, I wanted to get hands-on experience with some more advanced networking techniques and running Kubernetes on bare metal hardware. What I built out reflects that.

Today in my homelab I have:

  • My home networking and security gear, which at this point is largely isolated from the lab gear both for security and practical reasons.
  • An Intel i3 NUC running pfSense that is currently the firewall for the Lab.
  • Five Raspberry Pi 4B’s assembled into a Kubernetes cluster.
  • Four of these have hard drives attached to them via USB to provide some permanent storage space that’s resilient to any single node failing.
  • An inexpensive TP-Link Power over Ethernet switch that will deliver power and networking connectivity to my Raspberry Pi’s over the same cable.

Photo of Censys employee Matt Farmer's homelab

On the scale of homelabs, this one is pretty small. It certainly won’t win me any awards for elegant wiring (though I’m working on it). Even still, it has afforded me plenty of opportunities to learn new things and that is the entire point.

I mentioned at the start that this is a blog post about things that have gone poorly, so if you’re looking for advice on how to set up a configuration like this to be useful, you’re probably in the wrong spot. Instead, what follows, is a few of the more interesting tangents that occurred and what I learned from them. If that sounds interesting to you, let’s go.

External Hard Drive Woes

USB external drives are a fairly common phenomenon these days. One of the reasons these drives work as well as they do is a technology called “UAS” – which effectively allows you to read and write data on a USB drive much faster than you could otherwise. One of the first things I wanted to do with my new homelab was to have some storage, so I purchased some drives and started hooking them up.

I purchased some test drives and a test cable for one Raspberry Pi. I moved everything over to the new drive, including the operating system. Everything looked good so I purchased a few more. Everything checked out after connecting the drives and running some read/write speed tests. So I deployed Longhorn to give my Kubernetes cluster some redundant, persistent storage. I was super stoked about everything until I copied the first large file into a Longhorn volume and suddenly everything stopped working. I couldn’t connect to the machine remotely, my files weren’t showing up anymore, and it was a lot like everything just disappeared into thin air until I rebooted the server and everything would come back. Until it broke again.

I had made a critical error, and there’s no way I could have anticipated it ahead of time. Regardless of the brand name that’s on the outside of a computer cable when you purchase it, other manufacturers produce the chips that go inside the cable that allow it to function. If these chips don’t play nicely with your hardware, you’re toast.

This cable I was using to connect my disk to USB didn’t play nice with Raspberry Pi. As a result, it would just periodically die until the power was reset. This is bad enough on its own, but I learned the hard way that if the core operating system is on this external drive when this happens, you’re going to lose all access to this server to figure out what happened until you reboot it. After which the evidence of the problem is usually gone and there’s not much evidence left of the earlier problem.

So, how did I fix this and what have we learned?

Effectively the solution to this problem was twofold.

First, I had to move the operating system back to the SD Card that Raspberry Pi typically expects you to run your operating system on. This means that in the event something goes wrong with external storage, the core server is still functioning and I can at least get in to gather information about the problem.

Second, it turns out external drives with Pi’s are a very involved topic. Ultimately, though, the solution to stop my drive from failing was to enable “quirks mode” for my drive. This would effectively tell the Pi to stop using the faster data transfer speeds in favor of the slower, more stable approach, and voila my external drives started working again and stayed working.

I Need More Power

I decided early on that I wanted my home lab powered using Power-over-Ethernet. Mostly to avoid having to buy a larger power strip with enough plugs to connect five Raspberry Pi’s to wall power. My existing equipment was capable of delivering power to devices, so why not take advantage of that?

One of the first things that I ran into with my new setup was power issues. As you’ll notice from the picture above my setup is quite dense. Aside from the homelab gear, I’ve got to deliver power and connectivity to the devices that run my home network. The switches I had initially planned on using to deliver power couldn’t handle five Raspberry Pi’s at once, which meant they had to get plugged into one of the precious few ports available on my main switch. This led me down the path of trying to figure out how to get more powered ethernet ports for my setup.

At first, the solution seemed simple. Get a new switch from my preferred manufacturer. Unfortunately, some quick investigation revealed I’d need to spend a lot more money than I wanted to make that work (~ $350-700). Then I hit paydirt – a switch from a different vendor that checks most of my boxes for $85 on Amazon. Score!

Okay, I know what you’re thinking, and truthfully I was prepared for a lot of differences between an $85 switch and a $350 one. I wasn’t, however, prepared for the biggest problem I’ve run into. One afternoon recently, I was running some cable in my closet to add a new component and suddenly heard a very loud “CLICK.” Everything had lost power and was in the process of rebooting. Strange…. I didn’t think I bumped the switch on the power strip, but maybe I did. After verifying the main power switch was in the correct position, I moved things around so I couldn’t hit the switch on accident, and continued running my cable. “CLICK.” Okay, so it’s not the power strip.

So, what happened?

When you’re looking at saving money in the physical world, there are lots of places a manufacturer can make mistakes in their process or cut corners to get their price down. In this case, it looks like the folks who built this switch didn’t get the sizing quite correct on the power cable that connects it to the wall. So, anytime a light wind (or an Ethernet cable) brushes past the power connection it causes everything to reset.

After a bit of thinking on what to do here, I’ve decided that this is one of those problems that you don’t necessarily have to solve, but you can mitigate. Oftentimes things in the real world with physical hardware just don’t line up how they do in a specification. Sometimes you return the broken equipment and get something new, but others you decide it’s a better use of your time to mitigate the problem and move on. That’s what I’ve done here. The cable has been reconfigured to minimize the number of things it can come in contact with, which should in turn mean that unexpected restarts from it getting shaken should, likewise, become more uncommon.

More Breakage, More Learning

These are just a select few of the shenanigans that I’ve gotten into working on my homelab setup. If you see me at a conference and want to hear more, I’d be thrilled to share the time that I caused every piece of networking equipment in my house to throttle their fans to maximum and then reboot in sequence. Or the time I had to figure out the exact format of a floppy disk for a legacy remote management system to work. The list goes on.

If you’ve got access to a professional development budget, I cannot recommend enough getting curious, buying some hardware, and trying to build something. And if you don’t, now is probably a good time to mention that we’re hiring.

Censys Careers

About the Author

Matt Farmer
Censys Principal Site Reliability Engineer
Matt is a software engineer focused on high-performance backend applications. With experience across consulting, startups, and larger companies, he’s known for executing technical roadmaps and mentoring engineers. A dedicated family guy, he loves spending time with his kids when he’s not exploring new technologies.

Similar Content

Back to Resources Hub
Attack Surface Management Solutions
Learn more