A Pyrrhic Victory

January 30, 2025

A Pyrrhic Victory

“a victory that is not worth winning because the winner has lost so much in winning it”

It must be said that emotions on a self hosting follow a sine wave pattern of “its so over” for dips, to “we are so back” for highs. We’ve had several problems that I was able to solve, (the high) and then feeling like everything was done, encountered new ones that have derailed things entirely (the low). The title of this blog post is I think the perfect way of describing how my weekend went with trying to fix my local cluster once and for all.

So, on the 20th all the kit that I needed for my Raspberry Pi’s arrived. I now had my fifth Pi, along with five SSDs and USB to SATA adapters. I spent the evening on that day, re-flashing the Pi’s to boot from USB, then image them with Ubuntu 24.04 LTS. That process overall was quite lovely, the Raspberry Pi Imager is nice software and I can already lay down defaults like my username and SSH key. I got the OS installed and static IPs assigned locally, to have all the Pi’s ready for eventual onboarding into my kube cluster.

That was about all I managed during the week. With a giant storm on the way, which did knock out our power and internet for half of Friday, I was not inclined to try and make changes. But I had grand ambitions for my weekend. The aim was to get the cluster fully swapped over to the Pi’s for non-worker based activities. I effectively wanted my cluster to be done and dusted, I wanted this chapter to be finally closed. I dug deep and early on Saturday, began the work. Imposter syndrome is something that many suffer from, I think myself included at times. Especially on the self hosting front, it’s like, how can I struggle so much with something like this considering what I do in my day job?

On Saturday though, I had that moment of transcending that feeling, where there was no real gotchas, things just worked and I felt like a god. I slowly onboarded each Pi into the cluster as a control plane node, with each one just coming online and working. Getting everything into the cluster was as smooth as I could have hoped. I even revisited Traefik and got that working as a L4 load balancer for my cluster extremely quickly with minimal troubleshooting. It felt amazing. I then began the process of offboarding the NUCs, from being “do absolutely everything” machines to just being workers. This was where I hit a first what I thought was a minor snag, but at the time it was about lunch time I had been working on this since effectively 7 or 8 in the morning. So I elected instead to stop for the day and do some other things, enjoy the wins that I had without tainting the day with some troubleshooting. It was a good call. Unfortunately, it was not clear at the time that these issues would be the start of a downfall.

A Pyrrhic victory, is essentially a victory where the victor has won, but at such great cost it may as well be a defeat. I think it is the perfect way to describe how Sunday went. Revisiting the issue I encountered the day prior, I realised that quite simply, I was still using the RKE2 server service, where I should be using the agent service. So I set about taking one NUC out of the server, wiping it and then rejoining it to the cluster as an agent. This process worked just fine till I had all three machines as workers. I felt quite good, this was it, the finishing line. Rather critical issues however, were lurking right around the corner.

Pods were failing to get started on one of the Nodes due to no CIDR being available for the Pods. Which is a rather unexpected error. At first I thought it was a red herring, so I started just killing the Pods so that they would reschedule to a different, seemingly working Node. Pods that were deleted, were simply not coming back. Not only that, they were not even showing up in a Pending state, there was just nothing being reported on the cluster whatsoever. No events, no reaction to workloads changing. I even drained and removed a node and Daemonsets were still desiring three Pods for a cluster with only two worker Nodes. What I eventually found was that for some reason, the worker Nodes were failing to initialise their CNI plugin. With no real idea as to why a component that is just using the defaults provided by RKE2, was just not showing up.

Research online did not yield many answers, I tried reverting from using the agent binary to the server binary with all the roles disabled to see if that was the issue, since really that was the only change. But no dice. I was thinking was it perhaps some traffic being dropped by the load balancer due to a port not being open. But, machines running RKE2 use the load balancer endpoint to register initially, then they just discover other machines running over the network and don’t route through the load balancer again. API server requests, were also working just fine. I read release notes for my version of RKE2 since I knew I was a bit behind, to see if perhaps there was some one time bug in my version that was this exact issue. That was also a dead end. The closest thing I could find was that sometimes CNI fails due to server nodes not having enough resources. Recommended system specs ask for 8GB of RAM for a server node, and my Pi’s only have 4GB, which is in fact the minimum for RKE2. I did not see pending Pods and seemingly some services, worked fine for a time. Due to the state of things, the only path forward seems like another cluster wipe and rebuild.

The feelings experienced at this time sufficed to say, were not great. I truly felt that I was due a break and would just have a working cluster and I could finally just exhale and leave this journey concluded. That is not the case and I am happy to admit that I feel moments of just giving up on all of it. There is far simpler ways of doing things locally, one could even go further and just simply give up on self hosting entirely. I note that when we experienced yet another power cut at the start of the week, I did not resurrect several services for a few days whereas I’m normally quite prompt at doing so. But I think there are some critical things that keep me going with this.

  1. I believe I truly enjoy puzzle solving, and this kind of thing is the ultimate endeavour of puzzle solving. The things that I learn during this, both technically and about myself, are always worth it. Even if it can be incredibly frustrating.
  2. There is so many possibilities around self hosting and I truly believe being in control of my own services that run locally, are private by default and are additive to my life, is a goal worth pursuing.
  3. A healthy dose of stubbornness and possibly also masochism.

So, I may be down, but I am certainly not out! I will take the wins I experienced, cherish them, and I shall go back to the drawing board and figure out a path forward that will lead me to the Kubernetes based infrastructure that I desire. But certainly a small break first to ensure I can have a bit of a reset before going back into the thick of it!

Thank you!

You could of consumed content on any website, but you went ahead and consumed my content, so I'm very grateful! If you liked this, then you might like this other piece of content I worked on.

The previous post in this mini series

Photographer

I've no real claim to fame when it comes to good photos, so it's why the header photo for this post was shot by Luis Villasmil . You can find some more photos from them on Unsplash. Unsplash is a great place to source photos for your website, presentation and more! But it wouldn't be anything without the photographers who put in the work.

Find Them On Unsplash

Support what I do

I write for the love and passion I have for technology. Just reading and sharing my articles is more than enough. But if you want to offer more direct support, then you can support the running costs of my website by donating via Stripe. Only do so if you feel I have truly delivered value, but as I said, your readership is more than enough already. Thank you :)

Support My Work

GitHub Profile

Visit My GitHub

LinkedIn

Connect With Me

Support my content

Support What I Do!

My CV / Resume

Download Here

Email

contact at evanday dot dev

Client Agreement

Read Here