Defaulting on my technical debt
March 12, 2023
Careful, this is a long one!
I had to have some mandatory downtime for this website and a few of my clients websites a week or two ago. Reasons why could of been avoided I think, but being honest I did not want to invest a huge amount of effort.
If you have seen my release notes and the code I maintain, you will notice a lot of Terraform modules. I moved towards Terraform in my more, cloud heavy personal world to have a bit of consistency. I was annoyed with things being different between sets of resources that were trying to do the same thing (think web hosting resources) and it meant having to always think before making a change to something was there something I was missing. I started to create modules that I could then consume, to provision a full stack of the resources needed for specific purposes. Once I had those created, I could just have a standard footprint everywhere.
The problems begin
My issues started well before needing downtime, in my approach to how I wrote the modules. I effectively created a dependency hell with my networking module. To build a CloudFront distribution, you typically need an S3 bucket for the distribution. If you’re being more granular, you also need identity resources to enable CloudFront to access the S3 bucket. Nowadays that’s Origin Access Control, but at the time Origin Access Identity was the only option. So I thought it would be great if all the dependencies were included in the one high level module.
Everything fell apart quite quickly as work continued on the child modules or as bugs were discovered. If something went wrong in the storage module, I had to fix it, bump a new release tag. Then, I would have to open the networking module, bump the internal dependency and release the networking module. It got quite tiresome, but I did not know better. That is till I read about dependency injection. I came across it in the context of Java Spring, but I figured I could do something similar where I just have one module accept an input for a specific object from another. Done in such a way too that, I did not have to use my own module if I really did not want to.
This was a breath of fresh air to me. I had a change coming up on the horizon when AWS released Origin Access Control as a successor to Origin Access Identity. While there currently was no end of life date for OAIs, I wanted to use the opportunity to make the change. This is where my issues began, as the reaper that is technical debt, came knocking on my door, demanding my loan be paid in full, with interest attached.
Analysis
I attribute a lot of my issues to three factors - Terraform state, juggling too many things and AWS constraints. Lets cover Terraform state and the juggling as ironically, they are very coupled. When I originally started this work, I was using GitLab for my source control, CI/CD and Terraform management. You can publish modules and have GitLab manage your state for you. At this point, I was approaching the ceiling of the free tier with GitLab in terms of storage. I also found that certain Terraform state files appeared to have grown large enough to exceed my humble 2GB memory self hosted runners. Being faced with having to increase the instance sizes along with potentially needing a premium Gitlab license, I decided to swallow my pride once more and migrate things back to GitHub.
Getting over the line
Attempting to export state from GitLab proved troublesome primarily to a lack of knowledge, but also needing to jump through some hoops due to not being able to create a Group (Gitlab verb for an organisation in GitHub) Personal Access Token to more easily pull down the state locally. Transitioning things to GitHub Actions also took some time, but making the decision to not self host the runners again, was the wise one here. I was expecting to be spending almost $1000 annually if I stuck with my original GitLab config. No matter what ones opinions of Microsoft are, $40 a year for GitHub Pro is far more appealing.
Trying to migrate over 120 repos and building new systems, while originally trying to improve my Terraform setup, was shall we say taxing. I have other projects too and being a bit too wide, was my downfall and I think one of the reasons why I was on this little break from self hosting. Trying to do too much, on top of everything else in terms of my personal and work life, was very dumb. In the end I did develop GitHub Actions pipelines and combined with Hashicorp Cloud, I at the very least had things moved away from GitLab. Now, it was on to solving the issues I had with existing state and some AWS constraints.
Handling Terraform state can be a pain. Once I exported my state from GitLab, imported into Hashicorp Cloud, I encountered a new issue. Since I was in the middle of a transition from internal dependencies to dependency injection, all existing Terraform managed resources were using code files with those dependencies. So, once the Terraform state was migrated, I was stuck with errors that Terraform could not access module versions on my GitLab account, even though the top level module was now on HCP. Trying to create just a new set of resources, using the new code, was not possible due to AWS constraints with CloudFront distributions and origins. Trying to use a totally different provider temporarily, was impossible to CNAME since Route 53 obeys DNS related RFCs hard and fast. In the end, the fastest and most straight forward solution, was the total deletion.
Conclusion
So in the end, I informed my clients of the impending downtime, destroyed the Terraform resources, rebuilt them all with the new dependency injection approach and moved on with my life. I think it was not a huge amount of downtime, maybe 15 to 20 minutes. Where these websites have run flawlessly for years, I would like to think I have not lost any confidence in me from my clients. It was just frustrating to me it had to happen at all. It was an amazing learning experience, for that I am grateful. My cost savings measures have extended beyond just GitLab and I am hoping to have an extremely small cloud bill going forward. I am looking forward to sharing more on what has changed on that front for me, in the near future.
Thank you!
You could of consumed content on any website, but you went ahead and consumed my content, so I'm very grateful! If you liked this, then you might like this other piece of content I worked on.
Another time I broke somethingPhotographer
I've no real claim to fame when it comes to good photos, so it's why the header photo for this post was shot by Marc-Olivier Jodoin . You can find some more photos from them on Unsplash. Unsplash is a great place to source photos for your website, presentation and more! But it wouldn't be anything without the photographers who put in the work.
Find Them On UnsplashSupport what I do
I write for the love and passion I have for technology. Just reading and sharing my articles is more than enough. But if you want to offer more direct support, then you can support the running costs of my website by donating via Stripe. Only do so if you feel I have truly delivered value, but as I said, your readership is more than enough already. Thank you :)
Support My Work