Thoughts on (not) automating the setup of our first cloud server

May 16, 2024

I recently set up our first cloud server, in a flailing way that's probably familiar to anyone who still remembers their first cloud VM (complete with a later discovery of cloud provider 'upsell'). The background for this cloud server is that we want to check external reachability of some of our systems, in addition to the internal reachability already checked by our metrics and monitoring system. The actual implementation of this is quite simple; the cloud server runs an instance of the Prometheus Blackbox agent for service checks, and our Prometheus server performs a subset of our Blackbox service checks through it (in addition to the full set of service checks that are done through our local Blackbox instance).

(Access to the cloud server's Blackbox instance is guarded with firewall rules, because giving access to Blackbox is somewhat risky.)

The proper modern way to set up cloud servers is with some automated provisioning system, so that you wind up with 'cattle' instead of 'pets' (partly because every so often the cloud provider is going to abruptly terminate your server and maybe lose its data). We don't use such an automation system for our existing physical servers, so I opted not to try to learn both a cloud provider's way of doing things and a cloud server automation system at the same time, and set up this cloud server by hand. The good news for us is that the actual setup process for this server is quite simple, since it does so little and reuses our existing Blackbox setup from our main Prometheus server (all of which is stored in our central collection of configuration files and other stuff).

(As a result, this cloud server is installed in a way fairly similar to our other machine build instructions. Since it lives in the cloud and is completely detached from our infrastructure, it doesn't have our standard local setup and customizations.)

In a way this is also the bad news. If this server and its operating environment was more complicated to set up, we would have more motivation to pick one of the cloud server automation systems, learn it, and build our cloud server's configuration in it so we could have, for example, a command line 'rebuild this machine and tell me its new IP' script that we could run as needed. Since rebuilding the machine as needed is so simple and fast, it's probably never going to motivate us into learning a cloud server automation system (at least not by itself, if we had a whole collection of simple cloud VMs we might feel differently, but that's unlikely for various reasons).

Although setting up a new instance of this cloud server is simple enough, it's also not trivial. Doing it by hand means dealing with the cloud vendor's website and going through a bunch of clicking on things to set various settings and options we need. If we had a cloud automation system we knew and already had all set up, it would be better to use it. If we're going to do much more with cloud stuff, I suspect we'll soon want to automate things, both to make us less annoyed at working through websites and to keep everything consistent and visible.

(Also, cloud automation feels like something that I should be learning sooner or later, and now I have a cloud environment I can experiment with. Possibly my very first step should be exploring whatever basic command line tools exist for the particular cloud vendor we're using, since that would save dealing with the web interface in all its annoyance.)

Comments on this page:

By Daz at 2024-05-16 23:33:55:

Enjoy reading your blog and learning from you, thanks!

I strongly encourage you to install your cloud provider's CLI and start with that route for (re)creating your VM(s). The command(s) give you something to commit to git and are often functionally richer than the web-based tools.

I've generally stuck with Bash scripting CLI tools (several clouds) and avoided using Terraform and Pulumi and their ilk.

I think cloud deployment tools are great but you immediately have 2 problems: learning the cloud; and learning the cloud deployment tool (and hoping it implements the functionality you need) ;-)

It's also something of a red-herring that deployment tools enable cloud portability because the clouds are so different beneath the top-level concepts of VM, blob storage, database etc. that there's little reuse.

I ended up not using our cloud provider's overarching configuration tool, because it wants to set up siloed "applications" and we have a database shared between a few facets (front end, back office, CRM). It wasn't clear how to co-host those in a "one instance, one purpose" setup.

I first duplicated my manual setup in a shell script. These days, a stub runner goes in the launch data, which pulls stage 2 out of blob storage, and runs it. No more reliable network required on the client. The cloud provider has also added an "automation scripts" feature that run from the cloud side, so we use that, but still not the full "as code" system.

We've also stuck to individual self-managed instances, so we can fully control our language/library versions, instead of "whatever shows up when the cloud provider has some spare cycles for packaging it."

By Ian Z aka nobrowser at 2024-05-17 18:26:29:

You should check about authentication requirements early on when you start exploring automation.

I only have any experience with this on aws, but I think this is typical: 2fa is required, and they won't be satisfied with storing secrets in the filesystem, you need either a hardware token (e.g. a yubikey) or biometric setup.

-- Ian

By cks at 2024-05-18 19:13:05:

I experimented with this cloud vendor's primary CLI tool, and for me it seems to go through a basically standard OAuth 2 process that resulted in it saving (current) credentials in a magic file under my home directory (which I carefully found, because I wanted to know just where it was keeping this stuff, how well it was protected, and so on). Once it had its token, the CLI tool hasn't re-prompted me for authentication. However, I'm operating in a potentially unusual environment, because my cloud account as a whole is being authenticated through the university's central authentication system, not through the cloud vendor as an individual account. My impression is that very few places combine delegated primary authentication with additional separate MFA; if you want MFA, it's up to you to put it in the primary authentication.

Written on 16 May 2024.
« Turning off the X server's CapsLock modifier
The trade-offs in not using WireGuard to talk to our cloud server »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu May 16 22:52:33 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.