What containers do and don't help you with

November 23, 2020

In a comment on my entry on when to use upstream versions of software, Albert suggested that containers can be used to solve the problems of using upstream versions and when you have to do this anyway:

A lot of those issues become non-issues if you run the apps in containers (for example Grafana).

Unfortunately this is not the case, because of what containers do and don't help you with.

What containers do is that they isolate the host and the container from each other and make the connection between them simple, legible, and generic. The practical Unix API is very big and allows software to become quite entangled in the operating system and therefor dependent on specific things in unclear ways. Containers turn this into a narrow interface between the software and the host OS and make it explicit (a container has to say clearly at least part of what it wants from the host, such as what ports it wants connected). Containers have also created a social agreement that if you violate the container API, what happens next is your own fault. For example, there is usually nothing stopping you from trying to store persistent data within your theoretically ephemeral container, but if you do it and your container is restarted and you lose all the data, you get blamed, not the host operators.

However, containers do not isolate software from itself and from its own flaws and issues. When you put software in a container, you still have to worry about choosing and building the right version of the software, keeping it secure and bug free, and whether or not to update it (and when). Putting Exim 4.93 in a container doesn't make it any better to use than if you didn't have it in a container. Putting Grafana or Prometheus Pushgateway in a container doesn't make it any easier to manage their upgrades, at least by itself. It can be that the difficulties of doing some things in a container setup drive you to solve problems in a different way, but putting software in a container doesn't generally give it any new features so you could always have solved your problems in those different ways. Containers just gave you a push to change your other practices (or forced you to).

Containers do make it easier to deal with software in one respect, which is that they make it easier to select and change where you get software from. If someone, somewhere, is doing a good job of curating the software, you can probably take advantage of their work. Of course this is just adding a level of indirection; instead of figuring out what version of the software you want to use (and then keeping track of it), you have to figure out which curator you want to follow and keep up with whether they're doing a good job. The more curators and sources you use, the more work this will be.

(Containers also make it easier and less obvious to neglect or outright abandon software while still leaving it running. Partly this is because containers are deliberately opaque to limit the API and to create isolation. This does not magically cure the problems of doing so, it just sweeps them under the rug until things really explode.)


Comments on this page:

By rwoodsmall at 2020-11-24 01:39:08:

Putting ... a container doesn't make it any easier to manage their upgrades, at least by itself

Yes, it does. Once the work of containerizing the application is done, maintenance becomes trivial. Even if building from upstream sources, the process can be encoded, pipelined, and maintained as code in version control. Things that seem to matter to the OS you're using - kernel, libc, system compiler, package manager, etc. - become non-issues, as long as the substrate to run containers has a new enough kernel and container management system. Containerized OS management itself becomes a matter of updating the upstream image and fixing deps. Since you're already working as code, this can be automated in a sane way to ensure your deps are up-to-date.

Furthermore, separating build and locked-down, minimal-requirement deployment container images limits blast radius on compromised deployments; it's hard to break out of a container that's not running as root if there's no real shell or userspace. Testing becomes easier - things like port selection and user management are abstractable inside containers in a way that dumping an install on an operating system isn't. Things like load balancing multiple instances of applications become easier, or even possible without a whole fleet of servers, and the load balancer itself can be containerized.

I think you have a dim view of containers. They are a very different way of thinking about developing, building and deploying software. We - developers, operations, devops, sysadmins, etc. - must own the stacks we deploy and/or provide for others, the whole way down. Containers make it easier to own artifacts and processes holistically. Not something I'd shrug off at this point.

By Ruben Greg at 2020-11-24 06:11:02:

I am not an expert in Exim or typical Docker type containers. But perhaps you need to look at LXD system containers - with persistent data, and snap-upstream (OK - it has other issues). Not specifically your issue, but in academic world, I found that people like to run whatever version/whatever distro as the scientist/turned/developer uses. For these system containers are amazing.

By Albert at 2020-11-24 10:09:34:

You specifically mentioned the case where's no package for your OS provided by the distro. Docker solves that, even more so if there are official Docker images for the software (most often there are, at least for Grafana, Prometheus and rspamd, which you mentioned).

Outdated packages? You can put them in a container and keep using them until the end of time (within reason), without having to worry about whether updating the host OS or libraries will break stuff.

Packages that exist for your OS but the distro-provided version is too old compared to upstream? Same thing, use the latest and greatest in a container, if you so wish, and forget about the host OS.

Plus everything rwoodsmall already said.

By cks at 2020-11-24 10:41:25:

As people have said here, containers do have real advantages that are enabled because of how they isolate software from the host OS and make what it provides and requires legible (eg, what ports it maps). But those advantages are not in the realm of curating software and they don't solve the curation problems for you.

My view is that the mechanical process of 'building' and 'deploying' software is the easiest part. The hard part is curating what you are building and deploying, tracking when you need to update, and handling the changes required by the new version of the software. Containers do not help you update from Exim 4.92 to a later Exim version; either way, you need significant configuration file changes. Containers don't help you deal with changes in Grafana configuration and operation from Grafana version to Grafana version (or any database format problems you may have if you jump over a significant version range), nor do they migrate your Prometheus Pushgateway persistent storage if you jump from v0.9.0 to v1.0.0 (you have to know to run v0.10.0 for a bit first). And containers definitely don't help you if Exim 4.92 has a security issue and you have to upgrade, one way or another.

(Containers may limit the damage of having a security issue, but if Exim itself is compromised the blast radius is 'all of the email it processes and TLS keys it has access to', which is bad all on its own.)

Containers by themselves don't enable CI and testing on software packages not already built for it, although I expect they make it easier because the installed application is more contained and can be provided as a black box. You could build automated tests for your containerized Exim configuration, but you could also build them for an un-containerized one.

Containers do make it easier to keep running ancient versions of applications but my view is that this is merely postponing the pain and making it worse. Sooner or later you are going to have to upgrade (or abandon the software entirely and rebuild from scratch). Zombie applications aren't doing you any more favours than zombie OS versions are.

(Containers also make what you're running and what security issues it may have less legible, since they're black boxes. See all of the various startups and tools that are out there for scanning containers to find security vulnerabilities, track deployed software versions, and so on.)

Containers would mean though that the decision about when to move to that troublesome exit can be made entirely independently of when your OS’s support windows start and stop.

By cks at 2020-11-24 16:44:00:

This is true in theory but may not be true in practice, because often the OS's support window is (much) longer than the upstream project's. Ubuntu will be supporting or at least patching security issues in Exim 4.92 for much longer than the Exim project (which has already stopped doing so). Containers do change which OS you care about, though; you now care about the base OS source for the container instead of the host OS.

Containers do have the advantage of letting you move between OSes in two ways. First, you can make an up front choice of what OS base to use for each container given your knowledge of their support period and quality of implementation, without being locked to your host OS. Second, you can in theory move between container OSes but keep the same software version by rebuilding your container on a different base OS if that base OS does better at supporting your version (or will support it for longer).

By rwoodsmall at 2020-11-25 02:37:27:

None of the anecdotes provided negate anything I - or others - have said.

Your config(s), both build and runtime, need to be in version control too, with proper tagging to track versions/dates/etc. All the better if using a configuration management system. None of this is theoretical; I run software like this - production, qa, dev, test - a lot of folks do. The concept of curation is one of the reasons application containers exist: decoupling the application/service from the machine by abstracting the hard parts (patching, runtime data, etc.) is extremely powerful. If you're doing this by hand, that's valuable time that can be pulled back by automating and changing the unit of compute.

From the operating system container side, someone else mentioned lxd. I have a handful of CentOS 6 instances that I can't decom - a couple of which have already been converted to lxc instances running on newer CentOS machines, no fuss, and will be sequestered from the internet at large. There is software that may be zombied by the vendor, but someone somewhere is still paying for it and can't afford to lose it or change their process.

I'm not arguing for you to change everything about your workflow. But there are quality of life improvements that are hard to come by without containerizing. From an academic view, this is cool and will help with bedrock understanding of the way the operating system works. Running an entirely separate, constrained distribution as part of a shell pipeline or using a CI for building/testing software in multiple versions of multiple distributions on multiple architectures (via qemu-system) is mind-expanding. At the same time, there's a fence around complexity that VMs and bare metal intrinsically bring to an environment. Again, not something I'd shrug off at this point.

Written on 23 November 2020.
« My views on when you should use the official upstream versions of software
Firefox's WebRender has mixed results for me on Linux »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Nov 23 22:44:05 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.