Configurations can quietly drift away from working over time, illustrated
At this point, we've been running various versions of Ubuntu LTS
for over ten years. While we reinstall individual systems when we
move from LTS version to LTS version, we almost never rebuild our
local customizations from scratch unless we're forced to; instead
we carry forward the customizations from the last LTS version, only
changing what seems to need it. This is true both for the configuration
of our systems and also for the configuration of things we build
on top of Ubuntu, such as our user-run web servers. However, one of the hazards of carrying
forward configurations for long enough is that they can silently
drift away from actually working or making sense. For example, you
can set (or try to set) Linux
sysctls that don't exist any more
and often nothing will complain loudly enough for you to notice.
Today, I had an interesting illustration of how far this can go
without anything obvious breaking or anyone saying anything.
For our user-run web servers, we supply a set of configurations for Apache, PHP, and MySQL that works out of the box, so users with ordinary needs don't have to muck around with that stuff themselves. Although some people customize their setups (or run web servers other than Apache), most people just use the defaults. In order to make Ubuntu version to Ubuntu version upgrades relatively transparent, most of this configuration is central and maintained by us, instead of being copied to each user's Apache configuration area and so on. This has basically worked over all of the years and all of the Ubuntu LTS versions; generally the only version to version change people have had to do in their user-run web server is to run a magic MySQL database update process. Everything else is handled by us changing the our central configurations.
(I'm quite thankful that both Apache and MySQL have 'include' directives in their configuration file formats. You may also detect that we know very little about operating MySQL.)
One of the things that we customize for user-run web server is the
MySQL settings in PHP, because the stock settings are set up to try
to talk to the system MySQL and we don't run a system MySQL (especially
not one that people can interact with). We do this with a custom
php.ini, and that
php.ini is configured in the Apache configuration
in a little
.conf snippet. Here is the current one, faithfully
carried forward from no more recently than 2009 and currently running
on our Ubuntu 16.04 web server since the fall of 2016 or so:
<IfModule mod_php5.c> PHPIniDir conf/php.ini </IfModule>
Perhaps you can see the problem.
Ubuntu 16.04 doesn't ship with PHP 5 any more; it only ships with
PHP 7. That makes the
IfModule directive here false, which means
that PHP is using its standard system Apache
php.ini. For that
matter, I'm not certain this directive was actually working for
Ubuntu 14.04's PHP 5 either.
This means that for at least the past two years or so, people have been operating their user-run web servers without our PHP customizations that are supposed to let their PHP code automatically talk to their MySQL instances. I'm not sure that no one noticed anything but at the very least no one said anything to us about the situation, and I know that plenty of people have user-run web servers with database-driven stuff installed, such as WordPress. Apparently everyone who needed to was able to set various parameters so that they could talk to their MySQL anyway.
(This is probably not surprising, since 'configure your database settings' is likely a standard part of the install process for a lot of software. It does seem to be part of WordPress's setup, for example.)
On the one hand, that this slipped past us is a bit awkward (although understandable; it's not as if this makes PHP not load at all). On the other hand, it doesn't seem to have done any real harm and it means that we can apparently discard our entire php.ini customization scheme and make our lives simpler, since clearly it's not actually necessary in practice.
(I stumbled over this in the process of preparing our user-run webserver system for an upgrade to 18.04. How I noticed it actually involve another bit of quiet configuration drift, although that's story for another entry.)
Our problem with (Amanda) backups of many files, especially incrementals
Our fileserver-based filesystems have a varying number of inodes in use on them, ranging from not very many (often on filesystems with not a lot of space used) to over 5.7 million. Generally our Amanda backups have no problems handling the filesystems with not too many inodes used, even when they're quite full, but the filesystems with a lot of inodes used seem to periodically give our backups a certain amount of heartburn. This seems to be especially likely if we're doing incremental backups instead of full ones.
(We have some filesystems with 450 GB of space used in only a few hundred inodes. The filesystems with millions of inodes used tend to have a space used to inodes used ratio from around 60 KB per inode up to 200 KB or so, so they're also generally quite full, but clearly being full by itself doesn't hurt us.)
Our Amanda backups use GNU Tar to actually read the filesystem and generate the backup stream. GNU Tar works through the filesystem and thus the general Unix POSIX filesystem interface, like most backup systems, and thus necessarily has some general challenges when dealing with a lot of files, especially during incremental backups.
When you work through the filesystem, you can only back up files
by opening them and you can only check if a file needs to be included
in an incremental backup by
stat()ing it to get its modification
time and change time. Both of these
activities require the Unix kernel and filesystem to have access
to the file's inode; if you have a filesystem with a lot of inodes,
this will generally mean reading it off the disk. On HDs, this is
reasonably likely to be a seek-limited activity, although fortunately
it clearly requires less than one seek per inode.
Reading files is broadly synchronous
but in practice the kernel will start doing readahead for you almost
stat()s is equally synchronous, and then things
get a bit complicated. Stat() probably doesn't have any real readahead
most of the time (for ZFS there's some hand waving here because
in ZFS inodes are more or less stored in files), but you also get 'over-reading'
where more data than you immediately need is read into the kernel's
cache, so some number of inodes around the one you wanted will be
available in RAM without needing further disk fetches. Still, during
incremental backups of a filesystem with a lot of files where only
a few of them have changed, you're likely to spend a lot of time
stat()ing files that are unchanged, one after another, with only
a few switches to
read()ing files. On full backups, GNU Tar is
probably switching back and forth between
it backs up each file in turn.
(On a pragmatic level it's clear that we have more problems with incrementals than with full backups.)
I suspect that you could speed up this process somewhat by doing
stat()s in parallel (using multiple threads), but I doubt
that GNU Tar is ever going to do that. Traditionally you could also
often get a speedup by sorting things into order by inode number,
but this may or may not work on ZFS (and GNU Tar may already be
doing it). You might also get a benefit by reading in several tiny
files at once in parallel, but for big files you probably might as
well read them one at a time and enjoy the readahead.
I'm hoping that all of this will be much less of a concern and a problem when we move from our current fileservers to our new ones, which have local SSDs and so are going to be much less affected by a seek-heavy worklog (among other performance shifts). However this is an assumption; we might find that there are bottlenecks in surprising places in the whole chain of software and hardware involved here.
(I have been tempted to take a ZFS copy of one of our problem filesystems, put it on a test new fileserver, and see how backing it up goes. But for various reasons I haven't gone through with that yet.)
PS: Now you know why I've recently been so interested in knowing where in a directory hierarchy there were a ton of files (cf).
It's worth testing that obvious things actually do work
We've reached the point in putting together our future ZFS on Linux NFS fileservers where we believe we have everything built and now we're testing it to make sure that it works and to do our best to verify that there are no hidden surprises. In addition to the expected barrage of NFS client load tests and so on, my co-worker decided to verify that NFS locks worked. I would not have bothered, because of course NFS locks work, they are a well solved problem, and it has been many years since NFS locks (on Linux or elsewhere) had any chance of not working. This goes to show that my co-worker is smarter than I am, because when he actually tried it (using a little lock testing program that I wrote years ago), well:
$ ./locktest plopfile Press <RETURN> to try to get a flock shared lock on plopfile: Trying to get lock... flock lock failure: No locks available
With some digging we were able to determine that this was caused
not being started on our (Linux) fileserver. We're using NFS v3,
which requires some extra daemons to handle aspects of the (separate)
locking protocol, and presumably NFSv3 is unfashionable enough these
days that systems no longer bother to start them by default.
(Perhaps I'm making excuses for Ubuntu 18.04 here.)
Had we taken the fileserver into production without discovering this, the good news is that important things like our mail system would probably have failed safe by refusing the proceed without locks. But we would certainly have had a fun debugging experience, and under more stress than we did in testing. So I'm very glad that my co-worker was carefully thorough here.
The obvious moral I take from this is that it's worth testing that the obvious things do work. The obvious things are probably not broken in general (otherwise you would hopefully have heard about it during system research and design), but there's always the possibility of setup or configuration mistakes, or that you have a sufficiently odd system that you're falling into a corner case. You may not want to test truly everything, but it's certainly worth testing important but obvious things, such as NFS locking.
(There's also the unpleasant possibility that you've wound up with some fundamental misunderstanding about how the system is designed to work. This is going to force some big changes, but it's better to find this out before you try to take your mistake into production, rather than afterward as things are exploding.)
How much and how thoroughly you test in general depends on your resources and the importance of what you're doing. Some places might find and run a test suite that verified that their new NFS fileservers were delivering full POSIX compatibility (or as much as you can on NFS in general), for example. Making a point of testing the obvious is only an issue if you're only going to do partial tests, and so you might otherwise be tempted to skip the 'it's so obvious it must work' bits in the interests of time.
You may also want to skip explicitly testing the obvious in favour of doing end to end tests that will depend on the obvious working. For example, we might set up an end to end test of mail delivery and (IMAP) mail reading, and if we had, that would almost certainly have discovered the locking issue. There are trade-offs involved in each level of testing, of course.
(The short version is that end to end testing can tell you that it works but it can't tell you why, and it can be dangerous to infer that why yourself. If you actually want a low level functionality test, do the test directly.)
Sidebar: The smoking gun symptom
The fileserver's kernel logs had a bunch of messages reporting:
lockd: cannot monitor <host>
This comes from kernel code that attempts to make an upcall to
rpc.statd, which led us to look at
ps to make sure that
was there before we went digging further.
It matters where (or when) your programs ask questions
The other day, I wrote about how we belatedly evolved our account creation script so that it could now just assume we wanted the defaults for most everything, and how this simple change had been a real quality of life improvement for us. This improvement isn't just because we interact with the script a lot less; it's also because we changed where we interact with it. Specifically, we now only interact with the script right at the start and all the way at the end; before, we had to periodically interact with the script all the way through its run.
The problem with periodic interactions is that they have the end result of slowing down the whole process a bunch, and often they make it feel draining and demanding. What happens in practice is that you start the process, have it run, get bored with waiting for it to ask you a question, look away to do something else, don't notice immediately that the process has paused with a question, go back to it, answer the question, get bored again, and repeat until the whole thing is over. If and when you wind up constantly looking over to check on the process or focusing on it while you wait for it to ask you something, it feels draining and demanding. You're not doing anything, but you have to pay attention and wait.
When you shift all of the questions and interaction to the start and the end, you wipe most of this away. You start the process, it immediately asks you a question or two, and then you can go away. When it finishes, you may have a final question or two to answer, but at that point it's actually done. You don't have to constantly pay it some amount of attention in order to keep it moving along; it becomes a fire and mostly forget thing. Maybe you look over every so often to see if it's finished yet, but you know that you're not really delaying it by not paying enough attention.
As a result of our experiences with this script (and similar ones that need to ask us questions or have us do things by hand), I've come to be strongly biased about where I want to put any interactions in my scripts. If I have to ask questions, I'm going to do my best to put them as early as possible. If I can't ask them right at the start, I'm at least going to ask them all at once, so there's only one pause and interruption, and once that's over I know I can basically ignore the script for a while.
(Our local scripts are not perfect here, and perhaps we should change one that asks its questions early but not right away. But that script does at least ask all its questions all at once.)
PS: You might wonder how you wind up with a bunch of questions scattered through your script. Through good intentions, basically. If you have a bunch of different operations to do and you have a tacit custom that you want to manually confirm operations, you can easily wind up with a pattern where people adding operations add a 'okay, should I do this/tell me what option to take' question right before they start the operation itself. Then you wind up with a stop-start script that keeps pausing to ask you questions.
The evolution of our account creation script
One of the things about system administration automation is that its evolution often follows the path of least resistance. This can leave you with interesting and peculiar historical remnants, and it can also create situations where it takes a relatively long time before a system does the obvious thing. As it happens, I have a story about this.
To go with our account request system, which handles people requesting new accounts and authorizing requested accounts, we have an actual script that we run to actually create Unix accounts. Until relatively recently that script asked you a bunch of questions, although they all had default answers that we'd accept essentially all of the time. The presence of these questions was both a historical remnant of the path that the script took and an illustration of how unquestioningly acclimatized we can all become to what we think of as 'normal'.
We have been running Unix systems and creating accounts on them for a very long time, and in particular we've been doing this since before the World Wide Web existed and was readily accessible. Back in the beginning of things, accounts were requested on printed forms; graduate students and suchlike filled out the form with the information, got their account sponsors to sign it, handed it to the system staff, and the system staff typed all of the information into a script that asked us questions like 'login?', 'name?', 'Unix group?', 'research group affiliation?', and so on.
At a certain point, the web became enough of a thing that having a CGI version of our paper account request form was an obvious thing to do. Not everyone was going to use the CGI form (or be able to), and anyway we already had the account creation script that knew all of the magic required to properly create an account around here, so we adopted the existing script to also work with the CGI. The CGI wrote out the submitted information into a file (basically as setting shell environment variables) and this file was then loaded into the account creation script as the default answers to many of the questions that had originally been fields on the printed form. If the submitted information was good, you could just hit Return through many of the questions. After you created the account, you then had to email some important information about it (especially the temporary password) off to the person it was for; you did this by hand, because you generated the random password by hand outside of the script.
(For reasons lost to history, the data file that the CGI wrote and the script loaded was a m4 file that was then processed through m4 to create shell variable assignments.)
When we wrote our account request system to replace the basic CGI (and the workflow around it, which involved manually emailing account sponsors to ask them about approving accounts), the simple and easy way for it to actually get accounts created was to carefully write the same data file that the CGI had used (m4isms and all). The account request script remained basically unchanged, and in particular it kept asking us to confirm all of the 'default' answers, ie all of the information that the account request system had already validated and generated. More than that, we added a few more bits of special handling for some accounts, with their own questions.
(Although the account request system was created in 2011, it took
until a 2016 major revision for a new version of Django for us to
switch from generating m4 data files to just directly generating
shell variable assignments that the script directly sourced with
That we had to actually answer these questions and then write the 'you have a new account' email made the whole process of creating an account a tedious thing. You couldn't just start the script and go away for a while; you had to periodically interact with it, hitting Return, generating a password in another window and pasting it in to the password prompt, and composing email yourself. None of these things were actually necessary for the backend of the account request system, but they stayed for historical reasons (and because we needed them occasionally, because some accounts are created outside of the account request system). And we, the people who used the script, were so acclimatized to this situation that we didn't really think about it; in fact I built my own automation around writing the 'you have a new account' form email.
At this point I've forgotten what the exact trigger event was, but last year around this time, in the middle of creating a bunch of new graduate student accounts (where the existing script's behavior was at its most tedious), we realized that this could be fixed. I'll quote my commit messages:
New 'fast create' mode for account creation that takes all the defaults and doesn't bother asking if we're really sure.
For fast mode, add the ability to randomly generate or set the initial password at the start of the process.
offer to send new-account greeting email.
make sending greeting email be the default (if you just hit return).
(In theory we could make sending the greeting email happen automatically. In practice, asking a final question gives us an opportunity to look back at all the messages printed out just in case there's some problem that the script didn't catch and we want to pause to fix things up.)
This simple change has been a real quality of life improvement for us, turning a tedious slog into a mostly fire and forget exercise that we can casually run through. That it took so long to make our account creation script behave this way is an illustration not just of the power of historical paths but also of the power of habituation. We were so used to how the existing system worked that we never really questioned if it had to be that way; we just grumbled and accepted it.
(This is, in a sense, part of the power of historical paths. The path that something took to get where it is shapes what we see as 'normal' and 'just how things are', because it's what we get used to.)
Sidebar: There were some additional steps in there
There are a few questions in the account creation script where in theory we have a genuine choice to make; for example, some accounts have several options for what filesystem they get created in. Part of what made the no-questions version of the script possible was that we realized that in practice we always made a particular choice (for filesystems, we always picked the one with the most free space), so we revised the script to make this choice the default answer.
Had we not worked out default answers for all of these questions, we couldn't have made the creation script not even ask the questions. We might have done both at the same time if it was necessary, but in practice it certainly helped that everything already had default answers so the 'fast create' mode could just be 'take all of the default answers without requiring confirmation'.
The benefits of driving automation through cron
In light of our problem with timesyncd, we needed a different (and working)
solution for time synchronization on our Ubuntu 18.04 machines. The
obvious solution would have been to switch over to chrony; Ubuntu even has chrony set up so that if you run
it, timesyncd is automatically blocked. I like chrony so I was
tempted by this idea briefly, but then I realized that using chrony
would mean having yet another daemon that we have to care about.
Instead, our replacement for timesyncd is running
There are a number of quiet virtues of driving automation out of
cron entries. The whole approach is simple and brute force, but
this creates a great deal of reliability. Cron basically never dies
and if it were ever to die it's so central to how our systems operate
that we'd probably notice fairly fast. If we're ever in any doubt,
cron logs when it runs things to syslog (and thus to our central
syslog server), and if jobs fail or produce output, cron has a very
reliable and well tested system for reporting that to us. A simple
cron entry that runs
ntpdate has no ongoing state that can get
messed up, so if cron is running at all, the
ntpdate is running
at its scheduled interval and so our clocks will stay synchronized.
If something goes wrong on one run, it doesn't really matter because
cron will run it again later. Network down temporarily? DNS resolution
broken? NTP servers unhappy? Cure the issue and we'll automatically
get time synchronization back.
A cron job is simple blunt force; it repeats its activities over and over and over again, throwing itself at the system until it batters its way through and things work. Unless you program it otherwise, it's stateless and so indifferent to what happened the last time around. There's a lot to be said for this in many system tasks, including synchronizing the clock.
(Of course this can be a drawback if you have a cron job that's failing and generating email every failure, when you'd like just one email on the first failure. Life is not perfect.)
There's always a temptation in system administration to make things complicated, to run daemons and build services and so on. But sometimes the straightforward brute force way is the best answer. We could run a NTP daemon on our Ubuntu machines, and on a few of them we probably will (such as our new fileservers), but for everything else, a cron job is the right approach. Probably it's the right approach for some of our other problems, too.
(If timesyncd worked completely reliably on Ubuntu 18.04, we would likely stick with it simply because it's less work to use the system's default setup. But since it doesn't, we need to do something.)
PS: Although we don't actively monitor cron right now, there are ways to notice if it dies. Possibly we should add some explicit monitoring for cron on all of our machines, given how central it is to things like our password propagation system. Sure, we'd notice sooner or later anyway, but noticing sooner is good.
One simple general pattern for making sure things are alive
One perpetual problem in system monitoring is detecting when something goes away. Detecting the presence of something is often easy because it reports itself, but detecting absence is usually harder. For example, it generally doesn't work well to have some software system email you when it completes its once a day task, because the odds are only so-so that you'll actually notice on the day when the expected email isn't there in your mailbox.
One general pattern for dealing with this is what I'll call a staleness timer. In a staleness timer you have a timer that effectively slowly counts down; when the timer reaches 0, you get an alert. When systems report in that they're alive, this report resets their timer to its full value. You can implement this as a direct timer, or you can write a check that is 'if system last reported in more than X time ago, raise an alert' (and have this check run every so often).
(More generally, if you have an overall metrics system you can presumably write an alert for 'last metric from source <X> is more than <Y> old'.)
In a way this general pattern works because you've flipped the problem around. Instead of the default state being silence and exceptional things having to happen to generate an alert, the default state is an alert and exceptional things have to happen to temporarily suppress the alert.
There are all sorts of ways of making programs and systems report in, depending on what you have available and what you want to check. Traditional low rent approaches are touching files and sending email to special dedicated email aliases (which may write incoming email to a file, or simply run a program on incoming email that touches a relevant file). These can have the drawback that they depend on multiple different systems all working, but they often have the advantage that you have them working already (and sometimes it's a feature to verify all of the systems at once).
(If you have a real monitoring system, it hopefully already provides a full selection of ways to submit 'I am still alive' notifications to it. There probably is a very simple system that just does this based on netcat-level TCP messages or the like, too; it seems like the kind of thing sysadmins write every so often. Or perhaps we are just unusual in never having put together a modern, flexible, and readily customizable monitoring system.)
All of this is a reasonably obvious and well known thing around the general community, but for my own reasons I want to write it down explicitly.