Wandering Thoughts archives

2019-11-11

An apparent hazard of removing Linux software RAID mirror devices

One of the data disks in my home machine has been increasingly problematic for, well, a while. I eventually bought a replacement HD, then even more eventually put it the machine along side the current two data disks, partitioned it, and added it as a third mirror to my software RAID partitions. After running everything as a three-way mirror for a while, I decided that problems on the failing disk were affecting system performance enough that I'd take the main software RAID partition on the disk out of service.

I did this as, roughly:

mdadm --manage /dev/md53 --fail /dev/sdd4
mdadm --manage /dev/md53 --remove /dev/sdd4
mdadm --manage /dev/md53 --raid-devices 2

(I didn't save the exact commands, so this is an approximation. The failing drive is sdd.)

The main software RAID device immediately stopped using /dev/sdd4 and everything was happy (and my Prometheus monitoring of disk latency no longer showed drastic latency spikes for sdd). The information in /proc/mdstat said that md53 was fine, with two out of two mirrors.

Then, today, my home machine locked up and rebooted (because it's the first significantly cold day in Toronto and I have a little issue with that). When it came back, I took a precautionary look at /proc/mdstat to see if any of my RAID arrays had decided to resync themselves. To my very large surprise, mdstat reported that md53 had two out of three failed devices and the only intact device was the outdated /dev/sdd4.

(The system then then started the outdated copy of the LVM volume group that sdd4 held, mounted outdated copies of the filesystems in it, and let things start writing to them as if they were the right copy of those filesystems. Fortunately I caught this very soon after boot and could immediately shut the system down to avoid further damage.)

This was not a disk failure; all of my other software RAID arrays on those disks showed three out of three devices, spanning the old sdc and sdd drives and the new sde drive. But rather than assemble the two-device new version of md53 with both mirrors fully available on sdc4 and sde4, the Fedora udev boot and software RAID assembly process had decided to assemble the old three-device version visible only on sdd4 with one out of three mirrors. Nor is this my old case of not updating my initramfs to have the correct number of RAID devices, because I never updated either the real /etc/mdadm.conf or the version in the initramfs to claim that any of my RAID arrays had three devices instead of two.

As I said on Twitter, I'm sufficiently used to ZFS's reliable behavior on device removal that I never even imagined that this could happen with software RAID. I can sort see how it did (for a start, I expect that marking a device as failed leaves its RAID superblock untouched), but I don't know why and the logs I have available contain no clues from udev and mdadm about its decision process for which array component to pick.

The next time I do this sort of device removal, I guess I will have to explicitly erase the software RAID superblock on the removed device with 'mdadm --zero-superblock'. I don't like doing this because if I make a mistake in the device name (and it is only a letter or a number away from something live), I've probably just blown things up.

The obvious conclusion is that mdadm should have an explicit way to say 'take this device out of service in this disk array', one that makes sure to update everything so that this can't happen even if the device remains physically present in the system. I don't care whether that involves adding a special mark to the device's RAID superblock or erasing it; I just want it to work. Perhaps what I did should already work in theory; if so, I regret to say that it didn't in practice.

(My short term solution is to physically disconnect sdd, the failing disk drive. This reduces the other three-way mirrors to two-way ones and I don't know what I'll do with the pulled sdd; it's probably not safe to let my home machine see it in any state at any time in the future. But at least this way I have working software RAID arrays.)

Sidebar: Why mdadm's --replace is not a solution for me

I explicitly wanted to run my new drive along side the existing two drives for a while, in case of infant mortality. Thus I wanted to run with three-way mirrors, instead of replacing one disk in a two-way mirror with another one.

linux/SoftwareRaidRemovingDiskGotcha written at 22:28:46; Add Comment

2019-11-10

Putting a footer on automated email that says what generated it

We have various cron jobs and other systems that occasionally send us email, such as notifications about Certbot renewals failing or alerts that Prometheus is down. One of our local system administration habits when we set up such automated email is that we always add a footer to the email message that includes as much information as possible about what generated the message and where.

For example, the email we send on systemd unit failures (currently used for our Certbot renewal failures) has a footer that looks like this:

(This email comes from /opt/cslab/sbin/systemd-email on $HOST. It was probably run from /etc/systemd/system/cslab-status-email@.service, triggered by an 'OnFailure=' property on the systemd unit $SERVICE.)

I acquired this habit from my co-workers (who were doing it before I got here), but in an environment with a lot of things that may send email once in a while from a lot of machines, it's a good habit to get into. In particular, if you receive a mysterious email about something, being able to go find and look at whatever is generating it is very useful for sorting out what it's really telling you, what you should do about it, and whether or not you actually care.

(There is apparently local history behind our habit of doing this, as there often is with these things. Many potentially odd habits of any particular system administration environment are often born from past issues, much as is true for monitoring system alerts. In both cases, sometimes this is bolting the stable door after the horse is long gone and not returning and sometimes it's quite useful.)

In general, this is another way of adding more context to things you may only see once in a while, much like making infrequent error messages verbose and clear. We're probably going to have forgotten about the existence of many of our automated scripts by the time they send us email, so they need to remind us about a lot more context than I might think when writing them.

PS: If you're putting a footer on the automatic email, you don't necessarily have to identify the machine by the GECOS name of its root account, but every little bit helps. And identifying the machine in the GECOS name still helps for things that don't naturally generate a footer, like email from cron about jobs that failed or produced unexpected output.

sysadmin/AutomatedEmailSourceFooter written at 23:30:20; Add Comment

The problems with piping curl to a shell are system management ones

I was recently reading Martin Tournoij's Curl to shell isn't so bad (via), which argues that the commonly suggested approach of using 'curl example.com/install.sh | sh' is not the security hazard that it's often made out to be. Although it may surprise people to hear this, I actually agree with the article's core argument. If you're going to download and use source code (with its autoconfigure script and 'make install' and so on) or even pre-build binaries, you're already extending quite a lot of trust to the software's authors. However, I still don't think you should install things with curl to shell. There are two reasons not to, one a general system management one and one a pragmatic one about what people do in these scripts.

The general system management one is that to manage and maintain your system over time, you need to control what changes are made to it and insure that everything is handled consistently. You don't want someone's install script making arbitrary and unknown changes to your system, and it gets worse when that install script can change over time. The ideal thing to install is an artifact that you can save locally and that makes limited and inspectable changes to your system (if any). Good install options are, for example, a self-contained tarball that you can extract into a directory hierarchy of your choice (and that doesn't even have to be owned by or extracted by root), or a package for the standard package manager on your system that doesn't contain peculiar custom scripts with undesired side effects. An un-versioned shell script fetched from a remote end that you don't save or inspect and that will make who knows what changes on your system is a terrible idea for being able to manage, maintain, and understand the resulting system state.

The pragmatic reason is that for some reason, the people writing these install shell scripts feel free to have them make all sorts of nominally convenient changes to your system on behalf of their software. These shell scripts could be carefully contained, minimal, and unchanging (for a particular release), doing very little more than what would happen if you installed a good package through your package manager, but very often they aren't and you'll wind up with all sorts of random changes all over your system. This is bad for the obvious reason, and it's also bad because there's no guarantee that your system is set up in the way that the install script expects it to be. Of course generally 'make install' has the same problem, which is why experienced sysadmins also mostly avoid running that as root.

(More generally, you really want to manage the system through only one thing, often the system's package manager. This is the problem with CPAN and other independent package systems (althogh there are good reasons why people keep creating them). Piping curl to a shell and 'make install' are just magnified versions of it. See also why package systems are important.)

sysadmin/CurlToShellManagementProblem written at 00:20:22; Add Comment

2019-11-08

I have to assume that people here can be successfully phished

Over on Mastodon, I said some things in a conversation:

@cks: Given what mobile browsers are doing to the visibility of web page URLs plus how many 'you must authenticate' web services we have, I basically assume that a lot of our users can be phished by anyone who tries hard enough.

(Some spammers are starting to work that hard, but they're not doing phish spam, they're doing the 'please can you do me a favour' manual spam.)

@cks: Our latest finphishing cloned someone's signature block, too, which shows some reasonably decent advance scouting. I was a bit alarmed by that; with some more work they could have made it very hard to tell in typical mail clients.

(I'm only including my remarks for obscure reasons; see Mastodon for the full conversation.)

An increasing number of people read email and use the web from smartphones and tablets, where mail clients and browsers are making the details of where URLs go and what website you're on harder and harder to see casually. This combines badly with the sort of environment we have, where there is a broad assortment of web services that require you to authenticate with your password (because we can't ask people to remember multiple passwords). The odds that people on a smartphone could tell a well done fake phish website from one of our real websites is relatively low. It wouldn't even necessarily have to try to duplicate one of our existing sites; an attacker could put together something that looks like a convincing internal administrative service, then send targeted email to staff, professors, or grad students. People are are already used to new services they have to use being introduced periodically.

There are various organizational things that could be done to try to reduce this, but it's a hard problem in general as long as we use passwords alone. And introducing any sort of two factor authentication would have its own significant challenges that are well beyond the scope of this entry.

But even that's not the whole story, as my second toot is sort of about. The modern sort of finphishing attacks aren't after your password, they're directly after your money by persuading you that your boss or other important person urgently needs you to buy some gift cards (and then pass them on to the spammer, whether or not you realize that that's what you're doing). These attacks are also made easier by how modern smartphone and tablet mail clients typically present email; if you can make the message body, signature, and subject line look authentic enough, that's almost all of what people will see and what they'll make judgments based on. And people are trusting.

I don't really have any answers here. Mostly I'm glad that we seem to be targeted relatively infrequently and don't have more problems than we do.

(Alternately, perhaps plenty of our users have been compromised but the spammers are just keeping things quiet enough that we don't notice.)

spam/WeCanBePhished written at 23:48:16; Add Comment

2019-11-07

Some notes on getting email when your systemd timer services fail

Suppose, not hypothetically, that you have some things that are implemented through systemd timers instead of traditional cron.d jobs, and you would like to get email if and when they fail. The lack of this email by default is one of the known issues with turning cron.d entries into systemd timers and people have already come up with ways to do this with systemd tricks, so for full details I will refer you to the Arch Wiki section on this (brought to my attention by keur's comment on my initial entry) and this serverfault question and its answers (via @tvannahl on Twitter). This entry is my additional notes from having set this up for our Certbot systemd timers.

Systemd timers come in two parts; a .timer unit that controls timing and a .service unit that does the work. What we generally really care about is the .service unit failing. To detect this and get email about it, we add an OnFailure= to the timer's .service unit that triggers a specific instance of a template .service that sends email. So if we have certbot.timer and certbot.service, we add a .conf file in /etc/systemd/certbot.service.d that contains, say:

[Unit]
OnFailure=cslab-status-email@%n.service

Due to the use of '%n', this is generic; the stanza will be the same for anything we want to trigger email from on failure. The '%n' will expand to the full name of the service, eg 'certbot.service' and be available in the cslab-status-email@.service template unit. My view is that you should always use %n here even if you're only doing this for one service, because it automatically gets the unit name right for you (and why risk errors when you don't have to). In the cslab-status-email@.service unit, the full name of the unit triggering it will be available as '%i', as shown in the Arch Wiki's example. Here that will be 'certbot.service'.

(With probably excessive cleverness you could encode the local address to email to into what the template service will get as %i by triggering, eg, cslab-status-email@root-%n.service. We just hard code 'root' all through.)

The Arch Wiki's example script uses 'systemctl status --full <unit>'. Unfortunately this falls into the trap that by default systemd truncates the log output at the most recent ten lines. We found that we definitely wanted more; our script currently uses 'systemctl status --full -n 50 <unit>' (and also contains a warning postscript that it may be incomplete and to see journalctl on the system for full details). Having a large value here is harmless as far as I can tell, because systemd seems to only show the log output from the most recent activation attempt even if there's (much) less than your 50 lines or whatever.

(Unfortunately as far as I can see there is no easy way to get just the log output without the framing 'systemctl status' information about the unit, much of which is not particularly useful. We live with this.)

As with the Arch Wiki's example script, you definitely want to put the hostname into the email message if you have a fleet. We also embed more information into the Subject and From, and add a MIME-Version:

From: $HOSTNAME root <root@...>
Subject: $1 systemd unit failed on $HOSTNAME
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=UTF-8

You definitely want to label the email as UTF-8, as 'systemctl status' puts a UTF-8 '‚óŹ' in its output. The subject could be incorrect (we can't be sure the template unit was triggered through an 'OnFailure=', even that's how it's supposed to be used), but it's much more useful in the case where everything is working as intended. My bias is towards putting as much context into emails like this, because by the time we get one we'll have forgotten all about the issue and we don't want to be wondering why we got this weird email.

The Arch Wiki contains a nice little warning about how systemd may wind up killing child processes that the mail submission program creates (as noticed by @lathiat on Twitter). I decided that the easiest way for our script to ward off this was to just sleep for 10 or 15 seconds at the end. Having it exit immediately is not exactly critical and this is the easy (if brute force) way to hopefully work around any problems.

Finally, as the Arch Wiki kind of notes, this is not quite the same thing as what cron does. Cron will send you email if your job produces any output, whether or not it fails; this will send you the logged output (if any) if the job fails. If the job succeeds but produces output, that output will go only to the systemd journal and you will get no notification. As far as I know there's no good way to completely duplicate cron's behavior here.

(Also, on failure the journal messages you get will include both actual stuff printed by the service and also, I believe, anything it logged to places like syslog; with cron you only get the former. This is probably a useful feature.)

linux/SystemdTimersMailNotes written at 23:30:42; Add Comment

Realizing that Go constants are always materialized into values

I recently read Global Constant Maps and Slices in Go (via), which starts by noting that Go doesn't let you create const maps or slices and then works around that by having an access function that returns a constant slice (or map):

const rateLimit = 10
func getSupportedNetworks() []string {
    return []string{"facebook", "twitter", "instagram"}
}

When I read the article, my instinctive reaction was that that's not actually a constant because the caller can always change it (although if you call getSupportedNetworks() again you get a new clean original slice). Then I thought more about real Go constants like rateLimit and realized that they have this behavior too, because any time you use a Go constant, it's materialized into a mutable value.

Obviously if you assign the rateLimit constant to a variable, you can then change the variable later; the same is true of assigning it to a struct field. If you call a function and pass rateLimit as one argument, the function receives it as an argument value and can change it. If a function returns rateLimit, the caller gets back a value and can again change it. This is no different with the slice that getSupportedNetworks() returns.

The difference between using rateLimit and using the return value from getSupportedNetworks is that the latter can be mutated through a second reference without the explicit use of Go pointers:

func main() {
   a := rateLimit
   b := &a
   *b += 10

   c := getSupportedNetworks()
   d := c
   d[1] = "mastodon"

   fmt.Println(a, c)
}

But this is not a difference between true Go constants and our emulated constant slice, it's a difference in the handling of the types involved. Maps and slices are special this way, but other Go values are not.

(Slices are also mutable at a distance in other ways.)

PS: Go constants can't have their address taken with '&', but they aren't the only sorts of unaddressable values in Go. In theory we could make getSupportedNetworks() return an unaddressable value by making its return value be '[3]string', as we've seen before; in practice you almost certainly don't want to do that for various reasons.

(This seems like an obvious observation now that I've thought about it, but I hadn't really thought about it before reading the article and having my reflexive first reaction.)

programming/GoConstantsAsValues written at 00:13:57; Add Comment

2019-11-06

Systemd needs official documentation on best practices

Systemd is reasonably well documented on the whole, although there are areas that are less well covered than others (some of them probably deliberately). For example, as far as I know everything you can put in a unit file is covered somewhere in the manpages. However, as was noted in the comments on my entry on how timer units can hide errors, much of this information is split across multiple places (eg, systemd.unit, systemd.service, systemd.exec, systemd.resource-control, and systemd.kill). This split is okay at one level, because the systemd manpages are explicitly reference documentation and the split makes perfect sense there; things that are common to all units are in systemd.unit, things that are common to running programs (wherever from) are in systemd.exec, and so on and so forth. Systemd even gives us an index, in systemd.directives, which is more than some documentation does.

But having reference documentation alone is not enough. Reference documentation tells you what you can do, but it doesn't tell you what you should do (and how you should do it). Systemd is a complex system with many interactions between its various options, and there are many ways to write systemd units that are bad ideas or that hide subtle (or not so subtle) catches and gotchas. We saw one of them yesterday, with using timer units to replace /etc/cron.d jobs. There is nothing in the current systemd documentation that will point out the potential drawbacks of doing this (although there is third party documentation if you stumble over it, cf).

This is why I say that systemd needs official documentation on best practices and how to do things. This would (or should) cover what you should do and not do when creating units, what the subtle issues you might not think about are, common mistakes people make in systemd units, and what sort of things you should think about when considering replacing traditional things like cron.d jobs with systemd specific things like timer units. Not having anything on best practices invites people to do things like the Certbot packagers have done, where on systemd systems errors from automatic Certbot renewal attempts mostly vanish instead of actually being clearly communicated to the administrator.

(You cannot expect people to carefully read all of the way through all of the systemd reference documentation and assemble a perfect picture of how their units will operate and what the implications of that are. That is simply too complex for people to keep full track of, and anyway people don't work that way outside of very rare circumstances.)

linux/SystemdNeedsBestPractices written at 01:04:34; Add Comment

2019-11-04

Systemd timer units have the unfortunate practical effect of hiding errors

We've switched over to using Certbot as our Let's Encrypt. As packaged for Ubuntu in their PPA, this is set up as a modern systemd-based package. In particular, it uses a systemd timer unit to trigger its periodic certificate renewal checks, instead of a cron job (which would be installed as a file in /etc/cron.d). This weekend, the TLS certificates on one of our machines silently failed to renew on schedule (at 30 days before it would expire, so this was not anywhere close to a crisis).

Upon investigation, we discovered a setup issue that had caused Certbot to error out (and then fixed it). However, this is not a new issue; in fact, Certbot has been reporting errors since October 22nd (every time certbot.service was triggered from certbot.timer, which is twice a day). That we hadn't heard about them points out a potentially significant difference between cron jobs and systemd timers, which is that cron jobs email you their errors and output, but systemd timers quietly swallow all errors and output into the systemd journal. This is a significant operational difference in practice, as we just found out.

(Technically it is the systemd service unit associated with the timer unit.)

Had Certbot been using a cron job, we would have gotten email on the morning of October 22nd when Certbot first found problems. But since it was using a systemd timer unit, that error output went to the journal and was effectively invisible to us, lost within a flood of messages that we don't normally look at and cannot possibly routinely monitor. We only found out about the problem when the symptoms of Certbot not running became apparent, ie when a certificate failed to be renewed as expected.

Unfortunately there's no good way to fix this, at least within systemd. The systemd.exec StandardOutput= setting has many options but none of them is 'send email to', and I don't think there's any good way to add mailing the output with a simple drop-in (eg, there is no option for 'send standard output and standard error through a pipe to this other command'). Making certbot.service send us email would require a wholesale replacement of the command it runs, and at that point we might as well disable the entire Certbot systemd timer setup and supply our own cron job.

(We do monitor the status of some systemd units through Prometheus's host agent, so perhaps we should be setting an alert for certbot.service being in a failed state. Possibly among other .service units for important timer units, but then we'd have to hand-curate that list as it evolves in Ubuntu.)

PS: I think that you can arrange to get emailed if certbot.service fails, by using a drop in to add an 'OnFailure=' that starts a unit that sends email when triggered. But I don't think there's a good way to dig the actual error messages from the most recent attempt to start the service out of the journal, so the email would just be 'certbot.service failed on this host, please come look at the logs to see why'. This is an improvement, but it isn't the same as getting emailed the actual output and error messages. And I'm not sure if OnFailure= has side effects that would be undesirable.

linux/SystemdTimersAndErrors written at 23:02:04; Add Comment

Many of our 'worklog' messages currently assume a lot of context

The primary way we keep track of things around here is our worklog system, which is a specific email log both of changes that we make and of how to do things (such as rebuild systems). Also, a while back I wrote about how keeping your past checklists doesn't help unless you can find them. In the process of recovering the checklist from 2015 that I was looking for, I wound up re-reading a bunch of our worklog messages from around that time, which gave me a learning lesson.

What I learned from re-reading our old messages is that most of our worklog messages assume a lot of implicit context. This makes perfect sense, since we write and send in our messages when the context is fresh in everyone's minds; it is the water that we're currently swimming in, and we're as generally oblivious to it as fish are. But looking back a few years later, that context is all gone. When I re-read our worklog messages I had to carefully rebuild the context, which I fortunately mostly could by reading enough messages from both worklog and our internal sysadmin mailing list (which is also archived, and which we try to discuss most everything on or at least mail in summaries of in-person discussions).

I don't think that we want to try to put down that implicit context in every worklog email; that would be tedious both to write and to read. But for worklog messages which we expect to refer to much later, for example as 'how we did this last time' directions, putting in much more explicit context and explanations seems like a good idea. Of course this is a bit tricky to actually do in practice for two reasons. The first is that what context needs to explained for the future isn't necessarily clear to us, since we're immersed in it. The second is that it's not always clear what worklog messages we're going to want to refer back to in general. Some things we can predict, but others may look like one-off things until they come up again. Still, I can try, and I should, especially for big things that took a lot of planning.

(This is similar to the lessons I learned from some planned power shutdowns many years ago, in part 1 and then part 2. We've had some other power shutdowns since then, and the lessons have been useful and I've tried to carry them out.)

PS: Just in general it might not hurt to put a bit more context into my worklogs, if only in the form of a link to the first iteration of any particular thing, where I'm likely to have written out a bunch of the background. Especially if I put it in as a postscript, it's easy to skip most of the time.

sysadmin/WorklogsAssumeContext written at 00:17:19; Add Comment

2019-11-02

Using personal ruleset recipes in uMatrix in Firefox

I generally don't run Javascript on websites, and these days I do this with uMatrix. uMatrix requires more fiddling than controlling Javascript with uBlock Origin, but I like its fine grained control of various things (including cookies) and how it can improve my web experience. One of the ways I've started doing that is by exploiting uMatrix's ability to let you define personal ruleset recipes.

Suppose, not hypothetically, that you periodically read technical articles on Medium. These articles frequently use images and often inline snippets of code from Github gists and the like, and unfortunately both of these only render if you turn on enough Javascript (and not just Medium's Javascript). Also unfortunately, Medium's own Javascript does enough annoying things that I don't want to leave it on all of the time; I only want to turn it on when I really need it for an article. I can certainly do this by hand, but it involves an annoying amount of clicking on things and refreshing the page.

But it turns out that we can do better. uMatrix has a thing called ruleset recipes, which are, to quote it:

Ruleset recipes ("recipes" thereafter) are community-contributed rulesets suggested to you when you visit a given web site. Recipes are specific, they contain only the necessary rules to un-break a specific web site or behavior.

There is no community contributed recipe for Medium that I know of, but we can write our own and hook it into uMatrix, provided that we have a website somewhere. Once added to uMatrix, we can enable it temporarily with a couple of clicks and then dump all of its temporary additions later.

First we need to create a text file with the rulesets we want and the necessary rules in them. For my Medium rules, what we need looks like this:

$ cat recipes_cks_en.txt
! uMatrix: Ruleset recipes 1.0
! Title: Chris's additional rulesets for English websites
! Maintainer: Chris Siebenmann
!

Medium no account
   medium.com *
      _ 1st-party script
      _ gist.github.com script

Next we need to put this on a website somewhere. Generally this should be a HTTPS website that you trust, for safety. Having done this we next need to add our recipes URL to uMatrix. This is done by going to the uMatrix dashboard, going to the Assets tab, and then down at the bottom of the 'Ruleset recipes' section you will see an 'Import...' option. Enable it, enter the URL of your recipes, and click 'Apply changes'. There, you're done; your new recipes are now available through uMatrix's regular interface for them, described in the ruleset recipes wiki page.

(You can also see the built in recipes in the Assets tab, or look at them on Github. This will give you an idea of what you can do in your own recipes.)

PS: I haven't tried to contribute my Medium recipe because I have no idea if it's complete or truly good enough. It works for me for the things that I care about, more or less, but I don't care very much about having all of Medium's various peculiarities working correctly (or correctly being blocked).

web/UMatrixPersonalRulesets written at 23:11:11; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.