An apparent hazard of removing Linux software RAID mirror devices
One of the data disks in my home machine has been increasingly problematic for, well, a while. I eventually bought a replacement HD, then even more eventually put it the machine along side the current two data disks, partitioned it, and added it as a third mirror to my software RAID partitions. After running everything as a three-way mirror for a while, I decided that problems on the failing disk were affecting system performance enough that I'd take the main software RAID partition on the disk out of service.
I did this as, roughly:
mdadm --manage /dev/md53 --fail /dev/sdd4 mdadm --manage /dev/md53 --remove /dev/sdd4 mdadm --manage /dev/md53 --raid-devices 2
(I didn't save the exact commands, so this is an approximation. The failing drive is sdd.)
The main software RAID device immediately stopped using /dev/sdd4 and everything was happy (and my Prometheus monitoring of disk latency no longer showed drastic latency spikes for sdd). The information in /proc/mdstat said that md53 was fine, with two out of two mirrors.
Then, today, my home machine locked up and rebooted (because it's the first significantly cold day in Toronto and I have a little issue with that). When it came back, I took a precautionary look at /proc/mdstat to see if any of my RAID arrays had decided to resync themselves. To my very large surprise, mdstat reported that md53 had two out of three failed devices and the only intact device was the outdated /dev/sdd4.
(The system then then started the outdated copy of the LVM volume group that sdd4 held, mounted outdated copies of the filesystems in it, and let things start writing to them as if they were the right copy of those filesystems. Fortunately I caught this very soon after boot and could immediately shut the system down to avoid further damage.)
This was not a disk failure; all of my other software RAID arrays
on those disks showed three out of three devices, spanning the old
sdc and sdd drives and the new sde drive. But rather than assemble
the two-device new version of md53 with both mirrors fully available
on sdc4 and sde4, the Fedora udev boot and software RAID assembly
process had decided to assemble the old three-device version visible
only on sdd4 with one out of three mirrors. Nor is this my old
case of not updating my initramfs to have the correct number of
RAID devices, because I never updated either the
/etc/mdadm.conf or the version in the initramfs to claim
that any of my RAID arrays had three devices instead of two.
As I said on Twitter, I'm sufficiently used to ZFS's reliable behavior on device removal that I never even imagined that this could happen with software RAID. I can sort see how it did (for a start, I expect that marking a device as failed leaves its RAID superblock untouched), but I don't know why and the logs I have available contain no clues from udev and mdadm about its decision process for which array component to pick.
The next time I do this sort of device removal, I guess I will have
to explicitly erase the software RAID superblock on the removed
device with '
mdadm --zero-superblock'. I don't like doing this
because if I make a mistake in the device name (and it is only a
letter or a number away from something live), I've probably just
blown things up.
The obvious conclusion is that mdadm should have an explicit way to say 'take this device out of service in this disk array', one that makes sure to update everything so that this can't happen even if the device remains physically present in the system. I don't care whether that involves adding a special mark to the device's RAID superblock or erasing it; I just want it to work. Perhaps what I did should already work in theory; if so, I regret to say that it didn't in practice.
(My short term solution is to physically disconnect sdd, the failing disk drive. This reduces the other three-way mirrors to two-way ones and I don't know what I'll do with the pulled sdd; it's probably not safe to let my home machine see it in any state at any time in the future. But at least this way I have working software RAID arrays.)
Sidebar: Why mdadm's --replace is not a solution for me
I explicitly wanted to run my new drive along side the existing two drives for a while, in case of infant mortality. Thus I wanted to run with three-way mirrors, instead of replacing one disk in a two-way mirror with another one.
Some notes on getting email when your systemd timer services fail
Suppose, not hypothetically, that you
have some things that are implemented through systemd timers instead
cron.d jobs, and you would like to get email if
and when they fail. The lack of this email by default is one of the
known issues with turning
cron.d entries into systemd timers and
people have already come up with ways to do this with systemd tricks,
so for full details I will refer you to the Arch Wiki section on
(brought to my attention by keur's comment on my initial entry) and this serverfault question and its
(via @tvannahl on Twitter). This
entry is my additional notes from having set this up for our Certbot
Systemd timers come in two parts; a
.timer unit that controls
timing and a
.service unit that does the work. What we generally
really care about is the
.service unit failing. To detect this
and get email about it, we add an
OnFailure= to the timer's
.service unit that triggers a specific instance of a template
.service that sends email. So if we have
certbot.service, we add a .conf file in /etc/systemd/certbot.service.d
that contains, say:
Due to the use of '
%n', this is generic; the stanza will be the
same for anything we want to trigger email from on failure. The
%n' will expand to the full name of the service, eg '
and be available in the
cslab-status-email@.service template unit.
My view is that you should always use %n here even if you're only
doing this for one service, because it automatically gets the unit
name right for you (and why risk errors when you don't have to).
In the cslab-status-email@.service unit, the full name of the
unit triggering it will be available as '
%i', as shown in the
Arch Wiki's example. Here that will be '
(With probably excessive cleverness you could encode the local
address to email to into what the template service will get as
by triggering, eg, cslab-status-email@root-%n.service. We just hard
root' all through.)
The Arch Wiki's example script uses '
systemctl status --full
<unit>'. Unfortunately this falls into the trap that by default
systemd truncates the log output at the most recent ten lines. We
found that we definitely wanted more; our script currently uses
systemctl status --full -n 50 <unit>' (and also contains a warning
postscript that it may be incomplete and to see
journalctl on the
system for full details). Having a large value here is harmless as
far as I can tell, because systemd seems to only show the log output
from the most recent activation attempt even if there's (much) less
than your 50 lines or whatever.
(Unfortunately as far as I can see there is no easy way to get just the log output without the framing 'systemctl status' information about the unit, much of which is not particularly useful. We live with this.)
As with the Arch Wiki's example script, you definitely want to
put the hostname into the email message if you have a fleet. We
also embed more information into the Subject and From, and add
From: $HOSTNAME root <root@...> Subject: $1 systemd unit failed on $HOSTNAME MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8
You definitely want to label the email as UTF-8, as '
status' puts a UTF-8 '
●' in its output. The subject could be
incorrect (we can't be sure the template unit was triggered through
OnFailure=', even that's how it's supposed to be used), but
it's much more useful in the case where everything is working as
intended. My bias is towards putting as much context into emails
like this, because by the time we get one we'll have forgotten
all about the issue and we don't want to be wondering why we got
this weird email.
The Arch Wiki contains a nice little warning about how systemd may wind up killing child processes that the mail submission program creates (as noticed by @lathiat on Twitter). I decided that the easiest way for our script to ward off this was to just sleep for 10 or 15 seconds at the end. Having it exit immediately is not exactly critical and this is the easy (if brute force) way to hopefully work around any problems.
Finally, as the Arch Wiki kind of notes, this is not quite the same thing as what cron does. Cron will send you email if your job produces any output, whether or not it fails; this will send you the logged output (if any) if the job fails. If the job succeeds but produces output, that output will go only to the systemd journal and you will get no notification. As far as I know there's no good way to completely duplicate cron's behavior here.
(Also, on failure the journal messages you get will include both actual stuff printed by the service and also, I believe, anything it logged to places like syslog; with cron you only get the former. This is probably a useful feature.)
Systemd needs official documentation on best practices
Systemd is reasonably well documented on the whole, although there are areas that are less well covered than others (some of them probably deliberately). For example, as far as I know everything you can put in a unit file is covered somewhere in the manpages. However, as was noted in the comments on my entry on how timer units can hide errors, much of this information is split across multiple places (eg, systemd.unit, systemd.service, systemd.exec, systemd.resource-control, and systemd.kill). This split is okay at one level, because the systemd manpages are explicitly reference documentation and the split makes perfect sense there; things that are common to all units are in systemd.unit, things that are common to running programs (wherever from) are in systemd.exec, and so on and so forth. Systemd even gives us an index, in systemd.directives, which is more than some documentation does.
But having reference documentation alone is not enough. Reference
documentation tells you what you can do, but it doesn't tell you
what you should do (and how you should do it). Systemd is a complex
system with many interactions between its various options, and there
are many ways to write systemd units that are bad ideas or that
hide subtle (or not so subtle) catches and gotchas. We saw one of
them yesterday, with using timer units
/etc/cron.d jobs. There is nothing in the current
systemd documentation that will point out the potential drawbacks
of doing this (although there is third party documentation if you
stumble over it, cf).
This is why I say that systemd needs official documentation on best
practices and how to do things. This would (or should) cover what
you should do and not do when creating units, what the subtle issues
you might not think about are, common mistakes people make in systemd
units, and what sort of things you should think about when considering
replacing traditional things like
cron.d jobs with systemd specific
things like timer units. Not having anything on best practices
invites people to do things like the Certbot packagers have done,
where on systemd systems errors from automatic Certbot renewal
attempts mostly vanish instead of actually being clearly communicated
to the administrator.
(You cannot expect people to carefully read all of the way through all of the systemd reference documentation and assemble a perfect picture of how their units will operate and what the implications of that are. That is simply too complex for people to keep full track of, and anyway people don't work that way outside of very rare circumstances.)
Systemd timer units have the unfortunate practical effect of hiding errors
We've switched over to using Certbot
as our Let's Encrypt. As packaged for
Ubuntu in their PPA, this is
set up as a modern systemd-based package. In particular, it uses
a systemd timer unit to
trigger its periodic certificate renewal checks, instead of a cron
job (which would be installed as a file in
weekend, the TLS certificates on one of our machines silently failed
to renew on schedule (at 30 days before it would expire, so this
was not anywhere close to a crisis).
Upon investigation, we discovered a setup issue that had caused
Certbot to error out (and then fixed it). However, this is not a
new issue; in fact, Certbot has been reporting errors since October
22nd (every time
certbot.service was triggered from
which is twice a day). That we hadn't heard about them points out
a potentially significant difference between cron jobs and systemd
timers, which is that cron jobs email you their errors and output,
but systemd timers quietly swallow all errors and output into the
systemd journal. This is a significant operational difference in
practice, as we just found out.
(Technically it is the systemd service unit associated with the timer unit.)
Had Certbot been using a cron job, we would have gotten email on the morning of October 22nd when Certbot first found problems. But since it was using a systemd timer unit, that error output went to the journal and was effectively invisible to us, lost within a flood of messages that we don't normally look at and cannot possibly routinely monitor. We only found out about the problem when the symptoms of Certbot not running became apparent, ie when a certificate failed to be renewed as expected.
Unfortunately there's no good way to fix this, at least within
systemd. The systemd.exec
setting has many options but none of them is 'send email to', and
I don't think there's any good way to add mailing the output with
a simple drop-in (eg, there is no option for 'send standard output
and standard error through a pipe to this other command'). Making
certbot.service send us email would require a wholesale replacement
of the command it runs, and at that point we might as well disable
the entire Certbot systemd timer setup and supply our own cron job.
(We do monitor the status of some systemd units through Prometheus's
host agent, so perhaps
we should be setting an alert for
certbot.service being in a
failed state. Possibly among other
.service units for important
timer units, but then we'd have to hand-curate that list as it
evolves in Ubuntu.)
PS: I think that you can arrange to get emailed if
fails, by using a drop in to add an '
that starts a unit that sends email when triggered. But I don't
think there's a good way to dig the actual error messages from the
most recent attempt to start the service out of the journal, so the
email would just be '
certbot.service failed on this host, please
come look at the logs to see why'. This is an improvement, but it
isn't the same as getting emailed the actual output and error
messages. And I'm not sure if
OnFailure= has side effects that
would be undesirable.
Chrony has been working well for us (on Linux, where we use it)
We have a variety of machines around here that run NTP servers, for various reasons. In the beginning they all ran some version of the classic NTP daemon, NTPD, basically because that was your only option and was what everyone provided. Later, OpenBSD changed over to OpenNTPD and so our OpenBSD machines followed along as they were upgraded. Then various Linuxes started switching their default NTP daemon to chrony, and eventually that spread to our usage (first for me personally and then for our servers). These days, when we need to set up a NTP daemon on one of our Ubuntu machines, we reach for chrony. It's what we use on our Ubuntu fileservers and also on an additional machine that we use to provide time to firewalls that are on one of our isolated management subnets.
At the moment this means we have three different NTP daemon implementations running in our environment. An assortment of OpenBSD machines of various versions run various versions of OpenNTPD, a small number of CentOS 7 machines run NTPD version '4.2.6p5' (plus whatever modifications Red Hat has done), and a number of Ubuntu machines run chrony. This has given us some interesting cross comparisons of how all of these work for us in practice, and the quick summary is that chrony is the least troublesome of the three implementations.
Our experience with the CentOS 7 NTPD is that it takes a surprisingly
long time after the daemon is started or restarted (including from
a system reboot) for the daemon to declare that it has good time.
Chrony seems to synchronize faster, or at least be more willing to
declare that it has good time (since what we get to see is mostly
what chrony reports through SNTP).
Chrony also appears to update the system clock the most frequently
out of these three NTP implementations, which turns out to sometimes
(I don't want to draw any conclusions from our OpenNTPD experience, since our primary experience is with versions that are many years out of date by now.)
I do mildly wish that Linux distributions could agree on where to
put chrony's configuration file; Ubuntu puts it in
while Fedora just puts it in
/etc. But this only affects me, since
all of our servers with chrony are Ubuntu (although we may someday
get some CentOS 8 servers, which will presumably follow Fedora
(Chrony also has the reassuring property that it will retry failed DNS lookups. Normally this is not an issue for us, but we've had two power failures this year where our internal DNS infrastructure wasn't working afterward until various things got fixed. Hopefully this isn't a concern for most people.)
Netplan's interface naming and issues with it
Back in March, I wrote an entry about our problem with Netplan and routes on Ubuntu 18.04. In a comment on the entry, Trent Lloyd wrote a long and quite detailed reply that covered how netplan actually works here. If you use Netplan to any deep level, it is well worth reading in whole. My short and perhaps inaccurate summary is that Netplan is mostly a configuration translation layer on top of networkd, and its translation is relatively blind and brute force. This straight translation then puts limits on what alterations and matchings you can do, because of how Netplan will translate these to networkd directives and how they will work (or not work).
One of the things that this creates is a confusing interface naming problem. Suppose that you have a standardly created Netplan YAML file that looks like this:
network: version: 2 renderer: networkd ethernets: eno1: addresses: [...]
eno1 looks like it is an interface name, but it is actually
two things at once; it is both a Netplan section name (this is my
name for it; Netplan generally calls it a 'device name') and an
network interface name. This section will cause Netplan to create
a file /run/systemd/network/10-netplan-eno1.network (where eno1 is
being used as a section name) that will start out with:
My original problem with routes doesn't actually require us to attach routes to an interface by name, as I thought when I wrote the entry. Instead it requires us to attach routes to a Netplan section by name, and it is just that Ubuntu creates a Netplan configuration where the two are silently the same.
(This split is also part of my confusion over what was possible with netplan wildcards. Netplan wildcards are for matching interface names, not section names. Because of how Netplan creates networkd configuration files and how networkd works, all things that are going to apply to a given interface must have the same section name, as I understand the situation.)
Trent Lloyd ends his comment (except for a parenthetical) by asking:
[...] Perhaps we should look at changing the default configurations to show a functional 'name' so that this kind of task is more obvious to the average user?
I endorse this. I think that it would make things clearer and simpler if there was a visible split in the default configuration between the section name and the interface name, so that my previous example would be:
network: version: 2 renderer: networkd ethernets: mainif: match: name: eno1 addresses: [...]
This is more verbose for a simple case, but that is the YAML bed that Netplan has decided to lie in.
This would make it possible to write generic Netplan rules that applied to your main interface regardless of what it was called, and provide silent guidance for what I now feel are the best practices for any additional interfaces you might later set up.
(Then it would be good to document the merging rules for sections,
such as that you absolutely have to use '
mainif:' (or whatever) for
everything that you are going to merge together and there is no wildcard
matching on that level. In general the Netplan documentation suffers
badly from not actually describing what is actually going on; since what
is actually going on strongly affects what you can do and what will and
won't work, this is a serious issue.)
Another approach would be to allow defining a Netplan level 'section alias', so your section would still be called 'eno1' but it could have the alias of 'mainif', and then any other Netplan configuration for 'mainif' would be folded in to it when Netplan wrote out the networkd configuration files that actually do things.
PS: Since Netplan has two backends, networkd and NetworkManager, your guess is as good as mine for how this would get translated in a NetworkManager based setup. This uncertainty is one of the problems of making Netplan so tightly coupled to its backend in what I will politely call an underdocumented way.
PPS: None of this changes my general opinion of Netplan, which is that I hope it goes away.
The DBus daemon and out of memory conditions (and systemd)
We have a number of systems where for reasons beyond the scope of
this entry, we enable strict overcommit. In
this mode, when you reach the system's memory limits the Linux
kernel will deny memory allocations but usually not trigger the
OOM killer to terminate processes. It's up
to programs to deal with failed memory allocations as best they
can, which doesn't always go very well. In our current setup on the
most common machines we operate this way, we've set the
vm.admin_reserve_kbytes sysctl to reserve enough space for root
so that most or all of our system management scripts keep working
and we at least don't get deluged in email from cron about jobs
failing. This mostly works.
(The sysctl is documented in vm.txt.)
Recently several of these machines hit an interesting failure mode
that required rebooting them, even after the memory usage had
finished. The problem is DBus, or more specifically the DBus daemon.
The direct manifestation of the problem is that
an error message:
dbus-daemon: [system] dbus-daemon transaction failed (OOM), sending error to sender inactive
After this error message is logged, attempts to do certain sorts of systemd-related DBus operations hang until they time out (if the software doing them has a timeout). Logins over SSH take quite a while to give you a shell, for example, as they fail to create sessions:
pam_systemd(sshd:session): Failed to create session: Connection timed out
The most relevant problem for us on these machines is that attempts to query metrics from the Prometheus host agent start hanging, likely because we have it set to pull information from systemd and this is done over DBus. Eventually there are enough hung metric probes so that the host agent starts refusing our attempts immediately.
The DBus daemon is not easy to restart (systemd will normally refuse
to let you do it directly, for example), so I haven't found any
good way of clearing this state. So far my method of recovering a
system in this state is to reboot it, which I generally have to do
reboot -f' because a plain '
reboot' hangs (it's probably
trying to talk to systemd over DBus).
I believe that part of what creates this issue is that the DBus
daemon is not protected by
sysctl specifically reserves space for UID 0 processes, but
dbus-daemon doesn't run as UID 0; it runs as its own UID (often
messagebus), for good security related reasons. As far as I know,
there's no way to protect an arbitrary UID through
vm.admin_reserve_kbytes; it specifically applies only to processes
that hold a relatively powerful Linux security capability,
cap_sys_admin. And unified cgroups (cgroup v2) don't have
a true guaranteed memory reservation, just a best effort one (and
we're using cgroup v1 anyway, which doesn't have anything here).
We're probably making this DBus issue much more likely to happen by having the Prometheus host agent talk to systemd, since this generates DBus traffic every time our Prometheus setup pulls host metrics from the agent (currently, every 15 seconds). At the same time, the systemd information is useful to find services that are dead when they shouldn't be and other problems.
(It would be an improvement if the Prometheus host agent would handle this sort of DBus timeout during queries, but that would only mean we got host metrics back, not that DBus was healthy again.)
PS: For us, all of this is happening on Ubuntu 18.04 with their version of systemd 237 and dbus 1.12.2. However I suspect that this isn't Ubuntu specific. I also doubt that this is systemd specific; I rather suspect that any DBus service using the system bus is potentially affected, and it's just that the most commonly used ones are from systemd and its related services.
(In fact on our Ubuntu 18.04 servers there doesn't seem to be much on the system bus apart from systemd related things, so if there are DBus problems at all, it's going to be experienced with them.)
Ubuntu LTS is (probably) still the best Linux for us and many people
I write a certain amount of unhappy things about Ubuntu here. This is not because I hate Ubuntu, contrary to what it may appear like; I don't write about things that I hate, because I try to think about them as little as possible (cf spam). Ubuntu is a tool for us, and I actually think it is a good tool, which is part of why we use it and keep using it. So today I'm going to write about the attractions of Ubuntu, specifically Ubuntu LTS, for people who want to get stuff done with their servers and for their users without too much fuss and bother (that would be us).
In no particular order:
- It has a long support period, which reduces churn and the make work
of rebuilding and testing a service that is exactly the same
except on top of a new OS and a new version of packages. We
routinely upgrade many of our machines every other LTS version
(which means reinstalling them),
which means we get around four years of life out of a given
install (I wrote about this years ago here).
(We have a whole raft of machines that were installed in the summer and fall of 2016, when 16.04 was fresh, and which will be rebuilt in the summer and fall of 2020 on 20.04.)
- It has a regular and predictable release schedule, which is good for
our planning in various ways. This includes figuring out if we
want to hold off on building a new service up right now so that
we can wait to base it on the next LTS release.
(This regularity and predictability is one reason our Linux ZFS fileservers are based on Ubuntu instead of CentOS. 18.04 was there at the time, and CentOS 8 was unknown and uncertain.)
- It has a large collection of packages (which mostly work, despite my
grumbling). Building local copies of
software is a pain in the rear and we want to do it as little as
possible, ideally not at all.
- It has relatively current software and refreshes its software on a
regular basis (every two years, due to the LTS release cadence),
which lets us avoid the problems caused by using zombie Linux
distributions. This regular refresh is part
of the appeal of the regular and predictable release schedule.
- Since it's popular, it's well supported by software (often along
with Debian). For two examples that are relevant to us, Grafana
.debs and Certbot is available through a PPA.
- Debian has made a number of good, sysadmin friendly decisions about how to organize configuration files for applications and Ubuntu has inherited them. For example, they have the right approach to Apache configuration.
I don't know of another Linux distribution that has all of these good things, and that includes both Debian and CentOS (despite what I said about Debian only a year ago). CentOS has very long support but not predictable releases and current software, and even with EPEL's improved state it may not have the package selection. Debian has unpredictable releases and a shorter support period.
(As a purely pragmatic matter we're unlikely to switch to something that is simply about as good as Ubuntu, even if it existed. Since switching or using two Linuxes has real costs, the new thing would have to be clearly better. We do use CentOS for some rare machines because the extremely long support period is useful enough for them.)
The Ubuntu package roulette
Today I got to re-learn a valuable lesson, which is that just because something is packaged in Ubuntu doesn't mean that it actually works. Oh, it's probably not totally broken, but there's absolutely no guarantee that the package will be fully functional or won't contain problems that cause cron to email you errors at least once a day because of an issue that's been known since 2015.
I know the technical reasons for this, which is that Ubuntu pretty much blindly imports packages from Debian and Debian is an anarchy where partially broken packages can rot quietly. Possibly completely non-functional packages can rot too, I don't actually know how Debian handles that sort of situation. Ubuntu's import is mostly blind because Ubuntu doesn't have the people to do any better. This is also where people point out that the package in question is clearly in Ubuntu's universe repository, which the fine documentation euphemistically describes as 'community maintained'.
(I have my opinions on Ubuntu's community nature or lack thereof, but this is not the right entry for that.)
All of this doesn't matter; it is robot logic. What matters is the
experience for people who attempt to use Ubuntu packages. Once you
enable universe (and you probably will),
Ubuntu's command line package management tools don't particularly
make it clear where your packages live (not in the way that Fedora's
dnf clearly names the repository that every package you install
will come from, for example). It's relatively difficult to even see
this after the fact for installed packages. The practical result
is that an Ubuntu package is an Ubuntu package, and so most random
packages are a spin on the roulette wheel with an uncertain bet.
Probably it will pay off, but sometimes you lose.
(And then if you gripe about it, some people may show up to tell you that it's your fault for using something from universe. This is not a great experience for people using Ubuntu, either.)
I'm not particularly angry about this particular case; this is why I set up test machines. I'm used to this sort of thing from Ubuntu. I'm just disappointed, and I'm sad that Ubuntu has created a structure that gives people bad experiences every so often.
(And yes, I blame Ubuntu here, not Debian, for reasons beyond the scope of this entry.)
Understanding when to use and not use the
-F option for
A while back I wrote some notes on understanding how to use
flock(1), but those notes omitted a potentially
important option, partly because that option was added somewhere
in between version util-linux version 2.27.1 (which is what Ubuntu
16.04 has) and version 2.31.1 (Ubuntu 18.04). That is the
option, which is described in the manpage as:
Do not fork before executing command. Upon execution the flock process is replaced by command which continues to hold the lock. [...]
This option is incompatible with
-o, as mentioned in the manpage.
The straightforward situation where you very much want to use
is if you're trying to run a program that reacts specially to
Control-C. If you run '
program', there will still be a
flock process, it will get
Control-C and exit, and undesirable things will probably happen.
If you use '
flock -F program', there is only the program and it
can react properly to Control-C without any side effects on other
(I'm assuming here that if you ran
flock and the program from
inside a shell script, you ran it with '
exec flock ...'. If you're
in a situation where you have to do things in your shell script
after the program finishes, you can't solve the Control-C problem
just with this.)
However, there is also a situation where you don't want to use
and to see it we need to understand how the
flock lock is continued
to be held by the command. As covered in the first note,
flock(1) works through
flock(2), which means
that the lock is 'held' by having the
flock()'d file descriptor
still be open. Most programs are indifferent to inheriting extra
file descriptors, so this additional descriptor from
hangs around, keeping the lock held. However, some programs actively
seek out and close file descriptors they may have inherited, often
to avoid leaking them into child processes. If you use '
with such a program, your lock will be released prematurely (before
the program exits) when the program does this.
(The existence of such programs is probably part of why
is not the default behavior.)
Sidebar: Faking '
flock -F' if you don't have it
If you have a shell script that has to run on Ubuntu 16.04 and you
need this behavior, you can fake it with '
flock -o'. It goes like
exec 9 >>/some/lockfile flock -x -n 9 || exit 0 exec program ...
flock -F' locks some file descriptor and then exec's the
program, we can imitate it by doing the same manually; we pick a
random file descriptor number, get the shell to open a file on that
file descriptor and leave it open,
flock that file descriptor,
and then have the shell exec our program. Our program will inherit
the locked fd 9 and the lock remains for as long as fd 9 is open.
When the program exits, all of its file descriptors will be closed,
including fd 9, and the lock will be released.