If you can, you should use flock(1) for shell script locking

August 2, 2019

I have in the past worked out and used complicated but portable approaches for doing locking in shell scripts. These approaches get much more complicated if processes can die abruptly without undoing their lock. You can generally arrange things so that your locks are cleared if the entire machine reboots, but that's about it as far as simple approaches go. Sometimes this is what you want, but often it isn't.

As a result of a series of issues with our traditional shell script locking, I have been more and more moving to using Linux's flock(1) when I can, which is to say for scripts that only have to run on our Linux machines (which is almost all of our machines today). flock is sufficiently useful and compelling here that I might actually port it over to other Unixes if we had to integrate such systems into our current Linux environment.

(Anything we want to use should have flock(2), and hopefully that's the only thing the flock program really depends on.)

There are two strongly appealing sides to flock. The first is that it provides basically the usage that we want; in normal operation, it runs something with the lock held and releases the lock when the thing exits. The second is that it automatically releases the lock if something goes wrong, because flock(2) locks evaporate when the file descriptor is closed.

(The manpage's description of '-o' may make you confused about this; what flock means is that the open file descriptor of the lock is not inherited by the command flock runs. Normally you want the command to inherit the open file descriptor, because it means that so long as any process involved is still running, the lock is held, even if flock itself gets killed for some reason.)

Generally I want to use 'flock -n', because we mostly use locking for 'only one of these should ever be running at once'; if the lock is held, a previous cron job or whatever is still active, so the current one should just give up.

We have one script using a traditional shell script approach to locking that I very carefully and painfully revised to be more or less safe in the face of getting killed abruptly. Since it logs diagnostics if it detects a stale lock, there's a certain amount of use in having it around, but I definitely don't want to ever have to do another script like it, and it's a special case in some other ways that might make it awkward to use with flock. The experience of revising that script is part of what pushed me very strongly to using flock for others.

Written on 02 August 2019.
« Getting NetworkManager to probably verify TLS certificates for 802.1x networks
Link: ASCII table and history (Or, why does Ctrl+i insert a Tab in my terminal?) »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Aug 2 22:06:18 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.