Wandering Thoughts archives

2023-12-21

Systemd will block a service's start if you manually restart it too fast

Over on the Fediverse, I said something:

Recently I learned that if you manually restart a systemd service too often (with 'systemctl restart ...'), systemd will by default stop starting it:

<x>.service: Start request repeated too quickly.
<x>.service: Failed with result 'start-limit-hit'.
Failed to start <x>.service - Whatever it is.

Why would you do that, you ask? Well, consider scripts that update some data file and do a 'systemctl restart ...' to make the daemon notice it. Now try to do a bunch of updates all at once.

The traditional way to have systemd stop starting a service is for it to have a 'Restart=' setting with no restart delay, and then to fail on startup. Sometimes it's failing on start because your machine is out of memory; sometimes it's because you've made an error in its configuration files. However, if you read the actual documentation for StartLimitIntervalSec and StartLimitBurst, they don't say they're limited to the 'Restart=' case. Here's what they say, emphasis mine:

Configure unit start rate limiting. Units which are started more than burst times within an interval time span are not permitted to start any more. [...]

These configuration options are particularly useful in conjunction with the service setting Restart= (see systemd.service(5)); however, they apply to all kinds of starts (including manual), not just those triggered by the Restart= logic.

The way you clear this condition is also sort of mentioned in that section of the manual page; 'systemctl reset-failed' will reset this counter and allow you to immediately (re)start the unit again. If you want, you can restrict the resetting to just your particular unit.

The default limits for this rate limiting are likely visible in the commented out default values in /etc/systemd/system.conf. The normal standard values are five restarts in ten seconds (cf) and it appears that neither Fedora nor Ubuntu change these defaults, so that's probably what you'll see.

You might wonder how you get yourself into this situation in the first place. Suppose that you have a script to add an entry to a DHCP configuration file, which as part of activating the entry has to restart the DHCP server (because it doesn't support on the fly configuration reloading). Now suppose you have a bunch of entries to add; you might write a script (or a for loop) to effectively bulk add them as fast as the commands can run. When you run that script, you'll be restarting the DHCP server repeatedly, as fast as possible, and it won't take too long before you trigger systemd's default limit (since all you need with the default limits is to go through the whole thing in less than two seconds per invocation).

If you're doing this in a script, the two solutions I see are to always make the script sleep for three seconds or so after a restart, or to run 'systemctl reset-failed <service>' either at the end of the script or before you start doing any 'systemctl restart's.

(I'm not sure which of these we'll adopt.)

linux/SystemdStallAfterTooFastRestarts written at 22:44:58;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.