Systemd will block a service's start if you manually restart it too fast
Over on the Fediverse, I said something:
Recently I learned that if you manually restart a systemd service too often (with 'systemctl restart ...'), systemd will by default stop starting it:
<x>.service: Start request repeated too quickly. <x>.service: Failed with result 'start-limit-hit'. Failed to start <x>.service - Whatever it is.Why would you do that, you ask? Well, consider scripts that update some data file and do a 'systemctl restart ...' to make the daemon notice it. Now try to do a bunch of updates all at once.
The traditional way to have systemd stop starting a service is for it to have a 'Restart=' setting with no restart delay, and then to fail on startup. Sometimes it's failing on start because your machine is out of memory; sometimes it's because you've made an error in its configuration files. However, if you read the actual documentation for StartLimitIntervalSec and StartLimitBurst, they don't say they're limited to the 'Restart=' case. Here's what they say, emphasis mine:
Configure unit start rate limiting. Units which are started more than burst times within an interval time span are not permitted to start any more. [...]
These configuration options are particularly useful in conjunction with the service setting
Restart=
(see systemd.service(5)); however, they apply to all kinds of starts (including manual), not just those triggered by the Restart= logic.
The way you clear this condition is also sort of mentioned in that
section of the manual page; 'systemctl reset-failed
' will reset
this counter and allow you to immediately (re)start the unit again.
If you want, you can restrict the resetting to just your particular
unit.
The default limits for this rate limiting are likely visible in the commented out default values in /etc/systemd/system.conf. The normal standard values are five restarts in ten seconds (cf) and it appears that neither Fedora nor Ubuntu change these defaults, so that's probably what you'll see.
You might wonder how you get yourself into this situation in the first place. Suppose that you have a script to add an entry to a DHCP configuration file, which as part of activating the entry has to restart the DHCP server (because it doesn't support on the fly configuration reloading). Now suppose you have a bunch of entries to add; you might write a script (or a for loop) to effectively bulk add them as fast as the commands can run. When you run that script, you'll be restarting the DHCP server repeatedly, as fast as possible, and it won't take too long before you trigger systemd's default limit (since all you need with the default limits is to go through the whole thing in less than two seconds per invocation).
If you're doing this in a script, the two solutions I see are to always make the script sleep for three seconds or so after a restart, or to run 'systemctl reset-failed <service>' either at the end of the script or before you start doing any 'systemctl restart's.
(I'm not sure which of these we'll adopt.)
|
|