How to get per-user fair share scheduling on Ubuntu 16.04 (with systemd)
When I wrote up imposing temporary CPU and memory limits on a user on Ubuntu 16.04, I sort of discovered that I had turned on per-user fair share CPU scheduling as a side effect, although I didn't understand exactly how to do this deliberately. Armed with a deeper understanding of how to tell if fair share scheduling was on, I've now done a number of further experiments and I believe I have definitive answers. This applies only to Ubuntu 16.04 and its version of systemd as configured by Ubuntu; it doesn't seem to apply to, for example, a stock Fedora 26 system.
To enable per user fair share CPU scheduling, it appears that you must do two things:
- First, set
CPUAccounting=true
onuser.slice
. You can do this temporarily with 'systemctl --runtime set-property
' or permanently enable it. - Second, arrange to have
CPUAccounting=true
set on an active user slice. If you do this temporarily with 'systemctl --runtime
', the user must be logged in with some sort of session at the time. If you do this permanently, nothing happens until that user logs in and systemd creates theiruser-${UID}.slice
slice.
Once you've done both of these, all future (user) sessions from any
user will have their processes included in per-user fair share
scheduling. If you used 'systemctl --runtime
' on a user-${UID}.slice
,
it doesn't matter if that user logs completely out and their slice
goes away; the fair share scheduling sticks despite this. However,
fair-share scheduling goes away if all users log out and user.slice
is removed by systemd. You need at least one remaining user session
at all times to keep user.slice
still in use (a detached screen
session will do).
If you want to force existing processes to be subject to per-user
fair share scheduling, you must arrange to set CPUAccounting=true
on all current user scopes:
for i in $(systemctl -t scope list-units | awk '{print $1}' | grep '^session-.*\.scope$'); do systemctl --runtime set-property $i CPUAccounting=true done
This creates a slightly different cgroup hierarchy than you'll get
from completely proper fair share scheduling, but the differences
are probably unimportant in practice. In regular fair share scheduling,
all processes from the same user are grouped together under
user.slice/user-${UID}.slice
, so they contend evenly with each
other. When you force scopes this way, processes get grouped into
their scopes, so they go in
user.slice/user-${UID}.slice/session-<blah>.scope
; as a result,
a user's scopes also are fair-share scheduled against each other.
This only applies to current processes and scopes; as users log out
and then back in again, their new processes will be all grouped
together.
If you have a sufficiently small number of users who will log in
to your machines and run CPU-consuming things, it's feasible to
create permanent settings for each of them with 'systemctl
set-property user-${UID}.slice CPUAccounting=true
'. If you have
lots of users, as we do, this is infeasible; if nothing else, your
/etc/systemd/system
directory would wind up utterly cluttered.
This means that you have to do it on the fly (and then do it again
if all user sessions ended and systemd deleted user.slice
).
This is where we run into an important limitation of per user fair
share scheduling on a normally configured Ubuntu 16.04. As we've
set fair-share scheduling up, this only applies to processes that
are under user.slice
; system processes are not fair-share scheduled.
It turns out that user cron jobs don't run under user.slice
and
so are not fair-share scheduled. All processes created by user
cron entries wind up all grouped together under cron.service
;
there is no per-user separation and nothing is put under user slices.
(It's possible that you can change this with PAM magic, but this is how a normal Ubuntu 16.04 machine behaves.)
I discovered this because I had the clever idea that I could use a
root @reboot
/etc/cron.d
entry to set things on user.slice
and user-0.slice
shortly after the system booted. Attempting to
do this led to the discovery that neither slice actually existed
when my @reboot
job ran, and that my process was under cron.service
instead. As far as I can see there's no way around this; there just
doesn't seem to be a systemd command that will run a command for
you under a user slice.
(If there was, you could make a root @reboot
crontab that ran the
necessary systemctl
commands and then didn't exit, so there would
always be an active user slice so that user.slice
wouldn't get
removed by systemd.)
PS: My solution was to wrap up all of these steps into a shell script that we can run if we need to turn on fair-share scheduling on some machine because a bunch of users are contending over it. Such an on demand, on the fly solution is good enough for our case (even if it doesn't include crontab jobs, which is a real pity for some machines).
Comments on this page:
|
|