== Setting up Linux fair share CPU scheduling with systemd and cgroup v2 These days, modern versions of systemd on modern Linuxes, including the recently released Ubuntu 22.04, are using [[unified cgroups (cgroup v2) https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html]]. How to enable fair share CPU scheduling in this environment is different than [[how it used to work with systemd using cgroup v1 SystemdFairshareScheduling]]. How this works currently on Ubuntu 22.04 with systemd 249 is sufficiently 'clever' that it may well change in the future. [[In cgroup v2, fair share CPU scheduling for a cgroup is enabled by enabling the 'cpu' controller in that cgroup CgroupV2FairShareScheduling]]. However, systemd doesn't provide any good direct way to enable specific cgroup controllers; instead it seems to enable them when it thinks that it needs them due to some property that you set. In the case of the cpu controller, you get it enabled in a specific cgroup by setting [[_CPUWeight_ https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#CPUWeight=weight]] to some value on a child unit. Normally you'll want to set _CPUWeight_ to the default value of '100', so that that child unit and all of its peers get a predictable value for ((cpu.weight)) that's the same. If you want to enable fair share scheduling across users, you need to set _CPUWeight_ on some user-.slice so that the user.slice cgroup gets the cpu controller enabled. Of course, this requires such a user-.slice to exist in the first place, which generally means that you're going to need to hook into session setup, for example through [[``pam_exec'' https://man7.org/linux/man-pages/man8/pam_exec.8.html]]. As before, I believe that if everyone logs off, user.slice itself will disappear and so you'll have to re-establish this setting the next time around. Otherwise, as long as user.slice persists, the cpu controller stays enabled (as far as I know). Locally, we are using '_systemctl --runtime set-property ..._' to set _CPUWeight_ only non-permanently. Otherwise I fear we would wind up with a thicket of settings for various users as they're the first ones to log in this time around. (In earlier versions of systemd on cgroup v1, it was sufficient to turn CPU accounting on on some user slice, or sometimes a few of them. These days CPU accounting seems to default to on, without enabling the 'cpu' controller, and turning it on again doesn't do anything. Possibly this is because cgroup v2 seems to track CPU usage of cgroups even if the 'cpu' controller isn't enabled, so systemd decides to say that CPU accounting is always on.) If for some reason you want to enable fair share scheduling across system services, you can pick one that's always going to be there and set _CPUWeight=100_ permanently on it. I don't know how you'd arrange to set up fair share scheduling for virtual machines and containers (under machine.slice); possibly you could set permanent CPUWeight properties on all of your long-term VMs and containers, so that at least one of them would be active and trigger machine.slice having the cpu controller enabled on it. (It would be cleaner if systemd provided a direct way to enable a particular resource controller in a unit like system.slice, user.slice, or machine.slice, but so it goes.) If you're enabling fair share scheduling for users (ie, for children of user.slice) and you want system services to get CPU priority instead of everything under system.slice being fair share scheduled against everything under user.slice (ie, each collectively getting half of the available CPU), then you'll need to set an explicit _CPUWeight_ for either system.slice or user.slice. It's probably easier to do this for system.slice, since it's always going to be there. I'm not sure what value I'd set. (I suppose you can think of it in terms of how many of the machine's CPUs you want all users to be able to use under high CPU contention. For example, if your have has four CPUs and you'd like system services to collectively get three of them under load, you can set system.slice's _CPUWeight_ to 300. This assumes you don't have VMs that are also contending for CPU under machine.slice.)