How I think you set up fair share scheduling under systemd
When I started writing this entry, I was going to say that systemd automatically does fair share scheduling between and describe the mechanisms that make that work. However, this turns out to be false as far as I can see; systemd can easily do fair share scheduling, but it doesn't do this by default.
The basic mechanics of fair share scheduling are straightforward.
If you put all of each user's processes into a separate cgroup it
happens automatically. Well. Sort of. You see,
it's not good enough to put each user into a separate cgroup; you
have to make it a CPU accounting cgroup, and a memory accounting
cgroup, and so on. Systemd normally puts all processes for a single
user under a single cgroup, which you can see in eg
output and by looking at
by default it doesn't enable any CPU or memory or IO accounting for
them. Without those enabled, the traditional Linux (and Unix)
behavior of 'every process for itself' still applies.
(You can still use
systemd-run to add your own limits here, but I'm not quite sure how this works
Now, I haven't tested the following, but from reading the documentation
it seems that what you need to do to get fair share scheduling for
users is to enable
for all user units by creating an appropriate file in
/etc/systemd/user.conf.d, as covered in the systemd-user.conf
and the systemd.resource-control manpage.
You probably don't want to turn this on for system units, or at least
I don't think there's any point in turning on
As far as I can see there is no kernel control that limits a cgroup's
share of RAM, just the total amount of RAM it can use, so cgroups
just can't enforce a fair share scheduling of RAM the way you can
for CPU time (unless I've overlooked something here). Unfortunately,
missing fair share memory allocation definitely hurts the overall
usefulness of fair share scheduling; if you want to insure that no
user can take an 'unfair' share of the machine, it's often just as
important to limit RAM as CPU usage.
(Having discovered this memory limitation, I suspect that we won't bother trying to enable fair share scheduling in our Ubuntu 16.04 installs.)
The state of supporting many groups over NFS v3 in various Unixes
One of the long standing limits with NFSv3 is that the protocol only uses 16 groups; although you can be in lots of groups on both the client and the server, the protocol itself only allows the client to tell the server about 16 of them. This is a real problem for places (like us) who have users who want or need to be in lots of groups for access restriction reasons.
For a long time the only thing you could was shrug and work around this by adding and removing users from groups as their needs changed. Fortunately this has been slowly changing, partly because people have long seen this as an issue. Because the NFS v3 protocol is fixed, everyone's workaround is fundamentally the same: rather than taking the list of groups from the NFS request itself, the NFS server looks up what groups the user is in on the server.
(In theory you could merge the local group list with the request's group list, but I don't think anyone does that; they just entirely overwrite the request.)
As far as I know, the current state of affairs for various Unixes that we care about runs like this:
- Linux has long supported an option to
rpc.mountdto do this, the
--manage-gidsoption. See eg Kyle Anderson's Solving the NFS 16-Group Limit Problem for more details. I have to optimistically assume that it's problem free by now, but I've never tried it out.
- Illumos and thus OmniOS gained support for this relatively recently.
There is some minor system configuration required, which I've
covered in Allowing people to be in more than 16 groups with
an OmniOS NFS server. We have
tested this but not yet run it in production.
- FreeBSD has apparently supported this only since 10.3. To enable
it, you run
nfsuserdwith the new
-manage-gidsflag, per the 10.3
nfsuserdmanpage. I suspect that you need to be using what FreeBSD calls the new NFS server with v4 support, not the old one; see the
- Oracle Solaris 11.1 apparently supports this, as reported by David Magda in a comment here. See Joerg Moellenkamp's blog entry on this.
I care about how widespread the support for this is because we've finally reached a point where our fileservers all support this and so we could start putting people in more than 16 groups, something that various parties are very much looking forward to. So I wanted to know whether officially adding support for this would still leave us with plenty of options for what OS to run on future fileservers, or whether this would instead be a situation more like ACLs over NFS. Clearly the answer is good news; basically anything we'd want to use as a fileserver OS supports this, even the unlikely candidate of Oracle Solaris.
(I haven't bothered checking out the state of support for this on the other *BSDs because we're not likely to use any of them for an NFS fileserver. Nor have I looked at the state of support for this on dedicated NFS fileserver appliances, because I don't think we'll ever have the kind of budget or need that would make any of them attractive. Sorry, NetApp, you were cool once upon a time.)