Wandering Thoughts archives

2016-05-02

How I think you set up fair share scheduling under systemd

When I started writing this entry, I was going to say that systemd automatically does fair share scheduling between and describe the mechanisms that make that work. However, this turns out to be false as far as I can see; systemd can easily do fair share scheduling, but it doesn't do this by default.

The basic mechanics of fair share scheduling are straightforward. If you put all of each user's processes into a separate cgroup it happens automatically. Well. Sort of. You see, it's not good enough to put each user into a separate cgroup; you have to make it a CPU accounting cgroup, and a memory accounting cgroup, and so on. Systemd normally puts all processes for a single user under a single cgroup, which you can see in eg systemd-cgls output and by looking at /sys/fs/cgroup/systemd/user.slice, but by default it doesn't enable any CPU or memory or IO accounting for them. Without those enabled, the traditional Linux (and Unix) behavior of 'every process for itself' still applies.

(You can still use systemd-run to add your own limits here, but I'm not quite sure how this works out.)

Now, I haven't tested the following, but from reading the documentation it seems that what you need to do to get fair share scheduling for users is to enable DefaultCPUAccounting and DefaultBlockIOAccounting for all user units by creating an appropriate file in /etc/systemd/user.conf.d, as covered in the systemd-user.conf manpage and the systemd.resource-control manpage. You probably don't want to turn this on for system units, or at least I wouldn't.

I don't think there's any point in turning on DefaultMemoryAccounting. As far as I can see there is no kernel control that limits a cgroup's share of RAM, just the total amount of RAM it can use, so cgroups just can't enforce a fair share scheduling of RAM the way you can for CPU time (unless I've overlooked something here). Unfortunately, missing fair share memory allocation definitely hurts the overall usefulness of fair share scheduling; if you want to insure that no user can take an 'unfair' share of the machine, it's often just as important to limit RAM as CPU usage.

(Having discovered this memory limitation, I suspect that we won't bother trying to enable fair share scheduling in our Ubuntu 16.04 installs.)

linux/SystemdFairshareScheduling written at 23:11:23; Add Comment

The state of supporting many groups over NFS v3 in various Unixes

One of the long standing limits with NFSv3 is that the protocol only uses 16 groups; although you can be in lots of groups on both the client and the server, the protocol itself only allows the client to tell the server about 16 of them. This is a real problem for places (like us) who have users who want or need to be in lots of groups for access restriction reasons.

For a long time the only thing you could was shrug and work around this by adding and removing users from groups as their needs changed. Fortunately this has been slowly changing, partly because people have long seen this as an issue. Because the NFS v3 protocol is fixed, everyone's workaround is fundamentally the same: rather than taking the list of groups from the NFS request itself, the NFS server looks up what groups the user is in on the server.

(In theory you could merge the local group list with the request's group list, but I don't think anyone does that; they just entirely overwrite the request.)

As far as I know, the current state of affairs for various Unixes that we care about runs like this:

I care about how widespread the support for this is because we've finally reached a point where our fileservers all support this and so we could start putting people in more than 16 groups, something that various parties are very much looking forward to. So I wanted to know whether officially adding support for this would still leave us with plenty of options for what OS to run on future fileservers, or whether this would instead be a situation more like ACLs over NFS. Clearly the answer is good news; basically anything we'd want to use as a fileserver OS supports this, even the unlikely candidate of Oracle Solaris.

(I haven't bothered checking out the state of support for this on the other *BSDs because we're not likely to use any of them for an NFS fileserver. Nor have I looked at the state of support for this on dedicated NFS fileserver appliances, because I don't think we'll ever have the kind of budget or need that would make any of them attractive. Sorry, NetApp, you were cool once upon a time.)

unix/NFSManyGroupsState written at 00:46:00; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.