How I think you set up fair share scheduling under systemd
When I started writing this entry, I was going to say that systemd automatically does fair share scheduling between and describe the mechanisms that make that work. However, this turns out to be false as far as I can see; systemd can easily do fair share scheduling, but it doesn't do this by default.
The basic mechanics of fair share scheduling are straightforward.
If you put all of each user's processes into a separate cgroup it
happens automatically. Well. Sort of. You see,
it's not good enough to put each user into a separate cgroup; you
have to make it a CPU accounting cgroup, and a memory accounting
cgroup, and so on. Systemd normally puts all processes for a single
user under a single cgroup, which you can see in eg
output and by looking at
by default it doesn't enable any CPU or memory or IO accounting for
them. Without those enabled, the traditional Linux (and Unix)
behavior of 'every process for itself' still applies.
(You can still use
systemd-run to add your own limits here, but I'm not quite sure how this works
Now, I haven't tested the following, but from reading the documentation
it seems that what you need to do to get fair share scheduling for
users is to enable
for all user units by creating an appropriate file in
/etc/systemd/user.conf.d, as covered in the systemd-user.conf
and the systemd.resource-control manpage.
You probably don't want to turn this on for system units, or at least
I don't think there's any point in turning on
As far as I can see there is no kernel control that limits a cgroup's
share of RAM, just the total amount of RAM it can use, so cgroups
just can't enforce a fair share scheduling of RAM the way you can
for CPU time (unless I've overlooked something here). Unfortunately,
missing fair share memory allocation definitely hurts the overall
usefulness of fair share scheduling; if you want to insure that no
user can take an 'unfair' share of the machine, it's often just as
important to limit RAM as CPU usage.
(Having discovered this memory limitation, I suspect that we won't bother trying to enable fair share scheduling in our Ubuntu 16.04 installs.)
The state of supporting many groups over NFS v3 in various Unixes
One of the long standing limits with NFSv3 is that the protocol only uses 16 groups; although you can be in lots of groups on both the client and the server, the protocol itself only allows the client to tell the server about 16 of them. This is a real problem for places (like us) who have users who want or need to be in lots of groups for access restriction reasons.
For a long time the only thing you could was shrug and work around this by adding and removing users from groups as their needs changed. Fortunately this has been slowly changing, partly because people have long seen this as an issue. Because the NFS v3 protocol is fixed, everyone's workaround is fundamentally the same: rather than taking the list of groups from the NFS request itself, the NFS server looks up what groups the user is in on the server.
(In theory you could merge the local group list with the request's group list, but I don't think anyone does that; they just entirely overwrite the request.)
As far as I know, the current state of affairs for various Unixes that we care about runs like this:
- Linux has long supported an option to
rpc.mountdto do this, the
--manage-gidsoption. See eg Kyle Anderson's Solving the NFS 16-Group Limit Problem for more details. I have to optimistically assume that it's problem free by now, but I've never tried it out.
- Illumos and thus OmniOS gained support for this relatively recently.
There is some minor system configuration required, which I've
covered in Allowing people to be in more than 16 groups with
an OmniOS NFS server. We have
tested this but not yet run it in production.
- FreeBSD has apparently supported this only since 10.3. To enable
it, you run
nfsuserdwith the new
-manage-gidsflag, per the 10.3
nfsuserdmanpage. I suspect that you need to be using what FreeBSD calls the new NFS server with v4 support, not the old one; see the
- Oracle Solaris 11.1 apparently supports this, as reported by David Magda in a comment here. See Joerg Moellenkamp's blog entry on this.
I care about how widespread the support for this is because we've finally reached a point where our fileservers all support this and so we could start putting people in more than 16 groups, something that various parties are very much looking forward to. So I wanted to know whether officially adding support for this would still leave us with plenty of options for what OS to run on future fileservers, or whether this would instead be a situation more like ACLs over NFS. Clearly the answer is good news; basically anything we'd want to use as a fileserver OS supports this, even the unlikely candidate of Oracle Solaris.
(I haven't bothered checking out the state of support for this on the other *BSDs because we're not likely to use any of them for an NFS fileserver. Nor have I looked at the state of support for this on dedicated NFS fileserver appliances, because I don't think we'll ever have the kind of budget or need that would make any of them attractive. Sorry, NetApp, you were cool once upon a time.)
I should keep and check notes even on my own little problems
I mentioned yesterday that I had a serious issue when I installed a VMWare Workstation update, going from 12.1.0 to 12.1.1. I wound up being very grumpy about it, disrupted the remaining part of my work day, filed a support request with VMWare, and so on, and eventually VMWare support came through with the cause and a workaround..
It turns out that I could have avoided all of that, because I ran into this same problem back when I upgraded to Fedora 23. At the time I did my Internet research, found the workaround, and applied it to my machine. This means that I could have proceeded straight to re-doing the workaround if I'd remembered this. Or, more likely, if I'd kept good notes on the problem then and remembered to read them this time.
We try to make and keep good notes for problems on our production systems, or even things that we run into in testing things for our work environment; we have an entire system for it. But I generally don't bother doing the same sort of thing for my own office workstation; when I find and fix problems and issues I may take some notes, but they're generally sketchy, off the cuff, and not centrally organized. And partly because of this, I often don't think to check them; I think I just assume I'm going to remember things about my own workstation (clearly this is wrong).
So, stating the obvious: I would be better off if I kept organized notes about what I had to do to fix problems and get various things going on my workstation, and put the notes into one place in some format (perhaps a directory with text files). Then I could make it a habit to look there before I do some things, or at least when I run into a problem after I do something.
Also, when I make these notes I should make them detailed, including dates and versions of what they're about. It turns out that I actually had some very sketchy notes about this problem from when I upgraded to Fedora 23 (they were some URLs that turned out to be discussions about the issue), but they didn't have a date or say 'this applied when I upgraded to Fedora 23 with VMWare 12' or anything like that. So when I stumbled over the file and skimmed it, I didn't realize that the URLs were still relevant; I skipped that because I assumed that of course it had to be outdated.
(I'm sure that when I wrote the note file in the first place I assumed that I'd always remember the context. Ha ha, silly me, I really should know better by now. Especially since I've written more than one entry here about making just that assumption and being wrong about it.)
A story of the gradual evolution of network speeds without me noticing
A long time ago I had a 28.8Kbps dialup
connection running PPP (it lasted a surprisingly long time). A couple of times I really needed to run a graphical
X program from work while I was at home, so I did '
ssh -X work'
and then started whatever program it was. And waited. And waited.
Starting and using an X program that is moving X protocol traffic
over a 28.8K link gives you a lot of time to watch the details of how
X applications paint their windows, and it teaches you patience.
It's possible, but it's something you only really do in desperation.
(I believe one of the times I did this was when I really needed to dig some detail out of SGI's graphical bug reporting and support tool while I was at home. This was back in the days before all of this was done through the web.)
Eventually I finally stepped up to DSL (around this time), although not particularly fast
DSL; I generally got 5000 Kbps down and 800 Kbps up. I experimented
with doing X over my DSL link a few times and it certainly worked,
but it still wasn't really great. Simple text stuff like
(with old school server side XLFD fonts)
did okay, but trying to run something graphical like Firefox was
still painful and basically pointless. At the time I first got my
DSL service I think that 5/.8 rate was pretty close to the best you
could get around here, but of course that changed and better and
better speeds became possible. Much like I stuck with my dialup, I
didn't bother trying to look into upgrading for a very long time.
More speed never felt like it would make much of a difference to
my Internet experience, so I took the lazy approach.
Recently various things pushed me over the edge and I upgraded my DSL service to what is about 15/7.5 Mbps. I certainly noticed that this made a difference for things like pushing pictures up to my Flickr, but sure, that was kind of expected with about ten times as much upstream bandwidth. Otherwise I didn't feel like it was any particular sea change in my home Internet experience.
Today I updated my VMWare Workstation install and things went
I'd cleverly started doing all of this relatively late in the day,
I wound up going home before VMWare had a chance to reply to the
bug report I filed about this. When I got home, I found a reply
from VMWare support that, among other things, pointed me to this
I applied the workaround, but how to test it? Well, the obvious
answer was to try firing up VMWare Workstation over my DSL link.
I didn't expect this to go very well for the obvious reasons;
VMWare Workstation definitely is a fairly graphical program, not
something simple (in X terms) like
Much to my surprise, VMWare Workstation started quite snappily. In fact, it started so fast and seemed so responsive that I decided to try a crazy experiment: I actually booted up one of virtual machines. Since this requires rendering the machine's console (more or less embedded video) I expected it to be really slow, but even this went pretty well.
Bit by bit and without me noticing, my home Internet connection had become capable enough to run even reasonably graphically demanding X programs. The possibility of this had never even crossed my mind when I considered a speed upgrade or got my 15/7.5 DSL speed upgrade; I just 'knew' that my DSL link would be too slow to be really viable for X applications. I didn't retest my assumptions when my line speed went up, and if it hadn't been for this incident going exactly like it did I might not have discovered this sea change for years (if ever, since when you know things are slow you generally don't even bother trying them).
There's an obvious general moral here, of course. There are probably other things I'm just assuming are too slow or too infeasible or whatever that are no longer this way. Assumptions may deserve to be questioned and re-tested periodically, especially if they're assumptions that are blocking you from nice things. But I'm not going to be hard on myself here, because assumptions are hard to see. When you just know something, you are naturally a fish in water. And if you question too many assumptions, you can spend all of your time verifying that various sorts of water are still various sorts of wet and never get anything useful done.
(You'll also be frustrating yourself. Spending more than a small bit of your time verifying that water is still wet is not usually all that fun.)
You should plan for your anti-spam scanner malfunctioning someday
Yesterday I mentioned that the commercial anti-spam and anti-virus system we use ran into a bug where it hung up on some incoming emails. One reaction to this is to point and laugh; silly us for using a commercial anti-spam system, we probably got what we deserved here. I think that this attitude is a mistake.
The reality is that all modern anti-spam and anti-virus systems are going to have bugs. It's basically inherent in the nature of the beast. These systems are trying to do a bunch of relatively sophisticated analysis on relatively complicated binary formats, like ZIP files, PDFs, and various sorts of executables; it would be truly surprising if all of the code involved in doing this was completely bug free, and every so often the bugs are going to have sufficiently bad consequences to cause explosions.
(It doesn't even need to be a bug as such. For example, many regular expression engines have pathological behavior when exposed to a combination of certain inputs and certain regular expressions. This is not a code bug since the RE engine is working as designed, but the consequences are similar.)
What this means is that you probably want to think ahead about what you'll do if your scanner system starts malfunctioning at the level of either hanging or crashing when it processes a particular email message. The first step is to think about what might happen with your overall system and what it would look like to your monitoring. What are danger signs that mean something isn't going right in your mail scanning?
Once you've considered the symptoms, you can think about pre-building some mail system features to let you deal with the problem. Two obvious things to consider are documented ways of completely disabling your mail scanner and forcing specific problem messages to bypass the mail scanner. Having somewhat gone through this exercise myself (more than once by now), I can assure you that developing mailer configuration changes on the fly as your mail system is locking up is what they call 'not entirely fun'. It's much better to have this sort of stuff ready to go in advance even if you never turn out to need it.
(Building stuff on the fly to solve your urgent problem can be exciting even as it's nerve-wracking, but heroism is not the right answer.)
At this point you may also want to think about policy issues. If the mail scanner is breaking, do you have permission to get much more aggressive with things like IP blocks in order to prevent dangerous messages from getting in, or is broadly accepting email important enough to your organization to live with the added risks of less or no mail scanning? There's no single right answer here and maybe the final decisions will only be made on the spot, but you and your organization can at least start to consider this now.
You should probably track what types of files your users get in email
Most of the time our commercial anti-spam system works fine and we don't have to think about it or maintain it (which is one of the great attractions of using a commercial system for this). Today was not one of those times. This morning, we discovered that some incoming email messages we were receiving make its filtering processes hang using 100% CPU; after a while, this caused all inbound email to stop. More specifically, the dangerous incoming messages appeared to be a burst of viruses or malware in zipped .EXEs.
This is clearly a bug and hopefully it will get fixed, but in the mean time we needed to do something about it. Things like, say, blocking all ZIP files, or all ZIP files with .EXEs in them. As we were talking about this, we realized something important: we had no idea how many ZIP files our users normally get, especially how many (probably) legitimate ones. If we temporarily stopped accepting all ZIP file attachments, how many people would we be affecting? No one, or a lot? Nor did we know what sort of file types are common or uncommon in the ZIP files that our users get (legitimate or otherwise), or what sort of file types users get other than ZIP files. Are people getting mailed .EXEs or the like directly? Are they getting mailed anything other than ZIP files as attachments?
(Well, the answer to that one will be 'yes', as a certain amount of HTML email comes with attached images. But you get the idea.)
Knowing this sort of information is important for the same reason as knowing what TLS ciphers your users are using. Someday you may be in our situation and really want to know if it's safe to temporarily (or permanently) block something, or whether it'll badly affect users. And if something potentially dangerous has low levels of legitimate usage, well, you have a stronger case for preemptively doing something about it. All of this requires knowing what your existing traffic is, rather than having to guess or assume, and for that you need to gather the data.
Getting this sort of data for email does have complications, of course. One of them is that you'd really like to be able to distinguish between legitimate email and known spam in tracking this sort of stuff, because blocking known spam is a lot different than blocking legitimate email. This may require logging things in a way that either directly ties them to spam level information and so on or at least lets you cross-correlate later between different logs. This can affect where you want to do the logging; for example, you might want to do logging downstream of your spam detection system instead of upstream of it.
(This is particularly relevant for us because obviously we now need to do our file type blocking and interception upstream of said anti-spam system. I had been dreaming of ways to make it log information about what it saw going by even if it didn't block things, but now maybe not; it'd be relatively hard to correlate its logs again our anti-spam logs.)
How 'there are no technical solutions to social problems' is wrong
One of the things that you will hear echoing around the Internet is the saying that there are no technical solutions to social problems. This is sometimes called 'Ranum's Law', where it's generally phrased as 'you can't fix people problems with software' (cf). Years ago you probably could have found me nodding along sagely to this and full-heartedly agreeing with it. However, I've changed; these days, I disagree with the spirit of the saying.
It is certainly true you cannot outright solve social problems with technology (well, almost all of the time). Technology is not that magical, and the social is more powerful than the technical barring very unusual situations. And in general social problems are wicked problems, and those are extremely difficult to tackle in general. This is an important thing to realize, because social problems matter and computing has a great tendency to either ignore them outright or assume that our technology will magically solve them for us.
However, the way that this saying is often used is for technologists to wash their hands of the social problems entirely, and this is a complete and utter mistake. It is not true that technical measures are either useless or socially neutral, because the technical is part of the world and so it basically always affects the social. In practice, in reality, technical features often strongly influence social outcomes, and it follows that they can make social problems more or less likely. That social problems matter means that we need to explicitly consider them when building technical things.
(The glaring example of this is all the various forms of spam. Spam is a social problem, but it can be drastically enabled or drastically hindered by all sorts of technical measures and so sensible modern designers aggressively try to design spam out of their technical systems.)
If we ignore the social effects of our technical decisions, we are doing it wrong (and bad things usually ensue). If we try to pretend that our technical decisions do not have social ramifications, we are either in denial or fools. It doesn't matter whether we intended the social ramifications or didn't think about them; in either case, we may rightfully be at least partially blamed for the consequences of our decisions. The world does not care why we did something, all it cares about is what consequences our decisions have. And our decisions very definitely have (social) consequences, even for small and simple decisions like refusing to let people change their login names.
Ranum's Law is not an excuse to live in a rarefied world where all is technical and only technical, because such a rarefied world does not exist. To the extent that we pretend it exists, it is a carefully cultivated illusion. We are certainly not fooling other people with the illusion; we may or may not be fooling ourselves.
(I feel I have some claim to know what the original spirit of the saying was because I happened to be around in the right places at the right time to hear early versions of it. At the time it was fairly strongly a 'there is no point in even trying' remark.)
Bad slide navigation on the web and understanding why it's bad
As usual, I'll start with my tweet:
If the online copy of your slide presentation is structured in 2D, not just 'go forward', please know that I just closed my browser window.
This is sort of opaque because of the 140 character limitation, so let me unpack it.
People put slide decks for their presentations online using various bits of technology. Most of the time how you navigate through those decks is strictly linear; you have 'next slide' and 'previous slide' in some form. But there's another somewhat popular form I run across every so often, where the navigation down in the bottom right corner offers you a left / right / up / down compass rose. Normally you go through the slide deck by moving right (forward), but some slides have more slides below them so you have to switch to going down to the end, then going right again.
These days, I close the browser window on those slide presentations. They're simply not worth the hassle of dealing with the navigation.
There are a number of reasons why this navigation is bad on the web (and probably in general) beyond the obvious. To start with, there's generally no warning cue on a slide itself that it's the top of an up/down stack of slides (and not all slides at the top level are). Instead I have to pay attention to the presence or absence of a little down arrow all the way over on the side of the display, well away from what I'm paying attention to. It is extremely easy to miss this cue and thus skip a whole series of slides. At best this gives me an extremely abbreviated version of the slide deck until I realize, back up, and try to find the stacks I missed.
This lack of cues combines terribly with the other attribute of slides, which is that good slides are very low density and thus will be read fast. When a slide has at most a sentence or two, I'm going to be spending only seconds a slide (I read fast) and the whole slide deck is often a stream of information. Except that it's a stream that I can't just go 'next next next' through, because I have to stop to figure out what I do next and keep track of whether I'm going right or down and so on. I'm pretty sure that on some 'sparse slides' presentations this would literally double the amount of time I spend per slide, and worse it interrupts the context of my reading; one moment I'm absorbing this slide, the next I'm switching contexts to figure out where to navigate to, then I'm back to absorbing the next slide, then etc. I get whiplash. It's not a very pleasant way to read something.
Multi-option HTML navigation works best when it is infrequent and clear. We all hate those articles that have been sliced up into multiple pages with only a couple of paragraphs per page, and it's for good reason; we want to read the information, not navigate from page to page to page. The more complicated and obscure you make the navigation, the worse it is. This sort of slide presentation is an extreme version of multi-page articles with less clear navigation than normal HTML links (which are themselves often obscured these days).
I don't think any of this is particularly novel or even non-obvious, and I sure hope that people doing web design are thinking about these information architecture issues. But people still keep designing what I think of as terribly broken web navigation experiences anyways, these slide decks being one of them. I could speculate about why, but all of the reasons are depressing.
(Yes, including that my tastes here are unusual, because if my tastes are unusual it means that I'm basically doomed to a lot of bad web experiences. Oh well, generally I don't really need to read those slide decks et al, so in a sense people are helpfully saving my time. There's an increasing number of links on Twitter that I don't even bother following because I know I won't be able to read them due to the site they're on.)
Sidebar: Where I suspect this design idea got started
Imagine a slide deck where there you've added some optional extra material at various spots. Depending on timing and audience interest, you could include some or all of this material or you could skip over it. This material logically 'hangs off' certain slides (in that between slide A and F there are optional slides C, D, and E 'hanging off' A).
This slide structure makes sense to represent in 2D for presentation purposes. Your main line of presentation (all the stuff that really has to be there) is along the top, then the optional pieces go below the various spots they hang off of. At any time you can move forward to the next main line slide, or start moving through the bit of extra material that's appropriate to the current context (ie, you go down a stack).
Then two (or three) things went wrong. First, the presentation focused structure was copied literally to the web for general viewing, when probably it should be shifted into linear form. Second, there were no prominent markers added for 'there is extra material below' (the presenter knows this already, but general readers don't). Finally, people took this 2D structure and put important material 'down' instead of restricting down to purely additional material. Now a reader has to navigate in 2D instead of 1D, and is doing so without cues that should really be there.
Why you mostly don't want to do in-place Linux version upgrades
I mentioned yesterday that we don't do in-place distribution upgrades, eg to go from Ubuntu 12.04 to 14.04; instead we rebuild starting from scratch. It's my view that in-place upgrades of at least common Linux distributions are often a bad idea for a server fleet even when they're supported. I have three reasons for this, in order of increasing importance.
First, an in place upgrade generally involves more service downtime or at least instability than a server swap. In-place upgrades generally take some time (possibly in the hours range), during which things may be at least a little bit unstable as core portions of the system are swapped around (such as core shared libraries, Apache and MySQL/PostgreSQL installs, the mailer, your IMAP server, and so on). A server swap is a few minutes of downtime and you're done.
Second, it's undeniable that an in-place upgrade is a bit more risky than a server replacement. With a server replacement you can build and test the replacement in advance, and you also can revert back to the old version of the server if there are problems with the new one (which we've had to do a few times). For most Linux servers, an in place OS upgrade is a one way thing that's hard to test.
(In theory you can test it by rebuilding an exact duplicate of your current server and then running it through an in-place upgrade, but if you're going to go to that much more work why not just build a new server to start with?)
But those are relatively small reasons. The big reason to rebuild from scratch is that an OS version change means that it's time to re-evaluate whether what you were customizing on the old OS still needs to be done, if you're doing it the right way, and if you now need additional customizations because of new things on the OS. Or, for that matter, because your own environment has changed and some thing you were reflexively doing is now pointless or wrong. Sometimes this is an obvious need, such as Ubuntu's shift from Upstart in 14.04 LTS to systemd in 16.04, but often it can be more subtle than that. Do you still need that sysctl setting, that kernel module blacklist, or that bug workaround, or has the new release made it obsolete?
Again, in theory you can look into this (and prepare new configuration files for new versions of software) by building out a test server before you do in-place upgrades of your existing fleet. In practice I think it's much easier to do this well and to have everything properly prepared if you start from scratch with the new version. Starting from scratch gives you a totally clean slate where you can carefully track and verify every change you do to a stock install.
Of course all of this assumes that you have spare servers that you can use for this. You may not for various reasons, and in that case an in-place upgrade can be the best option in practice despite everything I've written. And when it is your best option, it's great if your Linux (or other OS) actively supports it (Debian and I believe Ubuntu), as opposed to grudging support (Fedora) or no support at all (RHEL/CentOS).
Why we have CentOS machines as well as Ubuntu ones
I'll start with the tweets that I ran across semi-recently (via @bridgetkromhout):
@alicegoldfuss: If you're running Ubuntu and some guy comes in and says 'we should use Redhat'...fuck that guy." - @mipsytipsy #SREcon16
mipsytipsy: alright, ppl keep turning this into an OS war; it is not. supporting multiple things is costly so try to avoid it.
This is absolutely true. But, well, sometimes you wind up with exceptions despite how you may feel.
We're an Ubuntu shop; it's the Linux we run and almost all of our machines are Linux machines. Despite this we still have a few CentOS machines lurking around, so today I thought I'd explain why they persist despite their extra support burden.
The easiest machine to explain is the one machine running CentOS 6. It's running CentOS 6 for the simple reason that that's basically the last remaining supported Linux distribution that Sophos PureMessage officially runs on. If we want to keep running PureMessage in our anti-spam setup (and we do), CentOS 6 is it. We'd rather run this machine on Ubuntu and we used to before Sophos's last supported Ubuntu version aged out of support.
Our current generation iSCSI backends run CentOS 7 because of the long support period it gives us. We treat these machines as appliances and freeze them once installed, but we still want at least the possibility of applying security updates if there's a sufficiently big issue (an OpenSSH exposure, for example). Because these machines are so crucial to our environment we want to qualify them once and then never touch them again, and CentOS has a long enough support period to more than cover their expected five year lifespan.
Finally, we have a couple of syslog servers and a console server that run CentOS 7. This is somewhat due to historical reasons, but in general we're happy with this choice; these are machines that are deliberately entirely isolated from our regular management infrastructure and that we want to just sit in a corner and keep working smoothly for as long as possible. Basing them on CentOS 7 gives us a very long support period and means we probably won't touch them again until the hardware is old enough to start worrying us (which will probably take a while).
The common feature here is the really long support period that RHEL and CentOS gives us. If all we want is basic garden variety server functionality (possibly because we're running our own code on top, as with the iSCSI backends), we don't really care about using the latest and greatest software versions and it's an advantage to not have to worry about big things like OS upgrades (which for us is actually 'build completely new instance of the server from scratch'; we don't attempt in-place upgrades of that degree and they probably wouldn't really work anyways for reasons out of the scope of this entry).