2015-10-20
Why I never tell people how I voted
Canada just had a federal election, and I got asked in passing who I voted for. As always, I gave my standard answer for this question, namely that I don't ever talk about how I voted. This is the answer I give to everyone who asks, even my kith and kin; I'll never admit to voting for anyone, no matter what.
It doesn't matter that my likely votes are probably often easy to guess (especially for my kith and kin), and it's not quite as simple as me feeling that on principle you should not interrogate people directly about this question (although asked bluntly it does make me twitch). Instead it is in large part about keeping and concealing secrets by denying people information.
Imagine that I normally told people who or what I voted for in votes, and that someday I had a vote that I wanted to conceal or not admit to for some reason. I'd have a problem; if I said 'I'm not going to tell you' that time, it'd be immediately obvious that I was hiding something (and that I probably hadn't voted for the predictable choice). I'd have to try to lie (and lie convincingly), perhaps to people who knew me.
By issuing a blanket 'no comment' about my votes (with no coy hints or anything) even when I don't deeply care, I buy myself cover for a day when I really do want to conceal my vote. That day I can give exactly the same answer I have all the times before, in exactly the same way, and there (hopefully) won't be any signs that anything is going on.
None of this is new or novel. It's always been the case that if you think you might someday have something to conceal that you should shut up about all aspects of it now; if you answer questions or give people information today and then stop, it is much more damaging than never giving information. As we all know, a sudden switch to 'no comment' might as well be an admission to the wise.
This is of course applicable to sysadmins, because we are partly keepers of secrets and sometimes, what we deal with is only sometimes secret. Or maybe it's not secret now and we don't realize that someday in the future it might be, so we share it openly now and then oops. We may not have leaked the secret when we clam up, but we've certainly admitted that there's a secret.
I don't think that this means sysadmins should never share or expose information. Instead, we should consider whether the information we're sharing and exposing and casually admitting to people now when they ask might, sometime in the future, become sensitive (and how possible or likely that is). If we can easily imagine scenarios where it becomes sensitive, well, maybe we should clam up up front and get used to saying 'I'm sorry, I can't tell you that, it's potentially sensitive information' even if it's perfectly harmless right now.
(I'm honestly not sure how likely this is to come up, though. I have the feeling that a lot of what we deal with is obviously potentially sensitive, as opposed to currently harmless but later dangerous. On the other hand, I may just be mistaking the latter sort of stuff for things that are entirely harmless.)
2015-10-14
OS installers should be easy to open up and modify
A while back I tweeted:
Every so often I dream of genuinely sysadmin-friendly OS installers. I doubt I'll ever get them, though.
As it happens, I have a specific vision of what being sysadmin friendly here should be and that is easy modifications.
The reality of life is that OS installers are necessarily very general but they are most frequently used in much more specific situations, ones where you already know the answers to a lot of the questions they'll ask. At the same time there's often a few local questions of your own that are important and need to get answered in order to set up a machine properly, and not infrequently the set of packages that is available on the install media is not ideal.
A genuinely sysadmin friendly installer would be one that fully understood this reality. It would make it not merely possible but easy to open up the installer image, make this sort of straightforward changes to it, put the whole thing back together, and have a custom setup that fit your environment. Want to run a post-install script? It should let you do that. Want to drop some additional scripts and files into the installed system? Ditto. And so on.
(A nice implementation would allow you to ask your own questions within the installer's Q/A framework and then export those answers to your post-install scripts or write them into the system or the like. A really nice implementation would allow you to use these answers to drive some of the regular installer actions somehow, although that gets more complicated.)
Of course I know that this is a quixotic quest. The modern world has decided that the answer to this problem is doing as little as possible in the actual installer and customizing the system after install using an automation framework (Puppet, Chef, Ansible, etc). I can't even argue with the pragmatics of this answer; getting an automation framework going is a lot easier than trying to persuade Debian, Red Hat, FreeBSD, OmniOS, and so on and so forth to change their installers to be nicer for sysadmins (and a full framework gives you a much more uniform cross system experience).
(Debian and Ubuntu have a 'preseed' feature but apparently rebuilding installer images is somewhat of a pain and preseed only goes so far, eg I don't think you can add your own questions or the like.)
(This whole chain of thought was sparked by working with our standard install system and thinking about how much of it could and really should be in the actual installer.)
2015-10-11
Bad news about how we detect and recover from NFS server problems
In a comment on this entry, Sandip Bhattacharya asked me:
Also, sometimes transient NFS server issues can cause the NFS mount to be wedged, where any access to the NFS mount hangs the process. How would do you escape or detect such conditions?
This is a good question in general and I am afraid the bad news is that there don't seem to be any good answers. Our usual method of 'detecting' such problems is that a succession of machines start falling over with absurd load averages; generally this is our central mailer, our primary IMAP server, our web server, and our most heavily used login server. This is of course not entirely satisfactory, but doing better is hard. Client kernels will generally start spitting out 'NFS server <X> not responding, still trying' messages somewhat before they keel over from excess load and delays, but you can have temporary blips of these messages even without server problems and on top of that you'd need very fast response before active machines start getting into bad situations.
(A web server is an especially bad case, since it keeps getting new requests all the time. If processes are stalling on IO, it doesn't take very much time before your server is totally overwhelmed. Many other servers at least don't spawn new activity quite so fast.)
As far as escaping the situation, well, again we haven't found any good solutions. If we're really lucky, we can catch a situation early enough that we can unmount currently unused and thus not-yet-wedged NFS filesystems from our clients. Unfortunately this is rare and doesn't help the really active machines. In theory clients offer ways to force NFS unmounts; in practice this has often not worked for us (on Linux) for actively used NFS filesystems. Generally we have to either get the NFS server to start working again (perhaps by rebooting the server) or force client reboots, after which they won't NFS mount stuff from the bad server.
(If a NFS server is experiencing ongoing or repeated problems, sometimes we can reboot it and have it return to good service long enough to unmount all of its filesystems on clients.)
In theory, you can fake a totally lost NFS server by having another NFS server take over the IP address so that at least clients will get 'permission denied, filesystem not exported' errors instead of no replies at all. In practice, this can run into serious client issues with the handling of stale NFS mounts so you probably don't want to do this unless you've already tested the result and know it isn't going to blow up in your face.
The whole situation with unresponsive NFS servers has been a real problem for as long as NFS has existed, but so far no one seems to have come up with good client-side solutions to make detecting and managing problems easier. I suspect one reason for this is that NFS servers are generally very reliable, which doesn't give people much motive to create complicated solutions for when they aren't.
(For reasons covered here, I feel that an automounter is not the answer to this problem in most cases. Anyways, we have our own NFS mount management solution.)
2015-10-09
Our low-rent approach to verifying that NFS mounts are there
Our mail system has everyone's inboxes
in an old-fashioned /var/mail style single directory; in fact it
literally is /var/mail. This directory is NFS mounted from one
of our fileservers, which raises
a little question: how can we be sure that it's actually there?
Well, there's always going to be a /var/mail directory. But what
we care about is that this directory is the actual NFS mounted
filesystem instead of the directory on the local root filesystem
that is the mount point, because we very much do not want to ever
deliver email to the latter.
(Some people may say that limited directory permissions on the mount point should make delivery attempts fail. 'Should' is not a word that I like in this situation, either in 'should fail' or 'that failure should be retried'.)
There are probably lots of clever solutions to this problem involving
advanced tricks like embedded Perl bits in the mailer that look at
NFS mount state and so on. We opted for a simple and low tech
approach: we have a magic flag file in the NFS version of /var/mail,
imaginatively called .NFS-MOUNTED. If the flag file is not present,
we assume that the filesystem is not mounted and stall all email
delivery to /var/mail.
This scheme is subject to various potential issues (like accidentally
deleting .NFS-MOUNTED some day), but it has the great virtue that
it is simple and relatively bulletproof. It helps that Exim has
robust support for checking whether or not a file exists (although
we use a hack for various reasons). The whole
thing has worked well and basically transparently, and we haven't
removed one those .NFS-MOUNTED files by accident yet.
(We actually use this trick for several NFS-mounted mail related
directories that we need to verify are present before we start
trying to do things involving them, not just /var/mail.)
(I mentioned this trick in passing here, but today I feel like writing it up explicitly.)
Sidebar: our alternate approach with user home directories
Since user home directories are NFS mounted, you might be wondering
if we also use flag files there to verify that the NFS mounts are
present before checking things like .forward files. Because of
how our NFS mounts are organized, we use an alternate approach
instead. In short, our NFS mounts aren't directly for user home
directories; instead they're for filesystems with user home directories
in them.
(A user has a home directory like /h/281/cks, where /h/281 is
the actual NFS mounted filesystem.)
In this situation it suffices to just check that the user's home
directory exists. If it does, the NFS filesystem it is in must be
mounted (well, unless someone has done something very perverse).
As a useful side bonus, this guards against various other errors
(eg, 'user home directory was listed wrong in /etc/passwd').
2015-10-03
There are two approaches to disaster recovery plans
One of the things I've come to believe in is that there are two ways to approach thinking about your disaster recovery planning (even if you're being quite abstract). The first approach is to think about how you'd restore your services, while the second approach is to think about how you'd get your users working again.
Thinking about how you'd restore your services is very sexy. It leads to planning out things like alternate server rooms, temporary cloud hosting, essential initial infrastructure, what you'd buy if you needed replacement hardware in a hurry, and so on. If you carry it all the way through you wind up with binders of multi-step plans, indexes of what backups are where, and so on.
Thinking about how you'd get your users working again can wind up in a much less comfortable and sexy place, because the first question you have to ask is what your users really need in order to get their work done (or at least to get their important work done). Asking this question can confront you with the reality that a lot of your services and facilities are not really essential for your users, and that what they really care about may not be things you consider important. The first stage disaster recovery plans that can result from this may wind up being much more modest and less sexy than the 'let's rebuild the machine room' sort of plans.
(For example, in places like us the first stage disaster recovery plan might be 'buy everyone important a laptop if they don't already have one, maybe restore some people's address books, and they all go set up GMail accounts and getting back in touch with people they email'.)
Focusing on what your users need to get working again doesn't mean not also having the first sort of disaster recovery plans, since presumably you are going to want to get all your services back eventually. But I think it puts them in the right perspective. The important thing is that necessary work gets done; your services are just a means to that end.
(This is kind of the flipside of what I wrote a while back about the purpose of computer disaster recovery.)
(Of course, if you have an actual disaster without preallocated resources you may find out that some of your services are not important enough any more to come back, or to come back in anywhere near their original form. There's nothing like starting from scratch to cause drastic reassessments of situations.)
2015-09-28
On not having people split their time between different bosses
In some places, it is popular (or occasionally done) to say something like 'well, this area only has the money for 1/3rd of a sysadmin, and this area has the money for 2/3rds of a sysadmin, so I know; we'll hire one sysadmin and split her up'. It is my personal view that this is likely to be a mistake, especially as often implemented. There are at least two pathologies you can run into here.
The basic pathology is that humans are frequently terrible at tracking their own time, so it is quite likely that you are not going to wind up with the time split that you intended. Without strong work against it, it's easy to get pulled towards one side because it's more interesting, clearly needs you more, or the like, and then have that side take over a disproportionate amount of your time. Perhaps time splitting might go well if your one sysadmin is a senior sysadmin with a lot of practical experience at doing this and a collection of tools and tricks for making it work. If your one sysadmin is a junior sysadmin thrown into the lion cage with no support, guidance, tools, and monitoring, well, you're probably going to get about the results that you should expect.
The more advanced pathology is that you are putting the sysadmin in the unhappy position of having to tell people no for purely bureaucratic reasons (or to go over and above their theoretical work hours), because sooner or later one of the areas is going to want more work than fits in the X amount of the sysadmin that they are entitled to. At that point the sysadmin is supposed to say 'sorry, I switch over to area Q now, I know that you feel that your work is quite important, maybe more important than area Q's work, but I am not supposed to spend any more time on you until next week'. This is going to make people unhappy with the sysadmin, which is a stressful and unpleasant experience for them. People don't like inflicting those experiences on themselves.
(The actual practical result is likely to be either overwork or that once again the actual time split is not the time split you intended.)
I feel strongly that the consequence of both pathologies is that management or at least team leadership should be deeply involved in any such split-sysadmin situation. Management should be the ones saying 'no' to areas (and taking the heat for it), not sysadmins, and management should be monitoring the situation (and providing support and mentoring) to make sure the time is actually winding up being split the way it's intended.
(There are structural methods of achieving this, such as having areas 'purchase' X hours of work through budget/chargeback mechanisms, but they have their own overheads such as time tracking software.)
If you like, of course, you can instead blame the sysadmin for doing things wrong or not properly dividing her time or the like. This is the 'human error' explanation of problems and as always it is not likely to give you a solution to the problem. It will give you a great excuse to fire people, though. Maybe that's what you actually want.
2015-09-25
Do we want to continue using a SAN in our future fileserver design?
Yesterday I wrote that we're probably going to need a new Linux iSCSI target for our next generation of fileservers, which I optimistically expect us to begin working on in 2018 (when the current ones will be starting to turn four years old). But as I mentioned in an aside, there's a number of things up in the air here and one of them is the big question of whether we want to keep on using any sort of SAN at all or move to entirely local storage.
We had a number of reasons originally for using an iSCSI SAN, but in practice many of them never got used much. We've made minimal use of failover, we've never expanded a fileserver's storage use beyond the pair of backends that 'belong' to it, and while surviving single backend failures was great (cf), a large part of those backend failures was because we bought inexpensive hardware. If our current, significantly better generation of hardware survives to 2018 without similar large scale failures, I think there could be a real question about carrying on the model.
I've written about our general tradeoffs of a SAN versus disk servers and they remain applicable. However, two things have changed since writing that last year. The first is that we now have operational experience with a single fileserver that has a pile of disk space and a pile of load on it, and our experience overall is that we wish it was actually two fileservers instead. Even when we aren't running into OmniOS issues, it is the fileserver that is most prone to have problematic load surges and so on, simply because it has so much disk space and activity on it. One thing this has done is change my opinion about how big a disk server we'd want to have; instead of servers as big as our current fileservers with their paired backends, I now think that servers roughly half the size would be a good choice (ie, with 8 pairs of data disks).
The second is that I now believe we're going to have a real choice of viable OSes to run ZFS on in 2018, and specifically I expect that to include Linux. If we don't need iSCSI initiator support, need only a medium number of disks, and are willing to pay somewhat extra for good hardware (as we did this generation by avoiding ESATA), then I think hardware support in our OS of choice is likely to be much less of an issue. Put bluntly, both Linux and FreeBSD should support whatever disk controller hardware we use and it's reasonably likely that OmniOS will as well.
There are unquestionably downsides to moving away from a SAN (as I covered, and also). But there are also attractive simplifications, cost savings, and quite possibly performance increases (at least in an all-SSD environment). Moving away from a SAN is in no way a done deal (especially since we like the current environment and it's been quite good for us) and a lot can (and will) change between now and 2018, but the thought is now actively in my mind in a way that it wasn't before.
(Of course, part of this is that occasionally I play around with crazy and heretical what-if thoughts about our architecture and systems. 'What if we didn't use a SAN' is just one iteration of this.)
2015-09-08
How we disable accounts here (and why)
In my opinion, the hardest part of disabling accounts is deciding what 'disabled' means, which can be surprisingly complex. These days, most of the time we're disabling an account as a prelude to removing it entirely, which means that the real purpose of disabling the account is to smoke out anything that would cause people to not want the account to be deleted after all. Thus our goal is to make it look as much as possible as if the account has been deleted without actually deleting anything.
These days, what this means is:
- scrambling their password, so they cannot log in to our Unix
systems, access their files through our Samba servers, read their
email via IMAP, and so on. If necessary, this gets 'reverted'
through our usual password reset process for people who have
eg forgotten their password.
(Given that Samba has its own password store, it's important for us to actively use
passwdto scramble the password instead of just editing/etc/shadowto lock or disable it (cf).) - making the user's home directory and web pages completely
inaccessible (we '
chmod 000' both directories). This blocks other people's access to files that would be (or will be) deleted when the account gets deleted. Explicitly removing access to the account's web pages has been important in practice because people sometimes forget or miss that deleting an account deletes its web pages too.(I believe this will stop passwordless SSH access through things like authorized keys, but I should actually test that.)
- making the user not exist as far as the mail system is concerned,
which stops both email to them and email through any local
mailing lists they may have.
(This automatically happens when someone's home directory is mode 000, and automatically gets reverted if their home directory becomes accessible again.)
- entirely removing their VPN credentials and DHCP registrations. Both of these can be restored through our
self-service systems, so there's no reason to somehow lock them
instead.
- find and comment out any crontab entries, and stop any user-run
web servers they have. All of this
should normally stop anyways because of mode 000 home directories,
but better safe than sorry.
- set their Unix shell to a special shell that theoretically prints a message about the situation. We use this more as a convenient marker of disabled accounts than anything else; the scrambled password means that the user can't see the message even if they actually tried to log in to our Unix systems (which they may not really do these days).
We don't try to find and block access to any files owned by the user outside of their home directory, because we don't normally remove such files when we do delete the account (which is one reason we need filesystem scans to find unowned files).
If we're disabling an account for some other reason, such as a security compromise, we generally skip making the user's files inaccessible. This also keeps email flowing to them and their mailing lists. In this case we generally specifically disable any SSH authorized keys and so on.
Sidebar: Keeping web pages without the rest of the account
This is actually something that people ask for. Our current approach is to leave the Unix login there, scramble the password and so on, and empty out the user's regular home directory and set it to mode 000 (to block email). This leaves the web pages behind and keeps the actual ownership and login for them (which is important because we still use Apache's UserDir stuff for people's web pages).
We haven't yet had a request to keep web pages for someone with CGIs or a user-run web server, so I don't know how we'd deal with that.
2015-09-07
Why we wind up deleting user accounts
In a comment on my entry on optimizing finding unowned files, Paul Tötterman asked a good question:
I'm surprised that you actually remove users instead of just disabling them. Care to expand on that?
At one level, the answer is that we remove users when account sponsors tell us to. How our account management works is that (almost) every user account is sponsored by some professor; if the account's sponsor removes that sponsorship, we delete the account (unless the person can find another sponsor). Professors sponsor their graduate students, of course, but they also sponsor all sorts of other people; postdocs, undergraduate students who are working on projects, visitors, and so on. There's no requirement to withdraw sponsorship of old accounts and it's customary to not do so, but people can do so and sometimes do.
(For instance, we have no policy that graduated grad students lose their accounts or have them disabled. Generally they don't and many of them live on for substantial amounts of time.)
But that's not the real answer, because I've glossed over what prompts sponsors to take action. Very few professors bother to regularly go over the accounts they're sponsoring and decide to drop some. Instead there tend to be two underlying causes. The first cause is that the professor wants to reclaim the disk space used by the account because the other option is buying more disk space and they'd rather not. The second cause is that we've noticed some problem with the account (for example, email to it bounces) and the account sponsor decides that removing it is the simplest way for them to resolve the situation. This usually doesn't happen for recent accounts; instead it tends to happen to the accounts of people who left years ago.
(Account sponsors are 'responsible' for accounts that they sponsor and get asked questions about the account if there are problems with it.)
Our current approach to account removal is a multi-stage process, but it does eventually result in the login getting deleted (and sometimes that happens sooner rather than later if the sponsor in question says 'no, really, remove it now').
2015-09-05
Why we aren't tempted to use ACLs on our Unix machines
One of the things our users would really like to have is some easy way to do ad-hoc sharing of files with random collections of people. In theory this is a great fit for ACLs, since ACLs allow users themselves to extend various sorts of permissions to random people. Despite this appeal of ACLs, we have no interest in supporting them on our machines; in fact, we go somewhat out of our way to specifically block any chance that they might be available.
The core problem is that in practice today, ACL support is far from universal and not all versions of it behave the same way and are equally capable. What support you get (if any) depends on the OS, the filesystem, and if you're using NFS (as we are), what the NFS fileserver and its filesystem(s) support. As a practical matter, if we start offering ACLs we're pretty much committed to supporting them going forward, and to supporting a version of them that's fully backwards compatible with our initial version; otherwise users will likely get very angry with us for taking away or changing what will have become an important part of how they work.
(The best case on an ACL system change is that people would lose access to things that they should have access to, partly because people notice that right away. The worst case is that some additional people get access to things that they should not.)
Given that neither ACL support nor ACL behavior is anywhere near universal, a need for backwards compatibility is almost certain to limit our choice of future OSes, filesystems, and so on. Do we want to switch the fileservers to FreeBSD, for example, but NFS to ZFS on FreeBSD doesn't support the ACL semantics we need? We'd be out of luck and stuck. If we want the most future freedom we have to stick to the lowest common denominator, and today that is Unix UIDs, GIDs, and basic file permissions.
(This sort of future compatibility is not a universal need, of course. There are any number of environments out there where you build systems for specific needs and when those needs go away you're going to toss the systems. In that setup, ACLs today for one system don't necessarily imply ACLs tomorrow (or the same ACLs tomorrow) for another one.)