How I tend to label bad hardware
Every so often I wind up dealing with some piece of hardware that's bad, questionable, or apparently flaky. Hard disks are certainly the most common thing, but the most recent case was a 10G-T network card that didn't like coming up at 10G. For a long time I was sort of casual about how I handled these; generally I'd set them aside with at most a postit note or the like. As you might suspect, this didn't always work out so great.
These days I have mostly switched over to doing this better. We have a labelmaker (as everyone should), so any time I wind up with some piece of hardware I don't trust any more, I stick a label on it to mark it and say something about the issue. Labels that have to go on hardware can only be so big (unless I want to wrap the label all over whatever it is), so I don't try to put a full explanation; instead, my goal is to put enough information on the label so I can go find more information.
My current style of label looks broadly like this (and there's a flaw in this label):
volary 2018-02-12 no 10g problem
The three important elements are the name of the server the hardware came from (or was in when we ran into problems), the date, and some brief note about what the problem was. Given the date (and the machine) I can probably find more details in our email archives, and the remaining text hopefully jogs my memory and helps confirm that we've found the right thing in the archives.
As my co-workers gently pointed out, the specific extra text on this label is less than idea. I knew what it meant, but my co-workers could reasonably read it as 'no problem with 10G' instead of the intended meaning of 'no 10g link', ie the card wouldn't run a port at 10G when connected to our 10G switches. My takeaway is that it's always worth re-reading a planned label and asking myself if it could be misread.
A corollary to labeling bad hardware is that I should also label good hardware that I just happen to have sitting around. That way I can know right away that it's good (and perhaps why it's sitting around). The actual work of making a label and putting it on might also cause me to recycle the hardware into our pool of stuff, instead of leaving it sitting somewhere on my desk.
(This assumes that we're not deliberately holding the disks or whatever back in case we turn out to need them in their current state. For example, sometimes we pull servers out of service but don't immediately erase their disks, since we might need to bring them back.)
Many years ago I wrote about labeling bad disks that you pull out of servers. As demonstrated here, this seems to be a lesson that I keep learning over and over again, and then backsliding on for various reasons (mostly that it's a bit of extra work to make labels and stick them on, and sometimes it irrationally feels wasteful).
PS: I did eventually re-learn the lesson to label the disks in your machines. All of the disks in my current office workstation are visibly labeled so I can tell which is which without having to pull them out to check the model and serial number.
Sending emails to your inbox is a dangerous default
One of the things I have to keep learning over and over again about email is that I should not let so many things bother me by showing up in my inbox. Even relatively low-volume things.
(I can filter or I can eliminate the email, depending on the situation.)
It starts innocently enough. You start getting some new sort of email (perhaps you sign up for it, maybe it's an existing service sending new email, or perhaps it's a new type of notification that you've been auto-included in). It's low volume and reasonably important or useful or at least interesting. But it's a drip. Often it ramps up over time, and in any case there are a lot of sources of such drips so collectively they add up.
In the process of planning an entry about dealing with this, I've come to the obvious realization that one important part here is that new email almost always defaults to going to your inbox. When it goes to your inbox two things happen. First, it gets mixed up with everything else and you have to disentangle it any time you look at your inbox. Second, by default it interrupts you when it comes in. Sure, I may have some tricks to avoiding significant interruptions from new email, but it still partly interrupts me (I have to look at the subject at least), and unless I'm very busy there's always the temptation to read it right now just so that I can throw it away (or file it away).
(Avoiding that interruption in the first place is not an option for two reasons. First, part of my job as a sysadmin is to be interrupted by sufficiently important issues. Second, I genuinely want to read some email right away; it's important or I'm expecting it or I'm looking forward to it.)
It's certainly possible to move email so it doesn't wind up in my inbox, but as long as the default is for email to go to my inbox, stuff is going to keep creeping in. It's inevitable because people follow the path of least resistance; when it takes more work to filter things out (and requires a sample email and some guesses as to what to match on and so on), we don't always do that extra work.
(And that's the right tradeoff, too, at least some of the time. One email a year or even a month probably is not worth the time to set up a filter for. Maybe not even one email a week, depending.)
If email defaulted to not coming to my inbox and had to be filtered in, my email life would be a very different place. There are drawbacks to this, so in practice probably the easiest way to arrange it is to have different email accounts with different inboxes that have different degrees of priority (and that you check at different times and so on).
(Of course this is where my email mistake bites me in the rear. I don't have the separate email accounts that other people often do; I would have to set up new ones and shift things over. This is something I'll have to do someday, but I keep deferring it because of the various pains involved.)
PS: There are also practical drawbacks to shifting (some) email out of your inbox, in that unless you're very diligent it increases the odds that the email won't get dealt with because you just don't get around to looking at it. This is certainly happening with some of the email that I've moved out of my inbox; I'll get to it someday, probably, but not right now.
I'm one of those people who never log out from their desktop
Only crazy people log out from their desktop every time they step away from it for moderate amounts of time. Whether you're leaving to get lunch or to go to a long meeting, sensible people just lock the screen (something that I've deliberately made very easy in my X setup). But my impression is that a fair number of people log out at the end of the day, or at least the end of the week.
I'm not one of those people; not at home and especially not at work. With rare exceptions, I log in when I boot up my machine and then I stay logged in until I'm going to reboot it (and then I log right back in again). When I leave, whether for an evening, the weekend, or the university's multi-week winter break, I just lock my X session (which at least purges my SSH keys). As a sysadmin who cares about security to some degree, this can feel a bit embarrassing; it would probably be moderately more secure to actually log off my office machine every night and log in again every morning.
(At home there's less reason to worry about the security issues and I use my desktop every day.)
A large part of why I do this is simply that I'm lazy. Both locking and unlocking my screen are a lot faster than logging out (in an orderly way) and then starting up my X session all over again. While I've automated a fair amount of starting my X session, there's still a number of manual steps involved (for example, I start some programs by hand and manually place their windows). The whole thing is enough of a hassle that I don't feel inclined to do it more often than I really have to. It also takes a bit of time, for various reasons; even if everything magically started automatically, it would probably take sixty seconds or so until my desktop was all up and running.
(Logging out requires a bit of work because things like Firefox are much happier if I shut them down in an orderly way instead of just yanking the X session out from underneath them.)
A certain amount of this manual startup work is because I've added
a few more 'always present' windows but haven't gotten around to
adding them to my startup script. Some of them are a bit awkward
to automate (because they are really 'start an
xterm with a shell,
then run a command in the shell'), but I could probably glue something
together. Other programs have to be started by hand because they
provide no way to specify things like where to place their windows
or that they should start iconified (with their icon in a specific
spot). Possibly I could arrange a sufficiently complicated set of
supporting scripts to automate this (using things like wmctrl), but just not logging out
is a lot easier.
Staying logged in all of the time has some interesting consequences. The obvious one is that I normally keep all of my regular X programs running continuously for days on end (and sometimes weeks). Unsurprisingly, programs do not always expect this or handle it perfectly. Even when a program doesn't have issues with running for a long time, it may do somewhat inconvenient things like only loading certain information at startup.
(Awkwardly, one of the programs I use with this 'only on startup' flaw is one that I wrote myself. My excuse is that it was by far the easiest way to code that particular feature, the data involved doesn't change often, and I can always restart the program if I need to. Still, I should probably fix this someday.)
Our small tools for running commands on multiple machines
A while back I wrote about the personal shell scripts I had for running commands on multiple machines. At the time, they were only personal scripts that I used myself; however, over time they kept informally creeping into worklog entries that documented what we actually did and even some shell scripts we have to pre-write the commands we need for convoluted operations like migrating ZFS filesystems from server to server. Eventually we decided to adopt them as actual official scripts, put in our central location for such scripts.
My own versions were sort of slapped together, especially the
machines script to print out the names of machines that fall into
various categories, so making them into production-worthy tools
meant cleaning that up. The
oneach script needed only moderate
reforms and as a result the new version is only slightly improved
over my old personal version; in day to day usage, I probably
couldn't notice any difference if I switched back to using my old
(The big difference is that the production version has more options
for things like extra verbosity and a dryrun mode that just reports
ssh commands that would be run.)
machines command got completely redone from scratch, because
I realized that my hack approach just wouldn't work. For a start,
I couldn't ask my co-workers to edit a script every time we added
a machine; there would have been a revolt. So I wrote a new version
in Python that parsed a
configuration file. This new production version is a drastic
improvement over my shell script hack; because I wrote it in Python,
I was able to include significantly more features, in addition to
making it more convenient and regular (since it's parsing a
configuration file). The most important one is support for 'AND'
and 'EXCEPT' operations, so you can express machine categories like
'all machines with some feature that are also Ubuntu 16.04 machines'
or 'all Ubuntu 14.04 machines except ...'. This is supported both
in the configuration file, where it sees a little bit of use, and
on the command line, where I take advantage of it periodically.
(The configuration file format is nothing special and basically duplicates what I've seen other similar programs use. Although I didn't consciously set out to duplicate their approach, it feels like we wound up in the same spot because there's only so many good solutions for the problem.)
Using a configuration file doesn't just make things more convenient and maintainable; it also makes them more consistent, in several senses. It's now much harder for me to accidentally forget to add machines to categories they should be in (or not remove them from categories that no longer apply). A good part of the reason is that the configuration file is mostly inverted from how my script used to do it. Rather than list machines that are in categories, it mostly lists the categories that a machine is in:
apps0 apps ubuntu1604 allnfs users
There are a few categories that are explicitly specified, but even then they tend to be in terms of other categories:
This approach wouldn't have been feasible in my original simple shell script, but it's a natural one once you have a configuration file (especially if you want to make adding new machines easy and obvious; for the most part you can copy an existing line and change the initial host name).
In theory I could have done all of these improvements in my own
personal versions, and writing the Python version of
didn't take too long (even writing a Go version for my own use only added a modest amount of
time). In practice it took the push of knowing that these had to
now be generally usable and maintainable by my co-workers to get
me to spend the time. Would it have been wrong to spend the time
on this when they were just personal scripts? Probably, and even
if not I doubt I could have persuaded myself of that. After all,
they worked well enough as they were originally.
The goals and problems of our Dovecot IMAP configuration migration
We have a long standing backwards compatibility issue with our
IMAP server, which is that we have it
configured so that the root of the IMAP mail folder storage is
$HOME. Originally this led to Dovecot session hangs, but now it's led to running out of
inodes on the Dovecot server machine and
general NFS load as people's Dovecot sessions rummage all through
their home directories on our fileservers. Today I'm going to talk about
our ideal IMAP configuration, the problems of trying to migrate to
it, and then some thoughts on what we might settle for.
(In other words, our Dovecot configuration currently sets
mail_location to '
If I could wave a magic wand, the Dovecot IMAP configuration we
want is simply one where all mail folders are stored under some
directory in people's home directories, say
$HOME/mail, and IMAP
clients wouldn't need or want an IMAP path prefix. In this world, if you had a mail
folder for private stuff your client would know it as the PrivateStuff
folder and it would be stored in
your IMAP client do a '
LIST "" "*"' operation, Dovecot would only
$HOME/mail and everything under it, not all of
There are three problems with migrating from our current configuration
to this setup. First, there's all of people's current mail folders
that are in places outside of
$HOME/mail, which must be moved
$HOME/mail in order to stay accessible via IMAP. Second,
even for people who have their mail folders in
clients know it under a different IMAP path; right now an existing
$HOME/mail/PrivateStuff would be known to the IMAP client as
mail/PrivateStuff (even if the client hides this from you by having
an IMAP path prefix), where in the new world it would be known as
just PrivateStuff. Finally, some people have their clients set up
with an IMAP path prefix, which would have to be removed to get our
ideal setup (even if their IMAP path prefix is currently
$HOME/mail right now).
There are some ways to improve these. First, if people are willing
to accept extra directories under
$HOME/mail, they can avoid
needing any client changes even if they were already putting all
of their mail folders in a subdirectory; all you do is preserve the
subdirectory structure when you move mail folders around. If you
$HOME/Mail/PrivateStuff, move it to
$HOME/mail/Mail/PrivateStuff instead of
You wind up with the theoretically surplus
(and a 'Mail/' IMAP path component), but all your clients can
continue on as they are.
Second, if a person uses IMAP subscriptions so that the server-stored
subscription information knows all of their mail folders, we can
reliably move all of them into the same hierarchy under
using only server-side information. Server side mail processing in
.procmailrc may need to be updated, however, since
the actual Unix paths will obviously change. Also, use of IMAP
subscriptions (and the IMAP
LSUB command) is far from universal
among IMAP clients (as I've discovered).
As far as I know, Dovecot doesn't provide a way to log information
about what mail folders are SELECT'd, so we can't determine what
actual mail folders exist through tracking client activity; it's
server side IMAP subscriptions or nothing.
So far I've described the configuration that we want, not necessarily
the configuration we're willing to settle for. So what are our
actual minimum goals? While we haven't actively discussed this, I
think what we'd settle for is an end state configuration where IMAP
clients can't search through all of
$HOME or store mail folders
anywhere outside a small set of subdirectories under it. We could
live with a configuration where mail folders could be in any of
$HOME/Mail, and a few others. We can
also live with these being visible in the IMAP mail folder names
that clients use, so that instead of seeing a folder called
PrivateStuff in your client, you either see mail/PrivateStuff (or
Mail/PrivateStuff or so on), or you set an IMAP path prefix in your
client to hide it.
Sam Hathaway's comment on my entry on IMAP paths in clients and servers brought Dovecot's namespaces to my attention, especially the backwards compatibility examples. I don't think these can be used to migrate towards our ideal configuration, but it's possible they could be used to create something like what we're willing to settle for.
(They could also be used to strip out prefixes from the IMAP paths
that clients send us, but in our specific situation I don't think
there's much point in doing that. The hard part is getting people's
mail folders under
$HOME/mail, and we don't really care if their
path there winds up being eg
Sidebar: A brief note on the mechanics of migration
For various reasons, we have no intention of operating a single Dovecot server with different configurations for different users (ie, with some migrated to the new, confined configuration and others using the old one). Instead we'd do the migration by building an entire new IMAP server under a new name with the new configuration, and then telling people what they had to do to switch over to using it. New people would be pointed to the new server (and blocked from using the old one), while existing people would be encouraged and perhaps helped to migrate. Eventually we'd be down to a few stubborn holdouts and then we'd give them no choice by turning the old server off.
Conveniently the current IMAP server is running Ubuntu 14.04, which means that it has a natural remaining lifetime of about a year and a quarter. This is perhaps enough time to actually get everyone migrated without too much pain.
(Then in a year or two more we'd quietly switch back to using the old IMAP server name, because it really is the best name for an IMAP server.)
Understanding IMAP path prefixes in clients and servers
Suppose you have some IMAP clients and they talk to an IMAP server which stores mailboxes somewhere in the filesystem under people's home directories (let's call this the IMAP root for a user). One of the complications of talking about where people's mailboxes and folders actually wind up in this environment is that both the clients and the server get to contribute their two cents, but how they manifest is different.
(As a disclaimer, I'm probably abusing IMAP related terminology here in ways that aren't proper and that I'd fix if I actually ever read up on the details of the IMAP protocol and what it calls things.)
To start with, the IMAP protocol has the concept of a hierarchy of
folders and mailboxes, rooted at
/. This hierarchy is an abstract
thing; it's how clients name things to the server (and how they
traverse the namespace with operations like
The IMAP server may implement this hierarchical namespace however
it wants, using whatever internal names for things that it wants
to (provided that it can map back and forth between internal names
and protocol level ones know by clients and named in the IMAP
subscriptions and so on). Even when an IMAP server stores this IMAP
protocol namespace in the filesystem, it may or may not use the
client names for things. For now, let's assume that our IMAP server
Many IMAP clients have in their advanced configuration options an
option for something like an 'IMAP Path Prefix' or an 'IMAP server
directory', to use the names that iOS and Thunderbird respectively
use for this. This is what it sort of sounds like; it basically
causes the IMAP client to use this folder (or series of folders)
as a prefix on all of the mailbox and folder names it uses, making
it into the root of the IMAP namespace instead of
/. If you set
this in the client to
IMail and have a mailbox that you call
Private' in the client, the actual name of the mailbox in the
IMAP protocol is
IMail/Private. Your client simply puts the
on the front when it's talking to the server and takes it back off
when it gets stuff back and presents this to you.
A client that has an IMAP path prefix and uses
LIST will normally
only ask for listings of things under its path prefix, because
that's what you told it to do. What's visible under the true IMAP
root is irrelevant to such a client; it will always confine itself
to the path prefix. In our filesystem-backed IMAP server, this means
that the client is voluntarily confining itself to a subdirectory
of wherever the IMAP server stores things in the filesystem and it
doesn't care (and won't notice) what's outside of that subdirectory.
On the server side, the IMAP server might be configured (as ours
sadly is) to store folders and mailboxes
$HOME, or it might be configured to store them
starting in a subdirectory, say
$HOME/IMAP. This mapping from the
IMAP protocol directory hierarchy used by clients to a directory
tree somewhere in the filesystem is very much like how a HTTP server
maps from URLs to filesystem locations under its document root
(although in the case of the IMAP server, there is a different 'IMAP
root' for every user). A properly implemented IMAP server doesn't
allow clients to escape outside of this IMAP root through clever
tricks like asking for '
..', although it may be willing to follow
symlinks in the filesystem that lead outside of it.
(As far as I know, such symlinks can't be created through the IMAP protocol, so they must be set up by outside means such as the user sshing in to the IMAP server machine and making a symlink by hand. Of course, with fileservers and shared home directories, that can be any of our Linux servers.)
Using an IMAP path prefix in your client is a good thing if the
server's IMAP root is, say,
$HOME, since there are probably a
great many things there that aren't actually mailboxes and mail
folders and that will only confuse your client (and complicate its
listing of actual interesting mailboxes) if it looks at them by
asking for a listing of
/, the root of the IMAP namespace. With
an IMAP path prefix configured, your client will always look at a
$HOME where you'll presumably only have mailboxes
and so on.
The IMAP server is basically oblivious to the use of a client side IMAP path prefix and can't exert any control over it. The client never explicitly tells the server 'I'm using this path prefix'; all the server sees is that the client only ever does operations on things with some prefix.
The net result of this is that you can't transparently replace the
use of a client side IMAP path prefix with the equivalent server
side change in where the IMAP root is. If you start out with a
client IMAP path prefix of
IMail and a server IMAP root of
and then change to a server IMAP root of
$HOME/IMail, the client
will still try to access
IMail/Private, the server will translate
$HOME/IMail/IMail/Private, and things will probably be
sad. To make this work, either you need to move things at the Unix
filesystem level or people have to change their IMAP clients to
take out the IMAP path prefix.
To make this perhaps a little bit clearer, here is a table of the various pieces and the resulting Unix path that gets formed once all the bits have been put together.
|Server IMAP root||client IMAP prefix||Client folder||Unix path|
For a given server IMAP root, it doesn't matter whether the client forms the (sub)folder name explicitly or through use of a client IMAP path prefix. If you use multiple clients and only some of them are set up with your IMAP path prefix, clients configured with the prefix will see folder names with the prefix stripped off and other clients will see the full (IMAP protocol) folder path; this is the second and third lines of the table.
(If all of your clients respect IMAP subscriptions, the server may not be able to tell whether or not any particular one of them has an IMAP path prefix configured, or if it's just dutifully following the subscriptions (which are of course all inside the IMAP path prefix you have configured on some clients).)
(This is one of the entries I write partly to get all of this straight in my head.)
How our IMAP server wound up running out of inodes
On Twitter, I mentioned that we'd run out of inodes on a server, and then a few weeks later I made a comment about an IMAP feature:
I'm coming to really dislike IMAP clients that don't use subscriptions, even though the consequences for our server are sort of our own fault.
These two tweets are very closely related, and there is a sad story here (since it's sort of our own fault).
In the IMAP protocol, there are two ways to get a list of mailboxes
and folders that you have; the
LIST command and the
The difference between the two is that
LSUB restricts itself to
things that you have
SUBSCRIBE'd to (another IMAP command), while
LIST command just lists, well, everything that the IMAP server
can discover. When the IMAP server is backed by some sort of database,
that 'what it can discover' comes from the database engine; when
the IMAP server is storing things in the filesystem as a directory
hierarchy, that just translates to a directory listing.
Many IMAP clients use IMAP subscriptions both to track what folders
they know about and synchronize the list of known folders between
clients, since your IMAP subscriptions are remembered by the server
and stored there. However, some clients can't be bothered with this;
they simply use
LIST to ask the IMAP server for absolutely
everything (and presumably then show some or all of it to you).
Even when your IMAP server is storing mailboxes and folders in the
filesystem, the difference between
LSUB is normally
not particularly important because the IMAP server is normally using
an area that's only for mailboxes, and the only thing normally found
there is mailboxes. Then, unfortunately, there's us. Due to the
ongoing requirements of backwards compatibility,
the root of our IMAP server's mailbox storage is people's
It is quite possible for people's
$HOME to contain a lot of things
that aren't mailboxes and mail folders, at which point the difference
LSUB becomes very important to us. If a client
uses IMAP subscriptions, what else is in
$HOME doesn't matter;
the client will only try to look through things you've subscribed
to, which are presumably actually mailboxes (and limited). But if
the client ignores IMAP subscriptions and just uses
LIST, it winds
up trying to look through everything, and then when it finds
directories, it recurses down through them in turn.
A year and a half ago, our problem was
LIST searches that either ran into symlink cycles or
escaped into the wider filesystem, hanging Dovecot and hammering
our fileservers. That's basically
stopped being a problem. Today's problem is that some people who
use these clients have fairly large
$HOMEs, with things like
significant version-controlled source trees and datasets with lots
of files and subdirectories. Dovecot maintains index files in a directory hierarchy for
every mailbox and mail folder that it knows about; when a client
LIST recursively, this translates to 'at least every directory
that Dovecot runs across'. We have Dovecot store its indexes on the
IMAP server's local mirrored system disks, because that's a lot faster
than getting them over NFS.
This is how we wound up running out of inodes on our IMAP server. Dovecot was just trying to store too many index files and directories. Discarding people's index data didn't help for long, because of course their clients did it again and recreated it all after a few days.
(Our short term brute force solution was to put in a larger set of SSDs and create a partition just for Dovecot's index data, with the number of inodes set to the maximum value. This has managed to keep us out of danger so far.)
I suspect that clients doing this unrestricted
LIST usage can't
be giving the people using them a really good experience, but
apparently it's not so terrible that people stop using them.
Unfortunately we don't really have any ideas what specific clients
are involved, partly because more and more people are using multiple
clients across many different devices.
(Our long term fix is going to have to be migrating away from our backwards compatibility settings, but that's going to be a very slow process and probably a lot of work. Helpfully it can be done fairly easily for people who actually use IMAP subscriptions, but discussing the issues involved is for another entry.)
Sidebar: How many inodes we're talking about
At the moment, our most prolific user has over 1.3 million Dovecot index files and directories, with the next two most prolific users have over 730k and 600k respectively (fortunately it falls off fairly rapidly from there). The overall result of this is that our filesystem for storing this Dovecot index data has over 4.6 million inodes used.
When you have fileservers, they naturally become the center of the world
Every so often I spend a little bit of time thinking about how we might make some use of cloud computing, generally without coming up with anything meaningful, and then inevitably I wind up thinking about what makes it hard for us. So today I want to mention a little downside of having fileservers, which is that once you have fileservers they can easily become the center of your computing universe and then everything becomes tied to the fileservers.
To make this concrete, let's look at IMAP. When you build an IMAP server, you have to decide where people's IMAP folders will be stored. One option is a storage system that is dedicated to the IMAP server (or servers) through various options, including locally attached disks or a dedicated little SAN. With a fileserver environment, another natural choice is on the fileservers along with all your other data; this is especially attractive if you're already managing space there on a per-user or per-group basis, so you don't have to allocate IMAP folder space to people or groups and you can have it just come out of their existing space.
Now suppose you want to move your IMAP service into a cloud. If you opted to store the IMAP folders 'locally' to the IMAP servers, you can move the whole assemblage into the cloud in a fairly straightforward way. But if you chose to store IMAP folders on your existing fileservers, the actual data the IMAP server uses is entangled with the rest of the data on the fileservers (perhaps hopelessly so). You can't really move the service as a whole to the cloud, and moving the servers alone is probably a bad idea for all sorts of reasons.
(It's not just IMAP for us, of course; there are all sorts of services that are entangled with our fileservers because the data they use lives on the fileservers. Our web server is another obvious example.)
At the same time, putting data on fileservers is not a bad thing; instead it's the completely natural thing. Holding and serving data is what they're there for and if we've done a competent job, they're quite good at that. Building, operating, backing up, monitoring, and managing space on a whole collection of little storage nodes is not the greatest idea in the world; it's redundant work and it adds all sorts of complications to everyone's life. And it's much easier for people if they can just get generic space that they can use for whatever they want, whether that be email messages, web data, home directories, data files for computations, or so on.
(In a sense, the entire reason you build general use fileservers is to make them the center of the computing universe. Well, at least in our somewhat unusual setup.)
Our next generation of fileservers will not use any sort of SAN
We've been using SAN-based fileservers here for a long time, partly for reasons that I once wrote about in Painless long term storage management without disturbing users. Our current and past generations of ZFS fileservers have been based around an iSCSI SAN, and before that we had at least one generation of Fibre Channel based fileservers using Solaris (with DiskSuite and relatively inexpensive hardware RAID-5 boxes. Some of the things we've wanted from a SAN haven't worked out lately but others have, and I wouldn't say we're unhappy with our current SAN setup.
We're in the process of putting together our next generation of fileservers and despite everything I just wrote, we've decided that they won't use a SAN. The core reason is that a SAN isn't necessary for us any more and moving away from having one both simplifies our life and means we need less hardware (which means everything costs less, which is an important consideration for us). It does matter that we want smaller fileservers and this affects the economics, but our decision goes beyond that; we have no regrets about the shift and don't feel we're being forced into it.
One not insignificant reason for this is that our ideas about long term storage management simply haven't worked out in practice (as I once theorized might happen). Even if we used iSCSI in our next generation, it was clear to us that the migration would once again involve copying all of the data with user-visible impact, just as it did the last time around. But beyond that, while I won't say that the iSCSI network has been useless, we haven't actually needed any of the advantages a SAN gives us in this generation. With solid hardware this time around, we haven't had a backend or a fileserver fail, or at least we've never had them fail for hardware reasons. Nor have we needed two iSCSI networks, as we've never had a switch or network failure.
Using iSCSI has unfortunately complicated our lives. It requires two extra networks and two extra sets of cabling, switches, and so on. It has to be monitored and software configurations have to be fiddled with, and we've actually had software issues because we have two iSCSI networks (every so often an OmniOS fileserver will refuse to use both iSCSI networks, especially after a reboot). And of course the split between fileservers and backends means more machines to look after.
(It also reduces the IO bandwidth we can get, which is an issue for various things including ZFS scrubs and resilvers, and means there's extra spots to monitor for performance impacts.)
A non-SAN fileserver environment is just going to be simpler, with fewer moving parts (in the sysadmin sense), and these days we can build it without needing to use anything that we consider chancy or unproven. Our existing iSCSI backends have provided us with the basic template; a server case with somewhere in the range of 16 to 24 disks and dual power supplies, a suitable motherboard, and connecting to all of the disks using some combination of SAS controllers and motherboard SAS and SATA ports (these days we no longer need to resort to chancy stuff like eSATA, the way we had to in our first generation). Using moderately sized servers with moderate amounts of disks goes well with our overall goals of smaller individual fileservers, and all of the pieces are well understood and generally work well (and are widely used, unlike eg iSCSI).
Will I miss having a SAN? My honest answer is that I won't. Like my co-workers, I'm looking forward to a simpler and more straightforward overall fileserver environment, with more isolation between fileservers and less to worry about and look at.
How we automate acmetool
Acmetool is my preferred client
for Let's Encrypt and the one we've
adopted for our switch to Let's Encrypt at work.
If you know acmetool, talking about automating it sounds like a
contradiction in terms, because the entire design of acmetool is
about automating everything already; you put it in cron (or more
exactly you let it put itself in cron as part of setup with '
quickstart'), and then you forget about it. Perhaps you have to
write a hook script or two or adjust file permissions because a
daemon runs as a different user, but that should be it.
However, there are a few questions that acmetool will ask you initially and there's one situation where it has to ask you a new question during certificate renewal, as was pointed out by a commentator on my earlier entry:
Recently Let's Encrypt switched over to a new version of their user agreement (v1.2). As a result, all certificate renewals for old accounts started failing (because they had only agreed to v1.1), and I had to ssh to all our servers, interactively run
acmetool, and re-confirm the signup process (agreement & email) myself.
Fortunately you can automate this too, and you should. Acmetool
supports a response file, which
contains answers to questions that
acmetool may try to ask you
during either installation or certificate renewal. We automate these
questions by preinstalling a
responses file in
which makes '
acmetool quickstart' work without having to ask us
anything. When Let's Encrypt updated their user agreement, we pushed
out a new version of the
responses file that auto-accepted it and
so returned certificate renewals to working without any manual
(The first renewal attempt after Let's Encrypt's update reported
errors, then we worked out what the problem was, updated the file,
pushed out a new version, and everything was happy. My personal
websites avoided the problem entirely because of the timing; I had
a chance to update my own
responses file before any of their
renewals came up, and when renewal time hit
acmetool was fine.)
responses settings we use are:
"acme-enter-email": "<redacted>@<redacted>" "acmetool-quickstart-choose-server": https://acme-v01.api.letsencrypt.org/directory "acmetool-quickstart-choose-method": webroot "acmetool-quickstart-webroot-path": "/var/www/.well-known/acme-challenge" "acmetool-quickstart-install-cronjob": true # add an additional line to accept any new user agreement "acmetool-quickstart-choose-server": https://acme-v01.api.letsencrypt.org/directory
As shown in the example
responses file, you
can set additional parameters like the normal key type, RSA key
size, and so on. We haven't bothered doing this so far, but we may
in the future.
You could vary the email address if you wanted to (for example for different classes of machines). We don't bother, because it's mostly unimportant; in practice, all it gets is the occasional email about one of our generic test machine hostnames that hasn't renewed its certificate because we haven't been using that hostname for anything that needed one.