The hidden danger of using
rsync to copy files instead of
I have a long standing reflexive habit that most of them time when
I want to copy files around, I reach for '
rsync -a' by default.
I do it on the command line, and I do it in things like our local
postinstall system setup scripts.
It's not really necessary these days ('
cp -a' now works fine on
everything I commonly use), but I started doing this in an era when
rsync was the clearly safest choice for a 'copy file, preserving
all attributes, no matter what system I'm on' command. Today I made
a mistake and was reminded that this is not necessarily the best
idea, because there is a small difference in behavior between
What happened today is that in a system setup script, I wrote:
set -e [...] rsync -a /master/loc/etc/cron.d/cron-thing /etc/crond.d/
I ran the script, it went fine, and then afterward the system didn't
actually seem to be doing what the
cron.d entry was supposed to
have it do. I spent some time wondering if I'd gotten some other bit
of the system setup wrong, so that the script I was invoking from cron
couldn't do anything, and then finally I looked in
some reason and the penny dropped.
You see, the important difference between
cp here is
that rsync will create a destination directory if necessary and
cp won't. The drawback of this sometimes-handy behavior is that
rsync's behavior hides typos. Had I written '
cp -a ...
cp would have errored out (and then the
entire script would have aborted). With
rsync, it quietly created
/etc/crond.d and put my cron-thing in it, just as I'd typed but
not as I'd wanted.
After this happened, I went back through this script and turned all
of my reflexive '
rsync -a' usage into '
cp -a'. I've been burned
once, I don't need to stub my toe a second time.
I don't currently plan to revise our existing (working) scripts
just for this, but certainly I'm now going to try to shift my
reflexes and use '
cp -a' in the future.
(In this sort of context, even if I want the directory created too
I think it's better to use '
mkdir -p' in order to be explicit about
it. On the command line I might exploit
rsync's side effect, but in
a script there's no reason to be that compact and obscure.)
Being reminded that an obvious problem isn't necessarily obvious
The other day we had a problem with one of our NFS fileservers, where a ZFS filesystem filled up to its quota limit, people kept writing to the filesystem at high volume, and the fileserver got unhappy. This nice neat description hides the fact that it took me some time to notice that the one filesystem that our DTrace scripts were pointing to as having all of the slow NFS IO was a full filesystem. Then and only then did the penny finally start dropping (which led me to a temporary fix).
(I should note that we had Amanda backups and a ZFS pool scrub happening on the fileserver at the time, so there were a number of ways it could have been overwhelmed.)
In the immediate aftermath, I felt a bit silly for missing such an obvious issue. I'm pretty sure we've seen the 'full filesystem plus ongoing writes leads to problems' issue, and we've certainly seen similar problems with full pools. In fact four years ago I wrote an entry about remembering to check for this sort of stuff in a crisis. Then I thought about it more and kicked myself for hindsight bias.
The reality of sysadmin life is that in many situations, there are too many obvious problem causes to keep track of them all. We will remember common 'obvious' things, by which I mean things that keep happening to us. But fallible humans with limited memories simply can't keep track of infrequent things that are merely easy to spot if you remember where to look. These things are 'obvious' in a technical sense, but they are not in a practical sense.
This is one reason why having a pre-written list of things to check is so potentially useful; it effectively remembers all of these obvious problem causes for you. You could just write them all down by themselves, but generally you might as well start by describing what to check and only then say 'if this check is positive ...'. You can also turn these checks (or some of them) into a script that you run and that reports anything it finds, or create a dashboard in your monitoring and alert system. There are lots of options.
(Will we try to create such a checklist or diagnosis script? Probably not for our current fileservers, since they're getting replaced with a completely different OS in hopefully not too much time. Instead we'll just hope that we don't have more problems over their remaining lifetime, and probably I'll remember to check for full filesystems if this happens again in the near future.)
Sidebar: Why our (limited) alerting system didn't tell us anything
The simple version is that our system can't alert us only on the combination of a full filesystem, NFS problems with that fileserver, and perhaps an observed high write volume to it. Instead the best it can do is alert us on full filesystems alone, and that happens too often to be useful (especially since it's not something we can do anything about).
Word-boundary regexp searches are what I usually want
I'm a person of relatively fixed and slow to change habits as far as
Unix commands go. Once I've gotten used to doing something in one way,
that's generally it, and worse, many of my habits fossilized many years
ago. All of this is a long-winded lead in to explaing why I have only
recently gotten around to really using the '
\b' regular expression
escape character. This is a real pity, because now that I have my big
reaction is 'what took me so long?'
Perhaps unsurprisingly, it turns out that I almost always want to search for full words, not parts of words. This is true whether I'm looking for words in text, words in my email, or for functions, variables, and the like in code. In the past I adopted various hacks to deal with this, or just dealt with the irritation of excessive matches, but now I've converted over to using word-boundary searches and the improvement in getting what I really want is really great. It removes another little invisible point of friction and, like things before it, has had an outsized impact on how I feel about things.
(In retrospect, this is part of what how we write logins in documentation was doing. Searching for '<LOGIN>' instead of 'LOGIN' vastly reduced the chance that you'd run into the login embedded in another word.)
There are a couple of ways of doing word-boundary searches (somewhat
depending on the program). The advantage of '
\b' is that it works
pretty universally; it's supported by at least (GNU) grep, ripgrep, and less, and it's at least
worth trying in almost anything that supports modern (or 'PCRE')
regular expressions, which is a lot of things. Grep and ripgrep
also support the
-w option for doing this, which is especially
useful because it works with
(I reflexively default to
fgrep, partly so I don't have to think
about special characters in my search string.)
Per this SO question and its answers,
in vim I'd need to use '
\<' and '
/>' for the beginning and end
of words. I'm sure vim has a reason for having two of them. Emacs
\b', although I don't actually do regular expression
searches in Emacs regularly enough to remember how to invoke them
(since I just looked it up, the documentation
tells me it's C-M-s and C-M-r, which ought to be reasonably memorable
given plain searches).
PS: Before I started writing this entry, I didn't know about
in grep and ripgrep, or how to do this in vim (and I would have
only been guessing about Emacs). Once again, doing some research
has proven beneficial.
PPS: I care about less because less is often my default way of scanning through pretty much anything, whether it's a big text file or code. Grep and company may tell me what files have some content and a bit of its context, but less is what let me poke around, jump back and forth, and so on. Perhaps someday I will get a better program for this purpose, but probably not soon.
We've decided to write our future Python tools in Python 3
About a year ago I wrote about our decision to restrict what languages we use to develop internal tools and mentioned that one of the languages we picked was Python. At the time, I mostly meant Python 2 (although we already had one Python 3 program even then, which I sort of had qualms about). Since I now believe in using Python 3 for new code, I decided that the right thing for us to do was explicitly consider the issue and reach a decision, rather than just tacitly winding up in some state.
Our decision is perhaps unsurprising; my co-workers are entirely willing to go along with a slow migration to Python 3. We've now actively decided that new or significantly revised tools written in Python will be written in Python 3 or ported to it (barring some important reason not to do so, for example if the new code needs to still run on our important OmniOS machines). Python 3 is the more future proof choice, and all of the machines where we're going to run Python in the future have a recent enough version of Python 3.
That this came up now is not happenstance or coincidence. We have a suite of local ZFS cover programs and our own ZFS spares handling system, which are all primarily written in Python 2. With a significantly different fileserver setup on the horizon, I've recently started work on 'porting' these programs over to our new fileserver environment (where, for example, we won't have iSCSI backends). This work involves significant revisions and an entirely new set of code to do things like derive disk mapping information under Linux on our new hardware. When I started writing this new code, I asked myself whether this new code in this new environment should still be Python 2 code or whether we should take the opportunity to move it to Python 3 while I was doing major work anyway. I now have an answer; this code is going to be Python 3 code.
(We have Python 3 code already in production, but that code is not critical in the way that our ZFS status monitoring and spares system will be.)
Existing Python 2 code that's working perfectly fine will mostly or entirely remain that way, because we have much more important things to do right now (and generally, all the time). We'll have to deal with it someday (some of it is already ten years old and will probably run for at least another ten), but it can wait.
(A chunk of this code is our password propagation system, but there's an outside chance that we'll wind up using LDAP in the future and so won't need anything like the current programs.)
As a side note, moving our spares system over to a new environment has been an interesting experience, partly because getting it running initially was a pretty easy thing. But that's another entry.
Having your SSH server on an alternate port provides no extra security today
Every so often I either hear someone say that having your SSH server on a non-standard TCP port provides extra security or get asked whether it does. On today's Internet, the simple answer is no, it doesn't provide any extra security, or at least that it shouldn't. To explain that and convince you, let's talk about the two risks that your SSH server opens you up to. Let us call these the the scattershot risk and the targeted risk.
The scattershot risk is mass SSH password guessing. Pretty much any system on the Internet with an open SSH port will see a legion of compromised zombie machines show up to repeatedly try to guess username/password combinations, because why not; if you have a compromised machine and nothing better to use it for, you might as well turn it loose to see if you can get lucky. Of course SSH is not the only service that people will try mass password guessing against; attackers are also constantly probing against at least authenticated SMTP servers and IMAP servers. Probably they're trying anything that exposes password authentication to the Internet.
However, you should never be vulnerable to the broad risk because
you shouldn't have accounts with weak passwords that can be guessed.
Especially you shouldn't have such accounts exposed to SSH, because
there are a number of ways to insure this. First, obviously you
want to enforce password quality rules (whether just on yourself,
for a personal machine, or on everyone). If you're worried about
random accounts getting created by software that may mis-manage
them and their passwords, you can lock down the SSH configuration
so that only a few accounts can log in via SSH (you probably don't
postgres system account to be able to SSH in, for
example). Finally, you can always go all the way to turning off
password authentication entirely and only accepting public keys;
this will shut down all password guessing attacks completely, even
if attackers know (or guess) a username that's allowed to log in
(SSH is actually a quite easy daemon to lock down this way, because it has good authentication options and it's relatively easy to configure restrictions on who can use it. Things like IMAP or authenticated SMTP are generally rather more troublesome because they have much weaker support for easily deployed access restrictions and they don't have public key authentication built in in the same way.)
The targeted risk is that someone will find a vulnerability in the (Open)SSH server software that you're running and then use it to attack you (and lots of other people who are also running that version). This could be the disaster scenario of full remote code execution, or an authentication bypass, or merely that they can work out your server's private key just by connecting to it (see also). This is an actual risk (even if it's probably relatively unlikely in practice), but on today's Internet, moving SSH to an alternate port doesn't mitigate it due to modern mass scanning and things like shodan and zmap. If your SSH is answering on some port, modern mass scanning will find it and in fact it's probably already in someone's database, complete with its connection banner and any other interesting details such as its host keys. In other words, all of the things that someone unleashing an exploit needs to know in order to target you.
You could get lucky, of course; the people developing a mass exploit could be lazy and just check against and target port 22, on the grounds that this will sweep up more than enough machines and they don't need to get absolutely everyone. But don't count on it, especially if the attackers are sophisticated and intend to be stealthy. And it's also possible that the Shodan-like search engine or the Internet scanning software that the attackers are using makes it easier for them to just ask for 'all SSH servers with this banner (regardless of port)' than to specifically limit things to port 22.
PS: The one thing that moving your SSH server off port 22 is good for is reducing log noise from all of the zombie machines trying scattershot password guessing attacks against you. My personal view is that there are better alternatives, including not paying attention to your logs about that kind of thing, but opinions differ here.
How and why we sell storage to people here
As a university department with a centralized fileserver environment plus a wide variety of professors and research groups, we have a space allocation problem. Namely, we need some way to answer the question of who gets how much space, especially in the face of uneven grant funding levels. Our historical and current answer is that we allocate space by selling it to people for a fixed one-time cost (for various reasons we can't give people free space). People can have as much space as they're willing to pay for; if they want so much space that we run out of currently available but not allocated space, we'll buy more hardware to meet the demand (and we'll be very happy about it, because we've historically had plenty of unsold space).
In the very old days that were mostly before my time here, our fileserver environment used Solaris DiskSuite on fixed-size partitions carved out from 'hardware RAID' FibreChannel controllers in a SAN setup. In this environment, one partition was one filesystem, and that was the unit we sold; if you wanted more storage space, my memory is that you had to put it in another filesystem whether you liked that or not, and obviously this meant that you had to effectively pre-allocate your space among your filesystems.
Our first generation ZFS fileserver environment followed this basic pattern but with some ZFS flexibility added on top. Our iSCSI backends exported standard-sized partitions as individual LUNs, which we called chunks, and some number of mirrored pairs of chunks were put together as a ZFS pool that belonged to one professor or group (which led to us needing many ZFS pools). We had to split up disks into multiple chunks partly because not doing so would have been far too wasteful; we started out with 750 GB Seagate disks and many professors or groups had bought less total space than that. We also wanted people to be able to buy more space without facing a very large bill, which meant that the chunk size had to be relatively modest (since we only sold whole chunks). We carried this basic chunk based space allocation model forward into our second generation of ZFS fileservers, which was part of why we had to do a major storage migration for this shift.
Then, well, we changed our minds, where I actually mean that our director worked out how to do things better. Rather than forcing people to buy an entire chunk's worth of space at once, we've moved to simply selling them space in 1 GB units; professors can buy 10 GB, 100 GB, 300 GB, 1000 GB, or whatever they need or want. ZFS pools are still put together from standard-sized chunks of storage, but that's now an implementation detail that only we care about; when you buy some amount of space, we make sure your pool had enough chunks to cover that space. We use ZFS quotas (on the root of each pool) to limit how much space in the pool can actually be used, which was actually something we'd done from the very beginning (our ZFS pool chunk size was much larger than our on FC SAN standard partition size, so some people got limited in the conversion).
This shift to selling in 1 GB units is now a few years old and has proven reasonably popular; we've had a decent number of people buy both small and large amounts of space, certainly more than were buying chunks before (possibly because the decisions are easier). I suspect that it's also easier to explain to people, and certainly it's clear what a professor gets for their money. My guess is that being able to buy very small amounts of space (eg 50 GB) to meet some immediate and clear need also helps.
(Professors and research groups that have special needs and their own grant funding can buy their own hardware and have their Point of Contact run it for them in their sandbox. There have been a fairly wide variety of such fileservers over the years.)
PS: There are some obvious issues with our general approach, but there are also equal or worse issues with other alternate approaches in our environment.
The hardware and basic setup for our third generation of ZFS fileservers
As I mentioned back in December, we are slowly working on the design and build out of our next (third) generation of ZFS NFS fileservers, to replace the current generation, which dates from 2014. Things have happened a little sooner than I was expecting us to manage, but the basic reason for that is we temporarily had some money. At this point we have actually bought all the hardware and more or less planned out the design of the new environment (assuming that nothing goes wrong on the software side), so today I'm going to run down the hardware and the basic setup.
After our quite positive experience with the hardware of our second generation fileservers, we have opted to go with more SuperMicro servers. Specifically we're using SuperMicro X11SPH-nCTF motherboards with Xeon Silver 4108 CPUs and 192 GB of RAM (our first test server has 128 GB for obscure reasons). This time around we're not using any addon cards, as the motherboard has just enough disk ports and some 10G-T Ethernet ports, which is all that we need.
(The X11SPH-nCTF has an odd mix of disk ports; 8x SAS on one PCI controller, 8x SATA on another PCIE controller, and an additional 2x SATA on a third. The two 8x setups use high-density connectors; the third 2x SATA has two individual ports.)
All of this goes in a 2U SuperMicro SC 213AC-R920LPB case, which gives us 16 hot swappable 2.5" front disk bays. This isn't quite enough disk bays for us, so we've augmented the case with what SuperMicro calls the CSE-M14TQC mobile rack; this goes in an otherwise empty space on the front and gives us an additional four 2.5" disk bays (only two of which we can wire up). We're using the 'mobile rack' disk bays for the system disks and the proper 16-bay 2.5" disk bays for data disks.
(Our old 3U SC 836BA-R920B cases have two extra 2.5" system disk bays on the back, so they didn't need the mobile rack hack.)
For disks, this time around we're going all SSD for the ZFS data disks, using 2 TB Crucial SSDs in a mix of MX300s and MX500s. We don't have any strong reason to go with Crucial SSDs other than that they're about the least expensive option that we trusted; we have a mix because we didn't buy all our SSDs at once and then Crucial replaced the MX300s with the MX500s. Each fileserver will be fully loaded with 16 data SSDs (and two system SSDs).
(We're not going to be using any sort of SAN, so moving up to 16 disks in a single fileserver is still giving us the smaller fileservers we want. Our current fileservers have access to 12 mirrored pairs of 2 TB disks; these third generation fileservers will have access to only 8 mirrored pairs.)
This time around I've lost track of how many of these servers we've bought. It's not as many as we have in our current generation of fileservers, though, because this time around we don't need three machines to provide those 12 mirrored pairs of disks (a fileserver and two iSCSI backends); instead we can provide them with one and a half machines.
Sidebar: On the KVM over IP on the X11SPH-nCTF
The current IPMI firmware that we have still has a Java based KVM over IP, but at least this generation works with the open source IcedTea Java I have on Fedora 27 (the past generation didn't). I've heard rumours that SuperMicro may have a HTML5 KVM over IP either in an updated firmware for these motherboards or in more recent motherboards, but so far I haven't found any solid evidence of that. It sure would be nice, though. Java is kind of long in the tooth here.
(Maybe there is a magic setting somewhere, or maybe the IPMI's little web server doesn't think my browser is HTML5 capable enough.)
The history of our custom NFS mount authorization system (or some of it)
I recently wrote an entry about the shifting goals of our custom NFS mount authorization system, where we moved away from authenticating our own machines and now only need to authenticate machines run by other people on other internal networks. You might wonder why we used to authenticate our own machines on our own network, or alternately why we were able to make the shift away from doing so (and if you're the right sort of person, you already have a guess).
The origins of our custom NFS mount authorization system predate me here, so I only know parts of the story, but as I understand it our old approach existed for a very simple reason. You see, back in the old days, what we now consider our own networks weren't just our own networks. Instead, the department's public networks used to be a mix of our machines and other people's machines, and it wasn't a split that was as neat as 'we get this /24, other people are on other /24s'. Instead, due to historical evolution, various people had hosts and small networks distributed all over this IP address space.
With networks that were effectively insecure against IP spoofing, we had to have some more effective way of verifying our own servers than merely their IP address. Hence our custom NFS mount authentication system. In fact, it may have first been used (only) for our own servers, and then only later extended to let other people's machines selectively do NFS mounts from our fileservers. But clearly this wasn't an ideal situation, and once we (the core services group) created the internal sandbox networks, we started getting people to move their machines from public networks into appropriate sandboxes. All of this was before my time; by the time I arrived, I believe the move was almost completely finished, with only a few real machines lingering around on the public networks that we (the core services team) used, although none of them was on the most important (and most central) /24.
(There are other groups who had and still have their own /24, but that's different; they're segregated from the subnets we have our servers on.)
Today, every non-us machine is long gone from our core networks, so we no longer have to worry very much about IP spoofing of our own servers. As a result, we could consciously shift the goals of the NFS mount authorization system in a way that let us set up a different implementation of it, with different priorities.