2009-07-31
How fast various ssh ciphers are
Periodically it surprises people to learn this, but ssh is not
necessarily very fast (in the bandwidth sense). It's plenty fast for
normal interactive use, but this speed issue can matter if you are
making large transfers with scp, rsync, or the like; depending on
your environment, ssh can go significantly slower than wire speed.
Ssh is slow because it has to encrypt and decrypt everything that goes over the wire, and this is a CPU-bound operation. How much time this takes depends on how fast the machines at each end are (the faster the better) and on which cipher ssh picks, because they vary significantly in speed.
Citing numbers is dangerous since yours are going to vary a lot, but here's some representative ones from Dell 2950s running 32-bit Ubuntu 8.04 with gigabit Ethernet:
- the fastest cipher is
arcfour, at a transfer rate of about 90 Mbytes/sec;arcfour128andarcfour256are about as fast within the probable margins for error of my testing.(This is still less than 80% of the full TCP/IP wire speed, and you can get gigabit wire speed on machines with much less CPU power than 2950s.)
- the slowest cipher is
3des-cbc, at 19 Mbytes/sec. aes128-cbc, the normal OpenSSH default cipher, is reasonably fast at 75 Mbytes/sec; this is the fastest non-arcfour speed.
That ssh's default cipher is among the fastest ones means that you can probably not worry about this unless you are transferring a lot of data and need it to go as fast as possible (in which case you should explicitly use arcfour).
(And of course all of this is relevant only if the rest of the system can read and write the data fast enough.)
All of this is with no compression. Since compression trades CPU usage for lower bandwidth, you should only turn it on if you're bandwidth-constrained to start with. (And on a multi-core machine you should consider doing the compression yourself, so that one core can be compressing while ssh is using the other core to do the ciphering.)
2009-07-27
Why you should do code reviews for sysadmin scripts
Through my experiences over the past while, I've come around to the view that you should try to have code reviews for sysadmin shell scripts. There's two reasons for this, and they both have to do with the fact that the Bourne shell is not really a programming language.
First, you want code reviews so that other people can convince you that your clever Bourne shell idioms are a little too clever. People's tastes and standards for this vary widely, and you're writing scripts not just for yourself but for your co-workers as well; black-box scripts (ones that no one but you can touch) ultimately don't help anyone.
(And some times you have strong disagreements over the best way to do something and need to come to an agreement on what the local style will be.)
Second and more importantly, because there is a lot of Bourne shell arcana (and beartraps) that people don't know, especially junior people. There are all sorts of clever but not obvious ways of doing things, or conditions that it's not obvious that you need to handle. If you don't know the particular trick that you need, you wind up writing shell scripts that do things inefficiently or miss cases or have subtle bugs.
(It's not just 'junior people' who miss shell idioms; for example, I've learned a number of new-to-me ones through WanderingThoughts, both in writing entries and in the comments people have written.)
Code review of people's scripts gives you an opportunity to pass on these tricks and arcana (and in a way that people may find easier to remember than some great big list of 'nifty shell tricks'), and in the process you improve the quality of your local scripts. As a bonus, you may even learn new ones yourself.
(This is especially good if it means that you fix an overlooked condition in a script before it goes off in someone's face. No one likes to make a mistake in a script that causes problems in production, and it's probably especially demoralizing for junior people and, to put it one way, not a great way to convince them that they're competent to write scripts and should keep on doing so.)
2009-07-16
Another reason to safely update files that are looked at over NFS
Suppose that you are writing a script on one system but testing it on another (perhaps the first system is the one that has your full editing environment setup). You go along in your cycle of edit, save, run, edit, save,
./testscript: Stale NFS file handle
What just happened?
You've run into the issue of safely updating files that are read over NFS, even though you weren't reading the file at the time you saved it.
In theory, every time an NFS client needs to turn a name into an NFS filehandle it should go off and ask the server. In practice, for efficiency NFS clients generally cache this name to filehandle mapping information for some amount of time (how long varies a lot). Usually no one notices, but you got unlucky; when you tried to run the script, the second machine had cached the filehandle for the old version of the file, which no longer exists, and when it tried to read the file the NFS server told it 'go away, that's a stale NFS filehandle'.
Running scripts isn't the only thing that can get stale filehandle
errors because of cached mappings, it's just one of the more obvious
ones because you actually get error messages. I believe that test is
another case (although I haven't yet demonstrated this in a controlled
test):
if [ test -f /some/nfs/file ]; then ... fi
I believe that this will silently fail if the client's cache is out of
date, as the client kernel winds up doing a GETATTR on a now-invalid
NFS filehandle (because test will stat() the file to see if it's a
regular file or not).
2009-07-11
What can go wrong in making NFS mounts
Now that we know what goes on in NFS mounts, we can see that there are any number of moving parts that can go wrong:
- the RPC portmapper refuses to talk to you (possibly because a firewall gets in the way, possibly because it has been set up with tcpwrappers based restrictions).
- the NFS mount daemon refuses to talk to you, possibly because it
usually insists that clients use a reserved port, or there could be another firewall
problem since it uses a different port than the portmapper.
- the NFS mount daemon thinks that you don't have sufficient permissions,
so it refuses to give you an NFS filehandle.
- the kernel NFS server refuses to talk to you, possibly because of
yet another firewall issue.
- the NFS filehandle you get back is broken.
- the kernel NFS server refuses to accept the filehandle that the NFS
mount daemon gave you. This is especially fun, because sometimes
mountwill claim that the filesystem was successfully mounted but any attempt to do anything to it will fail or hang.
Most versions of mount will at least give you different error messages
in the various sorts of cases, generally of increasing peculiarity and
opaqueness as you move down this list.
(Thus, I have seen mount report 'invalid superblock' on NFS mount
attempts.)
2009-07-06
How you could do a shared root directory with NFS
In a previous entry I made an offhand
comment that diskless clients still needed a separate / filesystem
for each client. This is true for how diskless clients were generally
implemented, but technically not true in general; it's possible to build
a diskless client environment with even a shared root directory.
The truth is that most of the contents of / are common between all
machines; there is just not that much system-specific information in
the root filesystem, especially if your diskless machines were generic
(which they usually were). So all you need for a shared root is to
put all of that system-specific information in a separate filesystem
(well, a separate directory hierarchy) and then arrange to mount that
filesystem in a fixed place very early on in the boot process.
(Then you make all of the system-specific files in / be symlinks
that point into the fixed mountpoint for the system-specific directory.)
How do the generic boot scripts in the generic / know which system's
directory to mount? Clearly you need a piece of system-specific
information to know what system you are, but fortunately diskless
machines already have one, namely their IP address, which they know by
the time they can NFS-mount the root filesystem.
I doubt that this is a novel idea, so why didn't any Unix vendor do
this back in the days when diskless systems were big? I don't know for
sure, but I suspect that it was a combination of there being a number
of painful practical issues that would have to be solved, plus there's
probably not all that much disk space to be saved. Using separate /
filesystems for each diskless client was simpler enough to win.
(You could also get most of the savings with hardlinks and cleverness, although I don't know if any Unix vendor officially supported that.)