2007-12-30
The importance of killing processes in the right order
In the old days, we had a mail system with two problems: receiving SMTP mail took a couple of processes per connection, and the SMTP server had no timeout. The result that our central mail server would slowly accumulate more and idle SMTP session processes waiting on zombie PCs that had just yanked the connection away, all of them chewing up memory, swap space, and so on; at the height of things, we might have more than a thousand idle SMTP connections.
(Specifically, there was a process per connection that did the SMTP conversion, and then a separate, fairly heavyweight program to verify addresses; the SMTP server process started the address verification router process when necessary and communicated with it through pipes. Sometimes the router process spawned its own children, for extra fun.)
Once upon a time, I took it upon myself to clean up this situation. This being a Solaris machine, I did:
# kill -9 `pgrep smtpserver`
The machine promptly exploded; we had to force boot it from the serial console to recover it. What had happened was this:
In the idle state, the SMTP server processes were waiting on network input and the router processes were waiting on input from the pipe connected to the SMTP server processes, and everyone was swapped out. When I killed all of the SMTP server processes, all of those pipes suddenly saw end of file, so the kernel woke up all of the router processes and immediately started trying to swap them all back into memory in order to run them. Since a thousand odd router processes did not even remotely fit into memory, the machine immediately started thrashing itself to death.
This makes a great illustration of the need to kill processes in the right order when recovering an overloaded system. You need to kill processes in the order that will produce as little system activity as possible; as this example shows, the last thing you want to do is kill one bunch of processes only to cause this to wake up another bunch of previously idle processes.
(Since kill does not kill all the processes on the command line at
once, 'kill -9 `pgrep smtpserver router' is not an entirely safe
approach; you are betting that kill will get everything before the
kernel interrupts it to start paging router processes back into memory.)
2007-12-23
Multihomed hosts and /etc/hosts
As a side note to looking up hostnames from IP addresses for people who use /etc/hosts, note that
/etc/hosts lookups work badly in the presence of hosts with multiple
IP address, since most gethostbyname() implementations will only
return the first IP address that they find in /etc/hosts. These days
you really want a minimal /etc/hosts and a reliable DNS server, unless
you have special concerns.
(The gethostbyname() behavior is sensible, since otherwise it would
always have to scan the entire /etc/hosts file just to make sure that
it had found all IP addresses for a host, even when most hosts only have
one IP address.)
While there are workarounds for this issue, I think that the best
way out is just to not have any entries for your multihomed hosts in
/etc/hosts, even on the hosts themselves. This appears to work fine on
at least modern Linuxes, and I can't imagine that the *BSDs do any worse
here.
(You can have similar behavior with gethostbyaddr(), depending on
how you give an IP address multiple names in /etc/hosts. Putting all
the names on one line works out, but having multiple lines for one IP
address has the same problem.)
Shortening hostnames for fun and profit
Once upon a time I needed to NFS export filesystems to a lot of
workstations, in a situation where I was worried about size limits
in /etc/exports (and we didn't use YP/NIS, so we couldn't just put
everything in netgroups). In situations like this, one thing to do
is to shrink hostnames down as much as possible, and that's what we
did.
(This was back in an era where the existence of such limits were at least plausible.)
First, we named the workstations after elements.
This let us make their canonical names in DNS be the short abbreviations
for each element (although the local hostname was still the friendlier
element name), meaning that workstations had a canonical hostname
that was only one or two characters long. Then we put them all in
/etc/hosts, using shortened names: their canonical hostname, plus only
the subdomain of their lab.
All of this gave us hostnames for /etc/exports that were only four
to six characters long, far shorter than they normally would have been,
and I stopped worrying about the potential problem.
(In the end I don't know if the exports file actually had any size limits; possibly I did all of this work merely out of paranoia.)
Perhaps we could have done all of this without making the abbreviation
be the canonical name in the DNS, but I didn't feel like finding out the
hard way that mountd did DNS lookups under some circumstances. And we
were clearly going to have the abbreviations in DNS, since having names
in /etc/hosts that aren't in DNS is a recipe for future confusion and
explosions.
2007-12-22
Getting a useful persistent VNC session
By a persistent VNC session, I mean a session that you can connect your VNC viewer to, do stuff, disconnect from, and then later come back to connect to it again. This makes it the graphical equivalent of screen, at least for me.
(Much like screen, it also gives you a certain amount of immunity
against network and workstation stability problems for critical tasks
that you don't want interrupted. More and more, systems have to be
managed graphically instead of through text interfaces, which means that
screen isn't good enough.)
VNC generally comes with a program, vncserver, that starts and manages
VNC sessions, and it even puts them in the background for you. The
magic secret to useful persistent VNC sessions is simple: ignore this
backgrounding and always start vncserver with nohup, because
vncserver's backgrounding is only doing half the necessary job.
If you don't do this, what happens can be confusing: once you close
the terminal window or log out or whatever from the shell that you
ran vncserver, all of the client programs get disconnected from
the server, so when you (re)connect to the VNC session all you see
is the plain black and white X background.
(Specifically, vncserver just uses '&' in the shell to start things
in the background, which doesn't protect them from getting a SIGHUP
when the session exits, and the VNC X server itself reacts to SIGHUP
by terminating all of the current client connections.)
Update: Pete Zaitcev pointed out
in email that people who use normal shells won't see this problem and
should just use plain vncserver, without nohup et al. The problem
happens for me because I'm using a shell
that doesn't do job control.
2007-12-05
Safely updating files that are read over NFS
To elaborate on an old entry a bit, let's suppose that you have important system data files that you expose to all of your machines via NFS, and that you need to re-generate and update them every so often.
When you regenerate files locally, you need to make sure that there's always a version of the file present, and that the file's always complete. When you add NFS there's a third, subtle requirement: you cannot immediately remove the old version because a process on another machine might be reading it, and NFS will happily yank the file out from under the process on the other machine.
(Normally, Unix doesn't completely remove a file until all processes drop their references to it, so a local process that's reading the old version of the file will keep being able to do so. NFS breaks this, because the NFS server has no knowledge of what files are open on the clients.)
The NFS issue means that things like plain rsync are not safe ways
to update files, since they do the equivalent of writing a temporary
file and then mv'ing it into place, which removes the old version
of the file on the spot. You need to preserve the old version of the
file under some other name, as in the recipe from the old entry.
(Note that rsync -b is not safe; rsync does not take the steps
necessary to make sure that a version of your file is always present, so
there is a brief moment when programs could see no file at all.)
(As a corollary, the obvious way of directly rewriting the file in place with shell redirection is very dangerous. Any problems will probably leave you with a truncated file, and even without a problem a process that tries to read the file while you're rewriting it will see an incomplete or empty version.)
2007-12-04
Dumb switches are now too smart for my good
Here's an ironic thing: basic switches these days are now too smart for their own good, or at least for my good. The issue is that even theoretically dumb switches are VLAN aware, which means that they will only pass VLAN tagged packets if they've been told about the particular VLAN is tagged for.
The problem this creates is that it is now impossible to have a generic traffic monitoring switch. What you'd like to have is a small switch that's pre-configured with a mirror port that you could just transparently splice in at any spot in your network to debug traffic. Ideally this switch would be generic, so that it would work anywhere and everywhere in your network fabric, no matter what else was going on.
(You won't be able to monitor the full 2 Gb of potential traffic, but in many situations you don't need to because the link isn't that saturated.)
But this doesn't work if you want to splice the switch into a spot that's carrying tagged VLANs, because if the switch doesn't know about one of the VLANs it will just drop that VLAN's packets. Which is far from transparent.
You can't configure the switch to pass all possible VLANs, because no switch can be configured with anywhere near 4094 VLANs. You can try to configure the switch to pass all VLANs that are in use at your site, but this is both prone to errors as configurations change over time and limits where you can plug in the switch (you have to put it in front of something of yours that would drop any stray VLANs anyways).
(We actually ran into this with the switch that our traffic tracking system uses to take a tap of the traffic going through our NAT gateway. We added a new subnet on our core switch infrastructure and to the NAT gateway but forgot to update the traffic tap switch, and were very puzzled for a while as machines on the subnet couldn't talk to the NAT gateway to get out to the world.)