2008-11-21
Limiting how much load Exim puts on your system
One of the things that you usually want to do with MTAs is have some
limit on how many things that they'll try to do at once. This is
especially important if, like us, you allow users to run programs from
their .forwards; people do every so often have runaway programs, or
just programs that sit there endlessly (trying to get a lockfile, for
example).
Unfortunately, Exim only has limited support for load limiting. What you really want to do is limit the number of particular sorts of simultaneous deliveries allowed, so that you can have limits like 'only twenty pipes at once, and only four at once per user'. Exim can't do that directly; instead, all you can do is try to limit the number of simultaneous deliveries in total, and there is no direct limit on that either, so you have to reverse engineer one sideways.
Exim can start delivery processes either immediately during an SMTP
conversation or later, during a queue run. Each queue run starts one
process, and for local transports each delivery process only does one
delivery at a time. So if all you are dealing with is queued mail, you
can be doing up to queue_run_max local deliveries at once.
(We mostly care about local deliveries, because they are the ones that
can explode the most and are the most likely to use up a lot of memory
and CPU. For remote SMTP, each top-level Exim delivery process can do up
to remote_max_parallel deliveries at once.)
Once you have at least smtp_accept_queue SMTP connections (more
or less; concurrency issues can create a bit of slop), the new
connections queue all of their messages and do not create more
delivery processes. Before then, each SMTP connection can create at
most smtp_accept_queue_per_connection delivery processes; after a
connection has processed that many messages, it starts queuing them
instead of immediately delivering them.
So the maximum process limit for local deliveries is the number of queue runners you allow, plus the maximum number of non-queueing SMTP connections times the number of non-queued messages per connection. This is the worst case situation, but unfortunately reducing any of these settings to limit the worst case will slow down ordinary processing under some situations, either by forcing things to be queued unnecessarily or by slowing down how soon queued messages get processed.
(Exim really wants to do immediate deliveries from SMTP sessions in order to process email promptly. The problem with relying on queue runners for delivery is that each time a message is retried, it has to be re-routed from scratch. This means that even a modest number of messages to places with DNS problems will probably clog your queue up significantly.)
The other possible approach is to use limiting based on the load
average, through queue_only_load and deliver_queue_load_max. My
concern with these is that the load average is a lagging indicator (it
is a one minute moving average, after all). Under a significant load
burst you can get into trouble well before the load average updates to
high enough to kick in these limits.
2008-11-19
A growing realization about tcpdump and reading IP traffic
Here is a gotcha about reading tcpdump output that recent events have
been tattooing on my forehead:
The only sure way to tell whether a packet is going to your gateway or to something on the local network is to look at the destination Ethernet address.
To put it another way: a packet being sent to your network's gateway
does not have the gateway's IP address in it. Thus, reading tcpdump
output without Ethernet addresses is not really telling you whether a
packet was really sent to your gateway or whether it was just floating
by on the network. Similarly if you are reading
tcpdump output on the sending machine; until
you look at the destination MAC, you don't actually know where the
machine is sending the packets, you just think you know.
This is obvious once you think about it (assuming that you know enough
about how IP works), as is its interaction with tcpdump being
promiscuous and how switches can
flood traffic through your network. But you do have
to think about it, and not doing so has tripped me up at least twice
now. It's certainly not intuitive that more or less the only thing
your machine's IP stack does with your gateway's IP address is to ARP for its
Ethernet address.
(I think one reason that this is so easy to overlook is that it feels like a layering violation. It's rational to think that the use of an IP gateway should be visible in the IP headers of a packet, instead of only showing up one lever lower.)
2008-11-08
Thinking about how your security domains relate to each other
It may be that you have (or can have) more than one security domain among your machines. If you want to really have multiple security domains, it's important to think about how they related to each other and what this implies for how to manage the machines.
First, a note: a machine can never be in multiple security domains. By definition, you work with security domains instead of single machines because compromising one machine in a security domain eventually leads to compromising all of them (assuming that your attacker knows what they are doing). So if you have multiple security domains and you put a machine into two of them at once, what you have just done is merge the security domains through this machine.
This leads to the most important and basic principle, which is that security can only flow one way. If there is any relationship between security domains at all, they can never be equal; instead, one is the higher-security domain and one is the lower-security domain, and the higher security domain has access rights to the lower security domain but not vice versa. (I generally say that security flows out or down from the high security domain.)
Now, when I say 'access rights' I do not mean anything as easily controlled as passwordless access and NFS mounts. If you are taking this seriously, merely logging in to system B from system A means either that both systems are effectively in the same security domain (regardless of what the nominal situation is) or that system A is in a superior security domain. This is because of the consequences of a compromise of system A; an attacker that trojans your ssh program (or your kernel or whatever) on system A has just gotten access to system B through your actions. (Been there. Had it happen to me. Not fun. A lot of passwords got changed in a hurry.)
(It follows that sysadmin workstations are basically at the top of the security domain pyramid, especially if you theoretically have several completely independent security domains. If you are paranoid, this has significant implications for things like how they can be set up and how you back them up.)
If you need to cross a security domain without having security flow between the two, you need to firewall what the 'superior' domain can do to the 'inferior' one. Naturally this limiting has to be imposed by the would-be inferior domain, since the whole point of the firewall is that what would be the inferior domain doesn't want to trust the superior one.
2008-11-05
How many root passwords should you have?
There's a simple answer to the question of how many root passwords you should have; clearly, you should have a separate root password for each system. This answer is, shall we say, naive in most situations.
We can see why by asking the traditional security question of what the actual risks are of using the same root password on different systems, which is that an attacker who gets your root password from one system can then immediately compromise another. So the first situation where it is mostly or entirely pointless to have separate root passwords is where an attacker could compromise the other machine even without the root password.
The next situation is where blocking the attacker getting root on other machines isn't actually protecting anything meaningful, for example if you use ordinary NFS and the attacker gets root on a machine with enough NFS mount permissions. The attacker hardly needs to get root on any other machine, because they already have full access to user files that are visible from their machine, which in many cases is 'all of them'.
(Sure, NFS doesn't give them access as root, but this is hardly an obstacle; they can use root powers to become the user's UID and then go to town.)
I could go on, but there's a more general principle here: you don't want to think about machines, you want to think about security domains. There is very little point in using different root passwords on machines in the same security domain, and even if you have multiple security domains you may still want to use the same root password across them, because there are some risks to having lots of passwords.
(And you want to think realistically about what is and isn't in each of your security domains. You may conclude that things are intertwined enough that you only really have one security domain, although you could technically argue that you have several.)
2008-11-04
Mistakes editors can make that disqualify them as sysadmin editors
A sysadmin editor needs slightly more than to be a fast-starting thing that needs a minimal environment. It turns out that sysadmins are picky people (well, at least I am), and there are some small mistakes that otherwise competent editors can make that, unfortunately, make them non-starters for the job.
The first and the most classic mistake is automatically changing tabs to spaces or spaces to tabs. Just like programmers working on makefiles, sysadmins edit files where tabs are not interchangeable with spaces; an editor that thinks they are is a great way to blow your foot off some day.
(This is why to this day I wince any time someone says that they edit
with pico, which was infamous for this behavior at one time. I believe
that the pico people fixed it pretty promptly, but these things stick
in people's memories.)
The other mistake is not overwriting files in place, or at a minimum not doing so when they have hardlinks. Because it is the best way to save files, any number of editors always save to a temporary file and then rename it over top of the real one; this is beautifully safe but explodes if the real file had additional links, because you've just broke those links. Some editors will notice this and revert to overwriting the file directly if it has hardlinks, but unfortunately hard links aren't the only case these days when preserving the exact file is important; consider security contexts.
(In theory editors can look for every last possible special attribute a file could have that means it should be updated in place. In practice there are too many of them and they keep growing.)
I maintain that there is also a third mistake, that of being too UTF-8 aware (or arguably, being UTF-8 aware at all). One of the important general virtues of a sysadmin editor is leaving strictly alone any bytes that you didn't actually edit. Unfortunately, most UTF-8 aware editors will rewrite any invalid UTF-8 sequences that they encounter, which can mangle your files in various ways (some of which are sometimes important). This generally doesn't bother me because I am so old fashioned that I still use the C locale, but it's something to watch out for.
(I do use one editor that always works in UTF-8 regardless of your locale, and every so often it does mangle text that I didn't want mangled. Fortunately it is not a sysadmin editor, since it requires X.)
2008-11-03
Why vi has become my sysadmin's editor
Here is one of my little peculiarities: for all that I use it a lot, I
don't really like vi. I use two or three editors on a routine basis
(and at least one more every so often), and out of all of those, vi
is my least favorite. But it is also the editor that I use the most,
because over time it has become what I'll call my sysadmin editor.
Why I have drifted into using vi all the time is a roll call of its
virtues as a sysadmin editor. First, it is ubiquitous; if I am on some
random, un-customized system, I know that I can type vi and get
something usable. Second, vi works in minimal environments, in that it
doesn't need X, and will thus work fine over basic ssh connections and
on the console (either normal or serial). Finally, vi starts fast.
You are probably laughing at the last advantage, but it matters a lot in a sysadmin's editor because of two factors: sysadmins make lots of little edits to separate files, and we do this on a bunch of different machines and accounts. An editing environment that takes ten seconds to initialize is appreciably less useful than an one that starts in under a second, because those ten seconds can be an appreciable portion of the time I'm going to spend in this editing session.
This usage profile is basically completely opposite that of most people, who edit the same things for a long time in a single, well developed and customized environment. Which is why, for me, vi is a great sysadmin editor but not my favorite editor; in any environment besides sysadmin editing, its relative weaknesses start showing up more and more.
(Despite this, because I've used vi so much for quick sysadmin
editing jobs I've wound up drifting into it more and more for casual
but more extended editing. It's usable enough (and I know it well
enough) for such editing, and it's often just enough of a pain to fire
up a better editing environment that I don't bother. Thus I wind up
doing things like dashing off quick email messages or even writing
WanderingThoughts entries with it.)
Sidebar: Why sysadmins don't just leave a big editor running
The usual big editor retort to the slow start problem is 'just start the editor once', but that doesn't work for sysadmins because we edit files in lots of different contexts. An editor for every context is an unfeasibly large number of them, and that assumes I am willing to leave editors running as root just in case I need them again, which I'm not (like most sysadmins, I get out of privileged contexts as fast as possible). Some editors try to let you access files on other systems with other logins, but this is infeasible for sysadmins for various reasons (including that I am not going to trust any editor with our root password).