2012-06-26
A little gotcha with SSH connection sharing
I've written before about SSH connection sharing (and how I use it), but I should mention that there is one little obscure gotcha with it. The one gotcha is that all sessions inherit at least one thing from the initial session, instead of setting it up from scratch.
Specifically, X forwarding is only set from the initial session. All ssh sessions over the shared connection channel will use the X forwarding set up for the initial session and thus talk to whatever X display it was connected to, regardless of what their own $DISPLAY is set to (and even if it's not set to anything). If the initial session had no X forwarding (perhaps because it was started outside of X), then no subsequent session will have it.
This is not the case for port forwarding, where a subsequent session can
set up a new port forwarding. It's also not the case for the various
environment variables that SSH normally propagates to the server; you
can have different settings for things like $TERM or $LANG in
different sessions over the same shared connection.
(I don't know what happens with ssh agent forwarding; I don't have a test environment because I don't use a ssh agent or agent forwarding myself.)
Now, I will admit that this is an obscure gotcha; most people will never run into it because they won't be using the client machine with different $DISPLAY settings at the same time. I'm peculiar this way because I sometimes wind up logged on to my office workstation from home at the same time as my regular office environment is running (complete with its ssh connection sharing).
Sidebar: workarounds and solutions
The real solution is that the ssh ControlPath setting should have a
substitution for the local $DISPLAY value; you could then make and find
control sockets that were specific to the $DISPLAY (or lack thereof)
that the initial master session supported.
My original hack solution was to use an uncommon variant of the hostname of the target servers in the script where I was setting up and using shared connections. Then when I ssh'd to the host by hand I'd naturally use the common hostname and thus not find the shared connection. This had the drawback that I'm not using the shared connection if I ssh to the host by hand from inside my regular office environment, even though I could and it would be more efficient.
My current solution is a cover script for ssh that checks to see
if $DISPLAY is set and if it isn't, forces 'ControlPath none' to
effectively turn off shared connections. This is not completely ideal or
correct, but solves all of the immediate problems.
(In retrospect, a better but more complex solution would be to use a
cover script to set ControlPath to something where I manually included
the value of $DISPLAY.)
2012-06-11
Choosing how slowly your mailer should time out email
Here's a little provocative question: why should mailers have timeouts on email delivery at all? Wouldn't it be more helpful to users to never give up on a message?
In practice, there are at least four reasons to configure your mailer to expire sufficiently old messages:
- some hosts or domains will never accept your email; they are permanently
unresponsive (sometimes in general, sometimes just to you).
- there's so much unexpired old email in the mailer queue that it starts
causing problems for your mailer or the system in general.
- sysadmins get irritated when they see huge queues of (so far) undeliverable
email; among other things, it's clutter.
- expiring and bouncing the message is how the sender gets a copy of
their email back so they can do something else with it.
(This is a pretty weak reason these days, since in today's world most people's mail clients already keep a copy of all of the email they send.)
The latter two reasons are relatively uncompelling, which leaves the first and the second as the primary drivers of email expiry times. And it's really just the first reason, because if you need to set email expiry time based on the second reason you probably already know it because your system has likely exploded under the load of too many old messages.
(I suspect that not many sites send enough email that the second reason is a serious concern. And if it is, there's a bunch of technical measures you can take to reduce the problem.)
The challenge with the first issue is telling the difference between a host that's temporarily down (or unreachable or experiencing problems) and a host that's permanently unresponsive. If we could do this reliably, the right thing to do would be to immediately bounce email for permanently unresponsive hosts and never expire email for hosts that are temporarily down. Since we can't, we have to go with a heuristic: we assume that hosts that have not accepted the email for N days are most likely hosts that are permanently unresponsive.
What's the right value of N? On the one hand, that's a good question. On the other hand, it's the wrong question because this is a heuristic. Heuristics generally do not have a single 'right' answer; instead they have a whole spectrum of answers depending on your specific circumstances (and in this case also depending on what the destination is; for instance, you might know a bunch about this for email inside your organization).
However, we can turn this around by asking what's a reasonable amount of time to expect a temporarily down mail machine to stay down before it gets repaired. My view is that there are reasonable circumstances where a mailer can be down for four or five days at a minimum. If this strikes you as extreme, consider a small organization where the mail machine dies on the afternoon or evening of a long weekend; take three days for the long weekend itself, and then they could easily lose a day or two to getting a new machine and configuring it.
(The extreme case around here is a small department who has their mail machine die just as Christmas vacations start. If they have no spare machines, getting a new server delivered might not happen until two or more weeks later, after Christmas vacations are over and the buildings are open again.)
So if you need a simple number that is not too large, my answer would be 'at least six days'. As it happens, our current mailer configuration times out email to outside domains after six days (more or less), although I don't think we did this sort of thinking about the issue before going with that number.
2012-06-10
Modern email is actually multiple things in one system (mailer timeouts edition)
A commentator on on my entry on mailer delay warnings suggested, in response to my view that repeated delay notices are bad partly because after a day the sender isn't expecting the mail to get through soon anyways:
I think the default for many mailers is to warn after one day, and give up after five. Given the above sentence, in addition to a warning after an hour (as you state), it seems that you opinion is that the mail system should give up after 1-2 days.
I actually don't think that this is a good idea (or the right thing).
One of the complications of handling modern email for both senders and receivers is that it has effectively become a whole bunch of applications and communication systems in one protocol, and what's appropriate for one 'flavour' can be wildly wrong for another. While I'm not going to try to do a complete taxonomy, one big split in sorts of email is between email used as a means of near-realtime communication (both for conversations and for notifications) and email that isn't.
(I've alluded to a split in the sorts of email before, but not spelled it out explicitly.)
Most near-realtime email loses its usefulness very fast as it gets delayed. If you're conducting a near-realtime conversation over email and email stalls, either you're going to abandon the conversation entirely or change to another medium (a telephone call, for example). It makes decent sense for the mailer to give up entirely on delivering this email in only a day or two, because after even a day the conversation is almost certainly either dead or over. But not all email is near-realtime email; there's plenty that isn't. That 'slow' email generally remains useful to deliver even after several days of delay, or even many days of delay, so you don't want to give up on it until you need to.
Unfortunately one of the problems in modern email is that there's no good way to tell all of the different sorts of email apart; the best we have on both the sending and the receiving side is heuristics, and they misfire periodically. Since we can't expire all delayed email after only a day or two and we can't really tell apart fast expire email from slow expire email, the only safe thing to do is expire all email slowly.
(How slowly to expire email is a somewhat complex subject that calls for another entry.)
2012-06-09
Rethinking when your mailer sends 'not-yet-delivered' warning messages
Most Unix mailers have a feature where they periodically send
people warning notes about email messages that haven't been
delivered yet; Exim defaults to doing so roughly every 24 hours, for
example (this is the delay_warning configuration setting, cf).
I've come to believe that typical default values for when this happens
are both too slow and too frequent, and should be rethought in today's
Internet mail environment.
(Actually it looks like Postfix defaults to not sending delay notifications, so this may now be less common than I think.)
The reality of modern mail delivery, at least for us, is that mail delivery times are very skewed and likely quite bimodal. In particular, almost all of the mail sent through our outbound user email gateway is delivered to the remote SMTP server within a minute (and much of it is delivered within seconds) and most of the remainder is delivered within ten minutes. I have the strong feeling that people have come to expect this rapid email delivery, simply because it's what almost always happens.
In this environment, delaying initial notification of a delay for a day is not really what you want since a day is well after people expect their email to have been delivered. How soon the first notification should be sent depends on how mail delivery delays break down in your environment, so I recommend getting statistics. My gut feeling is that an hour is a good starting point under normal circumstances; when people hit an address that genuinely isn't accepting email, they'll probably get notified about it soon enough to know that they should take other steps.
(A related issue is giving people relatively prompt notification when they've misspelled a domain name into a variant that doesn't accept email. The hope is that they'll notice the mistake when they get the notification email and be able to resend their message soon enough to be useful. My unchecked intuition is that such misspellings are one of the significant sources of long term delayed email in our environment.)
However once we've sent one or two delay notifications I don't think there's much point in repeating them. My view is that by the time a day has gone by without a successful delivery, the person who sent the email is no longer really expecting it to get through any time soon. If getting the information through or getting a reply was important they'll probably already have taken alternate steps so further delay notifications are pointless, and if the email's not important the delay notifications are just noise in general. If you have a long message delivery timeout you might want to send another delay notification partway through, but certainly not one a day.
Of course all of this generously assumes that people actually see and read the delay notifications that your system sends them. If people filter them out or just reflexively delete them, you might as well not send them at all (although there may be political reasons to send them even though they'll get ignored). I don't have any information on this, although I have my suspicions.
PS: hopefully it goes without saying that you should only be sending delay notifications to your own local users.