2010-08-22
Another reason to hate $LANG and locales on Unix
Sometimes I'm slow; only recently did it occur to me how the
$LANG sort misfeature and GNU comm's misfeature combine in an orgy of annoyance in a heterogenous
environment.
Suppose that you have systems that changed their default locale
between operating system versions. As part of routine processing, you use comm to get the difference between
something on the local system and a global list. Well, oops. Even if you
carefully use sort on both versions, you are going to have problems.
As we saw earlier, the choice of locale may change the sort
order. While GNU comm is locale aware in just the same way as sort, it
is not aware of multiple locales; it assumes that all files are sorted
in the current locale (and these days it actively requires it). So your
global file, although sorted, may not be sorted in the current system's
locale, which will cause comm both to complain and to fail.
(You get the same effect if you generate different global files on different machines and then try to process them together.)
Effectively this means that there is no such thing as a globally visible
file that is properly sorted, because what 'properly sorted' is is
different on different machines. Instead you probably want to sort all
files on the local machine, which means making copies of the global
ones. Ideally you want to do this right before using them, because the
locale may differ between various environments even on a single machine;
it simply safer to sort files in the script immediately before feeding
them to comm, so you know that sort and comm were both running in
the same locale.
(Offhand, there are at least four plausible environments where system
scripts might run with a different locale: from init.d scripts at boot
time, from crontab entries, from an interactive login, and from an
automated ssh command invocation that passes along the other machine's
locale.)
2010-08-14
What I want in a caching nameserver
What the world needs is a good caching nameserver. What brought this on is that I am currently flirting with yet another caching nameserver, which is something that I do from time to time because every caching nameserver I've ever found sucks in its own way. This is actually somewhat surprising to me, because at one level the job is not all that difficult so you'd think that someone would have written a sane implementation by now.
(Possibly the DNS system actually is sufficiently difficult that it drives every implementer insane. Sadly I can believe it; DNS is both baroque and peculiar, and I'm sure there are lots of dark corners.)
What I want in a caching nameserver, beyond 'works', is:
- it can forward queries for some zone(s) off to other recursive (caching) nameservers, as recursive queries.
- it can send queries for some zone(s) directly to primary nameservers,
as non-recursive queries.
- it has a sane and small configuration system. I am not interesting in anything that requires a SQL server, for example.
- it has a small memory footprint.
The first and second give you different ways of splicing in local zones so that you can resolve private internal names and you can still resolve things in your own organization even when your Internet link is down. I need both; sometimes I want to do a recursive query to another caching nameserver that handles all the details, and sometimes I want to talk directly to a primary nameserver that will laugh at me if I send it DNS queries that are marked as 'recursion allowed'.
DJ Bernstein's dnscache is the usual recommendation but it falls down on the first issue (and arguably on the second one as well, depending on how you interpret what it should do if it gets NSes back); it's what I normally use (because years ago I got horribly offended at Bind's memory usage). My current flirtation is with unbound, which has both recursive and non-recursive forwarding, mostly has a sane configuration system, and unfortunately falls down on memory usage even more spectacularly than Bind did.
(Looking at the package list in Fedora 13 suggests that there are a lot more potential nameservers than I thought. This list covers a lot, but the only likely candidates are MaraDNS and PowerDNS's caching server.)
2010-08-13
PPP over ssh: solving problems with indirection
There is an old aphorism in Computer Science that any problem can be solved by another level of indirection (Wikipedia credits it to David Wheeler). Today I have an illustration of this.
I mentioned that I was having problems
with USB serial ports on my just upgraded to Fedora 13 machine.
Specifically, starting up PPP on such a port would hang the pppd
process (and then any other process that touched the serial port).
Since my DSL link is still down, this is a problem.
It turns out that the specific issue is that on my machine,
trying to switch to the PPP line discipline on a USB serial port
hangs the process; I suspect locking issues between the kernel's TTY
layer and USB layer. Reasonably, the PPP daemon switches its tty to the
PPP line discipline pretty much the moment it starts, and there goes my
connection attempt (and my dialup connection).
(I suspect that this happens on all SMP x86_64 Linux machines with a recent enough kernel, and possibly all SMP machines. It doesn't happen on a uniprocessor x86 machine. Interested parties can see the Fedora bug report.)
This is not a general bug with the TTY layer's handling of the PPP line discipline, or it would have been noticed well before now. In particular, you can switch to the PPP line discipline on a pseudo-tty without problems.
(I wound up testing this sort of by accident. My PPP account has a
somewhat weird setup, and the most convenient way to test that it had
survived the Fedora 13 upgrade was to just ssh in to it in a terminal
window and see if it spewed a PPP connection initiation at me or printed
errors. After the upgrade I tried this and had it work, and I thought
nothing of it until later.)
So I solved my problem with indirection; I arranged to run pppd on
a pty instead of on the serial port itself, transparently passing all
IO back and forth between the serial port and the pty. This needed
something to do this work, and the simple program for this is ssh in
its transparent mode. So now my connect script doesn't directly log in to
my PPP account; instead it logs in to a regular account and immediately
does an 'ssh -e none pppme@localhost'.
(I could have written a program to do this without the SSH overhead, but
ssh has the great virtue of already existing and working and this is
an expedient hack that I sincerely hope is not living on for too long.)
Sidebar: the logical extension of this hack
Suppose that you have two machines; one machine with a dialin modem and
a regular account but no PPP setup, and another machine where you can
actually run PPP but you don't have a (working) dialin modem. We can
solve the problem of getting a PPP link up in this situation in exactly
the same way; since we're using ssh, we can perfectly well ssh off
to another machine entirely instead of localhost. It may not have great
latency and performance, but as the wise man once observed, working at
all is better performance than not working.
(Extensions to the situation where the first machine can't directly talk to the second machine are left as an exercise for the reader.)