2014-09-10
The cause of our slow Amanda backups and our workaround
A while back I wrote about the challenges in diagnosing slow (Amanda) backups. It's time for a followup entry on that, because we found what I can call 'the problem' and along with it a workaround. To start with, I need to talk about how we had configured our Amanda clients.
In order to back up our fileservers
in a sensible amount of time, we run multiple backups on each them
at once. We don't really try to do anything sophisticated to balance
the load across multiple disks both because this is hard in our
environment (especially given limited Amanda features) and because
we've never seen much evidence that reducing overlaps was useful
in speeding things up; instead we just have Amanda run three backups
at once on each fileserver ('maxdumps 3' in Amanda configuration).
For historical reasons we were also using Amanda's 'auth bsd' style
of authentication and communication.
As I kind of mentioned in passing in my entry on Amanda data flows, 'auth bsd' communication causes all
concurrent backup activity to flow through a single master amandad
process. It turned out that this was our bottleneck. When we had a
single amandad process handling sending all backups back to the
Amanda server and it was running more than one filesystem backup
at a time, things slowed down drastically and we experienced our
problem. When an amandad process was only handling a single backup,
things went fine.
We tested and demonstrated this in two ways. The first was we dropped
one fileserver down to one dump at a time and then it ran fine. The
more convincing test was to use SIGSTOP and SIGCONT to pause and then resume backups
on the fly on a server running multiple backups at once. This
demonstrated that network bandwidth usage jumped drastically when
we paused two out of the three backups and tanked almost immediately
when we allowed more than one to run at once. It was very dramatic.
Further work with a DTrace script
provided convincing evidence that it was the amandad process
itself that was the locus of the problem and it wasn't that, eg,
tar reads slowed down drastically if more than one tar was
running at once.
Our workaround was to switch to Amanda's 'auth bsdtcp' style of
communication. Although I initially misunderstood what it does, it
turns out that this causes each concurrent backup to use a separate
amandad process and this made everything work fine for us;
performance is now up to the level where we're saturating the
backup server disks instead of the network.
Well, mostly. It turns out that our first-generation ZFS fileservers probably also have the slow backup
problem. Unfortunately they're running a much older Amanda version
and I'm not sure we'll try to switch them to 'auth bsdtcp' since
they're on the way out anyways.
I call this a workaround instead of a solution because in theory a
single central amandad process handling all backup streams shouldn't
be a problem. It clearly is in our environment for some reason, so it
sort of would be better to understand why and if it can be fixed.
(As it happens I have a theory for why this is happening, but it's
long enough and technical enough that it needs another entry. The short version is that
I think the amandad code is doing something wrong with its socket
handling.)
2014-09-03
Why we don't want to do any NAT with IPv6
In a comment on yesterday's entry on our IPv6 DNS dilemma, Pete suggested that we duplicate our IPv4 'private address space with NAT' solution in IPv6, using RFC 4193 addresses and IPv6 NAT. While this is attractive in that it preserves our existing and well proven architecture intact, there are two reasons I think we want to avoid this (possibly three).
The first reason is simply that NAT is a pain from a technical and administrative perspective once you're working with a heterogenous environment (one where multiple people have machines on your networks). A firewall configuration without NAT is simpler than one with it (especially once you wind up wanting multiple gateway IPs and so on), and on top of that once you have NAT you start needing some sort of traffic tracking system so you can trace externally visible traffic back to its ultimate internal source.
(There are other fun consequences in our particular environment that we would like to get away from. For example, people with externally visible machines can't use the externally visible IP address to talk to those machines once they're inside our network, because the NAT translation is done only at the border.)
The other reason is political. To wit, the university's central networking people aren't very fond of NAT. Among other things, they want to be able to directly attribute network behavior to specific end devices and possibly to block those end devices on the campus backbone. They will be much happier with us if we directly expose end devices via distinct IPv6 addresses than if we aggregate them behind IPv6 NAT gateways, and the vastly larger IPv6 address space means that we have basically no good reason to NAT things.
(The potential third reason is how well OpenBSD IPv6 NAT works. I suspect that IPv6 NAT has not exactly been a priority for the OpenBSD developers.)
Note that in general the source hiding behavior of NAT has drawbacks as well as advantages; to put it crudely, if outsiders can't tell you apart from a bad actor you'll get lumped in with them. In our environment, avoiding this (with no NAT) would be a feature.
2014-09-01
An IPv6 dilemma for us: 'sandbox' machine DNS
In our current IPv4 network layout, we have a number of internal 'sandbox' networks for various purposes. These networks all use RFC 1918 private address space and with our split horizon DNS they have entirely internal names (and we have PTR resolution for them and so on). In a so far hypothetical IPv6 future, we would presumably give all of those sandbox machines public IPv6 addresses, because why not (they'd stay behind a firewall, of course). Except that this exposes a little question: what public DNS names do we give them? Especially, what's the result of doing a reverse lookup on one of their IPv6 addresses?
(Despite our split horizon DNS, we do have one RFC 1918 IP address that we've been forced to leak out.)
We can't expose our internal names for these machines because they're not valid global DNS names; they live in an entirely synthetic and private top level zone. We probably don't want to not have any reverse mapping for their IPv6 addresses because that's unfriendly (on various levels) and is likely to trigger various anti-abuse precautions on remote machines that they try to talk to. I think the only plausible answer is that we must expose reverse and forward mappings under our organizational zone (probably under a subzone to avoid dealing with name collision issues). One variant of this would be to expose only completely generic and autogenerated name mappings, eg 'ipv6-NNN.GROUP.etc' or the like; this would satisfy things that need reverse mappings with minimal work and no leakage of internal names.
If we expose the real names of machines through IPv6 DNS people will start using these names, for example for granting access to things. This is fine, except that of course these names only work for IPv6. This too is probably okay because most of these machines don't actually have externally visible IPv4 addresses anyways (they get NAT'd to a firewall IP when they talk to the outside world, and of course the NAT IP address is shared between many internal machines).
(There are some machines that are publically accessible through bidirectional NAT. These machines already have a public name to attach an IPv6 address to and we could make the reverse lookup work as well.)
Overall, I think the simplest solution is to have completely generic autogenerated IPv6 reverse and forward zones that are only visible in our external DNS view and then add IPv6 forward and reverse DNS for appropriate sandboxes to our existing internal zones. This does the minimal amount of work to pacify external things that want reverse DNS while preserving the existing internal names for machines even when you're using IPv6 with them.
The fly in this ointment is that I have no idea if the OpenBSD BIND can easily and efficiently autogenerate IPv6 reverse and forward names, given that there are a lot more of them than there are in typical autogenerated IPv4 names. If it's a problem, I suppose we can have a script that autogenerates the public IPv6 names for any IPv6 address we add to internal DNS.