2007-05-30
A gotcha with the automounter and loopback mounts
On Solaris, there is a combination gotcha with the automounter and mounts on the same host. It goes like this:
- your fileserver normally has /dev/whatever mounted on /export/foo.
- your generic automounter configuration mounts fileserver:/export/foo as /foo.
- you need to do some maintenance to the filesystem, so you unshare and unmount /export/foo.
- after you're done you try to remount it, but you get a message that
the mount point is busy. The only mention of /export/foo that
mountshows is something that looks like:/foo on /export/foo ...
What has happened is that something on your fileserver tried to touch
/foo during your maintenance, so the automounter went ahead and
mounted it from where you told it to. Loopback mounts don't check NFS
share permissions so the mount wasn't denied, and loopback mounts (like
NFS mounts) just put directory A on directory B, so the automounter
didn't stop because there was no filesystem there; there was an
/export/foo directory and that was good enough.
The direct way out is umount /foo. Unfortunately this may not be
good enough if something is actively banging on that name, because
the automounter will just remount it again; you may need to find the
something and shoot it.
(In our case it was mail delivery. Why we are doing mail delivery directly on the fileservers is a long story.)
2007-05-28
Why ZFS's data integrity is less important than Solaris's usability
The bottom line is that Solaris is hard to administer (yeah, it's a fair cop), so server data is just going to have to suffer. Hopefully some day Solaris will be as easy as redhat, or debian, or ubuntu, or <insert name of distro here>. Some day. Meanwhile, I'll choose data integrity over ease of administration.
The problem with this is that quiet disk corruption is not currently a big issue for most people; it just don't happen all that often, at least that people notice, or people would be howling in pain right now. You can argue that people just haven't noticed the corruption that they're experiencing now, but the counter-argument is that if people haven't noticed it it's clearly not that important to them (yet, more or less).
Or to put it another way: the problem for Sun is that they are trying to sell a better mousetrap when people don't feel that they have a mouse problem (or at least not a mouse problem that their existing mousetraps can't deal with).
(Perhaps Sun has done studies that show that disk systems and so on are going wrong much more often than people expect, or that future disk systems will inevitably have higher error rates, or the like. That would be newsworthy and I would expect to find that sort of stuff mentioned at the ZFS pages.)
Even without a mouse problem, people would still go for the better mousetrap if it was otherwise a more or less neutral choice, but it is not. To extend the metaphor, the better Sun mousetrap is uncomfortable and has sharp bits that poke you reasonably frequently. That it is cool and nifty starts to fade after the first few times you have to apply bandaids.
And that is why ZFS's data integrity features are less important than Solaris's ease of administration. In practice, ease of administrations matters more to more people, because right now relatively few people are seriously worried about silent data corruption whereas everyone has to administer their machines.
(In other words, people will indeed often choose practical ease of administration over (theoretical) data integrity, whether or not they are willing to admit it out loud.)