programming/NondeterministicGCProblem written at 01:52:22; Add Comment
The problem with nondeterministic garbage collection
Yesterday I mentioned in passing that I think that nondeterministic garbage collection is a significant mistake. Today it's time to expand on that, and the first step is defining my terms so that people can understand me. By nondeterministic garbage collection I mean GC that only collects garbage objects at unpredictable amounts of time after they become unused. This is in contrast to deterministic prompt garbage collectors that collect straightforward garbage objects immediately or almost immediately after they become unused.
(I believe that prompt GC is almost always based on reference counting.)
The problem with nondeterministic GC can be illustrated in two Python examples. First, the version with prompt GC:
data = open("/some/file", "r").read()
Then a correctly written version in the face of nondeterministic GC:
fp = open("/some/file", "r") data = fp.read() fp.close() del fp # to clean up buffers
In short, the problem with nondeterministic garbage collection is that it forces you to do manual storage management. You can't rely on garbage collection if you care about memory usage or if object lifetime has side effects (such as keeping files open), because GC may be arbitrarily delayed; instead you must explicitly do cleanup and try to destroy objects. Instead of becoming a great simplification, GC turns into something that handles only trivial objects (or what you hope is trivial objects) and objects with complex lifetimes.
Actually it's even worse than I've shown here. In a nondeterministic GC
environment there is absolutely no guarantee that my '
(The implementation of file objects can't help me out because it too
doesn't have any magic way of destroying any internal buffers
Manual storage management and object lifetime management sucks. It's what garbage collection is supposed to get us away from. Moving back to it is not progress for any language that is supposed to be biased towards convenience.
(I believe that people like nondeterministic GC because reference counting GC has performance issues with updating reference counts all the time, especially in threaded environments.)
I'm sure this observation is not new to me, and in fact I may have read a version of it in my random walk through the multi-faceted Internet.
python/PyPyView written at 00:49:07; Add Comment
My current view of PyPy
In a comment on my entry about Go as a 'Python with performance' for me, I was asked about my views on using PyPy for this. I flailed around a bit in my reply there and since then I've been thinking about it more, so it's time to go on at more length.
The simple version is that today I think of PyPy as perhaps a way to make some Python programs go faster but not as a way to write fast Python programs. If I have an existing Python program that fit what I think of as the PyPy profile (long-running, generally does basic operations, and I'm indifferent to memory usage) and I absolutely needed it to go faster, I'd consider feeding it to PyPy to see what'd happen. If it speeds up without exploding the memory usage, I've won and I can stop. If that doesn't work, well, time for other measures. However, PyPy is too unpredictable to me for me to be able to write Python code that I can count on it speeding up dramatically, especially if I also want to control the memory usage and so on.
There are other pragmatic issues with using it. For a start, the version of PyPy available to me through distribution packages varies widely from system to system here and with that variance I can expect an equally large performance variance. The current version of PyPy is 2.2.1 while Fedora 19 has 2.1.0 and Ubuntu 12.04 LTS is back at 1.8. Beyond that, a certain amount of interesting Python environments just don't work with PyPy; for example, I can't use PyPy to speed up parts of a Django app deployed through mod_wsgi (not that the app is likely to have a performance bottleneck anyways, that's an illustration).
There's also two serious problems with PyPy today that make it far less interesting for me (at least as of the Fedora 19 version of 2.1.0). The first is what I alluded to above; PyPy has a significant startup delay before it starts speeding up your program and thus doesn't really speed up short running things. I'm pretty sure that if I had a Python program that ran in a second, PyPy wouldn't speed it up very much. The second is that PyPy quietly explodes on common Python idioms under some circumstances.
For an example that I have personally run into, consider:
data = open("/some/file", "r").read()
This is a not uncommon Python idiom to casually read in a file. If
you try this in a PyPy-run program in any sort of situation where you
do this repeatedly, you'll probably wind up with a 'too many open
files' error before too long. In straight (C)Python the open file is
immediately garbage collected at the end of the
Yes, yes, you say, this is bad style. The reality is that this 'bad style' is common in Python, as are other examples where code assumes that dropped or out of scope objects will be immediately garbage collected. I don't want to spend my time troubleshooting mysterious problems in otherwise reliable long-running Python programs that only appears when I run them under PyPy. Not running them under PyPy is by far the easier solution, even if it costs me performance.
(In my opinion non-deterministic garbage collection is actually a serious problem, but that's another entry.)
tech/HardwareIsWeird written at 01:20:53; Add Comment
Hardware is weird (disk enclosure edition)
I've written before about our disappearing ESATA disk problem, but since I wrote that the situation has become weirder. In fact I think it makes a good illustration about just how odd hardware can be (and why I prefer working on software to banging my head against hardware).
Here is what happens. We have a 15-bay ESATA based external disk enclosure, with the 15 disks sensibly divided into three port multiplier based ESATA channels of five disks each. If the enclosure and the server connected to it were powered off, the enclosure was powered up (and let sit), and then the server powered up, one or more of the ten 4TB ESATA disks in the system were failing to be recognized. As initially set up, we had the ten disks in two channels and the third channel empty. Then we did some shuffling and got to the serious weirdness.
The failed recognition pattern was as follows: the first five disks on the first channel probed by Linux were recognized correctly, regardless of which physical channel it was on the enclosure. On the second and possibly third channel probed by Linux, the second disk present was not recognized (regardless of which physical slot it was in); it would be probed briefly but then Linux would be unable to get it to go and it disappeared (until the server was rebooted).
We initially saw the problem with some 4TB Hitachis. We tried sticking a 4TB WD SE drive into the deadly spot on the second (probed) channel and it too showed the same problem. However an ancient WD 80GB in the same spot worked perfectly; unlike the two sorts of 4TB drives, it was recognized fine.
(At this point we gave up because we had a stable system with the ten 4TB Hitachi drives we were committed to use. We don't care that we've sacrificed an 80 GB drive as basically a spacer or that we can't really use the remaining drive bays.)
It bugs me that I can't come up with any relatively rational explanation for what's going on here. It's possible that something is going on with spinup power draws, but if so it's very unusual. It also bugs me that I have basically no diagnostic tools to see what's going on; a real investigation would probably require a bunch of equipment to, eg, monitor power draws during disk probing.
(It's clear that there is something different about probing the disks when the enclosure has just been powered on as opposed to later on if the server just reboots. Even on the 'good' first probed channel, it takes significantly longer to probe all the disks on a cold power up. My vague theory is that this is because the disks aren't fully spinning up until the first time the host talks to them, but I have no idea if this is true and if modern disks behave this way.)
PS: I have no idea if the different channels are fed power by any means that separates power for one channel from power for the others. For all I know right now, all disks are powered off a single run from the power supply. I assume (but have not verified) that there are three de-multiplier daughter cards in the case, one for each ESATA channel and set of drives, and they are wired separately. The external ESATA cables are certainly physically separated enough to make that plausible.
sysadmin/SometimesYouStop written at 02:23:58; Add Comment
Sometimes the right thing to do is to stop (and even to give up)
I'm generally someone who is happy to keep chasing an oddity or a mystery, to keep plugging away at the problem to at least chart it out and perhaps figure out what is going on. I suspect that this is something that a lot of sysadmins feel; if there is something wrong, we itch to figure it out and put it to right. And the satisfaction of finally succeeding is an excellent feeling. But sometimes this is absolutely the wrong thing to do. Sometimes the right thing to do is to stop with the mystery not understood, or even to give up entirely.
I've been continuing to work away on our disappearing ESATA disk problem since I wrote about it; I've tried more things, gotten more specific information, and the whole thing has gotten weirder. But at the end of this past week we decided to stop all of that. I managed to get the system to a precariously balanced point where it's stable and that's that. In fact we're going further than just stopping with a stabilized system; in the longer run we're giving up on it entirely and will be migrating the whole thing to different hardware. We'll write off the disk enclosure as a loss (the server is a generic one and can be reused for other things).
The direct reason that this makes sense is that we have gone far enough to establish that something very odd is going on. Even if we continue investigating and discover exactly what the problem is we have no confidence that we'll be able to fix it, and in the mean time we have managed to stabilize the system as-is. Until we can at least identify the problem, we can't trust the enclosure in general. We could do a bunch of experiments to chart out what disks we can add to the enclosure where and still have an apparently stable system, but that wouldn't make us trust it and if we can't trust it we don't want to use it.
But the bigger reason to stop is the cost/benefit ratio of continuing to investigate the problem. I could easily spend a bunch of time and effort conducting experiments to map out the precise contours of the problem (and maybe find some clues to its cause). But by far the most likely result of these experiments is a pile of data on a disk enclosure that we no longer trust. In the best case we have minimal expansion in this enclosure and we're certainly not going to buy any more of them, so the smart choice is to say 'this is good enough, we've spent enough time on it'.
Or in short: sometimes you lose. When you are losing, the smart thing to do is to recognize that and lose fast. This is painful, since we don't like to lose, but it's also best. Try not to let it get to you.
(This would be more obvious if staff time was considered a cost on par with hardware, but universities almost never think about staff time that way.)
PS: yes, this entry is being written in part to make me feel better about throwing in the towel on this issue. We're all squishy humans with those awkward emotions.
linux/NFSReadonlyAtime written at 03:18:07; Add Comment
Things get weird with read-only NFS mounts and atime on Linux
Actually things here are much more interesting and odd than you
might think. In light of the fact that mounting the filesystem
(The server mounting the filesystem with
It turns out that this is an illusion. As a stateless protocol, NFS
servers do not send any sort of notification to clients when a file's
attributes change; instead NFS clients have to check for this by issuing
(Attribute caching is covered in the
So what's really happening here is that with a
The corollary of this is that mounting your NFS filesystems with
unix/NFSReadonlyLevels written at 03:02:09; Add Comment
The three levels of read-only NFS mounts
It's sometimes useful to understand that there are three ways that an NFS mounted filesystem can be 'read-only'. Let's call them three levels:
These can certainly be stacked on top of each other (a read-only server filesystem, NFS exported as read-only and mounted as read-only on clients) but they don't have to be. For instance you can NFS export filesystems as read-only but mount them read-write on clients (we do this here for complex reasons).
Now let's talk about atime and atime updates. In NFS, atime updates are the responsibility of the server, not the clients. More specifically they are generally the responsibility of the underlying server filesystem code or VFS, not specifically the NFS server code, and as such they can happen when you read data through a read-only NFS mount or even a read-only NFS export. The NFS clients asks to read data, the NFS server code makes a general VFS 'get me data' call, and as a side effect of this the VFS or the filesystem updates the atime (if atime updates are enabled at all).
(This implies that not all client reads necessarily update the server atime, because a client may satisfy a read from its own file cache instead of going to the server.)
If you think about it this is actually a feature. If you have atime enabled on a read-write filesystem mount, you have told the (server) kernel that you want to know when people read data from the filesystem and lo, this is exactly what you are getting. The read-only NFS export is just to tell the NFS server that it should not allow people to do 'write' VFS operations.
(Since you can export the same filesystem read-write to some clients and read-only to others, suppressing atime updates on read-only NFS exports could also produce odd effects. Read a file from client A and the atime updates, read the file from client B and it doesn't. And all because you didn't trust client B enough to let it actually make (filesystem level) changes to your valuable filesystem.)
Sidebar: NFS exporting of read-only filesystems
You might think that the NFS export process should notice when it's
exporting a read-only filesystem as theoretically read-write and
silently change it to read-only for you. One of the problems with this
is that on many systems it's possible to switch filesystems back and
forth between read-only and read-write status through various mechanisms
sysadmin/BodyOfKnowledgeThoughts written at 01:02:39; Add Comment
Some thoughts on a body of knowledge for system administration
Earlier this week I read an entry from this year's SysAdvent, Introducing the Guide to Sysadmin Body of Knowledge. If we're going to talk about a sysadmin body of knowledge, the first thing we need to talk about is whether this BoK is intended to be descriptive or prescriptive.
A descriptive BoK essentially restricts itself to an inventory of practices with descriptions about what good or bad things can happen when you use the particular practice. That's why it's a descriptive BoK; it simply describes things. A descriptive BoK generally should make an attempt to be even-handed or at least honest because otherwise it's not really honestly descriptive.
A prescriptive BoK says 'these are best practices and you should do them'. This almost necessarily comes with a side order of 'these practices are bad and should be avoided by all right-thinking people'. There are two problems with this. The first problem is that this is intrinsically a strong editorial stance on how people solve problems in system administration. This is going to be controversial.
The larger problem, a problem which also afflicts a descriptive BoK, is that system administration is nowhere near being a settled and fully developed field. Best practices in system administration are evolving on an ongoing basis as people come up with new solutions, try them out, refine them, work out how to make them simpler and easier to use, and so on (sometimes also with 'and discover that they don't work'). A prescriptive BoK that says 'do things this way' is freezing the state of the field as it is right now (at best). Unless the BoK is constantly updated, tomorrow and next month and two years from now it will still tell you to do things in what was the best practices for today, not where the field has moved to by then.
To get a more solid idea of what this might mean for the field, imagine that you had a BoK for the field that was compiled five years ago. How many of what are now considered best practices would be mentioned? How many now-deprecated things would be recommended (or at least not disavowed strongly)? One can get an idea of this by looking at old books on system administration (which is basically all of them today) and asking what they're missing.
A prescriptive BoK is usually considered more desirable by people because it tells you what to do, but this very feature makes it more harmful when it's out of date. The out of date BoK is not only silent on new things, it implicitly tells people not to do them. To do things in what is now the new best practices you must actively go against what the BoK is telling you to do. The result is that a respected prescriptive BoK would effectively freeze the parts of system administration that it described; new (best) practices will move through our field only very slowly until the BoK was revised and people learned about it.
A related issue is that people are probably only going to be willing to reread the BoK so many times before they throw up their hands and abandon it entirely as too much work to keep up with. Again this has implications for a constantly evolving field where you should really be revising the BoK every few years.
Now let me be optimistic (or at least temper my pessimism here). I do think it's possible to create a body of knowledge with general knowledge and principles that have proven timeless or are likely to be, and that this would be a quite useful thing to do (especially if someone can talk coherently about various underlying principles). But such a body of knowledge is not going to deliver specific actionable 'do this thing' advice, it's just going to be a high level guide.
(This elaborates on some things I brushed lightly over back in my earlier entry on professional knowledge, certification, and regulation.)
sysadmin/SudoNotAuditingMechanism written at 01:31:10; Add Comment
* * *