Sometimes having a system programmer around is the right answer

June 4, 2011

In SystemProgrammerDanger, I wrote about the danger of having a system programmer around (to summarize, we automatically reach for things like the source code). But there's a flipside of that, handily illustrated by my recent entry on getting stale file errors for local files; sometimes the system programmer approach is the right one.

When my colleague brought up this odd issue he was having with local files giving 'stale filehandle' errors, my first reaction was to grep through the kernel source for places that returned ESTALE errors; I figured that there couldn't be too many of them, since ESTALE is a very specific error (I was mostly correct about this). Reading through the code the grep found soon pointed me to the likely issue, especially once my colleague also reported that the system had disk errors and he'd been seeing odd stat() results for other files. All of this took me only a few minutes (partly because I already had kernel source available).

I'm pretty sure that this was the fastest way I could have found the answer. And I found it by taking a system programmer's path.

(Okay, web searches do suggest that other people have run into this before and have identified it as being caused by disk corruption and sometimes fixed by fsck. This is decent operational advice but doesn't tell you what's really going on. My personal view is that knowing what's really going on is important because it gives you confidence that you've dealt with the real problem instead of just papered over a symptom.)

Written on 04 June 2011.
« Ints, __slots__, and Python 3
A subtle difference between tuples and lists in CPython »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Jun 4 03:04:41 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.