Wandering Thoughts archives

2010-05-16

You should also document why you didn't do attractive things

I recently needed to do something to our MoinMoin-based wiki. As it was currently configured, doing that thing was a parade of annoyance, so I wound up rummaging through the configuration options and found one that significantly simplified my task. Now, suddenly, I had a dilemma.

Our existing MoinMoin configuration didn't have this option turned on, and the person who configured our MoinMoin instance isn't around any more to be asked questions. So, had they overlooked this option when they set up the wiki, or had they tried it and discovered that sadly it didn't actually work or worse, had some undesirable side effect elsewhere?

(We have documentation on the configuration settings that they use, but as common it covers what got changed from the defaults. And this option defaults to off.)

So, I have a suggestion: when you are documenting how you configured something, you should take a moment to write down all of the attractive-looking options and approaches you tried out but that turned out to not work, caused problems, or whatever. Otherwise you risk someone coming along later, seeing that you have not done something that will make their life easier, turning it on themselves on the assumption that you overlooked, and having things explode.

Conversely, documenting things that you have tried but not used gives those people some confidence that you actually overlooked this convenient option and they can thus turn it on with only ordinary precautions and concerns.

By the way, your future self is likely to be one of those people (unless you have a better memory than I do for things that I tried and rejected).

PS: this generalizes to more than software configuration files.

(To add a conclusion to my MoinMoin story, so far it seems that the configuration option I found works fine and has no bad side effects. I'm happy.)

sysadmin/DocumentUnusedSettings written at 22:10:18; Add Comment

A theory about our jumbo frame switch firmware bug

Last entry I mentioned that I now had a theory about our odd switch failure with jumbo frames, where after a power cycle the switch would start doing jumbo frames remarkably slowly until you went into the configuration system and re-selected the 'do jumbo frames' option. This is theory.

As I've mentioned before, modern switches have two parts; a high speed switching core and a slower management processor that handles everything else. If the jumbo frames weren't being handled by the switching core but were instead being passed up to the management processor, you could expect things to work but be very slow, which is just what I saw.

So how could things get that way? My theory is that the code that configured the switching core on boot was doing an incomplete job of enabling jumbo frames; it told the switching core to accept them, but didn't turn on everything that was needed to have the switching core actually switch them. The code that got run when you turned on jumbo frames in the configuration system did do the full setup, hence explicitly 'enabling' jumbo frames in the configuration interface suddenly making them work at full speed.

(This theory also leads to a decent story about how the switch passed the vendor's testing, since most testing starts from factory default settings.)

One of the things that this reinforces for me is that modern hardware is not just hardware; it has a lot of non-trivial software embedded into it. This matters because software generally has much more complicated failure modes than physical hardware, which means that even what we think of as simple hardware can behave very oddly in narrow circumstances.

(The poster child for this is hard drives, which now run a scarily large amount of onboard code to do increasingly sophisticated processing, more or less behind your back. All things considered, I am sometimes impressed that modern HDs work anywhere near as well as they do.)

tech/SwitchJumboTheory written at 00:55:22; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.