Wandering Thoughts: Recent Entries

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web.

2010-02-09

Why your program should have an actual configuration file

Every so often, someone says something like 'you know, our program has a configuration file but also supports runtime reconfiguration via some magic. Clearly this is wrong, so what we should do is get rid of our configuration file and just make sure the running state is persistent'. If they're feeling nice, they add that the running state will be saved as an XML file.

Every time people say this, sysadmins cry. Here is a very important thing for real deployments of your program in real environments: configuration files are a good thing because they are really easy to manage. Running state that is updated by applying changes (often non-idempotent changes) is much harder.

First, let's get something out of the way: machine generated, automatically updated XML files are not configuration files in any conventional sense that is useful to sysadmins. They are an internal persistence mechanism that may, perhaps, have vaguely useful and inspectable contents (but generally not). So regardless of XML or not, if you go down this route you do not have a configuration file but instead a program with configuration state that persists over reboots and restarts.

Let's inventory some of the things that you lose when you merely have persistent configuration state without actual configuration files:

  • you cannot configure the program without the program actually being running. Programs often have undesirable behavior when started in an unconfigured, misconfigured, or inaccurately configured state.

    Among other things, this means that you can't prepare alternate configurations in advance; you must build them on the fly.

    (Or you must build them on another machine or in another instance of the program, shut both down, and port the magic persistence database over in whatever form it is in, assuming that it does not have host or instance specific data buried in it that you must scrub out.)

  • you cannot atomically make a bunch of changes, having them all take effect at once by putting a new configuration file into place and restarting the program (well, unless there's an explicit 'batch changes together' mechanism). Instead you must make the changes reconfiguration operation by reconfiguration operation. Much like before, this can result in the program temporarily operating in a highly undesirable state. At a minimum, it's going to complicate planning changes.
  • corollary: you can't easily switch configurations or choose different configurations based on outside conditions.

  • automatically updating configuration files clash, potentially badly, with attempts to maintain configuration files through version control systems, automated deployment mechanisms, and so on.

  • it is (or should be) easier to understand a configuration that is written out in a configuration file than one that is the implicit results of applying a bunch of configuration change operations.

    (If it is not, let's be honest here: you need a better configuration file format.)

  • it is much easier to update configurations by providing new files than it is to update configurations by applying configuration changes. There are lots of mechanisms to put new files into place; there are very few to carefully run sequences of commands, keeping track of what ones have already been executed successfully.

I could go on, but I think I'm going to stop now; I hope that you get the point. Configuration files don't exist merely because those other programmers are lazy people, they exist because they're actually a pretty good solution to a whole bunch of problems at once. Getting rid of them is almost never forward progress.

programming/UseConfigurationFiles written at 00:34:14; Add Comment

2010-02-08

A thought on deliberately slow disaster recovery

Given my earlier entry, here is a thesis: some disasters are big enough that you should stop trying to recover rapidly.

The problem with attempting rapid disaster recovery is that significant disasters are high stress, high pressure situations. Unless you have very good checklists, this is exactly the sort of situation where it's easy to have something go catastrophically wrong through various situations; missed steps, miscommunication between people about who was doing what, failing to notice problem indicators under the pressure of driving full speed ahead, interruptions and distractions making people lose their place, and and so on.

So in this sort of situation, maybe what you should do is slow down. Back off, reduce the stress level, be methodical. Take the time to be organized. Stop sometimes to take a breather. Yes, this requires accepting that the systems will come back up slower than you might have been able to achieve if you went all out and everything went well. But in return, you are much more likely to avoid making the situation (much) worse.

This is a new way of thinking about crisis handling for me, because I am quite a lot a 'go, now now now!' type of person when trying to fix problems. (And yes, some of the time I have probably made the situation worse by rushing to slap apparent bandaids on things; my instinct is to get the system up now and sort out the situation later and, well, this is not always the right answer.)

There's two things that strike me about this. First, the most dangerous crises and disasters from this perspective are not necessarily the huge ones, but the ones that have the highest potential for further damage, the ones that involve your critical infrastructure but have not already done much damage to it.

(To put it one way, if your machine room has burned down you have very little left to lose, no matter what you do.)

Second, this is not necessarily going to be easy. There are going to be a lot of people yelling at you to get things going faster, and a lot of pressure on you in general. I suspect that you're going to want management agreement on this, in advance (because you're unlikely to get it at the time, not with people yelling at your management too).

sysadmin/SlowDisasterRecovery written at 01:16:07; Add Comment

2010-02-07

The problem with blog footnotes

Here is something that has just occurred to me (courtesy of seeing an example of it): footnotes are hard to do well in blogs, and may need actual software support if you want them to be completely correct.

The conventional way of doing footnotes in HTML is to use fragment URLs and anchors, with the footnote text at the bottom of the entry and your choice of footnote markers in the main text. But, like anything involving anchors, this means that you need to come up with unique anchor names.

On one level this is no problem; you can just use 'fn:1', 'fn:2', and so on. But on another level this is a problem for blogs, because blog entries are repeatedly aggregated together with each other on web pages. When you put multiple footnote-using entries on the same HTML page, you need all of their anchors to be unique; you are not likely to get this if you use 'fn:1' style anchors. (This is especially pernicious once you start considering syndication feeds and 'planets', that put content from multiple blogs on the same HTML page.)

You can just punt on the issue and say 'well, it's up to the author to come up with unique anchor text (ideally globally unique text)', but in practice people won't always do this and this is equivalent to having non-functional footnote links under some circumstances.

Admittedly, I suspect that most people won't really care about all of this, and will be perfectly happy using 'fn:1' style links and having them not work. Regardless of whether the actual links work, your intent is likely to be pretty easy for users to follow.

(And who knows, maybe the proper implementation of footnotes in blog entries is pop-up alt text, like xkcd famously does on the comics images. Alternately, footnotes are a printed thing that are not appropriate in HTML.)

web/BlogFootnoteProblem written at 03:06:17; Add Comment

2010-02-06

Why a laptop is not likely to be my primary machine any time soon

I know and read a number of people who use laptops as their primary machines, but I'm one of the people who's not interested in the idea (even ignoring any issues of relative prices). I wound up actually thinking about the question recently, and as it turns out I think I have a fairly odd set of reasons for it.

So, here they are so far:

  • I have very particular tastes in keyboards (I have used a BTC-5100C keyboard for more than a decade) and for the space immediately in front of the keyboard. Laptops may have decent keyboards, but they don't have my keyboard.

  • I want a fairly physically large display with good resolution, especially good vertical resolution; when there's room for it, I want two of them.

  • I use two drives in my systems in order to have mirrored (system) disks. (Of course this can have drawbacks.)

In the past, my desire for Unix (ideally Linux) would also have been a significant obstacle, but my impression is that it's now relatively easy to find a nice modern laptop that has good Linux support. (Hopefully I'm not wrong.)

Another way of thinking about this is that I have two roles for computers: the computer I sit in front of all the time, and the computer that I take places for relatively moderate use. For the heavily used computer, I have strong and very particular opinions about the pieces of the computer that I interact with a lot (the keyboard, the displays), but I'm indifferent to the rest of it (provided that it's quiet). I don't care as much about the casual computer, but I want it to be small, light, and still nice for productive work.

(The late Dell Mini 12 is about my platonic ideal of the casual laptop in form factor, screen resolution, and keyboard.)

It's pretty clear to me that some of these desires clash even in the best of circumstances, particularly the displays; a laptop screen big enough to be one of my regular displays makes the laptop too big to be conveniently portable. Thus, if I tried to use a laptop for both roles the only use I'd get for it in the full time usage role would be as the system unit of a desktop system, as I wouldn't use either its display or its keyboard (and I'd still only have one system disk). If I absolutely had to have only computer this could be workable, but if not, there's little advantage to it.

I suspect that other people are generally much less particular and picky about their keyboards, displays, software, and so on. (Or, alternately, they have found a laptop maker whose keyboards and screens they are as fond of as I am fond of my favorites.)

(This entry was sparked by the discussion here. Plus, I feel like not writing about documentation for days on end.)

tech/WhyNoLaptop written at 00:44:34; Add Comment

2010-02-05

Emergency procedures checklists need check steps

Given my previous entry, here is a thesis about emergency procedure documentation: you shouldn't just have a checklist for what to do, your checklist should include actual check steps, points where you stop to explicitly confirm that you've done something and it actually works.

Checklists are a good idea, but the common form of a checklist is just a list of steps to be carried out. Under the stress of an emergency situation, I don't think that this is good enough. First, your checklist implicitly assumes that everything works right, and second, it's too easy to be rushed, distracted by some interruption, sleep-deprived, or whatever while you're going through the checklist and lose track of where exactly you are, miss-do something, or miss the potentially subtle signs that something is not working the way that your checklist assumes.

Thus, you need spots in your checklist where you not do things but check things; you take positive steps to make sure that everything is as it should be and that the system is in the state that you and your checklist assume that it is. These checks insure that if something goes wrong, either in the environment or in you carrying out the checklist, that it gets noticed before things go horribly off the rails and explode.

In short: it's not good enough to have a checklist item that says 'throw switch 12'; you need something to confirm that you have in fact thrown switch 12 (and ideally just switch 12) and that the results of throwing switch 12 are what you expect.

You need these checks to be explicit steps in your checklist for the same reason that you have a checklist in the first place; your memory is fallible, especially under stress, and having them written down explicitly maximizes the chances that you will always do this.

(I suspect that one of the lessons that the airline industry can teach system administration is that in this sort of situation it is best to have two people involved, one reading off the checklist and the other one performing the actions and verbally confirming that they've been done. This makes it harder to fool yourself that something has been done or that of course something looks right.)

The corollary to this corollary is that checks should especially be inserted before you about to do damaging operations such as formatting a disk, putting a replacement system online under its production IP address, or force-importing a SAN filesystem on a non-default fileserver.

(Sadly, testing checks is probably even harder than testing documentation normally is; how do you manufacture failures in checklist steps to make sure that your check steps actually do anything useful?)

sysadmin/ChecklistChecks written at 01:15:16; Add Comment

2010-02-03

Outdated documentation is especially risky for sysadmins

The obvious traditional risk of outdated documentation in all its forms is that you rely on it and go wrong somehow; you trust the comments in the source code and write your new code accordingly, and your changes don't work. I think that this risk is especially acute for sysadmins, for two strongly related reasons.

First, much of our documentation tends to be about procedures, not simple information. Following what is actually a wrong or incomplete procedure is a great way to create spectacular failures on the spot. Worse, sysadmins inevitably wind up dealing directly with live systems and live data.

(Yes, you can test procedures just as you test the code that you write, but at some point you have to use them on your live system and this is always somewhat different from the test environment, unless you have a spectacularly complete test environment.)

Second, some of the least used documentation (and thus our most risky ones) is our emergency procedures. When we need to use them, we're in one of the most tense situations possible, under a great deal of pressure to get things fixed now and thus least able to go slowly and carefully and stop if something, anything, seems off. This is the exact sort of situation where incorrect procedure documentation can do the most damage, because people don't stop before they compound a small problem into a huge one.

(Imagine, for example, an off by one error in documentation about how to map disk bay slots to device names. Now add a 'get things back up right away' crisis where you need to replace a disk.)

sysadmin/OutdatedDocumentationRiskII written at 23:20:41; Add Comment

Link: Pollution in 1.0.0.0/8

IANA has recently allocated 1.0.0.0/8 to APNIC, which has caused a certain amount of concern that it is 'polluted' by people already using it for various reasons. Pollution in 1/8 is a report from RIPE Labs on what happened when they announced routing for some bits of it as part of their debogonising work.

This is clearly going to be what they call 'interesting'.

(via Hacker News.)

links/Net1Bogons written at 12:02:33; Add Comment

How to destroy people's interest in updating documentation

Here is one of the less obvious perils of outdated documentation:

Suppose that you have some documentation that is out of date, but not in an obvious way; for example, you have an out of date network layout diagram. Since it's not obvious you don't realize this right away, so you keep on updating the network layout diagram when you make changes to your actual network.

Except that faithfully updating an inaccurate network layout diagram is relatively pointless. When you realize that it is incorrect, you are going to have to re-check most of it anyways, or at least spend a bunch of effort to reconstruct what sections are trustworthy.

This peril of outdated documentation is that updating bad documentation is wasted effort. (Fixing bad documentation is not, but that's a different thing.)

Since updating documentation takes time that you could be using for other things, and it's generally not fun, it does not take too much time to be wasted this way before people stop doing updating documentation entirely. Why do annoying wasted effort, when you could be doing something that's actually productive and useful? (Especially if you did the work thinking that it wasn't wasted effort, only to find out later that what you thought was productive work, well, wasn't. People really don't like that.)

At first, this effect will probably be limited to documentation that is highly suspect. But I don't think it takes much bad documentation before people more or less give up totally, because it is too heartbreaking to waste time this way and they can't stand the idea of it any more; you will lose the culture of documentation. At that point, you can stop talking about updating documentation and start talking about reconstructing it from scratch.

(This is where local wikis are perhaps less than ideal, because at this stage what you really need to do is pave everything so that there is a clear line between 'done recently, can be trusted' and 'is old, do not trust until it has been redone'.)

sysadmin/OutdatedDocumentationRisk written at 01:58:54; Add Comment

2010-02-02

What charging credit cards doesn't prove

Every so often, commonly in the context of SSL certificates, someone puts forward the theory that charging money for things makes the customers somehow more identifiable and reliable than giving it to people for free (with the same other authentication of customers). After all, so the theory goes, when you give people something just because they have a particular email address, that's not much, but when you've charged their credit card, you have a lot more confidence in their real identity.

This is wrong. To explain why it is wrong, let's talk specifically about SSL certificates.

The basic model of 'verifying' SSL certificates is that in order to get a certificate for a domain, you have to prove that you (theoretically) have power over that domain; you have one of a certain number of email addresses at that domain, you can put things on its web server, or something of the like. Most SSL certificate authorities also charge money on top of this; you submit credit card information along with your Certificate Signing Request, they charge your card, and if the charge goes through you get your signed certificate in email. By collecting money from you, they've gotten a stronger verification than before.

Except that they haven't, because I snuck a fast one into this description: charging a credit card is not the same as actually collecting money from it. No SSL CA waits on giving you your certificate until they actually have received your money from the credit card company; the delays involved in that would drive most customers away. Instead they issue SSL certificates very close to on the spot, which means that SSL CAs are not verifying that you can pay them money, they are verifying that they can charge a credit card. And there are a lot of ways to get a credit card number that can have some amount of money charged to it and not have that reversed, rejected, or detected as fraudulent for (say) six hours, if not days.

(Oh, sure, once the charge blows up the SSL CA will try to revoke the SSL certificate. Good luck with that.)

(This is kind of a reaction to this, because I think this misapprehension is a general one.)

tech/ChargingNoProof written at 01:39:20; Add Comment

2010-01-31

More vim options it turns out that I want

Much to my displeasure, Ubuntu seems to have been steadily making the version of vim that they ship more and more superintelligent. I do not want a superintelligent vi; in fact, superintelligence is a net negative in vim, because unlike with GNU Emacs it is almost always wrong. So, unlike the first set of vim options, these are negative options that I need, things that turn off settings.

So far, I have wound up with:

set formatoptions=l
Turns off automatic line wrapping. Since vi is my sysadmin's editor and sysadmins edit configuration files a lot, automatic line wrapping is anti-feature.

(I hate it in Emacs too, when it happens.)

let loaded_matchparen = 1
Turns off blinking matched delimeters, like () and [] and so on. I find this irritating and distracting.

filetype plugin off
This turns off all sorts of superintelligent automatic formatting that I aggressively don't want.

(At some point I may look into the best way to fix the line ending issue, but I haven't been annoyed enough yet.)

Some reading in the vim help files suggests that 'set paste' will also do a lot to turn off all of the superintelligence that I so dislike. Using Ubuntu's 'tiny' version of vim also goes a long way to disabling various things I don't like, but it has the side effect of making vim not like the latter two .vimrc settings here (and it's not something that I can turn on globally on our systems and so have all the time, no matter what environment or UID I am at the moment).

All in all, I really wish vim had a mode where it just settled for being a better vi instead of trying to be a bad imitation of GNU Emacs. As before, if I want GNU Emacs, I know where to find it.

linux/VimOptionsII written at 23:02:28; Add Comment

Thinking about syndication feeds and spoilers

DWiki has always had the ability to do the common blog thing of 'click here to see the rest of the entry'; when I put it in, I expected to use it for things like the detailed stats at the end of this entry. Because I am crazy that way, I built the feature so that it could apply on the main page (pages, really), in syndication feed entries, or both, depending on what options I turned on in any particular entry.

In practice, it turned out that I really don't like using cuts in syndication feed entries, for at least two reasons. First, syndication feed readers already have good ways to skip parts of entries and even whole entries (or at least they should), which makes cutting for volume mostly unnecessary. Second, partial entries are in annoying in general because they effectively force you out of your syndication feed reader and into your browser in order to read the full entry.

(In fact it turns out that I don't like cuts very much in general, so I barely use them even on the main pages.)

However, this does leave one case unhandled: spoilers. Places like the anime blogging community have come up with decent Javascript-based solutions for people who are reading your main site, but this is a complete non-starter in syndication feeds. In fact you can't even count on the old 'set the colour of the text to the background colour' trick, as modern syndication feed readers can strip styling as well.

My reluctant conclusion is that handling spoilers may well call for using a cut even in syndication feeds, with the annoyance of having to click off to read the entry being the lesser of two evils. The other approach is just to note that there will be spoilers at the start of an entry and count on people to use their feed reader's 'skip to next entry' feature.

(Spoilers are not generally relevant to WanderingThoughts, but they sometimes come up for me elsewhere.)

tech/CutsInSyndicationFeeds written at 01:39:52; Add Comment

These are my WanderingThoughts
(About the blog)

GettingAround
Full index of entries
Recent comments

This is part of CSpace, and is written by ChrisSiebenmann.

* * *

Atom feeds are available; see the bottom of most pages.

This is a DWiki.
(Help)

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web

Search:
[There's more, starting at 2010/01/30 or Previous 11]
(Previous day)
By day for February 2010: 2 3 5 6 7 8 9; before February.

Page tools: See As Blogdir, See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.