Wandering Thoughts archives

2013-01-18

SLAs, downtime, and planning

Disagreeing with Tom Limoncelli is sort of taking my life into my hands, but sometimes I can't help it. I have large and complex reactions to his All outages are due to a failure to plan, but to start with I want to jump on one bit:

What about the kind of outages that are completely unavoidable? That is a failure to plan to have an SLA that permits a reasonable amount of downtime each year. If your plan includes up to 4 hours of downtime each year, those first 239 minutes are not an outage. [...]

I feel that this is either flat out wrong or misleadingly written and vacuous. At best it is true only for large organizations (like Google) that have decided that they cannot be down ever for more than a certain amount of time no matter what it takes.

Let me use an example that is not as hypothetical as I would like it to be: suppose that our machine room suffered major damage and was a total loss, perhaps from the building burning down, perhaps from a local flood. Depending on exactly what the disaster was, recovery would almost certainly take more than a week. What SLA can we write to cover this?

If Tom Limoncelli means us to take 'SLA' and reasonable allowed downtime hours literally, there is no non-laughable SLA that we can write to cover this. Our SLA would have to say 'allowable downtime: a few weeks (continuous)', at which point it is not a SLA but a joke. But (based on a comment reply he wrote), it seems that Tom doesn't mean this quite literally; instead he means that your 'allowed downtime' should be documented (including circumstances). If this is the case, his article is misleadingly written (since it talks about SLAs only in the usual 'hours a year' terms), unclear, and ultimately essentially vacuous. What he really seems to mean is 'document all of the situations where you will be down for an indeterminate amount of time' (and then get people to agree to them). I don't think that this is useful advice for several reasons.

First, there's very little point to it except as an excuse. It is an exercise in preparing a document that you will hand to management in order to be able to later say that you warned everyone that something could happen. If you have decent management, everyone will look back after the building has burned down and not blame you for the resulting downtime. If you have management that would blame you for not warning them that there would be a major downtime if the building burned down, you need a new job (and it's likely that preparing a document will not stop said management from blaming you anyways).

(If you get explicitly asked 'what could really ruin our week and what can we do about it', then sure, prepare a document that's as comprehensive as you can make it.)

Second, it's very hard to actually foresee all of the possible disaster scenarios that could happen to you in any detail. The universe is a very perverse place, often far more perverse than we can imagine to any degree of specificness. If you are specific you are not likely to be comprehensive and then you expose yourself to Tom's accusation of 'failure to plan' (because in hindsight it is both easy and tempting to say 'you should have seen that obvious possibility'). If you are general you are in practice uselessly vacuous; it boils down to 'if we suffer a major catastrophe (whatever that is) we will be down for some unknowable amount of time'. There, I just wrote your SLA. Again, if your management demands something like this, find a new job.

Personally and specifically, I'm confident that I can't possibly inventory all of the terrible things that could happen to knock us out of action for at least a week. For example, before last year I doubt I would have thought to include 'AC seizes up, machine room overheats, sprinkler heads pop open in the high temperatures, all machines flooded from above for some time before power is cut off' as a disaster scenario. Or even just the general 'sprinkler heads activated without a fire'.

(If Tom Limoncelli would have me write that as the general 'machine room is lost', well, we once again circle back to vacuous 'plans'. You might as well document the situations you think you can recover from and then write 'for any other disaster, we don't know but we'll probably be down for a while'.)

sysadmin/SLAsAndDowntime written at 22:48:47; Add Comment

More on my favorite way of marking continued lines

A commentator on my first entry on this both correctly noted that I had mis-attributed the RFC that originated this (I learned it from RFC 822, but it originally was invented in RFC 724) and had some reactions to my idea, which means that I need to clarify it and add some additional comments. They wrote:

Comment-folding whitespace? Please, no. No. :( Comment-folding whitespace is the bane of people handling email.

In a sense, I entirely agree with this comment. Implementing full RFC 724/RFC 822 style parsing in your language is not what you want to do because it's too complex and perverse (mail headers have some crazy rules). But I was unclear in my original entry, especially about comments.

In a 'leading whitespace continues the logical line' environment, my usual approach to comments is that they occupy whole physical lines (ie you cannot have a line that is part-content and part-comment) and are silently removed in low-level parsing. As an example:

# this is a comment
abc # this is not
this is
   # a comment
   some text

this is some text

The last two things result in the same logical line ('this is some text') because the (indented) comment line is removed as part of assembling logical lines. There are many equally good variants on comment handling (eg disallow them in continued lines); I just find it convenient to be able to write comments for parts of anything that gets long enough to be split over multiple physical lines.

(As implied by the reassembled line, my approach is to replace all of the trailing whitespace, the newline, and the leading whitespace with a single space.)

As implied by how I prefer to handle comments, this is all designed for simple situations, for configuration files and small DSLs with grammars that are as simple as possible (often simply 'space separated words' with some meaning layered on top). It's my strong belief that all of these languages already want to avoid language features that might make this sort of line continuation a problem (although I'm not sure what they would be). Yes, people can break logical lines up in perverse ways with this, but they can do that with any line continuation scheme (and you still want a line continuation scheme).

(As I have found out the hard way repeatedly, line continuations are something you almost always want to have, much like comments.)

If you're doing this as part of a real lexer and tokenizer, you will have to decide what happens with a single token that gets split over multiple physical lines, such as:

a = "some
     text"

Because I do this before any tokenization gets its hands on the result, my answer is 'what you see is what you get', ie the language tokenizer and parser gets handed 'a = "some text"' and may do with it whatever it wishes. This is not necessarily suitable for sophisticated languages which may sometimes want to retain newlines and leading whitespace as actual elements of eg strings, but as I said this is a design for simple languages.

programming/FavoriteLineContinuationII written at 02:13:22; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.