2013-08-31
Simple availability doesn't capture timing and the amount of warning
Here is a mistake that I have actually kind of made: a simple availability or 'amount of downtime' number does not fully capture your availability situation. In real life it matters a lot both when you go down and whether or not you have advance warning. To put it simply, an hour of planned downtime at 6pm is qualitatively different from an hour of unplanned downtime at 6pm (or at 11am on your busiest morning) even if they have exactly the same effect on your overall availability numbers.
(I've sometimes seen availability numbers cited as excluding planned downtimes. That strikes me as disingenuous unless it comes with very careful disclaimers and a bunch of additional information.)
Of course it's better to not have the downtime at all, but if you're going to have it it's generally quite worthwhile to transform an unplanned downtime into a planned one (often even if the planned downtime is longer). There is a surprising amount of technology that effectively exists to do this conversion; for example, any non-hotswappable form of redundancy.
(If you have some form of redundancy that you can't hotswap and one half of it breaks (so now you have no redundancy), you're going to have to eventually take things down to restore the redundancy. This shifts the unplanned downtime of losing your only whatever-it-is to the planned downtime of replacing one.)
Sidebar: UPSes in this view
If you have a perfect UPS and no source of alternate or additional power (a redundant power supply, a transfer switch, etc), you're likely converting unplanned power failures into planned UPS battery replacements. In real life UPSes have been known to cause problems and it's usually not that difficult to have power redundancy. Overall a good setup probably simply decreases the chances of unplanned downtimes.
(Our UPSes exist not to prevent unplanned downtimes from power loss but to hopefully prevent unplanned downtimes from ZFS pool corruption due to power loss. This gives me an odd perspective on UPS issues.)
HTML quoting as I currently understand it
Since I was just doing some work with DWiki where I needed to refresh my memory of this, I want to write down what I know, remember, and have worked out before I forget it again. First off there are effectively three areas where you (or at least I) want to quote and escape text in HTML:
- When outputting things that are not supposed to be interpreted as
HTML, such as in a form
<textarea>
or just in any situation where they are supposed to be plain text even if a user gave you funny characters. - When embedding things into attribute values, such as the initial
or current value of form
<input>
elements. - In the special sub-case of putting a URL into a link (where you
embed it as the
href
attribute value).
The first two cases must use HTML character entities to escape a number of dangerous characters. In theory which characters you need to escape varies by context; in practice you might as well have a single function that escapes the union of what you need because over-escaping things doesn't hurt (the browser will happily convert everything back). My current belief is that the maximal escaping is to encode &, <, >, ', and " as character entities.
(DWiki effectively has two HTML escaping functions. One is a minimal one for large scale use in rendered DWikiText (where excessive escaping bulks up the HTML and makes it look bad) and the other one is a maximal one for small-scale use in other contexts.)
Escaping URLs is complicated because it depends on how much escaping you can assume has been applied to the URL before it was handed to you and that is effectively a social question. In general use I assume that the URL I've been handed is in a shape where it could be pasted into a browser's location bar and work, which means that it has been %-encoded to some degree and any remaining characters with special meaning in URLs (like ?, &, =, +, and #) are supposed to be there. At that point I want to entity-encode & and %-encode ", ', and > (the latter to be friendly).
(The full list of things you must or should %-escape in URLs is much
longer. If you are neurotic it includes things like ~
. & must be
entity-encoded instead of %-encoded because %-encoding it would remove
its special meaning in URLs.)
A URL should not be subject to this encoding until you are actually embedding it in a link. If you have a form field where people enter and re-enter a URL (for example a 'what is your website?' field in a comment form) you want to do HTML entity (form) encoding on it. The reason is that HTML entity encoding is reversible in forms; if you entity-encode something, put it in a form, and then the form is resubmitted you will get back exactly what you originally encoded. If you %-encode something this does not happen.
(If you are showing a URL as plain text I think it depends on where the URL comes from and what use you expect people to make of it. If you are just showing a user-entered URL to them I would entity-encode it so that the browser shows it to them exactly as they entered it. If you expect them to copy it and actually use it, %-encode things.)
Sidebar: But what if people give you URL paths with funny characters?
If you have to worry about things like a %
or a ?
appearing in the
URL path (where it should be %-escaped so that it isn't interpreted as
separating the query parameters from the path) my opinion is that you
need an API that clearly separates the components of a URL and leaves it
to you to glue them back together. At this point you can %-encode away
to make sure that the browser interprets everything exactly right.
If you get the URL as a single blob, the only sane way to go is to assume that it is basically correctly formatted apart from some stray characters that you may need to quote mostly for convenience. Doing anything else requires heuristics and guesswork.