2011-03-27
Some thinking on proliferating web standards
In a comment on an earlier entry, nothings noted:
One concern I have with web standards, which I first posted about in 2004, is the fact that we keep getting new ones.
(See his comment there for the elaboration, and his other comment here has a pointer to his longer writing on this topic, with some context.)
In thinking about where web standards come from, I think that there are four broad explanations for them. So here is my outsider's two cents:
First, some standards get created because they cover genuinely new things on the web. My example of this (because I am relatively familiar with the area) is syndication feeds; these are a real new capability, something that you couldn't do on the web before them. (People tried and it didn't work too well.)
Next, some standards get created because previous standards were incomplete (or at least are perceived as being incomplete), lacking necessary or at least useful features. The other way to put this is that new versions of standards get made because now people are willing to standardize more. My impression is that this is what has been going on with CSS; every generation of CSS adds more layout options, many of which have already been tried out in various browsers.
Then we have the standards that get created because people feel that a previous standard is the wrong way and this needs to be 'corrected'. A lot of people will point to XHTML as exhibit one for this, given that its goal could be summarized as 'HTML 4.01 but with XML correctness'.
Finally, there is the case where a previous standard was ambiguous (or in need of clarification because it proved open to interpretation in practice) and a new version is needed to fix this. My impression is that this was especially common for the early web standards (certainly this was the fate of many of the early syndication standards); later ones were usually written more carefully and with more precision.
(Of course, actual standards can be a blend of these reasons.)
Out of these different causes of standards, the only one that really makes me grumpy is the third cause. The fourth cause is useful, the first cause generally only matters if people actually turn out to care about the new thing, and the worth of the second cause is rather personal; it depends on whether you like the new features being added or think that the original standard is good enough as it is and doesn't need new bling.
(On a side note, I don't think that the new standards slow down content creation in general by making people who write HTML spend all of their time learning new things. It is still perfectly possible to write 1998 era HTML and have it work just as well as it did in 1998, and people certainly do. To the extent that professionals have to spend more time to learn more stuff to be employable, my cynical view is that they consider it desirable because it's a higher and higher barrier to entry.)
2011-03-24
XHTML and web authoring folklore
From a comment on an earlier entry:
What's surprising is not the failure of XHTML. Rather it's the enthusiasm for cargo-cult XHTML that persists to this day: HTML files with the XHTML 1.0 DTD decl and superficial XHTML features like <br /> tags, but served without the XHTML mime type that would actually make them XHTML.
I'm not surprised in the least. In fact, this is a great illustration of the web authoring folkore that I talked about in that entry.
The thing to understand about the web is that people do not author content to standards. I don't mean that in the sense that they reject standards; I mean that in the sense that almost no web authors know what the standards actually require, because they didn't learn how to write HTML by reading the standard or authoritative descriptions of it. Instead people learn through what could be called a folk process, where information passed from person to person in various ways.
(There are many reasons why this happened, but I suspect that one is that there have been so many web standards that people have needed some sort of trusted guide to sort out which one to pay attention to. Clearly the standards organizations themselves cannot play this role.)
In this environment, very few people really know about XHTML and how to
use it. Instead most people know a cloud of folklore surrounding XHTML;
they know that it is the right thing to author content in and they know
some of the markers of 'writing XHTML', like using <br />, so they use
these on their pages. These people are not cynical or idiots; instead
they are well intentioned but undereducated. In some sense they are
the inevitable result of repeated proselytization that writing to web
standards is important.
In fact they may not even know that <br /> is XHTML, they may just
have vaguely heard or vaguely remember that it is the right way to write
singleton tags these days. You might laugh, but this used to describe
me.
The uncomfortable truth is that writing to standards requires an expert and not all that many people are interested in becoming experts on HTML. Especially when the best practices for HTML keep changing.
(Someone is about to say 'just author with standard conforming software'. Great. How do you know that the software is really standards conforming? How do you know enough to even ask that question? This is how we get people confidently using software <X> and putting 'valid XHTML' badges on their web pages when their pages are neither XHTML nor valid. I'm sure you can fill in some values for <X> here.)
2011-03-21
Why XHTML was doomed from the beginning
Yesterday I said that XHTML was an obviously flawed standard right from the start. I should actually clarify that, and then explain it. First, XHTML is probably not flawed from a strictly technical perspective (I don't have the expertise to be sure). Instead, it has always been flawed on the larger, social level of how nominal standards turn into real standards that people actually use.
(Ie, XHTML doesn't solve the real problem.)
The short version is that XHTML was doomed from the start because it ignored the reality of how actual people create actual web pages on the Internet. We can see this because of three mistakes it made: XHTML requires draconian error handling (in a decision inherited from XML), but you have to serve XHTML with a specific MIME type in order to trigger this (cf), and XHTML is renderable as HTML (and it looks basically right). The first decision is the fatal blow; the other two just add a lot of salt to the self-inflicted wound.
The core issue is that actual people create web pages not by reading standards but instead by a process that most closely resembles folklore. They vaguely know how things are supposed to work, they may peek with web searches or with 'view source', and they pretty much stop when the result renders the way they want or expect. People who even know what a validator is are a very small portion of the web authoring world, and always have been.
Draconian error handling is always going to be a massive failure in this environment, because people following this approach will always make lots of 'errors'. This leads to a terrible authoring experience where the typical person spends much of their time trying to get their browser to stop showing them a big yellow 'invalid XML' box, instead of doing interesting things like improving their content or how things look. People aren't exactly enthused about jumping through hoops to appease the computer instead of doing productive things.
(The other two decisions add salt because they cause lots of people to create invalid XHTML when they think they are actually authoring valid XHTML. This is an even worse failure mode than before in the long term.)
Any format with draconian error handling pretty much can't be written by hand, since people make mistakes; it has to be written by program, and ideally only a few programs that are carefully developed. Web page authoring has never worked this way, so XHTML's insistence on draconian error handling required a fantasy world where web authors would drastically change how they created web pages when they moved from HTML to XHTML.
(This omits all of the other pragmatic problems with draconian error handling, like programs with bugs.)
2011-03-20
The devil's advocate argument against paying attention to web standards
The IE team famously doesn't pay too much attention to web standards like, oh, XHTML (to give one famous example). As it happens, I think that there's a decent devil's advocate argument for this position (as I've mentioned in passing).
(At this point, I would like to make it utterly plain that I don't agree with what IE has done in terms of standards. That's why this is a devil's advocate position.)
The simple way to put it is that many web standards are not great standards. Exhibit one for this is XHTML itself, which was an unrealistic boondoggle even before Microsoft refused to implement it, but XHTML at least has the advantage of being an obviously flawed standard. Other traditional problems with W3C standards are less obvious but no less problematic; especially with complex or abstruse standards, they may well not be clear, complete, or free of contradictions, and very few standards have a test suite.
Implementing a complex, potentially flawed standard with no test suite is a recipe for problems. The first rule is that anyone can make mistakes, which means that your programmers are perfectly capable of misinterpreting the standard (or coming to a different interpretation than what other people think is the obvious one). With no test suite, you have no good way of discovering this divergence from proper behavior until people start yelling at you for being 'broken'. The larger or more complex the standard, the higher the chances of this happening to you at some point. Potential flaws in the standard compound the issue because now you get to play the 'is this our bug or the standard's bug' game (and winning this game requires you to become a deep expert on the standard).
(Effectively you have no good process for finding bugs in your implementation of the standard, and this guarantees that there will be such bugs.)
If the web standard is an invented standard, dreamed up by a bunch of people in a back room somewhere, it's at least potentially flawed in these ways. Not implementing the 'standard' means that you don't have to deal with all of this and you're not wasting your limited time to develop something that people won't care about in practice.
Hence, if you think that web standards are generally flawed on a mechanical level and not especially necessary or useful on a general level, more or less ignoring them until the level of outrage rises high enough is a perfectly sane reaction. Instead the sensible thing to do is cherry-pick the clear, easy to implement, and useful pieces of each standard without necessarily trying to implement the whole thing. This gets you the useful bits at a relatively low effort, or at least makes sure that you don't waste too much time coding things that people aren't going to use.