HTMLAndSGML written at 15:33:28; Add Comment
HTML is not a SGML dialect and never really has been
There is a persistent story that makes the rounds among the web specification world (for example, in this otherwise realistic article on XHTML) that HTML is a SGML dialect but web browsers persistently mishandle and mis-parse certain SGML features such as minimization. Although I have pandered to this belief before, it is false in practice and in reality.
HTML is really a documentation standard; the standard followed behind existing practice, not preceded it. In the very beginning, people just created browsers and a vague format that the browsers understood. This format was inspired by SGML, but it was never an SGML dialect and as such it never had various obscure SGML features. At some point, when people in the W3C were writing down the HTML standard of the time (or perhaps evolving it), they decided to 'fix' this obvious omission by writing into the new version of the HTML specification that it was a SGML dialect.
You can guess what happened next. All of the browsers of the time promptly ignored this new bit of the standard, and pretty much every browser written since then has as well; none of them ever parsed HTML as SGML, supporting all of the little odd SGML features that that implies. HTML may be an SGML dialect as far as the W3 standards and their validator are concerned, but it is not in real life and anyone who writes HTML believing otherwise is going to have problems.
As you might expect, HTML5 very firmly puts a stake in this particular issue; the current spec draft says explicitly (emphasis mine):
Perhaps someday all of the common HTML validators will be updated to understand HTML as it really is.
* * *