== HTML is not a SGML dialect and never really has been
There is a persistent story that makes the rounds among the web
specification world (for example, [[in this otherwise realistic article
on XHTML http://www.webdevout.net/articles/beware-of-xhtml]]) that HTML
is a SGML dialect but web browsers persistently mishandle and mis-parse
certain SGML features such as [[minimization ShortTagsMeanings]].
Although I have pandered to this belief [[before ShortTagsMeanings]],
~~it is false in practice and in reality~~.
HTML is really a [[documentation standard ../tech/WaysToStandards]];
the standard followed behind existing practice, not preceded it. In
the very beginning, people just created browsers and a vague format
that the browsers understood. This format was inspired by SGML, but it
was never an SGML dialect and as such it never had various obscure SGML
features. At some point, when people in the W3C were writing down the
HTML standard of the time (or perhaps evolving it), they decided to
'fix' this obvious omission by writing into the new version of the HTML
specification that it was a SGML dialect.
(Looking at [[the historical specifications via wikipedia
http://en.wikipedia.org/wiki/HTML]], this appears to go as far
back as [[HTML 2.0 http://tools.ietf.org/html/rfc1866]].)
You can guess what happened next. All of the browsers of the time
promptly ignored this new bit of the standard, and pretty much every
browser written since then has as well; none of them ever parsed HTML
as SGML, supporting all of the little odd SGML features that that
implies. HTML may be an SGML dialect as far as the W3 standards and
their validator are concerned, but it is not in real life and anyone who
writes HTML believing otherwise is going to have problems.
As you might expect, HTML5 very firmly puts a stake in this particular
issue; [[the current spec draft
http://dev.w3.org/html5/spec/infrastructure.html]] says explicitly
(emphasis mine):
> For compatibility with existing content and prior specifications,
> this specification describes two authoring formats: one based on XML
> (referred to as the XHTML syntax), and one using ~~a custom format
> inspired by SGML~~ (referred to as the HTML syntax).
Perhaps someday all of the common HTML validators will be updated to
understand HTML as it really is.