Wandering Thoughts archives

2009-01-23

The HTML tax (in Python, and in general)

The HTML tax is my name for all of those bits of verbosity that you have to include when you write straight HTML, as opposed to something more compact. What do I mean by that? Well, consider all of the things that you need for a well formed, standards compliant basic HTML web page these days.

A relatively minimal page needs a doctype, a <head> section with a <title> and ideally a <meta> declaration for the charset, and then the boilerplate of the <body> tag. If sensibly formatted (by my standards) that is at least six lines before I get to do anything more interesting than pass in the title text. After that comes a steady drizzle of closing tags in the actual content, most of which are just a distraction from what actually matters.

Does the formatting of the HTML matter? Yes, absolutely; because this is directly embedded in your source code, it needs to be readable just like the rest of your source code. One of my issues with the HTML tax in Python specifically is that I think that sensible HTML formatting does not look very much like natural Python code formatting, so you have an appearance clash in your code; things change abruptly from style to style, interrupting the visual flow of your source. (Well, of my source.)

This isn't the end of it, because six lines is too much verbosity to inline every time you want to produce a different HTML page. So you rapidly start coming up with some way to pass around strings to be put together with these six or so lines to make up your full error pages or results pages. I think that this inevitably winds up being templating via functions, where you call error("Thing A went wrong"), error() wraps its string argument in your standard error blurb and calls stdpage(bodyconts), stdpage() wraps its string argument in your standard page boilerplate, and a large blob of HTML eventually comes out the end.

Templating via functions is not bad, exactly (as long as you keep it simple enough), but the problem is that the HTML tax (and all of the structure that you've built to get around it) serves as a strong disincentive to deviate from the canned structure, even when it would make for better messages to the user. In other words, once you have a canned error routine, everything becomes a canned error. By contrast, by reducing the tax overhead, simple HTML generation systems encourage creating specialized HTML pages when you could use them, instead of relying on generic ones with some blanks filled in.

(To rephrase a previous entry, templating systems don't solve this problem because they (generally) don't make it much easier to create a new specialized 'template' than writing straight HTML.)

TheHTMLTax written at 01:51:51; Add Comment

2009-01-15

Why templating systems are the wrong answer for simple HTML generation

One common proposed solution to the problem of simple HTML generation is a templating system. Apart from any practical issues (all of which can be worked around), I've come to realize that I believe that they're the wrong answer in general, because templating systems are solving a different problem.

At least for me, the problem simple HTML generation solves is that HTML is annoyingly verbose and picky to write directly by hand (and partly as a result is difficult to cleanly embed into Python code). This is the same problem that is solved for by using simple markup, and it's such a popular thing to do for the same reason that high level languages are popular: you want to have a compact and clear way of expressing your intentions, one that's quick to write and is not cluttered up with unimportant details.

The problem that templating systems solve is that if you try to generate complex pages in string-bashing code, you wind up with a horrible entangled mishmash of code and HTML that is effectively unreadable; you can't easily see either the shape of the HTML that will come out the end or what the code is doing. This is an important problem, but it is a different problem than not having to write HTML by hand; a solution to one is not necessarily a solution to the other.

(Some templating systems also try to solve the 'HTML is annoying to write' problem, but often they do not. In fact, one popular model for templating systems is HTML plus additional markup, which is exactly the reverse of what you want for simple HTML generation.)

TemplatingVsSimpleHTML written at 02:40:35; Add Comment

2009-01-12

A surprising lack in Python's standard library

Here's something that I am surprised is not already in the Python standard library: a simple module to generate and assemble HTML fragments (or if you prefer, general XML fragments), up to and including full HTML pages.

I find it especially surprising because not only is this something that a lot of people wind up needing to do sooner or later (consider generating error pages from inside a simple CGI program), but there's even a very common simple model for it that everyone seems to write their own version of:

from html import *
page = HTML(HEAD(TITLE("Qwerty")),
            BODY(H1("Qwerty's page"),
                 P("Some text goes here.")))
print page

(Details often vary considerably beyond the basic idea of nested objects with optional keyword arguments as the properties of the element.)

Now, you can argue that this is so simple to implement from scratch that it doesn't need to be in the standard library, but I have two replies. First, doing a good implementation is more work than it looks (consider quoting of bare strings and the Unicode issues, for example), and second, if lots of people are going to do it themselves, it makes sense to save everyone the effort and put a single quality implementation in the standard library.

Possibly the Python people consider this a terrible pattern to follow once you start getting beyond the very basics and think that there is a much superior interface. If so, it would be nice if an implementation of the better version was in the standard library; as it is, as far as I can tell there's absolutely nothing that will do this at all, although there are a number of things that will parse HTML and XML.

(It is possible that a simple HTML builder is hiding somewhere in the depths of the standard XML modules. In my defense, I will note that it's not obvious from skimming the library documentation and XML is not where I naturally look for 'simple'.)

Sidebar: pointers to some relevant resources

  • HTML:EasyTags, one of the Perl versions of the same basic idea. Perl's implementation probably came first. (HTML::LoL is another interesting take on the overall idea.)

  • XIST, which among many other features can do this. But it looks like a very big package, which brings up certain issues if I install it myself.

  • markup.py is a single Python file and so easy to drop into a project, but it seems to not be intended to let you generate HTML fragments; instead you have to generate all of the HTML page in the proper sequence, which I find confining.

And hopefully I have missed something obvious in the standard library; if so, and if I find out about it, I'll add a note here.

SimpleHTMLCreationLack written at 22:23:39; Add Comment

2009-01-03

How to help programmers (parts 2 and 3): os.environ and sys.argv

As it happens, the os.listdir() problem is just the tip of the iceberg of Python 3's Unix problems. Here are two more ways that it helps Unix programmers, from the release notes:

Some system APIs like os.environ and sys.argv can also present problems when the bytes made available by the system is not interpretable using the default encoding. Setting the LANG variable and rerunning the program is probably the best approach.

Since the release notes are not explicit, let me fill them in with what happens in each case.

If you have environment variables with un-decodable contents, Python 3 will pretend that they don't exist (and in fact they don't as far as it is concerned; they never made it into the os.environ data structure). This is worse than the os.listdir() case, because there is no way to work around it in your Python program; the behavior is hard-coded into the C source of the posix module. The only good news is that Python 3 doesn't remove these environment variables from the environment it passes to programs it executes via things like os.system() and os.popen().

For sys.argv, any un-decodable command line arguments (such as oddly encoded filenames) cause your Python program to abort with a message like 'Could not convert argument 2 to string'. This happens whether or not you ever import the sys module, as it is hard coded very early on in CPython's startup. For bonus points, the error message makes no attempt to identify what is producing it (it doesn't even mention that it is being produced by Python 3).

(System administrators and anyone else who deals with complex, multi-layered systems have a special sort of affection for unidentified error messages.)

As Ian Bicking noted in the comments on the os.listdir() problem, the real solution here is alternate bytes-based interfaces to both os.environ and sys.argv that (at least on Unix) would be the 'real' versions. But that would require Python 3 admitting that Unix is not all Unicode, which seems unlikely right now.

ArgvEnvironProblem written at 00:24:02; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.