2009-01-23
The HTML tax (in Python, and in general)
The HTML tax is my name for all of those bits of verbosity that you have to include when you write straight HTML, as opposed to something more compact. What do I mean by that? Well, consider all of the things that you need for a well formed, standards compliant basic HTML web page these days.
A relatively minimal page needs a doctype, a <head> section with
a <title> and ideally a <meta> declaration for the charset, and
then the boilerplate of the <body> tag. If sensibly formatted (by
my standards) that is at least six lines before I get to do anything
more interesting than pass in the title text. After that comes a steady
drizzle of closing tags in the actual content, most of which are just
a distraction from what actually matters.
Does the formatting of the HTML matter? Yes, absolutely; because this is directly embedded in your source code, it needs to be readable just like the rest of your source code. One of my issues with the HTML tax in Python specifically is that I think that sensible HTML formatting does not look very much like natural Python code formatting, so you have an appearance clash in your code; things change abruptly from style to style, interrupting the visual flow of your source. (Well, of my source.)
This isn't the end of it, because six lines is too much verbosity to
inline every time you want to produce a different HTML page. So you
rapidly start coming up with some way to pass around strings to be
put together with these six or so lines to make up your full error
pages or results pages. I think that this inevitably winds up being
templating via functions, where you call error("Thing A went wrong"),
error() wraps its string argument in your standard error blurb and
calls stdpage(bodyconts), stdpage() wraps its string argument in
your standard page boilerplate, and a large blob of HTML eventually
comes out the end.
Templating via functions is not bad, exactly (as long as you keep it simple enough), but the problem is that the HTML tax (and all of the structure that you've built to get around it) serves as a strong disincentive to deviate from the canned structure, even when it would make for better messages to the user. In other words, once you have a canned error routine, everything becomes a canned error. By contrast, by reducing the tax overhead, simple HTML generation systems encourage creating specialized HTML pages when you could use them, instead of relying on generic ones with some blanks filled in.
(To rephrase a previous entry, templating systems don't solve this problem because they (generally) don't make it much easier to create a new specialized 'template' than writing straight HTML.)
2009-01-15
Why templating systems are the wrong answer for simple HTML generation
One common proposed solution to the problem of simple HTML generation is a templating system. Apart from any practical issues (all of which can be worked around), I've come to realize that I believe that they're the wrong answer in general, because templating systems are solving a different problem.
At least for me, the problem simple HTML generation solves is that HTML is annoyingly verbose and picky to write directly by hand (and partly as a result is difficult to cleanly embed into Python code). This is the same problem that is solved for by using simple markup, and it's such a popular thing to do for the same reason that high level languages are popular: you want to have a compact and clear way of expressing your intentions, one that's quick to write and is not cluttered up with unimportant details.
The problem that templating systems solve is that if you try to generate complex pages in string-bashing code, you wind up with a horrible entangled mishmash of code and HTML that is effectively unreadable; you can't easily see either the shape of the HTML that will come out the end or what the code is doing. This is an important problem, but it is a different problem than not having to write HTML by hand; a solution to one is not necessarily a solution to the other.
(Some templating systems also try to solve the 'HTML is annoying to write' problem, but often they do not. In fact, one popular model for templating systems is HTML plus additional markup, which is exactly the reverse of what you want for simple HTML generation.)
2009-01-12
A surprising lack in Python's standard library
Here's something that I am surprised is not already in the Python standard library: a simple module to generate and assemble HTML fragments (or if you prefer, general XML fragments), up to and including full HTML pages.
I find it especially surprising because not only is this something that a lot of people wind up needing to do sooner or later (consider generating error pages from inside a simple CGI program), but there's even a very common simple model for it that everyone seems to write their own version of:
from html import *
page = HTML(HEAD(TITLE("Qwerty")),
BODY(H1("Qwerty's page"),
P("Some text goes here.")))
print page
(Details often vary considerably beyond the basic idea of nested objects with optional keyword arguments as the properties of the element.)
Now, you can argue that this is so simple to implement from scratch that it doesn't need to be in the standard library, but I have two replies. First, doing a good implementation is more work than it looks (consider quoting of bare strings and the Unicode issues, for example), and second, if lots of people are going to do it themselves, it makes sense to save everyone the effort and put a single quality implementation in the standard library.
Possibly the Python people consider this a terrible pattern to follow once you start getting beyond the very basics and think that there is a much superior interface. If so, it would be nice if an implementation of the better version was in the standard library; as it is, as far as I can tell there's absolutely nothing that will do this at all, although there are a number of things that will parse HTML and XML.
(It is possible that a simple HTML builder is hiding somewhere in the depths of the standard XML modules. In my defense, I will note that it's not obvious from skimming the library documentation and XML is not where I naturally look for 'simple'.)
Sidebar: pointers to some relevant resources
- HTML:EasyTags,
one of the Perl versions of the same basic idea. Perl's implementation
probably came first. (HTML::LoL is another
interesting take on the overall idea.)
- XIST, which among many other
features can do this. But it looks like a very big package, which brings
up certain issues if I install it myself.
- markup.py is a single Python file and so easy to drop into a project, but it seems to not be intended to let you generate HTML fragments; instead you have to generate all of the HTML page in the proper sequence, which I find confining.
And hopefully I have missed something obvious in the standard library; if so, and if I find out about it, I'll add a note here.
2009-01-03
How to help programmers (parts 2 and 3): os.environ and sys.argv
As it happens, the os.listdir() problem
is just the tip of the iceberg of Python 3's Unix problems. Here are two more ways that it helps Unix
programmers, from the release notes:
Some system APIs like
os.environandsys.argvcan also present problems when the bytes made available by the system is not interpretable using the default encoding. Setting theLANGvariable and rerunning the program is probably the best approach.
Since the release notes are not explicit, let me fill them in with what happens in each case.
If you have environment variables with un-decodable contents, Python 3
will pretend that they don't exist (and in fact they don't as far as it
is concerned; they never made it into the os.environ data structure).
This is worse than the os.listdir() case, because there is no way to
work around it in your Python program; the behavior is hard-coded into
the C source of the posix module. The only good news is that Python 3
doesn't remove these environment variables from the environment
it passes to programs it executes via things like os.system() and
os.popen().
For sys.argv, any un-decodable command line arguments (such as oddly
encoded filenames) cause your Python program to abort with a message
like 'Could not convert argument 2 to string'. This happens whether
or not you ever import the sys module, as it is hard coded very early
on in CPython's startup. For bonus points, the error message makes no
attempt to identify what is producing it (it doesn't even mention that
it is being produced by Python 3).
(System administrators and anyone else who deals with complex, multi-layered systems have a special sort of affection for unidentified error messages.)
As Ian Bicking noted in the comments on the os.listdir() problem,
the real solution here is alternate bytes-based interfaces to both
os.environ and sys.argv that (at least on Unix) would be the 'real'
versions. But that would require Python 3 admitting that Unix is not all
Unicode, which seems unlikely right now.