A surprising lack in Python's standard library

January 12, 2009

Here's something that I am surprised is not already in the Python standard library: a simple module to generate and assemble HTML fragments (or if you prefer, general XML fragments), up to and including full HTML pages.

I find it especially surprising because not only is this something that a lot of people wind up needing to do sooner or later (consider generating error pages from inside a simple CGI program), but there's even a very common simple model for it that everyone seems to write their own version of:

from html import *
page = HTML(HEAD(TITLE("Qwerty")),
            BODY(H1("Qwerty's page"),
                 P("Some text goes here.")))
print page

(Details often vary considerably beyond the basic idea of nested objects with optional keyword arguments as the properties of the element.)

Now, you can argue that this is so simple to implement from scratch that it doesn't need to be in the standard library, but I have two replies. First, doing a good implementation is more work than it looks (consider quoting of bare strings and the Unicode issues, for example), and second, if lots of people are going to do it themselves, it makes sense to save everyone the effort and put a single quality implementation in the standard library.

Possibly the Python people consider this a terrible pattern to follow once you start getting beyond the very basics and think that there is a much superior interface. If so, it would be nice if an implementation of the better version was in the standard library; as it is, as far as I can tell there's absolutely nothing that will do this at all, although there are a number of things that will parse HTML and XML.

(It is possible that a simple HTML builder is hiding somewhere in the depths of the standard XML modules. In my defense, I will note that it's not obvious from skimming the library documentation and XML is not where I naturally look for 'simple'.)

Sidebar: pointers to some relevant resources

  • HTML:EasyTags, one of the Perl versions of the same basic idea. Perl's implementation probably came first. (HTML::LoL is another interesting take on the overall idea.)

  • XIST, which among many other features can do this. But it looks like a very big package, which brings up certain issues if I install it myself.

  • markup.py is a single Python file and so easy to drop into a project, but it seems to not be intended to let you generate HTML fragments; instead you have to generate all of the HTML page in the proper sequence, which I find confining.

And hopefully I have missed something obvious in the standard library; if so, and if I find out about it, I'll add a note here.


Comments on this page:

From 76.100.174.166 at 2009-01-12 23:50:50:

ElementTree does this: http://effbot.org/zone/element-index.htm#usage

>>> import xml.etree.ElementTree as ET
>>> root = ET.Element("html")
>>> head = ET.SubElement(root, "head")
>>> from sys import stdout
>>> ET.ElementTree(root).write(stdout)
<html><head /></html>
From 76.100.174.166 at 2009-01-12 23:51:35:

(I should add that I pointed to effbot's documentation only because python.org appears to be down right now; elementtree is in the standard library as of python 2.5)

By cks at 2009-01-13 00:44:05:

This example unfortunately shows one problem with using ElementTree for this: it is not HTML-aware. The output it gives for this case is incorrect, since no tag can be self-closed like this in ordinary HTML (see ShortTagsMeanings).

Another problem is that this is too verbose. One of the goals of this sort of simple creation is to write as little as possible, and the whole dance with having to specify the tag name as an argument is too much.

(Since I don't know ElementTree, I don't know if you can put a tree inside another tree, but if you can't that would be another drawback; it would be the same problem that markup.py has.)

From 82.152.15.113 at 2009-01-13 05:54:22:

There's also HTMLgen, which is pretty old (see this 1998 article: http://www.linuxjournal.com/article/2986).

HTML generation is generally less useful than templating systems such as Mako (http://www.makotemplates.org/) and Jinja (http://jinja.pocoo.org/2/).

By cks at 2009-01-13 08:50:59:

It's hard to evaluate HTMLgen, since the source seems to have disappeared. Personally I find this kind of a warning sign.

Templating systems may be better in theory (although it seems common that they are much more complex in practice), but none of them are in the standard library either (not even a basic one). I appreciate that the Python people don't want to force a choice on people, but in the mean time there's still nothing simple for generating HTML in the standard library and people keep reimplementing the wheel on their own.

(The other potential problem with templating systems for me is that I want something that can be embedded directly in Python code, so that my programs can be completely self-contained and do not have to try to find their template files.)

From 90.129.217.3 at 2009-01-13 08:54:37:

Your example looks a bit like stan, the templating system for Nevow http://divmod.org/trac/wiki/DivmodNevow which works pretty well IMHO.

It's not in the stdtlib unfortunately

From 213.253.62.66 at 2009-01-13 09:20:48:

I find it especially surprising because not only is this something that a lot of people wind up needing to do sooner or later (consider generating error pages from inside a simple CGI program)

There are other patterns you can use other than the one you suggest with code structured after the HTML.

For example: HTMLBASICPAGE = """<html>

 <head>
   <title>%s</title>
 </head>
 <body bgcolor="white">
   %s
 </body>

</html>"""

def get():

 return HTMLBASICPAGE%('My Title', 'some content')

If what your're doing gets more complicated than that you really need a HTML-aware templating engine.

From 98.194.24.2 at 2009-01-13 12:43:15:

>>> from lxml.html.builder import E

>>> tree = E.html( ... E.head( ... E.title('title') ... ), ... E.body() ... )

>>> from lxml import etree

>>> etree.tostring(tree, method='html')

'<html><head><title>title</title></head><body></body></html>'

>>>

How's that?

From 64.81.145.45 at 2009-01-13 13:46:33:

formencode.htmlgen is a builder based on ElementTree: http://svn.formencode.org/FormEncode/trunk/formencode/htmlgen.py

WebHelpers has something better, IMHO, that also handles quoting in a better way: https://www.knowledgetap.com/hg/webhelpers/file/419211699d60/webhelpers/html/builder.py

From 80.101.167.168 at 2009-01-23 20:29:03:

My latest version goes something like:

doc = ['html', ['head', ['title', 'foo']], ['body', ['p', 'hello world']]] serialize(doc)

From 146.115.40.118 at 2009-02-04 16:22:44:

This comment looks like the tool that best fits the post:

formencode.htmlgen is a builder based on ElementTree: http://svn.formencode.org/FormEncode/trunk/formencode/htmlgen.py WebHelpers has something better, IMHO, that also handles quoting in a better way: https://www.knowledgetap.com/hg/webhelpers/file/419211699d60/webhelpers/html/builder.py

Has anyone else used it before? With just a light reading, it looks like it might not easily support appending on more and more content over time.

For example, I want to do something like:

body = html.body("some stuff")
body.c += html.br
body.c += "some more stuff"
print body
"<body>some stuff<br />some more stuff</body>"

I would rather do something like body.add(html.br), but I don't see it. Am I missing anything?

Written on 12 January 2009.
« What I want out of NFS security, at least at the moment
Documenting the kernel.sem sysctl »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Jan 12 22:23:39 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.