Django, the timesince template filter, and non-breaking spaces

February 3, 2016

Our Django application uses Django's templating system for more than just generating HTML pages. One of the extra things is generating the text of some plaintext email messages. This trundled along for years, and then a Django version or two ago I noticed that some of those plaintext emails had started showing up not as plain ASCII but as quoted-printable with some embedded characters that did not cut and paste well.

(One reason I noticed is that I sometimes scan through my incoming email with plain less.)

Here's an abstracted version of such an email message, with the odd bits italicized:

The following pending account request has not been handled for at least 1 week.

  • <LOGIN> for Some Person <user@somewhere>
    Sponsor: A professor
    Unhandled for 1 week, 2 days (since <date>)

In quoted-printable form the spaces in the italicized bits were =C2=A0 (well, most of them).

I will skip to the punchline: these durations were produced by the timesince template filter, and the =C2=A0 is the utf-8 representation of a nonbreaking space, U+00A0. Since either 1.5 or 1.6, the timesince filter and a couple of others now use nonbreaking spaces after numbers. This change was introduced in Django issue #20246, almost certainly by a developer who was only thinking about the affected template filters being used in HTML.

In HTML, this change is unobjectionable. In plain text, it does any number of problematic things. Of course there is no option to change this or to control this behavior. As the issue itself cheerfully notes, if you don't like this change or it causes problems, you get to write your own filter to reverse it. Nor is this documented (and the actual examples of timesince output in the documentation use real spaces).

Perhaps you might say that documenting this is unimportant. Wrong. In order to find out why this was happening to my email, I had to read the Django source. Why did I have to do that? Because in a complex system there are any number of places where this might have been happening and any number of potential causes. Django has both localization and automatic safe string quotation for things you insert in templates, so maybe this could have been one or both in action, not a deliberate but undocumented feature in timesince. In the absence of actual documentation to read, the code is the documentation and you get to read it.

(I admit that I started with the timesince filter code, since it did seem like the best bet.)

Is the new template filter I've now written sufficient to fix this? Right now, yes, but of course not necessarily in general in the future. Since all of this is undocumented, Django is not committed to anything here. It could decide to change how it generates non-breaking spaces, switch to some other Unicode character for this purpose, or whatever. Since this is changing undocumented behavior Django wouldn't even have to say anything in the release notes.

(Perhaps I should file a Django bug over at least the lack of documentation, but it strikes me as the kind of bug report that is more likely to produce arguments than fixes. And I would have to go register for the Django issue reporting system. Also, clearly this is not a particularly important issue for anyone else, since no one has reported it despite it being a three year old change.)

Comments on this page:

By Dan at 2016-02-04 00:01:41:

You say that in plain text e-mails the non-breaking spaces do "any number of problematic things." What are they? Do the same problematic things happen when a person has a non-ASCII character in their name? (You're not still expecting human names to adhere to ASCII, are you?)

Complaining about some Django template filter not documenting its use of non-ASCII characters does not seem reasonable to me.

It strikes me as the kind of bug report that is more likely to produce arguments than fixes

Given the comment above – you didn’t even have to register for the bug tracker in order to be argued with! :-)

By cks at 2016-02-04 14:35:10:

I was clearly unclear in the entry in two ways. First, by 'plain text' I mean (much) more than email; for example, we have Django templates that write information to plain text files for further processing. Second, what I object to is the unforced error of converting spaces into non-ASCII. Things that are inherently non-ASCII (such as names) need to be properly encoded or represented, but spaces do not. Spaces are an especially bad thing to encode, because plenty of computer languages have 'split on spaces' operations that be used on, say, information written out from templates to plain text files.

(Indeed, we have such code that works on the files that this web app writes out.)

It would be one thing if Django templates knew their context and only generated non-breaking spaces in HTML context; I might not entirely like it, but they would at least have a case for doing so. Doing so generally in all text-generating contexts is in my opinion a bad decision, especially when not documented; sticking to plain ASCII without a good reason otherwise is in my opinion the right way.

(Of course, perhaps Django templates are explicitly only supposed to be used for HTML. If so, I think it should say so clearly in the documentation, so that people know what they are getting into.)

By James A (trs80) at 2016-02-08 11:42:34:

(Of course, perhaps Django templates are explicitly only supposed to be used for HTML. If so, I think it should say so clearly in the documentation, so that people know what they are getting into.)

The Django book explicitly says:

Syntax should be decoupled from HTML/XML. Although Django’s template system is used primarily to produce HTML, it’s intended to be just as usable for non-HTML formats, such as plain text.

So template filters only allowing for HTML output is a bug.

Written on 03 February 2016.
« You aren't entitled to good errors from someone else's web app
Some notes on SMF manifests (on OmniOS) and what goes in them »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Feb 3 23:42:32 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.