Wandering Thoughts archives

2016-02-03

Django, the timesince template filter, and non-breaking spaces

Our Django application uses Django's templating system for more than just generating HTML pages. One of the extra things is generating the text of some plaintext email messages. This trundled along for years, and then a Django version or two ago I noticed that some of those plaintext emails had started showing up not as plain ASCII but as quoted-printable with some embedded characters that did not cut and paste well.

(One reason I noticed is that I sometimes scan through my incoming email with plain less.)

Here's an abstracted version of such an email message, with the odd bits italicized:

The following pending account request has not been handled for at least 1 week.

  • <LOGIN> for Some Person <user@somewhere>
    Sponsor: A professor
    Unhandled for 1 week, 2 days (since <date>)

In quoted-printable form the spaces in the italicized bits were =C2=A0 (well, most of them).

I will skip to the punchline: these durations were produced by the timesince template filter, and the =C2=A0 is the utf-8 representation of a nonbreaking space, U+00A0. Since either 1.5 or 1.6, the timesince filter and a couple of others now use nonbreaking spaces after numbers. This change was introduced in Django issue #20246, almost certainly by a developer who was only thinking about the affected template filters being used in HTML.

In HTML, this change is unobjectionable. In plain text, it does any number of problematic things. Of course there is no option to change this or to control this behavior. As the issue itself cheerfully notes, if you don't like this change or it causes problems, you get to write your own filter to reverse it. Nor is this documented (and the actual examples of timesince output in the documentation use real spaces).

Perhaps you might say that documenting this is unimportant. Wrong. In order to find out why this was happening to my email, I had to read the Django source. Why did I have to do that? Because in a complex system there are any number of places where this might have been happening and any number of potential causes. Django has both localization and automatic safe string quotation for things you insert in templates, so maybe this could have been one or both in action, not a deliberate but undocumented feature in timesince. In the absence of actual documentation to read, the code is the documentation and you get to read it.

(I admit that I started with the timesince filter code, since it did seem like the best bet.)

Is the new template filter I've now written sufficient to fix this? Right now, yes, but of course not necessarily in general in the future. Since all of this is undocumented, Django is not committed to anything here. It could decide to change how it generates non-breaking spaces, switch to some other Unicode character for this purpose, or whatever. Since this is changing undocumented behavior Django wouldn't even have to say anything in the release notes.

(Perhaps I should file a Django bug over at least the lack of documentation, but it strikes me as the kind of bug report that is more likely to produce arguments than fixes. And I would have to go register for the Django issue reporting system. Also, clearly this is not a particularly important issue for anyone else, since no one has reported it despite it being a three year old change.)

python/DjangoTimesinceNBSpaces written at 23:42:32; Add Comment

You aren't entitled to good errors from someone else's web app

This particular small rant starts with some tweets:

@liamosaur: Developers who respond to bad URLs with 302 redirects to a 200 page with error info instead of a proper 404 page should be shot into the sun

@_wirepair: as someone who does research for web app scanners, a million times this.

@thatcks: It sounds like web apps are exercising good security against your scanners & denying them information.

If you are scanning someone else's web application, you have absolutely no grounds to complain when it does things that you don't like. Sure, it would be convenient for you if the web app gave you all the clear, semantically transparent HTTP errors you could wish for that make your life easy, but whatever error messages it emits are almost by definition not for you. The developers of those web apps owe you exactly nothing; if anything, they owe you less than nothing. You get whatever answers they feel like giving you, because you are not their audience. If they go so far as to give you deliberately misleading and malicious HTTP replies, well, that's what you get for poking where you weren't invited.

(Google and Bing and so on may or may not be part of their audience, and if so they may give Google good errors and you not. Or they may confine their good errors to the URLs that Google is supposed to crawl.)

Good HTTP error responses (at least to the level of 404's instead of 302s to 200 pages) may serve the goals of the web app developers and their audience. Or they may not. For a user-facing web app that is not intended to be crawled by automation, 302s to selected 200 pages may be more user friendly (or simply easier) than straight up 404s. As a distant outside observer, you don't know and you have no grounds for claiming otherwise.

(There are all sorts of pragmatic and entirely rational reasons that developers might do things that you disagree with.)

It's probably the case that web app developers are better served over the long term by doing relatively proper HTTP error handling, with real 404s and so on (although I might not worry too much about the exact error codes). However this is merely a default recommendation that's intended to make the life of developers easier. It is not any sort of requirement and developers who deviate from it are not necessarily doing it wrong. They may well be making the correct decision for their environment (including ones to deliberately make your life harder).

(See also Who or what your website is for and more on HTTP errors, which comes at the general issue from another angle.)

PS: If you are scanning your own organization's web apps, with authorization, it may be worth a conversation with the developers about making the life of security people a little easier. But that's a different issue entirely; then 'our security people' are within the scope of who the web app is for.

web/NotEntitledToGoodErrors written at 00:50:21; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.