Wandering Thoughts archives

2014-02-24

The origins of DWiki and its drifting purpose

One of the interesting things about writing Wandering Thoughts has been getting a vivid and personal experience with what happens when some code you've written gets repurposed for something rather different than what it was originally designed for. Because, you see, DWiki (the wiki engine behind the blog) was not originally intended to be a blog engine and what it was originally designed for shaped it in a number of ways that still show today.

(I alluded to this when I talked about about why comments aren't immediately visible on entries.)

Put simply, I originally designed DWiki as yet another attempt to build a local sysadmin documentation wiki that my co-workers would use. We hadn't shown much enthusiasm for writing HTML pages and I didn't think I could get my co-workers to edit things through the web, but I figured I at least had a shot if I gave them simple and minimal markup that they could edit by going 'cd /some/directory; vi file'. This idea never went anywhere but once I had the core wiki engine I added enough extra features to make it able to create a blog, and then I decided I might as well use the features and write one.

(From the right perspective a blog is just a paged time-based view over a directory hierarchy. So are Atom syndication feeds.)

One feature that this original purpose strongly affected is how comments are displayed. To put it one way, if you're creating a sysadmin documentation wiki, input from outsiders is not a primary source of content. It's a potential source of feedback to us, but it's definitely not on par to the (theoretical) stuff we were going to be writing. So I decided that (by default) comments would get a secondary position; if you were just browsing the wiki, you'd have to go out of your way to see the comments. As a wiki, if people left comments with seriously worthwhile feedback we'd fold that feedback into the main page.

(Adding comments was also a sop to the view that all true wikis are web-editable by outsiders. I wasn't going to make the wiki itself web-editable, but this way I could say that we were wiki-like in that we were still allowing outsiders to have a voice.)

Another thing that this original purpose strongly affected was DWiki's choice of text formatting characters, especially its choice of _ as the 'typewriter text' formatting character. If you're writing about sysadmin things it's quite common to want to set text in typewriter text to denote (Unix) commands so you want a nice convenient character sequence for it; _ looks like a great choice because almost nothing you write about is going to have actual underscores (they're very uncommon in Unix command lines). When I instead started using DWiki to write more and more about code, this turned into a terrible decision since _ is an extremely common character in identifiers.

(Another choice that looked sensible for writing about Unix commands but turned out to be bad for writing about code is using ((...)) for a block of typewriter text with no further formatting. The problem is that when you're writing about code you often wind up wanting to write about things with (...) on the end and that confuses the text parser.)

PS: In hindsight I can see all sorts of problems with my idea of a sysadmin documentation wiki. Even if I'd tried to market it better to my co-workers I suspect that it wouldn't have worked, especially as something that was publicly visible.

DWikiOrigins written at 01:31:00; Add Comment

2014-02-05

An interesting internal Django error we just got

As a result of someone trying to either exploit or damage it, our account request system just notified us that it had hit an internal exception in the depths of Django. While that's not too great, what was really interesting was the specific exception and where it happened; it boiled down to:

[...]
File ".../django/db/backends/sqlite3/base.py", line XXX, in execute
  return Database.Cursor.execute(self, query, params)
OverflowError: long too big to convert

Wait, what?

It turns out that the cause of this is (to me) very interesting and also completely explicable once we trace down the layers. We need to start at the actual form. Among other things, this presents a <select> element with the possible values drawn from the database. How Django implements this in our case is that the text of each option is under our control but the HTML form 'value' for each option is an integer (which happens to be the database row's internal primary key). Ie, it looks like this:

[...]
<option value="5">Blah blah</option>
<option value="6">More blah</option>
[...]

Our attacker edited the HTML (I believe using Firefox's developer tools) to provide a really absurdly large value for the option that they picked; for example, one attempt had '518446744073709551616' for this form element. Because Django is a modern web framework it of course does not simply trust this submitted value; instead it validates it and in the process turns it into a proper reference to an ORM object representing the particular database row. If I'm reading the code right, this validation is done ultimately by making a SQL query to look up the row given its primary key (in the process this validates that it's among the set you provided).

This is where we descend down the layers to the SQLite driver. Because the SQLite driver is a good modern database driver, it uses SQL placeholders. Since the primary key field is an integer, this means that the SQLite driver must convert the value passed down by Django to an actual integer in order to pass it as a placeholder, and not just a Python integer but an actual C-level long. As it happens, Django has not passed down the raw string value but has already called int() on the hacked up string, which has given us a Python long integer. This long integer is of course far too big to fit into the C-level long that the SQLite driver requires and the driver notices, giving us this OverflowError (well, it turns out that it is the core Python code that notices, but close enough).

(If you modify the form to something that is not an integer at all, Django detects it at a much higher level and rejects the form cleanly.)

I find this an interesting error partly because of how the low level issues involved show through. A whole cascade of things had to combine together to create this error, including Python's unification of ints and longs, and it is the sort of really obscure corner case that can easily slip through and be overlooked.

(Since it can be triggered from the outside it's probably worth reporting it as a Django bug, but I need to verify that it's still there in the current version. We're a bit behind by now for various reasons.)

PS: We found out about this problem because one of Django's cool features is that it can be set to email you reports about uncaught exceptions such as this. The reports include not just the backtrace but also things like the form POST parameters, which was vital in this case. Without the POST parameters I would have been totally lost; with them, once I started looking the absurd values of this particular form field jumped right out at me.

DjangoOverflowError written at 00:58:55; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.