2014-02-24
The origins of DWiki and its drifting purpose
One of the interesting things about writing Wandering Thoughts has been getting a vivid and personal experience with what happens when some code you've written gets repurposed for something rather different than what it was originally designed for. Because, you see, DWiki (the wiki engine behind the blog) was not originally intended to be a blog engine and what it was originally designed for shaped it in a number of ways that still show today.
(I alluded to this when I talked about about why comments aren't immediately visible on entries.)
Put simply, I originally designed DWiki as yet another attempt to build
a local sysadmin documentation wiki that my co-workers would use. We
hadn't shown much enthusiasm for writing HTML pages and I didn't think
I could get my co-workers to edit things through the web, but I figured
I at least had a shot if I gave them simple and minimal markup that
they could edit by going 'cd /some/directory; vi file'. This idea
never went anywhere but once I had the core wiki engine I added enough extra
features to make it able to create a
blog, and then I decided I might as well use the features and write one.
(From the right perspective a blog is just a paged time-based view over a directory hierarchy. So are Atom syndication feeds.)
One feature that this original purpose strongly affected is how comments are displayed. To put it one way, if you're creating a sysadmin documentation wiki, input from outsiders is not a primary source of content. It's a potential source of feedback to us, but it's definitely not on par to the (theoretical) stuff we were going to be writing. So I decided that (by default) comments would get a secondary position; if you were just browsing the wiki, you'd have to go out of your way to see the comments. As a wiki, if people left comments with seriously worthwhile feedback we'd fold that feedback into the main page.
(Adding comments was also a sop to the view that all true wikis are web-editable by outsiders. I wasn't going to make the wiki itself web-editable, but this way I could say that we were wiki-like in that we were still allowing outsiders to have a voice.)
Another thing that this original purpose strongly affected was DWiki's
choice of text formatting characters, especially its choice of _
as the 'typewriter text' formatting character. If you're writing about
sysadmin things it's quite common to want to set text in typewriter
text to denote (Unix) commands so you want a nice convenient character
sequence for it; _ looks like a great choice because almost nothing
you write about is going to have actual underscores (they're very
uncommon in Unix command lines). When I instead started using DWiki to
write more and more about code, this turned into a terrible decision
since _ is an extremely common character in identifiers.
(Another choice that looked sensible for writing about Unix commands
but turned out to be bad for writing about code is using ((...)) for a
block of typewriter text with no further formatting. The problem is that
when you're writing about code you often wind up wanting to write about
things with (...) on the end and that confuses the text parser.)
PS: In hindsight I can see all sorts of problems with my idea of a sysadmin documentation wiki. Even if I'd tried to market it better to my co-workers I suspect that it wouldn't have worked, especially as something that was publicly visible.
2014-02-05
An interesting internal Django error we just got
As a result of someone trying to either exploit or damage it, our account request system just notified us that it had hit an internal exception in the depths of Django. While that's not too great, what was really interesting was the specific exception and where it happened; it boiled down to:
[...] File ".../django/db/backends/sqlite3/base.py", line XXX, in execute return Database.Cursor.execute(self, query, params) OverflowError: long too big to convert
Wait, what?
It turns out that the cause of this is (to me) very interesting and also
completely explicable once we trace down the layers. We need to start at
the actual form. Among other things, this presents a <select> element
with the possible values drawn from the database. How Django implements
this in our case is that the text of each option is under our control
but the HTML form 'value' for each option is an integer (which happens
to be the database row's internal primary key). Ie, it looks like this:
[...] <option value="5">Blah blah</option> <option value="6">More blah</option> [...]
Our attacker edited the HTML (I believe using Firefox's developer
tools) to provide a really absurdly large value for the option
that they picked; for example, one attempt had '518446744073709551616'
for this form element. Because Django is a modern web framework it of course does not simply trust this submitted
value; instead it validates it and in the process turns it into a
proper reference to an ORM object representing the particular
database row. If I'm reading the code right, this validation is
done ultimately by making a SQL query to look up the row given its
primary key (in the process this validates that it's among the set
you provided).
This is where we descend down the layers to the SQLite driver.
Because the SQLite driver is a good modern database driver, it uses
SQL placeholders. Since the
primary key field is an integer, this means that the SQLite driver
must convert the value passed down by Django to an actual integer
in order to pass it as a placeholder, and not just a Python integer
but an actual C-level long. As it happens, Django has not passed
down the raw string value but has already called int() on the
hacked up string, which has given us a Python long integer. This
long integer is of course far too big to fit into the C-level long
that the SQLite driver requires and the driver notices, giving us
this OverflowError (well, it turns out that it is the core Python
code that notices, but close enough).
(If you modify the form to something that is not an integer at all, Django detects it at a much higher level and rejects the form cleanly.)
I find this an interesting error partly because of how the low level issues involved show through. A whole cascade of things had to combine together to create this error, including Python's unification of ints and longs, and it is the sort of really obscure corner case that can easily slip through and be overlooked.
(Since it can be triggered from the outside it's probably worth reporting it as a Django bug, but I need to verify that it's still there in the current version. We're a bit behind by now for various reasons.)
PS: We found out about this problem because one of Django's cool
features is that it can be set to email you reports about uncaught
exceptions such as this. The reports include not just the backtrace but
also things like the form POST parameters, which was vital in this
case. Without the POST parameters I would have been totally lost; with
them, once I started looking the absurd values of this particular form
field jumped right out at me.