Wandering Thoughts archives

2011-02-02

Django and primary keys versus surrogate keys

What Django does to my primary key problem is magnify the effects of changes and errors in two ways. First, the admin interface allows convenient direct insertion of data into the database, including immediately creating foreign key dependencies. People sometimes make mistakes when typing things in; the fewer steps between typing in something and having it wind up in the database, the fewer chances you have of catching your error. In a slower environment I might have noticed some of my typos after I had written the file of loader commands or SQL but before I had run it to insert things into the database, or perhaps at least before I started pointing foreign key relationships at those mistakes.

(Since straightforward SQL makes you re-type the foreign key value when using it, I'd also have had another chance to notice that I was making a stupid typo.)

Second, Django's admin interface makes it equally direct and simple to delete data and to cascade this deletion through to dependent records (and yes, it shows you everything that will be deleted and asks if you're sure). However, it has no equivalent way of doing a mass update to fix a primary key mistake (where you change all records pointing to the old record to point to the new one); you have to drop into Python (or direct SQL) to do that.

(In the SQL database I'm used to, attempting to do something that broke a foreign key relationship would normally error out, forcing you to stop and fix the whole situation. Possibly the Django admin interface can be tweaked to do that too; I didn't look closely.)

I don't think that Django is wrong for doing either of these things. Both are simply the consequence of making a better, more convenient interface for database administration operations.

python/DjangoPrimarySurrogate written at 02:33:07; Add Comment

My issues with primary keys are not Django specific

A commentator on my recent entry about learning from my Django schema design experience wrote, about me abandoning explicit primary keys:

I'm not sure your data will be best served by giving up the database schema you want in deference to the conventions and limitations of the (Django ORM) framework.

I want to clarify something: the issues in my schema that are driving me away from using explicit primary keys and to surrogate keys are not Django specific.

I've now designed two database schemas, one purely in SQL and one in Django schema. One of the major differences between them is where most of the data came from and how immutable it was once entered. In my pure SQL design most of the data (and almost all of the primary keys) came from automated systems elsewhere. One primary key was entered by users, but it had to be immutable once entered. It's easy to use explicit primary keys in a database schema like this, because errors and changes are rare (you hope) and you can't deal with them anyways; they have to be dealt with by the people and the upstream systems that build data for you.

(Attempts to fix errors and change things locally will just result in you getting out of sync with your data sources. This rarely ends well.)

My current schema is not like this. Essentially all of the data is entered by hand directly into my database and so is subject to errors, which we are on the hook to correct. Many of the natural primary keys actually are subject to change, even if that change is infrequent (people's names do change, and so sometimes do their logins, for good reason). One otherwise natural primary key can sometimes be null, and this is not supported for UNIQUE fields by all database engines (even apart from other issues). This database design is simply not a good fit for explicit primary keys, no matter what ORM or SQL database I use for it. If I did use explicit primary keys, I would expect to be doing a reasonable amount of re-stitching foreign key relationships every so often.

(This assumes that I do not make entry lifetime mistakes like linking my audit table entries directly to account requests. That too would be a problem in any database.)

So, in summary: not all database situations can sensibly use explicit primary keys. Sometimes surrogate keys really are the right choice, not something 'forced' on you by using inferior or awkward technology.

programming/PrimaryVsSurrogateKeys written at 02:29:50; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.