Three levels of database usage

October 7, 2005

One of the perennial programmer debates can be caricatured as a three way fight between adherents of MySQL, PostgreSQL, and Oracle (or the big iron database of your choice). As an example, you can see a recent round starting itself up here and here (both arguing what could be called the MySQL side).

Some time back it struck me that one way to think about how programs relate to SQL databases is that they can be doing it at three different conceptual levels. (And now that I have a blog, I can throw the thought out to the world.)

At level 0, programs use a SQL database as a simple data store. They could have used even simpler database packages, like Berkeley DB or gdbm, but using a SQL database means they don't have to write a lot of boring low-level code for marshalling stuff in and out of the database and scanning tables and so on. The programs mostly want this data storage and retrieval to go real fast, and maybe for multiple instances writing to the same database to not corrupt it.

At level 1, programs use the SQL database to store high-level data structures. Because SQL is pretty simple, these data structures need to be represented using multiple relational objects (multiple rows and multiple tables). Naturally, programs want the database to be able to guarantee that only 'whole' high-level objects are stored in it, because this vastly simplifies their lives. This means that the database needs at least things like foreign key constraints and row constraints.

At level 2, people use the database to hold abstract data types, which the programs interact with by using defined operations implemented in the database. They don't want programs using this ADT store to care about the underlying representation of the abstract data objects; that's for the DBAs to worry about. Thus they want powerful stored procedures, because those are the interfaces to the ADTs.

Regardless of the level of database interaction that your program uses, everyone is actually doing the same high-level thing in their programs. People using databases as data stores still have high-level data structures; they just prefer to store and manipulate them in their high-level code, instead of pushing this down into the database.

I don't think that there's a clear best conceptual level right now and I'm not sure there ever will be. (We can hope that database performance and features converge, so that everyone is fast as MySQL and as powerful as PostgreSQL or Oracle.)

There are various reasons for choosing different levels, some of them highly pragmatic. When I wrote my only SQL-using application (a student account management system), I used a 'level 1' approach because of laziness and paranoia. Laziness because I could easily write a few lines of SQL to do quite complex operations that would otherwise have taken me quite a bit of application level code. Paranoia because using SQL constraints in addition to program level checks meant that I had an extra line of defense against bad data (or programming mistakes) doing horrible things to student accounts.

Written on 07 October 2005.
« First irritations with Solaris 9
Solaris 9 'Power management' »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Oct 7 01:49:50 2005
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.