Why I don't use relational databases in practice

May 11, 2014

I started out planning to write an entry I was going to title 'when I reach for a relational database'. The problem with this implicit question is that in practice the answer is 'when the system I'm building on basically insists on it'. Otherwise I basically don't use one, even when the problem more or less naturally fits into a relational world. On a purely logical basis, this is crazy; using a relational database for relational things is much easier than the alternatives and SQLite makes this trivial.

In thinking about it, I've come around to the idea that my problem with relational databases is that using a database makes my data relatively opaque and as a result complicates my life. The easy option for a relational database is an SQL based database, probably SQLite, and all of those store my data in unreadable and uneditable binary files. In order to get at my data I need to use the database server's tools (or write code). The direct consequence of this is that I can't inspect and go through my data by firing up less or grep or any number of other things and I can't (if I want to) put things under version control in any easy way.

The slightly more indirect consequence is that I need to write more tools. When my data is in plain text files I can modify my data with an editor or other simple Unix programs; add something, delete something, modify something, shuffle things around, it's all something that gets done with existing Unix programs and maybe some thin scripts. When my data is in a SQLite database I have to manipulate the data through SQLite in some way, either in more complicated shell scripts that drive the SQLite shell or through actual little utility programs that I write. In practice the latter is much safer and so what I'm likely to wind up with. Either way it's more work and more dedicated tools that I have to keep track of in various ways.

For me all of this adds friction to using a real relational database, enough friction that I'll endure some level of pain from hacking things together with plain text files instead. This is not completely rational, but evidently I put a high premium on being able to see things with less and edit them with vi.

(Note that some people have violent objections to relational databases, either in general or for uses where they aren't 'needed' in some sense. I'm not one of them; I don't have any strong feelings that they're inelegant or consume too many resources or whatever.)

Comments on this page:

Maybe there needs to be something like ldapvi or shelldap for sqlite or relational databases in general. I think I remember reading about some sort of fuse sql hack.

By Ewen McNeill at 2014-05-11 05:28:23:

It sounds like we have fairly similar gut instincts.

I'll gladly use a relational database if I have either (a) a large volume of small data items or (b) a non-trivial amount of data which is clearly relational. But for anything that is easily expressed in a tabular form I'll almost always use some form of CSV (because grep/less/etc work), and for anything that is trivially expressed as nested data structures with no other relationship I'll use JSON or perhaps YAML. Especially if any given program only reads or writes the whole thing, but even sometimes for modifications if those modifications are most usefully done in a pipe fashion (process A reads data and transforms it, process B reads data and transforms it, etc, process Z visualises the result).

I think at least some programs end up using relational databases because that's all the programmer knows, even if it's a poor fit for the problem -- "all you have is a hammer...". (Relational databases seem to best fit transactional systems IME.)

ObThread: the FUSE-style API is a partial solution to this problem (and I've written "sql2csv" style tools for various databases in the past to do that). But it doesn't offer, eg, the same flexibility for working with transformed versions of data in a pipeline (because there's still only One True Version of the data; I suppose you could work around that by coercing temporary tables into existence via FUSE, but... that feels fragile).


Hmmm... well I disagree that it's opaque. Using the mysql command-line client you can bring SQL queries into your scripts. Or you can use a scripting language which supports MySQL directly. Or SQLite for that matter. And some of these SQL client languages (PHP and Perl for example) come with DB abstraction libraries which support CSV as a storage table format. So... no. It's only opaque if you choose an opaque storage engine or if you consider that a pipe-able shell command which takes and produces plain text, is "opaque".

Written on 11 May 2014.
« Some uses for Python's 'named' form of string formatting
The advantages of editors over database programs for modifying your data »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun May 11 00:48:56 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.