The good and bad of SQL

December 17, 2009

The good of SQL is that it is an excellent ad-hoc query and reporting tool. If you have an SQL database with a decently formed schema (and in many ways a denormalized schema is the easiest), you can bang together lots of things very fast and easily. Raw SQL is an excellent way to explore your way around a bunch of data, for example to see if there are signs that something odd is going on.

(It also makes it possible to answer silly amusing questions that would not be worth writing actual serious reports for.)

Essentially, raw SQL is a REPL interface for your data, with most of the REPL advantages that you have in any good language. Of course, a good SQL client (one with command line editing, history recall, and so on) helps a great deal; things slow down to some extent if you're editing files and re-running commands to get your latest dose of results, instead of hitting up-arrow and doing some editing.

(I say 'most of' the advantages because there are some interesting issues in trying to validate and debug complex queries.)

The bad of SQL is that it is a Turing tar pit. You can spend acres of time fiddling with your SQL queries, either to get them giving you output that is set up just right or to figure out how to do some tricky query or aggregation entirely in SQL, when perhaps you should have given up long ago and done at least part of the work in a programming language. Modern SQL database servers have just enough functions and additional features to make it look likely that you can do anything if you try hard enough, and there's that old programmer pleasure in discovering the latest neat trick for bending SQL to your will, especially since it's usually accomplished by stringing things together in ever more baroque sequences.

(Guess what I spent all today doing.)

Sidebar: today's PostgreSQL aggregation trick

Without going into why, suppose that you are interested in counting how many distinct /24 subnets are involved in something, except that your raw data contains bare IP addresses. PostgreSQL has no direct way to convert an IP address to its containing /24, but you can do it indirectly:

network(set_masklen(table.ip, 24))

This also neatly illustrates the Turing tar pit nature of the whole process.

Written on 17 December 2009.
« How Linux software RAID is making me grumpy right now
Secure or useful: pick one »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Dec 17 01:42:41 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.