Proper SQL database design versus the real world

May 24, 2011

If you read up on SQL databases, there are all sorts of rules that you should follow in order to design your SQL database schema and set up your database the right way, so that everything is efficient and so on. This is the domain of normalization, carefully considered indexes, query plans, and so on. Sometimes it is also the domain of carefully minimized field sizes.

I do not want to knock these rules and the people who advocate following them, at least not too hard. There are sensible reasons for things like normalization and indexes et al are important for good performance, even if all of this makes your life harder when you are designing the database and writing queries against it.

Well. Sort of. The thing is, all of this SQL hairshirtism only really matters if you have a significantly sized database or are operating in a demanding environment, or both. Especially these days, most SQL databases are small and are not operating in demanding environments. Often the entire database will fit in RAM.

(Indexes still matter a bit for an in-RAM database, but nowhere near as much as for larger databases which have to go to disk a lot.)

This means that a lot of SQL optimizations and database design rules simply do not matter for a lot of databases and a lot of programmers. Going out of your way to do them is yet another case of misplaced optimization; you might as well simply design whatever SQL schema is simplest for your application and not worry about it. So my view today is that before you worry about any of these issues, you should ask yourself how many records your database is going to hold, how big its total size will be, and other sizing questions. Almost all of the time I think the answer will be 'small enough that you don't have to worry about it'.

(There once was a day, back in the time of small machines, when even a relatively small SQL database needed to worry about these things. That day is long gone.)

As a side note, another form of SQL hairshirtism is trying very hard to do as much work as possible in SQL instead of in your program because it is 'more efficient' to let the database server do all the work. In real life, it is often more efficient to do it in whatever way is fastest to program. Sometimes doing queries or updates in SQL can save you a bunch of annoying coding, but other times you are better off doing the work in code instead of trying to play in the SQL Turing tar pit (also).

Comments on this page:

From at 2011-05-24 09:10:32:

Normalization is no longer the norm :)

What you say is true, normalization doesn't really help the small installations.

However, normalization actually severely hinders large databases. How many times have you seen large installations take on de-normalization projects to squeeze more performance out of their DB?

Taking that a step farther, one can argue that the NoSQL DB's that are all the rage are simply SQL databases with all normalization features removed.

From at 2011-05-25 11:41:50:

While a sophisticated query planning engine will perform better with a fully normalized database, perhaps only the PostgreSQL and Oracle planning engines are that sophisticated. Past failures of planning engines have justified de-normalization.

HOWEVER, normalization remains an essential design step. Being able to fully normalize your schema means you understand what that schema represents. Failure to do so will lead to accretions of inconsistency in the data representation.

The same is true for stored procedures. Although they can in some circumstances improve performance, that's the wrong reason to use them: the real reason is that they constitute shared code and an opportunity to enforce constraints and data consistency that might not be available through normalization. They become all the more important precisely when you denormalize.

From at 2011-05-25 11:50:24:

I forgot to make explicit the point to my previous comment, which is that normalization was essential to design - but once the design is understood, constraint- and consistency- maintaining de-normalization is very often an appropriate implementation tactic - and stored procedures (regardless of language - I don't much care for SQL myself) are an important tool.

From at 2011-05-26 08:42:59:

One other thing you have to watch out for is that while your database might fit into RAM by itself when it goes into production it might be put onto a system that has 150 other small databases on it which also individually might fit in RAM, but altogether they don't. I have seen that happen several times where a developer tested an app on their workstation and it worked fine, but when it got deployed into production suddenly those full table scans weren't so fast anymore.

By cks at 2011-05-26 11:08:59:

You definitely have to take a whole system view of the resource usage for your database, especially today; not only might you be running on a busy system with other things, but you might also be running on a small virtual server with a tiny amount of memory (instead of a real server with a decent amount).

By cks at 2011-05-26 11:14:01:

Whoops, I posted my comment too fast before completing my thought: as a sysadmin, I have the optimistic view that people actually will take a whole system view when designing their database schema and writing their code and so on. I also have the impression that outside of small virtual servers, most databases run in environments with a fair amount of resources. This is certainly the case for most of our local applications, for example.

(I'm sure the situation gets complicated when you're writing code that will be distributed to people and run in lots of different environments. I don't envy people worrying about this for general blogging packages, for example, because some of their code will definitely wind up running against slow database backends in over-committed shared hosting or tiny virtual servers or etc.)

Written on 24 May 2011.
« An aside on RAID-5/RAID-6 and disk failures
How Django's form field ordering works »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue May 24 00:48:20 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.