2011-05-24
Proper SQL database design versus the real world
If you read up on SQL databases, there are all sorts of rules that you should follow in order to design your SQL database schema and set up your database the right way, so that everything is efficient and so on. This is the domain of normalization, carefully considered indexes, query plans, and so on. Sometimes it is also the domain of carefully minimized field sizes.
I do not want to knock these rules and the people who advocate following them, at least not too hard. There are sensible reasons for things like normalization and indexes et al are important for good performance, even if all of this makes your life harder when you are designing the database and writing queries against it.
Well. Sort of. The thing is, all of this SQL hairshirtism only really matters if you have a significantly sized database or are operating in a demanding environment, or both. Especially these days, most SQL databases are small and are not operating in demanding environments. Often the entire database will fit in RAM.
(Indexes still matter a bit for an in-RAM database, but nowhere near as much as for larger databases which have to go to disk a lot.)
This means that a lot of SQL optimizations and database design rules simply do not matter for a lot of databases and a lot of programmers. Going out of your way to do them is yet another case of misplaced optimization; you might as well simply design whatever SQL schema is simplest for your application and not worry about it. So my view today is that before you worry about any of these issues, you should ask yourself how many records your database is going to hold, how big its total size will be, and other sizing questions. Almost all of the time I think the answer will be 'small enough that you don't have to worry about it'.
(There once was a day, back in the time of small machines, when even a relatively small SQL database needed to worry about these things. That day is long gone.)
As a side note, another form of SQL hairshirtism is trying very hard to do as much work as possible in SQL instead of in your program because it is 'more efficient' to let the database server do all the work. In real life, it is often more efficient to do it in whatever way is fastest to program. Sometimes doing queries or updates in SQL can save you a bunch of annoying coding, but other times you are better off doing the work in code instead of trying to play in the SQL Turing tar pit (also).