Thinking realistically about SQL database field sizes

February 8, 2011

There is a certain sort of SQL database designer who cares a lot about the micro-efficiency of their schema fields, and as part of this has a deep concern with minimum field sizes. These people want to know the exact size requirements of everything and specify the field sizes as the minimum possible. If you have a table with graduate student numbers and graduate student numbers happen to be ten digits or less, they will specify that field as ten characters. Period.

(They will probably make it a TEXT CHAR instead of VARCHAR field, too, because fixed size fields are or at least were marginally more efficient. Sometimes.)

I consider this an unhealthy obsession and also dangerous to the long term health of your database. Rather than worrying about field size, I feel that database design should err on the side of having lots of room. Throw in extra space, make fields wider than you think you need, and if you have a field with no particular natural length, make it big just in case. Maybe you will never need to put a 512 character or 1024 character comment in your database, maybe they'll all be less than 80 characters, but you don't know.

(In fact I should go back through the schema for my current project and widen a bunch of fields.)

In some sense this is extravagantly wasteful. But going the other way can be equally wasteful in the long run, because it amounts to premature optimization in most cases (and dangerous optimization at that). Disk space is almost invariably cheap unless you have a large database (or a very constrained environment), while a database schema change that is forced on you because your carefully minimally-sized fields turned out to be too small can be very expensive.

Do my graduate student number fields need to be 30 characters long? Probably not, but 20 characters is extremely cheap insurance against change at the small scale I'm working at. And change happens.

(Also, many fields are nowhere near as clear cut as something that has an explicit specification. How big should you make your 'name' field, for example? My answer is 'quite big', at least twice as big as the largest name I think we're ever going to encounter.)

PS: these days a database has to be very large indeed before it counts as large enough that this makes much of a difference. If you think that your database is big enough that field size is going to matter, you really want to run the numbers on how many records you will have and how much space you'll be saving. My gut reaction is that in most environments, a data size that's under plural gigabytes is too small to matter.

(And then you'll want to investigate your specific database to assess your real space savings. You may find that various internal issues mean that you can enlarge some fields from their utter minimum without changing how much disk space you use.)

PPS: okay, there are also potential performance-related reasons for keeping rows small (for both reads and writes), but again most people aren't going to be operating anywhere near these levels. And if you are operating at this level, you should already know it and be looking into SSDs.

Written on 08 February 2011.
« My brute force email archive
On flow (a digression) »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Feb 8 01:28:30 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.