2011-02-11
The letdown
The other side of my absorption in programming is what I call 'the letdown'. The letdown is what happens when I reach the end of the project, when I emerge from my trance and realize that the program is basically finished. Oh, there's polishing and refinement and cleanup and documentation, but I'm done the building. I've successfully put together what I imagined at the start, turned my dreaming and ideas and fancies into something concrete.
I've succeeded, and now I don't really have anything to do. (Certainly nothing that is absorbing in anywhere near the same way.)
When the letdown hits I always spend a while feeling blah and out of sorts. Nothing seems anywhere near as interesting as the programming was. Usually I spend a certain amount of time polishing and re-polishing the program, half trying to recapture the earlier magic (even though I can't), and then force myself to move on to something else.
The letdown in its full form is something more or less peculiar to me being a sysadmin instead of a programmer. A programmer is always programming, so they always have a next programming project on the horizon after they finish their current one. But sysadmin programming projects are relatively few and far between; once I've finished one, there is usually no more coding for a good while.
(And certainly not in the short term, because in the short term there's all of the things that I've put off in order to have the time to spend a solid week or more writing code.)
I'm pretty much in the letdown on my current project now. It's pretty much feature complete (at least for the features that it needs to have, as opposed to the ones that are kind of nifty and nice to have but that require me to learn design or JavaScript), it's been deployed to near production testing, and everything that's left is just cleaning up the corners, which is more annoying than absorbing. It kind of makes me wistfully sad.
(There is still the last thrill of deploying it to production and hoping that people like it for real, but I know that that will be transient.)
2011-02-09
On flow (a digression)
Recently I saw a blog entry (via Hacker News) from a developer who had turned off the digital clock in his machine's menu bar; he advocated doing this so as to reduce the potential of distraction from looking at it. What I took away from his entry, though, is the sure knowledge that people are very different, because what he described doesn't apply to me at all.
You see, when I'm focused on something, when I am in a state of flow, the digital clock on my screen is not going to knock me out of it. In fact, I'm demonstrably capable of ignoring far more intrusive (and occasionally important) things like my mail notifier. If I'm head down and programming away, I can dismiss a mail popup in seconds with essentially no attention (it's basically a fidget behavior, like saving editor buffers); a digital clock that's been there for years and I'm completely habituated to has no chance at all. When I'm working away like this, you practically have to use an axe to get my attention.
(This has its downsides in my job, since sysadmins are kind of supposed to be at least a bit attentive to the environment around them. Instead, I'm disgruntled that the outside environment is intruding into my valuable programming time.)
I was reminded of this rather vividly today when I sat down at 9am, started programming, and surfaced sometime after 2pm. Maybe 3pm, really. I paid no attention to the time (among other things) during all of this; in fact, I don't think I got up from my chair. After all, I was busy.
(I'm aware that this probably wasn't the healthiest thing in the world. But I was too busy to take breaks, and by 'busy' I mean 'absorbed'. (Even now it's hard not to be programming.))
Now, if I'm not in this absorbed state it's another matter. But then the real problem is that the work is not absorbing me, not that the clock is distracting me; if I wasn't distracted by the time, I would be distracted by a nearly infinite number of other things. If you want to be distracted, there are always distractions available.
(This is especially so for sysadmins, because we tend to have a basically endless collection of little things we could be doing that are nominally productive. At the extreme, we could be cleaning up our inboxes or our home directories.)
2011-02-08
Thinking realistically about SQL database field sizes
There is a certain sort of SQL database designer who cares a lot about the micro-efficiency of their schema fields, and as part of this has a deep concern with minimum field sizes. These people want to know the exact size requirements of everything and specify the field sizes as the minimum possible. If you have a table with graduate student numbers and graduate student numbers happen to be ten digits or less, they will specify that field as ten characters. Period.
(They will probably make it a TEXT CHAR instead of VARCHAR
field, too, because fixed size fields are or at least were marginally
more efficient. Sometimes.)
I consider this an unhealthy obsession and also dangerous to the long term health of your database. Rather than worrying about field size, I feel that database design should err on the side of having lots of room. Throw in extra space, make fields wider than you think you need, and if you have a field with no particular natural length, make it big just in case. Maybe you will never need to put a 512 character or 1024 character comment in your database, maybe they'll all be less than 80 characters, but you don't know.
(In fact I should go back through the schema for my current project and widen a bunch of fields.)
In some sense this is extravagantly wasteful. But going the other way can be equally wasteful in the long run, because it amounts to premature optimization in most cases (and dangerous optimization at that). Disk space is almost invariably cheap unless you have a large database (or a very constrained environment), while a database schema change that is forced on you because your carefully minimally-sized fields turned out to be too small can be very expensive.
Do my graduate student number fields need to be 30 characters long? Probably not, but 20 characters is extremely cheap insurance against change at the small scale I'm working at. And change happens.
(Also, many fields are nowhere near as clear cut as something that has an explicit specification. How big should you make your 'name' field, for example? My answer is 'quite big', at least twice as big as the largest name I think we're ever going to encounter.)
PS: these days a database has to be very large indeed before it counts as large enough that this makes much of a difference. If you think that your database is big enough that field size is going to matter, you really want to run the numbers on how many records you will have and how much space you'll be saving. My gut reaction is that in most environments, a data size that's under plural gigabytes is too small to matter.
(And then you'll want to investigate your specific database to assess your real space savings. You may find that various internal issues mean that you can enlarge some fields from their utter minimum without changing how much disk space you use.)
PPS: okay, there are also potential performance-related reasons for keeping rows small (for both reads and writes), but again most people aren't going to be operating anywhere near these levels. And if you are operating at this level, you should already know it and be looking into SSDs.
2011-02-02
My issues with primary keys are not Django specific
A commentator on my recent entry about learning from my Django schema design experience wrote, about me abandoning explicit primary keys:
I'm not sure your data will be best served by giving up the database schema you want in deference to the conventions and limitations of the (Django ORM) framework.
I want to clarify something: the issues in my schema that are driving me away from using explicit primary keys and to surrogate keys are not Django specific.
I've now designed two database schemas, one purely in SQL and one in Django schema. One of the major differences between them is where most of the data came from and how immutable it was once entered. In my pure SQL design most of the data (and almost all of the primary keys) came from automated systems elsewhere. One primary key was entered by users, but it had to be immutable once entered. It's easy to use explicit primary keys in a database schema like this, because errors and changes are rare (you hope) and you can't deal with them anyways; they have to be dealt with by the people and the upstream systems that build data for you.
(Attempts to fix errors and change things locally will just result in you getting out of sync with your data sources. This rarely ends well.)
My current schema is not like this. Essentially all of the data is entered by hand directly into my database and so is subject to errors, which we are on the hook to correct. Many of the natural primary keys actually are subject to change, even if that change is infrequent (people's names do change, and so sometimes do their logins, for good reason). One otherwise natural primary key can sometimes be null, and this is not supported for UNIQUE fields by all database engines (even apart from other issues). This database design is simply not a good fit for explicit primary keys, no matter what ORM or SQL database I use for it. If I did use explicit primary keys, I would expect to be doing a reasonable amount of re-stitching foreign key relationships every so often.
(This assumes that I do not make entry lifetime mistakes like linking my audit table entries directly to account requests. That too would be a problem in any database.)
So, in summary: not all database situations can sensibly use explicit primary keys. Sometimes surrogate keys really are the right choice, not something 'forced' on you by using inferior or awkward technology.