You should delete obsolete data files

October 10, 2009

Around here, our systems have layers of accreted history; scripts that are generating files that are used by, well, we're sometimes not entirely sure any more. Every so often we reach the point where we turn another one of the file-generating scripts off (we take them out of crontabs, we remove invocations of them from other scripts and so on).

Having done this for a while now, I have a suggestion: when a data file becomes obsolete and is no longer updated, you should immediately delete it. Don't keep it around just in case anything still refers to it, because if there are any remaining users, you want things to break right away, when you remember what you just did recently.

(If you are lucky, things will break with error messages about 'cannot read file X' and it will be obvious why. But sometimes things will just malfunctioning, and then it really helps to have a recent change to blame.)

The problem with leaving such files around just in case is that you still get breakage, but it is much more subtle breakage. What happens is that the file slowly slips further and further out of correspondence with reality (as reality keeps changing but it doesn't), and sooner or later this divergence starts producing odd results. Things work for old accounts (or old bits of data in general) but not for new enough ones; things go to the wrong place; deleted things mysteriously resurface or still part-work. Straightforward, immediate breakage is more painful (and perhaps more embarrassing if you overlooked something important) but is much better in the long run.

I admit that this is hard for me to do; I'm a packrat by nature, and even in an environment with version control systems and backups my instinct is to keep old data files around just in case. But just in case almost never shows up, so I need to wield the rm's more often.

Comments on this page:

From at 2009-10-10 11:48:25:

I use a similar approach, though it's a bit more forgiving than deleting the file:

mv scriptnouses scriptnouses.${REMOVAL_DATE}.REMOVEME

Periodically I will scan the pertinent directories for REMOVEME files, and rm those files if they are older than REMOVAL_DATE (I usually set this is something like todays date + 90 days). If someone screams because something broke, I can then quickly revert back to the old version via cp / mv. If no one yells, I can go ahead and remove it.

- Ryan

By cks at 2009-10-10 15:00:08:

I think there's two things here: preserving old scripts and preserving the files they generate. This is a sensible approach for preserving old scripts, but I'd argue that it's not a sensible way of preserving old files that they generate; if you turn out to really need the file you want to start running the script again, not restore an old version of the file.

Written on 10 October 2009.
« A fun bug I once ran across
Why security bugs aren't bugs »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Oct 10 01:19:57 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.