Why you should always allow version 1 to be specified

December 11, 2008

This is one of those things that is easier to discuss in specific, so I want to say that I'm not picking on ZFS here. Well, not too much.

Like a number of other filesystems, ZFS has several different versions, with later ones adding new features not supported by earlier code; this change is marked with an on-disk version number. Sensibly, ZFS allows you to explicitly set the version of a new filesystem when you create it, so that you can use a new system but create a filesystem that an old system can read (necessary in, say, SAN environments, where a filesystem may have to be brought up on a machine still running an older OS release).

(Similar versioning is common in many contexts where you have persistent objects and multiple generations of them floating around.)

However, ZFS made an interface mistake. On systems that only have ZFS version 1, you cannot explicitly set the version to version 1 when you create a filesystem; you can't explicitly set the version at all, presumably because there is only one version to start with.

This is a problem because of how it affects tools that sit above ZFS. In order to always create version 1 filesystems, they have to detect whether they are running on an original ZFS system (that doesn't allow the version to be specified but where an unversioned create gives you a version 1 filesystem) or on a newer one (where versioned create is allowed but an unversioned create gives you a more recent filesystem, so they have to use versioned create).

And thus: if you are going to version things and to allow older versions of things to be created (and you should), you should build in the ability to ask for a specific version right from the start, even when you only have version 1. 'Create with version X' should be one of your initial APIs. Do not wait until you have version 2 to add it, because it will not help people as much as you think.

(Remember: anyone who is specifically creating older versions of objects quite likely has systems running the older version of your code (otherwise they don't have much use for those old versions). Which means that their code probably has to run on top of that older version of your code.)

The other way to put it is that if you start out with an unversioned 'create' operation and no versioned 'create', its API is not really 'create with current version' but 'create with version 1', because this is its actual behavior. You should not then later change its API behavior to be 'create with current version'; if you do, you are causing problems precisely because you have changed the API.


Comments on this page:

By Dan.Astoorian at 2008-12-11 12:08:39:

The first version of a product is very frequently a retronym. Unless a product is exceptionally well-roadmapped, it's often unclear when the first version is released whether or how the next version will even be able to accommodate such backward compatibility, so it may be difficult to guess whether the first version should be called "1," or "1.0," or "0.9", or "--disable-feature-x" (where Feature X is the only non-forward-compatible feature introduced by the subsequent version and it's undesireable to bump the version major just for that change).

A sensible strategy, then, would probably be this: when version 2 is released (supporting the "create with version 1" operation), release a feature enhancement for the older version's tools which implements the same "create with version 1" option as a no-op.

(The end-user may even implement this by installing a wrapper around the original tool.)

--Dan

By cks at 2008-12-12 16:52:19:

In the sort of situation that I am talking about, people have already thought ahead enough to put a version number field into the objects that they are creating. (For example, I am pretty confidant that ZFS filesystems have had version numbers since the start.)

When you put an explicit version number into your objects, you should also put an explicit version number into your API, even if it only accepts one value in your first release.

From 209.131.62.113 at 2008-12-18 14:33:20:

A related idea is that format specifications should always include a built-in offset indicator at the beginning. I'm thinking for example of the mess of mp3 id3 tags. The v1 id3 tags are at the beginning of the file but have to be a small fixed (thus possibly padded) size so as not to confuse music players. The v2 tags go at the end of the file so they can be as large as needed but then you have to seek all the way to the end of the file to find them. I am probably getting a few details wrong but I think essentially this is how it works and I am too lazy to go read wikipedia right now.

Instead the solution to both of these problems is an offset indicator (which points to the start of the file data) at a fixed location close to the beginning of the file. Then any program reading the file can find that locator easily and jump straight to the data section of the file. Also then you don't have to worry about running off the end and hitting metadata when you just want the data.

I realize there are issues here like how many bytes do you reserve for the locator and what happens if it is too small (say its 32 bits and your file goes over 4GB in size), etc. However I think the basic idea makes data files easier to deal with and more flexible.

I think the PNG file specification is a good place to look for ideas on how to do file metatdata correctly.

Phil Hollenback
www.hollenback.net

Written on 11 December 2008.
« What sort of user interfaces the web is good for
Why syndication feed readers (and web browsers) should fail gracefully »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Dec 11 01:49:03 2008
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.