Wandering Thoughts archives

2010-10-03

An API mistake Unix has made several times

Unix has generally had decent APIs, but every so often Unix people have been a bit too concerned with minimalism and storing data as efficiently as possible. Several times this has invited in string-related APIs that practically beg for code to make mistakes, because of how they specify string termination.

Here is an abstracted example. Consider an API where you use the following structure:

struct dirent {
    uint16_t  ino;
    char      name[14];
}

You will notice that there is nothing explicit to tell you how long the name is. Instead, the API has a rule: the string in name is null-terminated unless it is exactly 14 characters long. This maximizes the length of the name that you can store in the structure, at the cost of complicating code that has to get the name out.

You can guess what happens next. Many people who write code that has to deal with this structure simply use strcpy() to copy the name to their own string, instead of the more complicated version that also deals with the case of a 14-character, non null-terminated name. The resulting programs work most of the time, because most of the time the name is shorter than 14 characters, but they blow up oddly every so often in what appears (to their users) to be unpredictable patterns. Over time, semi-superstition evolves to the effect that '14-character names are bad, avoid them'.

(This is of course yet another example of having to be sure that something actually is a C string, as well as the fact that exceptions are hard for people to remember.)

I blame this partly on minimalism because one of the ways to deal with this would have been to make some accessor functions and tell people to always use them. Instead, the structures were simply exported to people directly and every programmer using them had to get the whole access dance correct. This has the minimalism of avoiding an 'unnecessary' and obvious function in the standard library, at the cost of having people get it wrong with reasonable frequency.

(Off the top of my head, I believe this mistake was made in at least the original V7 directory format and in some versions of utmp records.)

My meta-moral for this is make things in your API that look like C strings actually be C strings. If people can treat them as C strings and have this work most of the time, a significant number of people will treat them as C strings regardless of what you say in your documentation. The corollary is that if you have things that are not C strings, you should consider actively frustrating attempts to use them as such by means like never null-terminating them. If you don't want to do this, make accessor functions that do it right and don't expose the raw structures.

programming/UnixAPIMistake written at 01:52:58; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.