The advantage of garbage collection for APIs

April 21, 2010

Here's a question that interests me: why don't I have a standard C version of my Python warn() function? The Python version shows up in most programs I write, but in my C programs I generally directly fprintf() errors to stderr, instead of having a utility function for it.

Part of the reason is that a C version of warn() really calls for a different and more complicated set of arguments. Instead of a single string, the natural C-ish approach is printf()-like, creating a prototype of 'warn(const char *fmt, ...)'. In turn this makes warn() a non-trivial function, because varargs functions in C are vaguely annoying.

(Things get even more interesting when one implements die(), since you can't have it call warn(); either you duplicate warn()'s code or you wind up needing a vwarn() with the prototype 'vwarn(const char *fmt, va_list ap)' that both warn() and die() call.)

But why does the C version of warn() need a different API than the Python one? One answer is that Python has a first class operator to format printf-like strings (the string '%' operator, in a marvelous abuse of operator overloading) and C doesn't, so in Python it's idiomatic to format strings directly in your code, instead of handing the format and the arguments to another function. This isn't the whole story, though, since C has sprintf() and could do much of the same tricks if people wanted to.

The problem with sprintf() (and why in practice it is not used for this) is that you have to give it a buffer. If you use sprintf() you have to think about how big a buffer you need and what you do if it isn't big enough. If you use fprintf(), you don't have to think about all of that, so people use fprintf(); in C, people will go to a lot of effort to avoid having to think about buffer issues.

And the reason that sprintf() needs you to supply the buffer is so that no one has to worry about allocating and especially deallocating the buffer, because memory management is a pain in the rear in C. Not just because you have to do it, but also because you have to decide where it's done and which function has to look after it and this generally complicates the code. (For example, warn() can't just free the string it's passed, which means that you can't write 'warn(strformat(fmt, ...))' because this leaks the string.)

Thus I come to the conclusion:

A subtle advantage of garbage collection is that it enables APIs that would otherwise be impossible, or at least dangerous.

Garbage collection is the essential glue that enables my Python versions of warn() and die() to have their simple APIs, because it is what makes Python's '%' operator convenient (among many other Python APIs). You could implement all of the operations of the Python version in C, but you shouldn't; in a non-GC'd language, object allocation is to be avoided as much as possible, and certainly turned into someone else's problem.

(Hence sprintf() does not allocate anything; finding a buffer for it is your problem.)


Comments on this page:

From 66.243.153.70 at 2010-04-21 13:33:39:

GNU provides asprintf and vasprintf to allocate a buffer of suitable size and print into it. If you know you are calling them then you can know that you will always need to free it.

Yes it does still require you to free it but you don't need to worry (or at least not too much) about the buffer not being big enough.

Icarus

From 65.172.155.228 at 2010-04-21 23:37:21:

It's hard to say for sure, but it looks like you just have less experience with C than with Python. Vararg functions just aren't that hard (and I see the exact same problems wrapping python's "logging" module functions).

In the specific case of warn/die they are called warn/err and are in err.h

Not that I don't appreciate the expressive power of GC, but I don't think your example is a good one. A much better example, IMO, would be something like the Schwartzian Transform:

http://www.stonehenge.com/merlyn/UnixReview/col64.html

From 65.172.155.228 at 2010-04-22 00:09:01:

Of course I now go to check that warn/err do the flushes (I was pretty sure they did), and I see they do neither of them ... sigh.

But still, you can do both xwarn/xdie with a call to vwarn(), the flushes and the 3 line stdarg scaffold.

By cks at 2010-04-22 01:58:59:

Varargs functions aren't hard per se in C, but they are annoying; there is an extra dance that you have to do, and you need special versions of every other function that you're going to pass your varargs to, and so on.

(And as is noted in passing in the Linux stdarg(3) manpage, you can't do certain useful things with varargs functions.)

Note that warn() and err() and company are not portable; I believe they are Linux and *BSD only at this point. (Certainly they are not there on Solaris, and for my sins I still care about it.)

By nothings at 2010-04-22 02:37:17:

Once you've written enough varargs functions, they are pretty rote/trivial. The idea of making one that takes a va_list, and the other a wrapper around it, is in fact understood to be the canonically correct way of doing varags, although I can't remember the places I've seen that recommend it offhand.

I think your analysis of the sprintf vs. allocation situation in C is actually entirely misrepresentative.

In standard C, there are no functions that visibily allocate memory other than the memory allocation functions themselves. (Some functions like fopen allocate memory under the hood; some functions like strdup() visibly allocate memory, but they are POSIX not standard C.)

I don't think this reflects some difficulty with memory management -- people write lots of code all the time that allocates memory and has to keep track of who "owns" it and thus is responsible for freeing it, and strdup() shows it's not exactly a complicated problem. Rather, I think this simply reflects the standards of the time when C and its standard library was designed. People may not have been comfortable with the idea of it, but moreover I imagine it was just an efficiency concern... better to let the client manage their memory themselves.

And since then it's been a failure of the people with control of C to significantly expand its standard library (thus leaving it totally overrun not just by C++ with the STL, but even moreso by Perl or Python with their massive libraries, or, in the Windows world, rather brutally overrun by C#--indeed I think people overly value C#'s garbage collection and fail to realize it's all the standard libraries that makes it significantly better).

It is true that a case like warn() is harder to handle this way (which is why I would just make a varargs warn(), and in fact have done similar things). You can always engineer around that, but I hope we'd agree that the grotesqueness of

  free(warn(asprintf("whatever %d %d",x,y)));

while implementable, is not actually a good idea.

One could theoretically make an asprintf() that alloca()d in its parent's stack (which is totally possible to implement but not actually available in any version of C as far as I know--much like the compiler has to know about alloca, the compiler would have to know about 'asprintfa').

You could always write it out:

  char *s = asprintf("whatever %d %d",x,y); warn(s); free(s);

isn't so bad, and if you want the warn in front:

  s = warn(asprintf("whatever %d %d",x,y)); free(s);

isn't the end of the world.

There are other possible ways, too.

Anyway, there is a proposal on the table for C1X to change the nature of the standard library and allocation--that is, to embrace allocation. Unfortunately it was motivated entirely by security: if the called function is responsible for allocating the memory to store something, you can't overflow a buffer. See TR 24731-2 Extensions to the C Librar Part II: Dynamic Allocation functions. http://www.open-std.org/jtc1/sc22/wg14/

Of course what I actually do most of the time in my code is I have an sprintf() to a static memory buffer that's 4K in my personal standard library, so it's not threadsafe and truncates at 4K, but I can just do things like:

   f = fopen(stb_sprintf("%s/%s",path,filename), "rb");

Of course it will fail if you ever hit 4K. I never do, with the kinds of things I'm doing, but I can imagine this isn't good enough for your needs.

Certainly I do not disagree with your conclusion about APIs. Indeed, I would argue it's not subtle at all; I think it's widely-believed to be difficult to do deeply functional languages or things like closures without garbage collection. (Where 'difficult' means 'impossible to deliver something in practice anybody would want to use', although we will see how the closures coming in C or C++ (i forget which) turn out.)

Written on 21 April 2010.
« Standard format Unix errors in Python programs
The latest Solaris licensing and support rumbles »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Apr 21 02:15:49 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.