C's malloc() and free() APIs are reasonable APIs for C

September 10, 2022

I've written about how the free() API means that C memory allocation needs to save some metadata, and before that about the effects of malloc() and free() on C APIs. However, despite these issues I think that C's malloc() and free() APIs are reasonable ones for C to have, and probably that they've helped C remain relevant over the years.

To start with, both malloc() and free() have what I could call a minimal API, where they take only the arguments that are really needed; in the case of both, this is one argument each. You can't allocate memory without saying how much in some way, and you can't free memory without saying what to free, so it's hard to get a smaller API without some form of automatic handling of memory (including Rust's automatic lifetime management). Having memory allocation as an explicit API has also meant that you can readily write C code that doesn't allocate memory (except on the stack), or use entirely different allocation functions that you built yourself. Both OS kernels and embedded code tend to use something completely different from malloc() and free().

That you don't have to pass a length (or any other qualifier) to free() is also a good fit with C's approach to strings, which don't have an explicit length field. Certain sorts of C code frequently deal with allocated strings of uncertain length and wind up freeing them; if free() required an explicit length, there would be a lot of additional strlen() calls. I think that the C malloc() and free() API (and realloc()) is a quite good fit as a simple string-focused memory allocator.

That C's memory allocation APIs are so minimal has allowed for a lot of experimentation with some approaches to memory allocation. There are a lot of more complicated allocators that can be hid behind the C API, for things like size class based allocation and so on. A more feature-rich API (for example, one that had an explicit idea of 'arenas') might have foreclosed certain sorts of internal implementations, or at least made them more complicated.

(It's worth noting at this point that on most Unixes, the internals of C memory allocation were completely changed over the course of time, from sbrk() to mmap(). Nothing really noticed.)

Within the constraints of being a simple and general API and allowing relatively simple and efficient implementation on small systems (which is what Unix and C started on), I think that malloc(), free(), and their friends are at least a decent API. They've certainly proven to be remarkably durable and functional; although there are alternate implementations (and modern C libraries generally use much more complicated approaches), no alternate API has really caught on.

(Well, at least on Unix. My knowledge of Windows, macOS, and mobile environments is limited.)


Comments on this page:

Having memory allocation as an explicit API has also meant that you can readily write C code that doesn't allocate memory (except on the stack), or use entirely different allocation functions that you built yourself. Both OS kernels and embedded code tend to use something completely different from malloc() and free().

Having an explicit API precludes not having dynamic allocation as a part of the language. All of this is possible in better ways with Ada. In Ada, a variable can be locally declared by name, or dynamically allocated with new and deallocated with an instantiation of Unchecked_Deallocation. It's very easy to write Ada code that dynamically allocates naught, and this can also be enforced with pragmas. However, Ada 2012 also allows for customization of this, including arenas and everything else, by having real interfaces to such functionality.

In the C language, it's not the same way. I can declare a local variable by name, but can't dynamically allocate a variable by name. This is because the former is a part of the language, and the other isn't, except it still is. Imagine if all local variables required their sizes, since that's totally relevant information, apparently.

That this meshes well with the dumbassery involving zero-terminated strings is of little importance to those who work with fewer worthless limitations.

That C's memory allocation APIs are so minimal has allowed for a lot of experimentation with some approaches to memory allocation.
(It's worth noting at this point that on most Unixes, the internals of C memory allocation were completely changed over the course of time, from sbrk() to mmap(). Nothing really noticed.)

This is called abstraction, which is uncommon with the C language and UNIX, sure. The implementation strategies for APL have changed over the years too, so why isn't APL considered to be flexible so?

A more feature-rich API (for example, one that had an explicit idea of 'arenas') might have foreclosed certain sorts of internal implementations, or at least made them more complicated.

So other implementation strategies are good, because none of their special qualities may be directly used? Why not use Brainfuck more often? It has many implementation strategies.

They've certainly proven to be remarkably durable and functional; although there are alternate implementations (and modern C libraries generally use much more complicated approaches), no alternate API has really caught on.
(Well, at least on Unix. My knowledge of Windows, macOS, and mobile environments is limited.)

Other systems try to eliminate the related class of errors entirely, and only allow for automatic memory management.

By Flatfinger at 2023-02-05 17:04:35:

Were there not a need to be compatible with programs that would make non-portable assumptions about pointers returned by malloc(), one could specify that free(ptr) would be semantically equivalent to:

   void free(void *p)
   {
     typedef (*allocFunc)(void*, void*);
     if (!p) return;
     allocFunc f = *((allocFunc*)p -1);
     if (f) f(p, 0);      
   }

Implementations of `malloc()` would be required to store a pointer to a compatible allocation-adjustment function, but free() and realloc(), and more importantly user-code functions that call them, could be agnostic as to whether they were passed storage created via malloc(), a static buffer which is preceded by a null function pointer, or something else.

Likewise, for implementations of longjmp.h which don't need to be compatible with an existing ABI could specify that the first item of a jmp_buf should be a pointer to a function that receives a jmp_buf and rewinds the stack space to it. If a set of functions used a thread-static pointer to jmp_buf for error handling, code that needed to install error handlers would then be able to install its own jmp_buf into the stack unwinding process, even if it was processed by a different C implementation from the one that installed the original stack unwinding handler.

Written on 10 September 2022.
« How we monitor the temperature of our machine rooms
The amount of memory in basic 1U servers and our shifting views of it »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Sep 10 22:09:13 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.