The pervasive effects of C's malloc() and free() on C APIs

August 6, 2022

In my entry on the history of looking up host addresses in Unix, I touched on how from the beginning gethostbyname() had an issue in its API, one that the BSD Unix people specifically called out in its manual page's BUGS section:

All information is contained in a static area so it must be copied if it is to be saved. [...]

This became a serious issue when Unix added threads (this static area isn't thread safe), but was seen as a problem from the very beginning. Given that the static return area was known as an issue, why was the API written this way?

While I don't know for sure, I think we can point fingers at the hassles that dynamic memory allocation brings you in a C API. The gethostbyname() API returns a pointer to a 'struct hostent', which is (from 4.3 BSD onward):

struct  hostent {
   char  *h_name;     /* official name of host */
   char **h_aliases;  /* alias list */
   int    h_addrtype; /* address type */
   int    h_length;   /* length of address */
   char **h_addrs;    /* list of addresses */
};

If this structure is dynamically allocated by gethostbyname() and returned to the caller, either you need an additional API function to free it or you have to commit to what fields in the structure have to be freed separately, and how (ie, this is part of the API). Having the caller free things is also not all that simple. Since this structure contains embedded pointers (including two that point to arrays of pointers), there could be quite a lot of things for the caller to call free() on (and in the right order).

This issue isn't unique to gethostbyname(); it affects any C API that wants to return (in a conceptual sense) anything more complicated than a basic type or a simple structure (even in old C, simple structures can be 'returned' by passing a pointer to the structure to the function, as is done in stat()). C offers no good solution to the problem; either you add one or more 'free' functions to your API (one per dynamically allocated structure you're returning), or you document and thus freeze the process for freeing what you return, or you do what BSD opted to in gethostbyname() and return a pointer to something static.

(Documenting what callers have to free implies that you can't later add extra fields to what you return unless they don't have to be freed separately.)

In POSIX, this API issue was eventually worked around with the first approach, when they added a freeaddrinfo() function to go with the new getaddrinfo(). This is the only particularly good solution, but it does mean that you get an increasing profusion of 'free something' functions, which serves as a disincentive to add APIs which would return something where you'd need such a function.


Comments on this page:

There is a third option that makes sense in some cases - require the caller to pass the buffer of a size that is returned when the API is called with a null pointer. Caller frees their own buffer at their convenience.

By Rob Mayoff at 2022-08-07 18:06:36:

Another solution: return a single allocation containing the struct hostent followed by the bytes of the other fields (the h_name, the h_aliases and what they point to, and the h_addrs and what they point to). Now your caller only needs to call free once, on the pointer returned by gethostbyname, and you can add more fields later if needed.

Written on 06 August 2022.
« How old our servers are (as of 2022)
Our BMCs are not great at keeping accurate time »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Aug 6 21:41:36 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.