The source of C's dependency hell for linking

February 19, 2013

C famously has a dependency hell problem for linking (both static and dynamic, although the static linking one is often more tractable). This is the problem both of what libraries you need (including what libraries are needed by the libraries that you need) and in what order you need them; it often results in people cramming ever-increasing numbers of libraries into their compiler command lines in the hopes that one of those libraries satisfies things.

As I alluded to in a comment on this entry, the root of that dependency hell is that C has only a single global namespace. With a single global namespace there is no explicit 'import' operation; global names can come from anywhere and appear from everywhere (in fact this is abused as a feature, where you can override or preempt a library routine). One way to put it is that in C, all global names from outside the current file are late-binding and scopeless. They can only be fully resolved or declared invalid at link time when the final binary is built. This leads naturally to libraries that themselves depend on and use global names which come from, well, somewhere, no one knows exactly where until link time.

(Global names often must be declared but this declaration is itself without scope or origin. There are many unfortunate things that result from this, including the potential mismatches between declarations and actual reality.)

This is in stark contrast to a compiled language with a package system and explicit imports (such as Go). In those languages, names are always within the scope of a package and a competently implemented compiler environment reliably knows the dependencies (both direct and transitive) of a piece of code; it knows what packages the code has imported and used names from, and it knows what packages those packages need, and so on. It may not be able to find them on the filesystem, but it can at least tell you that this code needs the compiled forms of the following N packages. It can even throw in version numbers (or something more comprehensive) if it wants to.

My memory is that Plan 9 made some attempts to change this for C. If I remember right, Plan 9 basically moved to a model where there was one header file per library and each header file contained a pragma to tell the compiler what the library was. Of course this is not ANSI-compatible in the least but I don't think the Plan 9 people considered this much of a problem.

In theory the library dependency problem can be dealt with; at the time you build a library (static or dynamic) you can 'link' everything as far as resolving all of the global names that the library needs, then note down where they all came from. In practice traditional Unix static libraries have never had this information and aren't built in ways that creates it (a traditional static library is just an archive of object files). I think that some dynamic library formats have attempted to include this sort of dependency information where available as a hint to various parties.

(And of course a C compiler environment could add support for a Plan 9 like pragma to say 'the stuff from this header file comes from this library' and then embed the resulting hint in the generated object files and so on. But I don't think anyone has. My cynical side suspects that it's just not considered an important problem.)

Written on 19 February 2013.
« The strikes against Solaris 11 for us
The meaning of listen(2)'s backlog parameter »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Feb 19 01:37:52 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.