Ways that I have lost the source code for installed programs

February 8, 2020

When I compared Go to Python for small scale sysadmin tools, I said that one useful property for Python was that you couldn't misplace the source code for your deployed programs. This property recently hit home for me when I decided that maybe I should rebuild my private binary of my feed reader for reasons beyond the scope of this entry, and had to try to find the source code that my binary was built from. So here is an incomplete list of the ways to lose (in some sense) the source code for your installed programs, focusing on things that I have actually seen happen either at work or for my personal programs.

I don't think I've ever lost source code by deleting it. This is less because I keep careful track of what source code we need to keep and more because I tend to be a software packrat. Deleting things is work, disk space is cheap, we might as well keep it around (and then at some point it's old enough to be of historical interest). We have deleted configuration files for a now-dead machine that were also being used by another machine, though, so we've come close.

The straightforward ways of losing source code by either forgetting where it is or by moving it from where it originally was to some 'obvious' other spot are two sides of the same coin and can be hard to tell apart from each other when you're looking for the source (although if you run 'strings' on an executable to get some likely paths and then they aren't there, things probably got moved). A dangerous and time-wasting variation on this is to start out with the source code in /some/where/prog, build it, rename the source directory to /some/where/prog-old, and reuse /some/where/prog for a new version of the program that you were working on but didn't install.

A variant of this is to wind up with several different source code directories for the program (with different versions in them) with no clear indication of which directory and version was used to build the installed program. If directories have been renamed, the strings embedded in the executable may not help. If you're lucky you left the .o files and so on sitting around so you can at least match up the date of the installed program with the dates of the .o files to figure out which it was.

Another way to lose source code is to start to change or update the code in your nicely organized source directory without taking some sort of snapshot of its state as of when you built the binary you're using. This one has bitten me repeatedly in the past, when I had the source directory I built from but it was no longer in the same state and I had no good way to go back. There are all sorts of ways to wind up here. You can be in the process of making changes, or you can have decided to merge together several divergent versions from different systems (with different patches and changes) into one all-good master version.

(Merging together disparate versions was especially an issue in the days before distributed version control systems. We had a locally written MTA that was used by multiple groups across multiple systems, and of course we wound up with a whole cloud of copies of source code, of various vintages and with various local changes.)

The final way of 'losing' source code that I've encountered is to have the unaltered source code in a known place that you can find, but for it to no longer build in a current environment. All sorts of things can cause this if you have sufficiently old programs; compilers can change what they accept, header files change, your program only works with an old version of some library where you have the necessary shared library objects but not the headers, the program's (auto)configuration system no longer works, and so on. In the past a big source of this was programs that weren't portable to 64 bit environments, but old code can have all sorts of other issues as well.

Comments on this page:

By Tom Matthews at 2020-02-08 04:42:19:

I feel like a distributed source code repository like git would be useful in the above situations. Whether it’s bitbucket (Atlassian) at work, or github for my personal scripts and code, its saved me more than once when I couldn’t find code or had deleted it locally by mistake.

One of the things I picked up from working on Apache and FreeBSD was the habit of embedding source control version strings in binaries. This was easy in the days of SCCS and CVS with the what and ident commands (or with strings if you know the version string format).

Git makes this a bit more tricky, because you have to add a build step that embeds the output of git describe in the binary for use by a --version option (git and BIND and my own programs do this). For extra fun you can wrap it in "@(#) $Version: YOUR VERSION HERE $" for use by what and ident :-)

By Greg A. Woods at 2020-02-11 17:04:39:

I remember a long time ago when I worked on a contract at CP Rail that there was a bit of a kerfuffle one day when people were roaming about searching filing cabinets. Turns out they had misplaced the only tape and printout of some cobol program.

I'm sort of in a transition now of trying to share some of the projects I've been maintaining by uploading them to GitHub, and I'm sometimes finding it hard to figure out how best to maintain them afterwards. Git has many advantages, but my finger memory is far more finely tuned for using, e.g., SCCS for some things. This has now twice lead me into minor confusion as to just what state a given project is in.

For the time being I've decided to continue to do primary work in at least two projects in their original SCCS archives, and to use conversion/migration tools to publish them. One of those projects is the migration tool itself, so it's a good test to have at least one other to use regularly the same way.

By Matthew Clairmont at 2020-02-16 15:05:25:

I feel that lost source code really is more of an organizational issue than a technical one. If you're not storing code in a versioning program, Github or otherwise, you're putting yourself (or your team) in an unnecessary position for future headaches. Implementing Github into your workflow is typically all of 3 or 4 commands basic version control.

Written on 08 February 2020.
« I frequently use dependencies because they enable my programs to exist
Code dependencies and operational dependencies »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Feb 8 00:01:03 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.