2013-07-06
Is it particularly useful to have old Unix source sitting around?
Over the years, the university collectively has acquired and accumulated quite a lot of Unix source code (by this I mean the copyright-restricted stuff from Unix vendors). We came by this source code not through any large deliberate action but instead because for a long time it was routine for us to ask vendors for system source and for vendors to give it to us for a nominal sum. In the early days people made significant use of the source code, modifying it to fix bugs and customize things that we needed; in later days this source code became more and more mostly for curiosity and due to habit.
(Partly this was because we no longer needed to change things so much, partly it was because open source tools got better, and partly it was because Unix itself got more modular through things like PAM.)
This source code never been centralized or centrally tracked; instead it's spread out all across the university in various different groups. As I sort of mentioned in an earlier entry, every so often some of it gets lost because a group throws away some ancient stuff (either explicitly or just by never migrating the data from an old system that they then turn off). After writing that earlier entry I found myself wondering if it would be worthwhile to change this a bit, for example by just by polling the university's sysadmins so we can all get an idea of what historical Unix source code we collectively still have.
But this raises a question: is this stuff actually useful or important? If it's not then there's no real point in going through the effort to catalog or collect it (and it's difficult to feel enthused about doing the work).
My reluctant conclusion is that it isn't. If the Unix Heritage Society didn't exist, some of what we have would be of possible general (internal) interest, but TUHS already has things like V7 and the 4.x BSD source. Beyond that, I just can't think of any use beyond extremely rare curiosity (and the related rare use of settling obscure arguments). None of this source code is for anything even close to a live system and the days when people looked at vendor Unix source to learn from it are long over; there are much better options now.
(It's possible that I'd enjoy having a collection of Unix source around enough to justify doing the work, but it's a hard sell even to myself. The reality is that I barely look at this stuff and I'm not certain I've ever written a blog entry that actually depended on it.)
Part of me finds this a bit sad, because there was a day when all of this vendor Unix source was a big deal and it was thrilling and exciting to have access to it. But such is the passage of time.
2013-07-04
My version of the story of universities and Unix source code
In the beginning there was Research Unix Version 7 from Bell Labs (a part of AT&T). AT&T gave V7 away basically for free to universities and V7 didn't so much come with source code as intrinsically require that source code in order for you to install and operate it sensibly. On top of that V7 needed plenty of hacking to add features to it, so people did. As a result of this, plenty of universities acquired AT&T licenses for V7 (source code included).
When the UCB CSRG created BSD Unix from V7 they gave it away for free because they were, after all, an academic research organization. You had to have a V7 source license from AT&T because the BSDs contained AT&T code (which would, a decade later, contribute to a nasty set of lawsuits). 4.2 BSD was better than V7 but that didn't make fully complete, so lots of people at universities hacked on it, fixed things, and sometimes did really odd things to it in order to support lots of nosy students on inadequate hardware. Like V7 before it, BSD was not infrequently just the raw material for a local computing environment rather than a 'ready for production' boxed pack of software.
(Early Unix was an environment of system programmers, especially once you started running into bugs and swapping bugfixes with other people.)
Soon various Unix vendors got into the action, licensing V7 from AT&T and adding modifications to the BSD base (including porting it to their own hardware and so on). When they sold the result to universities, the university system programmers asked the vendors for the source code because of course the system programmers were going to have to customize the vendor system. For various reasons early Unix vendors said 'sure, if you have an AT&T V7 license because this still descends from V7'. In practice source code was less and less necessary over time and less and less used as vendors successfully turned Unix into something you didn't need system programmers to run. Still, universities kept asking for source out of habit and just because and vendors generally kept giving it to them.
(Or at least old departments and groups with a habit of this kept asking for source code. As Unix expanded into new groups and departments, the new people often didn't ask because they had no use for it.)
There were two complications with this picture. The first one is that at some point AT&T stopped giving out V7 licenses, even to universities (and I think this was fairly early); if your university hadn't gotten into the Unix game early enough and thus was unlucky enough to lack a V7 license, no one would or could give you any further source code. The second is that many vendors themselves started to become increasingly reluctant to give out their source code, making it harder to get or simply no longer making it available at all. My vague impression is that by the late 1990s, even universities were no longer getting source access (right when the free Unixes were coming along to make the whole issue moot, although I think that's just coincidence).
(These days the V7 source code is publicly available at the Unix Heritage Society. I have no idea what this does to assorted licensing issues.)
2013-07-03
You can re-connect() UDP sockets (portably)
A commentator on my entry on UDP sockets and sendto() noted:
You can disconnect or re-connect a UDP socket, at least under Linux. The man page says:
"If address is a null address for the protocol, the socket's peer address shall be reset."
They are absolutely correct. More than that, this is a portable socket
feature; you can find similar wording in the manpages for FreeBSD
and even Solaris 10, to name two Unixes that I checked manpages for.
Everyone supports both completely resetting a UDP socket back to the
disconnected state (as stated above) and simply changing the address
that it's connected to (you just connect() it to the new address).
In fact this caused me to become curious about how far back this
particular feature went, which is where the Unix Heritage Society's Unix Tree comes in really handy.
I will cut to the chase: this connect() behavior first appeared
in and was first documented in the first 4.3 BSD release. It was
not in 4.2 BSD, where you really could only connect() a UDP
socket once.
(I actually read the online kernel source code to make sure of this,
because I thought it might have been in the code but just not
documented. No such luck. The relevant 4.3 BSD kernel source
actually has a rare comment to explicitly discuss the new behavior;
see soconnect().)
In an interesting note, this 4.3 BSD change apparently did not
immediately make it into commercial Unixes. The Unix Tree has
PDP-11 Ultrix 3.1 source online and its soconnect() has the
original 4.2 BSD behavior, although it was apparently released two
years after 4.3 BSD. Ultrix did eventually pick up this 4.3 BSD
behavior; we happen to have Ultrix 4.2 source still online and it
has this change. Based on kernel source again, SunOS also appears
to have picked up this change sometime between SunOS 3.5 and SunOS
4.1.
(The SunOS history page suggests that this change likely appeared in SunOS 4.0 when it became fully based on 4.3 BSD.)
(You may wonder why we have so much ancient Unix source code sitting around. The answer boils down to 'university sysadmins hate deleting source code, especially once it becomes an antique'. It's not like this stuff takes up much disk space, either; the entire Ultrix 4.2 source tree is only 210 MBytes. Instead the way we usually lose this historical stuff is by forgetting that it's there and then casually abandoning entire filesystems because they're just 'obsolete stuff'.)