2009-03-31
The SSD boom and the theoretical multicore revolution
For quite a while now, CPU vendors have been trying to persuade people to spend a great deal of money recoding applications to run on heavily multicore CPUs so that the CPU vendors could continue to sell expensive new CPUs (to put it one way). The motivation CPU vendors offer is that if you don't make the investment, system performance can't increase any more and then the wheels come off everyone's wagon, not just theirs.
Now consider the current SSD boom. Pretty much everyone agrees that one of the best ways to improve your system right now is to replace your hard drive with an SSD; for most people, their system becomes a lot more responsive and any number of things that they do get faster in practice. There are a lot of SSDs that are going to be sold in the next few years, and they will improve a lot of systems much more significantly than most CPU upgrades do.
Replacing hard drives with SSDs is only one of quite a few practical performance improvements waiting in typical systems. A lot of PC components have basically been put on the back burner for the past while in favour of chasing ever faster CPU speeds, sometimes to absurd degrees both in theory (just look at how much faster CPUs are than memory) and in practice (even before SSDs, one of the best performance improvement many people could make was not a faster CPU but more memory). Now improvements to these other components are a fruitful source of overall system performance improvements (and thus sales) for system vendors, even if this leaves the CPU vendors in the cold.
It gets worse. The blunt unfortunate truth for CPU vendors is not just that both software firms and system manufacturers have options for selling upgrades, but that more and more people do not need more CPU power at all. It is hard to sell more CPU to someone who is already not using all of the one that they have, and knows it.
2009-03-26
The git version control system as a creation of the modern age
It has recently struck me that git (the distributed version control system that Linus Torvalds created) is very much a creation of the modern age. By that I don't mean that it was created recently or by modern, Internet-enabled methods; I mean that it only makes sense in the modern era of really cheap, really large disk space.
Earlier version control systems were created when disk space was much more expensive and not all that large relative to your source code. In that era, wasting disk space was a sin and a serious issue, so the version control systems spent a lot of effort (and gave up a lot of speed) in order to have efficient storage of closely related versions of the same file, using interleaved fragments (SCCS) or reverse deltas (RCS) or the like. (Then people agonized about how storing binary files made these schemes blow up, and invented binary delta formats, and so on.)
In the modern era, disks are huge relative to your source base and all of those complex compression schemes are premature optimizations and more or less pointless. So git started out by throwing the whole issue out the window and just taking the brute force approach of storing full files. Change one line? You stored another full copy of the file. In the modern age, the extra disk space didn't really matter, not compared to the simplifications that you gained from this approach. In the old era that would have been unthinkable, and the lack of 'efficient storage' would have gotten git laughed out of the room.
(I suspect that chasing efficient storage had other effects on the overall design of old version control systems. For example, is the obsession with explicitly tracking file renames partly driven by the storage inefficiency of not doing so?)
2009-03-22
An outline of a possibly easier IPv4 to IPv6 transition
While an idealized IPv4 to IPv6 transition (one where IPv6 is completely backwards compatible with the IPv4 network) is impossible, it's possible that IPv6 could have done some things differently to make the transition easier. The outline of the theoretical easier transition goes like this.
First, you officially embed the IPv4 address space into the IPv6 address space, so that every IPv4 address is a valid IPv6 address inside a known area of the IPv6 space. You also need to add a rule that permits operating systems to make IPv4 connections to these addresses provided that suitable conditions are made (both endpoints have IPv4 addresses, you have IPv4 connectivity, and no IPv6-only flags are set).
With this, operating systems can immediately start adding system calls that accept IPv6 addresses, even before they actually have an IPv6 protocol stack; the only IPv6 addresses and options that are accepted are the IPv4 subset, and the operating system just talks IPv4 to them. In turn this allows applications to immediately start being modified for IPv6, and in fact to only use IPv6 addresses and system calls if that simplifies their lives. As operating systems get actual IPv6 stacks, the system calls can start accepting general IPv6 addresses and talking IPv6 to them.
In effect, you would turn IPv6 into an easy superset of IPv4 from the perspective of programs (and somewhat for system administration). Programs would only deal with IPv6 addresses and you would mostly only bother configuring IPv6 on systems; it's just that some of the time you would be using a limited subset of IPv6 addresses and might have limited or no connectivity to non-subset addresses.
The goal of all of this is to shortcut the chicken and egg problem that dogs IPv6 deployment today; you could have gotten IPv6 enabled programs and systems out in the field much earlier than we have, and gotten them talking IPv6 in live deployments. Once you had a critical mass of IPv6 systems deployed and talking IPv6, general IPv6 addresses are much more useful and easier to deploy.
This is a nice theoretical approach, but it has some practical engineering issues:
- you really want a protocol that IPv6 enabled stacks can use to find
out if they can talk actual IPv6 to those special IPv4 subset
addresses, or if they have to keep talking IPv4 to them behind the
scenes. You can fake this for TCP (you can try IPv6 first and fall
back to IPv4 if you get nowhere), but you need an actual way to tell
for UDP et al.
- you more or less permanently complicate IPv6 routing tables, because
you necessarily embed in them the large and baroque IPv4 ones.
- despite programs being nominally IPv6 ready, you don't know if they
can really deal with unrestricted IPv6 addresses until you start
feeding such addresses to them. Since programs start out only dealing
with a very restricted subset of IPv6 addresses, they may have
committed all sorts of convenient hacks and quick conversions.
- similarly and more alarmingly to system and network administrators, you don't know what lurking security issues are waiting for you to turn on real IPv6 connectivity and start talking to arbitrary IPv6 addresses. You can't assume that just because your programs and systems are secure with IPv4-in-v6 addresses that they'll stay that way for general addresses.
I suspect that the engineering issues are enough to sink this as a practical idea (or were, if anyone proposed something like it back at the dawn of IPv6).
(This entry and last entry were both sparked by reading DJ Bernstein's the IPv6 mess and then scratching my head. I still don't see how his desired transition would actually work.)
2009-03-21
Why the ideal IPv4 to IPv6 transition is impossible
The idealized and ideal IPv4 to IPv6 transition would work something like this: if you take a new machine and give it an IPv4 address, it can only talk to other machines with IPv4 addresses. But if you take a new machine and give it an IPv6 address, it can talk to machines with IPv6 addresses and also machines with IPv4 addresses. Then you could progressively give all your new machines IPv6 addresses, and eventually stop caring much about IPv4.
(Necessarily, any conversation with an IPv4-only machine has to be done with IPv4.)
Unfortunately, this idealized transition is impossible as far as I can see, and not because the IPv6 people 'fumbled' the design of the protocol by not making it backwards compatible with IPv4 this way; the backwards compatibility is itself impossible in general.
The fundamental problem is that each endpoint of an IPv4 conversation has to have a (unique) IPv4 address; when you talk to an IPv4 machine, it needs to know your IPv4 address so it can send packets back to you. What unique IPv4 address does an arbitrary IPv6 machine have?
You cannot make a one to one mapping of unrestricted IPv6 addresses to IPv4 addresses because there are way more IPv6 addresses than there are IPv4 addresses; that is one of the big attractions of IPv6. If you restrict the available IPv6 addresses so that you can map them to IPv4 addresses, you're stuck with the limits of the IPv4 address space, limits that everyone wants to get away from by moving to IPv6.
You can skip requiring a one to one mapping between random IPv6 addresses and IPv4 addresses but then you need some sort of address translation box between the two ends of the conversation, with all of the usual problems of address translation. Besides, once you introduce things like translation boxes you do not really have a nice transition any more; what you have is awkward backwards compatibility.
(There are some things that IPv6 could have done to potentially make a transition easier, but that's for another entry.)