Updating software to IPv6 is often harder than you might think
A while back D.J. Bernstein wrote was is now a famous rant about IPv6. Due to various things, this DJB article is on my mind and today I want to talk about one part of it that DJB casually handwaves, which is updating all software to support IPv6.
The obvious problem with software is that most of the traditional system APIs have specified IP addresses in fixed-size objects and with explicit, fixed types. Very little software has been written using generic APIs and variable-sized addresses, where you could just drop the bigger IPv6 addresses in without trouble; instead a lot of software knows that it is talking IPv4 with addresses that take up 4 bytes. Such software cannot just be handed IPv6 addresses, because they overflow the space and various things would malfunction. Instead systems have been required to define an entirely new and larger 'address family' for IPv6 and then software has had to be updated to support it along side IPv4.
The first complication emerges here: not only do you need a new address family, you need new APIs that can accept and return the new address family. Sometimes you need the new APIs because old APIs were defined only as returning 4-byte IPv4 addresses; sometimes you need new APIs because tons of people wrote tons of code that just assumed old APIs only ever return 4-byte IPv4 addresses.
(You could break all that code, but that would be a recipe for a ton of bugs for years. Let's not go there.)
But the larger problem is that IP addresses don't confine themselves
just to the networking layer of programs where they get handled
by generic system APIs that you can make cope with the new IPv6
addresses. Instead, in many important programs IP(v4) addresses
ripple through all sorts of other code. This code may represent them
in all sorts of ways and it may do all sorts of things to manipulate
them, things that 'know' various facts that are only true for IPv4
addresses. For instance, I have several sets of code in several
different languages that know how to make DNS blocklist queries given an
'IP address'. Depending on the language, it may split a string at every
'.' or it may take four bytes in a specific order from a 32-bit integer,
or even a
byte addr array.
(Some of this code may be at some distance from actual network programs. Consider code that attempts to process web server log files and do things like aggregate traffic by network regions, or even just tell when the logs have IP addresses instead of hostnames.)
All of this code needs to be revised for IPv6. Some of the revisions are simple. Others take more work and need to know things about, say, the typical and canonical ways of representing IPv6 addresses. Other code may need to be completely rethought from the ground up; for example, I have code that represents IP address ranges as pairs of '(start, end)' integers and supports various operations on them, including 'give me all of the IP addresses in this range set'. This works fine for IPv4 addresses, but the entire data structure may need to be totally redone for IPv6 and certain parts of the API might not make sense any more.
(And then there's the cases where IP addresses are stored in files and retrieved later. They are probably not being stored in large or arbitrary sized fields, so how they are stored may not be able to store IPv6 addresses. So we're looking at database restructuring here, and also restructuring of field validators and so on.)
Then you have all of the stuff that knows how to talk about IP addresses, for example in configuration files for programs. Much of this is likely specific to IPv4 addresses, so both code and specifications will need to be revised for IPv6 addresses. In turn this may ripple through to cause difficulties or require changes to the configuration file language; you may need to make IPv6 addresses accepted with some sort of quoting if your language treats ':' in words specially, for example. All of this involves far more than mechanical code changes and code updates; we're well at the level of system architecture and design, with messy tradeoffs between backwards compatibility and well supported IPv6 addresses.
(Exim famously has a certain amount of heartburn with lists of IPv6 addresses in its configuration files because long ago ':' was chosen as the default list separator character.)
Of course, IP addresses are just the start of the problem; it spirals
off in several directions from them. One direction is IPv6 netblocks
and address ranges; there's kind of a new syntax there, and people
have to rethink configuration files that currently designate ranges
via syntax like '
127.100.0.'. Another one is that there are various
special sorts of IPv6 addresses that your systems may need to be aware
of, like link-local addresses. A third is the broad issue of per-source
ratelimits; a simple limit that's per IPv6 address may not work very
well in an IPv6 environment where people have relatively large subnets
pushed down to their home connections or whatever.
All of this can be done, but it all adds up to a significant amount of work, both in raw programming and in design and architecture to make the right decisions about how systems should look and work in an IPv6 enabled environment. It should be no surprise that progress has been slow overall (and occasionally buggy) and people continue to design, build, and hack together systems that are implicitly or explicitly IPv4 only.
(If you only have to deal with IPv4 today, some of the high level issues may be effectively invisible to you.)