Updating software to IPv6 is often harder than you might think

January 10, 2016

A while back D.J. Bernstein wrote was is now a famous rant about IPv6. Due to various things, this DJB article is on my mind and today I want to talk about one part of it that DJB casually handwaves, which is updating all software to support IPv6.

The obvious problem with software is that most of the traditional system APIs have specified IP addresses in fixed-size objects and with explicit, fixed types. Very little software has been written using generic APIs and variable-sized addresses, where you could just drop the bigger IPv6 addresses in without trouble; instead a lot of software knows that it is talking IPv4 with addresses that take up 4 bytes. Such software cannot just be handed IPv6 addresses, because they overflow the space and various things would malfunction. Instead systems have been required to define an entirely new and larger 'address family' for IPv6 and then software has had to be updated to support it along side IPv4.

The first complication emerges here: not only do you need a new address family, you need new APIs that can accept and return the new address family. Sometimes you need the new APIs because old APIs were defined only as returning 4-byte IPv4 addresses; sometimes you need new APIs because tons of people wrote tons of code that just assumed old APIs only ever return 4-byte IPv4 addresses.

(You could break all that code, but that would be a recipe for a ton of bugs for years. Let's not go there.)

But the larger problem is that IP addresses don't confine themselves just to the networking layer of programs where they get handled by generic system APIs that you can make cope with the new IPv6 addresses. Instead, in many important programs IP(v4) addresses ripple through all sorts of other code. This code may represent them in all sorts of ways and it may do all sorts of things to manipulate them, things that 'know' various facts that are only true for IPv4 addresses. For instance, I have several sets of code in several different languages that know how to make DNS blocklist queries given an 'IP address'. Depending on the language, it may split a string at every '.' or it may take four bytes in a specific order from a 32-bit integer, or even a byte addr[4] array.

(Some of this code may be at some distance from actual network programs. Consider code that attempts to process web server log files and do things like aggregate traffic by network regions, or even just tell when the logs have IP addresses instead of hostnames.)

All of this code needs to be revised for IPv6. Some of the revisions are simple. Others take more work and need to know things about, say, the typical and canonical ways of representing IPv6 addresses. Other code may need to be completely rethought from the ground up; for example, I have code that represents IP address ranges as pairs of '(start, end)' integers and supports various operations on them, including 'give me all of the IP addresses in this range set'. This works fine for IPv4 addresses, but the entire data structure may need to be totally redone for IPv6 and certain parts of the API might not make sense any more.

(And then there's the cases where IP addresses are stored in files and retrieved later. They are probably not being stored in large or arbitrary sized fields, so how they are stored may not be able to store IPv6 addresses. So we're looking at database restructuring here, and also restructuring of field validators and so on.)

Then you have all of the stuff that knows how to talk about IP addresses, for example in configuration files for programs. Much of this is likely specific to IPv4 addresses, so both code and specifications will need to be revised for IPv6 addresses. In turn this may ripple through to cause difficulties or require changes to the configuration file language; you may need to make IPv6 addresses accepted with some sort of quoting if your language treats ':' in words specially, for example. All of this involves far more than mechanical code changes and code updates; we're well at the level of system architecture and design, with messy tradeoffs between backwards compatibility and well supported IPv6 addresses.

(Exim famously has a certain amount of heartburn with lists of IPv6 addresses in its configuration files because long ago ':' was chosen as the default list separator character.)

Of course, IP addresses are just the start of the problem; it spirals off in several directions from them. One direction is IPv6 netblocks and address ranges; there's kind of a new syntax there, and people have to rethink configuration files that currently designate ranges via syntax like '127.100.0.'. Another one is that there are various special sorts of IPv6 addresses that your systems may need to be aware of, like link-local addresses. A third is the broad issue of per-source ratelimits; a simple limit that's per IPv6 address may not work very well in an IPv6 environment where people have relatively large subnets pushed down to their home connections or whatever.

All of this can be done, but it all adds up to a significant amount of work, both in raw programming and in design and architecture to make the right decisions about how systems should look and work in an IPv6 enabled environment. It should be no surprise that progress has been slow overall (and occasionally buggy) and people continue to design, build, and hack together systems that are implicitly or explicitly IPv4 only.

(If you only have to deal with IPv4 today, some of the high level issues may be effectively invisible to you.)

(I've written other stuff about the problems I see with DJB's IPv6 migration ideas in earlier entries.)

Written on 10 January 2016.
« The convenience of having keyboard controls for sound volume
The benefits of flexible space usage in filesystems »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jan 10 03:30:04 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.