Some versions of sort can easily sort IPv4 addresses into natural order

April 29, 2017

Every so often I need to deal with a bunch of IPv4 addresses, and it's most convenient (and best) to have them sorted into what I'll call their natural ascending order. Unfortunately for sysadmins, the natural order of IPv4 addresses is not their lexical order (ie what sort will give you), unless you zero-pad all of their octets. In theory you can zero pad IPv4 addresses if you want, turning 58.172.99.1 into 058.172.099.001, but this form has two flaws; it looks ugly and it doesn't work with a lot of tools.

(Some tools will remove the zero padding, some will interpret zero-padded octets as being in octal instead of decimal, and some will leave the leading zeros on and not work at all; dig -x is one interesting example of the latter. In practice, there are much better ways to deal with this problem and people who zero-pad IPv4 addresses need to be politely corrected.)

Fortunately it turns out that you can get many modern versions of sort to sort plain IPv4 addresses in the right order. The trick is to use its -V argument, which is also known as --version-sort in at least GNU coreutils. Interpreting IPv4 addresses as version numbers is basically exactly what we want, because an all-numeric MAJOR.MINOR.PATCH.SUBPATCH version number sorts in exactly the same way that we want an IPv4 A.B.C.D address to sort.

Unfortunately as far as I know there is no way to sort IPv6 addresses into a natural order using common shell tools. The format of IPv6 addresses is so odd and unusual that I expect we're always going to need a custom program for it, although perhaps someday GNU Sort will grow the necessary superintelligence.

This is a specific example of the kind of general thinking that you need in order to best apply Unix shell tools to your problems. It's quite helpful to always be on the lookout for ways that existing features can be reinterpreted (or creatively perverted) in order to work on your problems. Here we've realized that sort's idea of 'version numbers' includes IPv4 addresses, because from the right angle both they and (some) version numbers are just dot-separated sequences of numbers.

PS: with brute force, you can use any version of sort that supports -t and -k to sort IPv4 addresses; you just need the right magic arguments. I'll leaving working them out (or doing an Internet search for them) as an exercise for the reader.

PPS: for the gory details of how GNU sort treats version sorting, see the Gnu sort manual's section on details about version sort. Okay, technically it's ls's section on version sorting. Did you know that GNU coreutils ls can sort filenames partially based on version numbers? I didn't until now.

(This is a more verbose version of this tweet of mine, because why should I leave useful stuff just on Twitter.)

Sidebar: Which versions of sort support this

When I started writing this entry, I assumed that sort -V was a GNU coreutils extension and would only be supported by the GNU coreutils version. Unixes with other versions (or with versions that are too old) would be out of luck. This doesn't actually appear to be the case, to my surprise.

Based on the GNU Coreutils NEWS file, it appears that 'sort -V' appeared in GNU coreutils 7.0 or 7.1 (in late 2008 to early 2009). The GNU coreutils sort is used by most Linux distributions, including all of the main ones, and almost anything that's modern enough to be getting security updates should have a version of GNU sort that is recent enough to include this.

Older versions of FreeBSD appear to use an old version of GNU coreutils sort; I have access to a FreeBSD 9.3 machine that reports that /usr/bin/sort is GNU coreutils sort 5.3.0 (from 2004, apparently). Current versions of FreeBSD and OpenBSD have switched to their own version of sort, known as version '2.3-FreeBSD', but this version of sort also supports -V (I think the switch happened in FreeBSD 10, because a FreeBSD 10.3 machine I have access to reports this version). Exactly how -V orders things is probably somewhat different between GNU coreutils sort and FreeBSD/OpenBSD sort, but it doesn't matter for IPv4 addresses.

The Illumos /usr/bin/sort is very old, but I know that OmniOS ships /usr/gnu/bin/sort as standard and really you want /usr/gnu/bin early in your $PATH anyways. Life is too short to deal with ancient Solaris tool versions with ancient limitations.


Comments on this page:

By Greg A. Woods at 2017-05-03 13:14:58:

I continue to be amazed and dismayed by the number of people who seem to want to sort IPv4 addresses directly in their human-readable printed character representation using a complex multi-field numeric sort instead of in their natural internal binary representation. The overhead of the former method is excessive when compared to the latter, even including conversion from and to the human readable form as may be needed for the latter, even if you do it twice into another temporary file to be able to use "sort" on the command line or in a script. :-)

By cks at 2017-05-03 13:26:52:

There are a lot of situations where we maintain lists of IPv4 addresses in their textual representations; one obvious case is firewall configurations. Representing these in plain text and sorting them makes it much easier for people to keep track of what is and isn't included and to update things in a comprehensible way.

(I assume that when these firewall configurations are loaded into memory by the kernel, the IP addresses and CIDR ranges are encoded into binary and maintained in an efficient data structure, but that's not what I care about when it comes to configuration files. Configuration files are for people first and computers second, just like programming languages.)

By Greg A. Woods at 2017-05-03 14:16:18:

Yes, but why would you work so hard, and make the computer work so hard, to do complex multi-column numeric sorts when it is (nearly) trivial to convert to 32-bit internal numeric form, do the sort, and then convert back to quad-dotted form?

By cks at 2017-05-03 14:42:53:

'sort -V' is already there and presumably generally useful for more than IPv4 addresses. If it's inherited from ls as the coreutils documentation suggests, sorting IPv4 addresses is probably not its common usage case in general.

If I was sorting ten million IPv4 addresses on a frequent basis I might write custom code. But if I had ten million IPv4 addresses that I needed to sort on a regular basis, they would probably not be held in text at all. The relevant XKCD applies here (and also).

By Greg A. Woods at 2017-05-03 19:15:27:

I had never heard of 'sort -V' until today. :-)

(it's definitely not in posix nor in NetBSD, though now FreeBSD has it, and OpenBSD has already taken it from FreeBSD, then it is probably only a matter of time until it is in NetBSD too)

I admit doing the conversion to/from plain integers is a bit of a pain to type in an AWK one-liner every time you need it, but a purpose-built converter is really easy! :-)

Of course it's not all that difficult to get a modern POSIX 'sort' to do it all with appropriate use of '-t', '-n', and '-k' -- though more work to type than "ip2n < file | sort -n | n2ip"

The question is how easy it is for the user to ask for sorting by IP, and sort -V plainly beats doing it by hand in the cases where sort -V is a plausible choice.

When you’re SSHed into another machine without your usual ~/bin and all you need to do is sort one text file with IP addresses, it’s completely beside the point how easy it is to write a purpose-built converter. Nobody is going to start typing out a program for the job if they can just type sort -V, at least not just because of the waste of making the computer spend 1200 nanoseconds on the job instead of 12.

If you go to the trouble of putting a purpose-built IP sorter in the base system (be it OS vendor upstream or your own local standard OS image or whatever) then sure… it might make sense to use that. But again, once sort -V is already in there, why bother?

Well, maybe because the code is hot enough for the wasted cycles to matter. But that’s not a scenario where sort -V is applicable; you’re not likely to be using the shell then anyway.

Maybe you’re talking about people writing complex sorts in something more powerful than shell, where it’s not typically as easy to express sorting IPs in their string form as “sort -V”. There it’s typically much less code to convert to and sort binary IPs, and in that case you have a very good point.

But neither of these cases are, it seems to me, what Chris was talking about.

By Greg A. Woods at 2017-05-04 17:58:12:

None of the machines I do any sys-admin work on these days (macOS and NetBSD) have "sort -V"

If I'm stuck needing to sort a list of IP addresses on some machine where 'sort' is old/incapable then I'd happily type out the slightly awkward AWK code needed to convert IP addresses to/from integers. Every version of AWK can do it, and every version of 'sort' has '-n'. It's really not that difficult to figure out and type out every time it's needed, though I do tend to be one of those people who will copy such things into shell command aliases, or tiny script files.

Then again I might copy/paste some existing C (or Go) code that does the whole job and more, assuming of course there's a C (or Go) compiler on the target machine, and/or I can generate binaries elsewhere that'll run on it. For this particular task I have used this in the past: https://github.com/alaska/cidr-convert

Or I might copy the list to my local machine and sort it, etc., with appropriate tools I'm familiar with. I've been doing that kind of thing, even for relatively large data sets, for literally decades now (ever since my normal working environment gave me the ability to copy and paste between two terminal windows -- which has been far longer than I could normally copy files directly between remote filesystems).

By cks at 2017-05-04 18:49:38:

I am personally wary of using awk for anything involving large integers. Awk theoretically works in floating point numbers, so if the precision of its floating point numbers is not large enough, problems will ensue with your large integers. (I actually had this happen to me long ago with Unix timestamps, which is why it has stuck with me ever since.)

Modern versions of awk may use floating point types large enough to avoid this for the 32-bit integers that are IPv4 addresses, or they may be clever enough to keep integer values as actual integers until they have to be converted to a potentially inexact form, but in either case I feel that I'm gambling. I would much rather work with something (such as Python) where I can be completely sure that my integers are staying exact.

By Greg A. Woods at 2017-05-04 21:17:07:

Well on modern hardware any decent version of AWK, including pre-arbitrary-precision GNU awk, will support exact integers in the range of +/- 2^53.

BTW, I do agree 100% with your original sentiment in this posting, i.e. that, if available, 'sort -V' is a cool trick for sorting IPv4 addresses in their normal quad-dotted human representation.

(The most painful thing about using AWK for converting IP addresses is that it doesn't have any bitwise operators.)

Written on 29 April 2017.
« Hardware RAID and the problem of (not) observing disk IO
Do we want to standardize the size of our root filesystems on servers? »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Apr 29 01:26:50 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.