Chris's Wiki :: blog/unix/SortingIPv4Addresses Commentshttps://utcc.utoronto.ca/~cks/space/blog/unix/SortingIPv4Addresses?atomcommentsDWiki2017-05-05T01:17:07ZRecent comments in Chris's Wiki :: blog/unix/SortingIPv4Addresses.By Greg A. Woods on /blog/unix/SortingIPv4Addressestag:CSpace:blog/unix/SortingIPv4Addresses:f79d6482e862ebfcf0d8959c020a84402d2f27d0Greg A. Woods<div class="wikitext"><p>Well on modern hardware any decent version of AWK, including pre-arbitrary-precision GNU awk, will support exact integers in the range of +/- 2^53.</p>
<p>BTW, I do agree 100% with your original sentiment in this posting, i.e. that, if available, 'sort -V' is a cool trick for sorting IPv4 addresses in their normal quad-dotted human representation.</p>
<p>(The most painful thing about using AWK for converting IP addresses is that it doesn't have any bitwise operators.)</p>
</div>2017-05-05T01:17:07ZBy Chris Siebenmann on /blog/unix/SortingIPv4Addressestag:CSpace:blog/unix/SortingIPv4Addresses:5fa5b2623743e4f137768b793397ded53e81571bChris Siebenmann<div class="wikitext"><p>I am personally wary of using awk for anything involving large integers.
Awk theoretically works in floating point numbers, so if the precision
of its floating point numbers is not large enough, problems will ensue
with your large integers. (I actually had this happen to me long ago
with Unix timestamps, which is why it has stuck with me ever since.)</p>
<p>Modern versions of awk may use floating point types large enough to
avoid this for the 32-bit integers that are IPv4 addresses, or they may
be clever enough to keep integer values as actual integers until they
have to be converted to a potentially inexact form, but in either case
I feel that I'm gambling. I would much rather work with something (such
as Python) where I can be completely sure that my integers are staying
exact.</p>
</div>2017-05-04T22:49:38ZBy Greg A. Woods on /blog/unix/SortingIPv4Addressestag:CSpace:blog/unix/SortingIPv4Addresses:c78b4dd3e94fccfbef23f306acfaf2e30f9eab30Greg A. Woods<div class="wikitext"><p>None of the machines I do any sys-admin work on these days (macOS and NetBSD) have "sort -V"</p>
<p>If I'm stuck needing to sort a list of IP addresses on some machine where 'sort' is old/incapable then I'd happily type out the slightly awkward AWK code needed to convert IP addresses to/from integers. Every version of AWK can do it, and every version of 'sort' has '-n'. It's really not that difficult to figure out and type out every time it's needed, though I do tend to be one of those people who will copy such things into shell command aliases, or tiny script files.</p>
<p>Then again I might copy/paste some existing C (or Go) code that does the whole job and more, assuming of course there's a C (or Go) compiler on the target machine, and/or I can generate binaries elsewhere that'll run on it. For this particular task I have used this in the past: <a href="https://github.com/alaska/cidr-convert">https://github.com/alaska/cidr-convert</a></p>
<p>Or I might copy the list to my local machine and sort it, etc., with appropriate tools I'm familiar with. I've been doing that kind of thing, even for relatively large data sets, for literally decades now (ever since my normal working environment gave me the ability to copy and paste between two terminal windows -- which has been far longer than I could normally copy files directly between remote filesystems).</p>
</div>2017-05-04T21:58:12ZBy Aristotle Pagaltzis on /blog/unix/SortingIPv4Addressestag:CSpace:blog/unix/SortingIPv4Addresses:6a8912927a45181f3526ccb517e9b13134a3463cAristotle Pagaltzishttp://plasmasturm.org/<div class="wikitext"><p>The question is how easy it is for the user to ask for sorting by IP, and <code>sort -V</code> plainly beats doing it by hand in the cases where <code>sort -V</code> is a plausible choice.</p>
<p>When you’re SSHed into another machine without your usual <code>~/bin</code> and all you need to do is sort one text file with IP addresses, it’s completely beside the point how easy it is to write a purpose-built converter. Nobody is going to start typing out a program for the job if they can just type <code>sort -V</code>, at least not just because of the waste of making the computer spend 1200 nanoseconds on the job instead of 12.</p>
<p>If you go to the trouble of putting a purpose-built IP sorter in the base system (be it OS vendor upstream or your own local standard OS image or whatever) then sure… it might make sense to use that. But again, once <code>sort -V</code> is already in there, why bother?</p>
<p>Well, maybe because the code is hot enough for the wasted cycles to matter. But that’s not a scenario where <code>sort -V</code> is applicable; you’re not likely to be using the shell then anyway.</p>
<p>Maybe you’re talking about people writing complex sorts in something more powerful than shell, where it’s not typically as easy to express sorting IPs in their string form as “<code>sort -V</code>”. There it’s typically much less code to convert to and sort binary IPs, and in that case you have a very good point.</p>
<p>But neither of these cases are, it seems to me, what Chris was talking about.</p>
</div>2017-05-04T07:12:33ZBy Greg A. Woods on /blog/unix/SortingIPv4Addressestag:CSpace:blog/unix/SortingIPv4Addresses:41c522efd03a78b26fe7917e3c3e1d33317ff25aGreg A. Woods<div class="wikitext"><p>I had never heard of 'sort -V' until today. :-)</p>
<p>(it's definitely not in posix nor in NetBSD, though now FreeBSD has it, and OpenBSD has already taken it from FreeBSD, then it is probably only a matter of time until it is in NetBSD too)</p>
<p>I admit doing the conversion to/from plain integers is a bit of a pain to type in an AWK one-liner every time you need it, but a purpose-built converter is really easy! :-)</p>
<p>Of course it's not all that difficult to get a modern POSIX 'sort' to do it all with appropriate use of '-t', '-n', and '-k' -- though more work to type than "ip2n < file | sort -n | n2ip"</p>
</div>2017-05-03T23:15:27ZBy Chris Siebenmann on /blog/unix/SortingIPv4Addressestag:CSpace:blog/unix/SortingIPv4Addresses:2b81ce7cf189347cbdeebe3bf0307bd183620623Chris Siebenmann<div class="wikitext"><p>'<code>sort -V</code>' is already there and presumably generally useful for more
than IPv4 addresses. If it's inherited from <code>ls</code> as the coreutils
documentation suggests, sorting IPv4 addresses is probably not its
common usage case in general.</p>
<p>If I was sorting ten million IPv4 addresses on a frequent basis I might
write custom code. But if I had ten million IPv4 addresses that I needed
to sort on a regular basis, they would probably not be held in text at
all. <a href="https://xkcd.com/1205/">The relevant XKCD</a> applies here (<a href="https://xkcd.com/1319/">and
also</a>).</p>
</div>2017-05-03T18:42:53ZBy Greg A. Woods on /blog/unix/SortingIPv4Addressestag:CSpace:blog/unix/SortingIPv4Addresses:364217c2ecf6d4099ca84ab3604510c6bf531331Greg A. Woods<div class="wikitext"><p>Yes, but why would you work so hard, and make the computer work so hard, to do complex multi-column numeric sorts when it is (nearly) trivial to convert to 32-bit internal numeric form, do the sort, and then convert back to quad-dotted form?</p>
</div>2017-05-03T18:16:18ZBy Chris Siebenmann on /blog/unix/SortingIPv4Addressestag:CSpace:blog/unix/SortingIPv4Addresses:56eb0b5d0c434b48ddc48fa5c674dcbfe64f1d3dChris Siebenmann<div class="wikitext"><p>There are a lot of situations where we maintain lists of IPv4
addresses in their textual representations; one obvious case is firewall
configurations. Representing these in plain text and sorting them makes
it much easier for people to keep track of what is and isn't included
and to update things in a comprehensible way.</p>
<p>(I assume that when these firewall configurations are loaded into memory
by the kernel, the IP addresses and CIDR ranges are encoded into binary
and maintained in an efficient data structure, but that's not what I
care about when it comes to configuration files. Configuration files are
for people first and computers second, just like programming languages.)</p>
</div>2017-05-03T17:26:52ZBy Greg A. Woods on /blog/unix/SortingIPv4Addressestag:CSpace:blog/unix/SortingIPv4Addresses:b4e905fdf06a46fdabec0c8656e0bff2abf5cba5Greg A. Woods<div class="wikitext"><p>I continue to be amazed and dismayed by the number of people who seem to want to sort IPv4 addresses directly in their human-readable printed character representation using a complex multi-field numeric sort instead of in their natural internal binary representation. The overhead of the former method is excessive when compared to the latter, even including conversion from and to the human readable form as may be needed for the latter, even if you do it twice into another temporary file to be able to use "sort" on the command line or in a script. :-)</p>
</div>2017-05-03T17:14:58Z