2008-02-25
The best way to shroud IP addresses
I wrote before about the limitations of hashes as privacy protectors, which leaves us with a not entirely theoretical question: suppose that you have to create a traffic accounting system, but you don't want to keep too much logs. What's the best way to protect user privacy in this situation?
Our answer is to drop the first octet of the IP address. This gives much better privacy protection than dropping the last octet or even the last two octets, because it cuts against how IP address space is organized.
(One way to put this is that it often hides much more information to drop the significant digit than the least significant digit.)
This may not meet everyone's needs, but our particular circumstance is that we don't really care about traffic volume ourselves; we just need to be able to identify the users involved when the campus NOC contacts us with reports of high bandwidth usage. The NOC reports have the full off-campus IP addresses involved, so we can look for high traffic volume to the obscured version of the IP address.
(Disclaimer: my co-workers came up with this idea, not me. I'm just sharing it because I think it's a useful approach to the problem.)
You do lose information with this, but then you lose information with any scheme that preserves user privacy; we could not do things like conclusively identify users that had visited an infected website (or who had zombied machines that were communicating with some master point). If we needed to do that sort of thing on a one-shot basis we would probably set up additional monitoring; if we needed to do it on an ongoing basis we would have to rethink the system.
(I suspect we'd wind up with hashed IP addresses.)