2010-07-29
Some brief notes on OpenSSH's known_hosts hashing
A number of current distributions of OpenSSH default to storing host
names and IP addresses in ~/.ssh/known_hosts
in a hashed form,
in order to make it harder for an intruder to work out where else
you have an account on that you access from this system (this is the
HashKnownHosts
option for ssh
). Since I recently wound up digging
into this and the details are underdocumented, here's what I know about
how this works.
The summary is that this is your traditional one-way cryptographic hash. The specific hash is a SHA1-based HMAC, but I strongly suggest not writing any code that knows that. The host name or IP address is treated like a password and hashed together with a random salt; both the salt and the HMAC result are stored in the known_hosts line. Matching the line later is done by extracting the salt, HMAC'ing your candidate hostname with it, and seeing if you got the same encrypted result.
(The salt appears to have relatively strong randomness.)
This means that checking to see if a particular host is present in a known_hosts file requires computing a separate HMAC for each line in the file. I imagine that this is not a problem in practice since most people have relatively short known_hosts files and SHA1 HMAC is relatively fast. As with unencrypted hostnames, it's possible to have multiple entries for a given host in known_hosts, each with a different key; if all of the hostnames are hashed, this may not be at all obvious.
(See sshd(8)
for how multiple entries for a single host work. The
short answer is that OpenSSH considers itself to have found a known host
key if any of them match.)
This all means that hashed known_hosts files are system independent and will continue working fine when moved to a different host.
(As it turned out, the problem I was seeing was because my new test system had a different system known hosts file. Once I fixed that, everything worked, but I almost went off on a complete wild goose chase worrying about potential system dependent hashing of known_hosts. Having a hashed known_hosts did make it less obvious that the other host's key wasn't even in it, though.)