Wandering Thoughts


My current views on using OpenSSH with CA-based host and user authentication

Recent versions of OpenSSH have support for doing host and user authentication via a local CA. Instead of directly listing trusted public keys, you configure a CA and then trust anything signed by the CA. This is explained tersely primarily in the ssh-keygen manpage and at somewhat more length in articles like How to Harden SSH with Identities and Certificates (via, via a comment by Patrick here). As you might guess, I have some opinions on this.

I'm fine with using CA certs to authenticate hosts to users (especially if OpenSSH still saves the host key to your known_hosts, which I haven't tested), because the practical alternative is no initial authentication of hosts at all. Almost no one verifies the SSH keys of new hosts that they're connecting to, so signing host keys and then trusting the CA gives you extra security even in the face of the fundamental problem with the basic CA model.

I very much disagree with using CA certs to sign user keypairs and authenticate users system-wide because it has the weakness of the basic CA model, namely you lose the ability to know what you're trusting. What keys have access? Well, any signed by this CA cert with the right attributes. What are those? Well, you don't know for sure that you know all of them. This is completely different from explicit lists of keys, where you know exactly what you're trusting (although you may not know who has access to those keys).

Using CA certs to sign user keypairs is generally put forward as a solution to the problem of distributing and updating explicit lists of them. However this problem already has any number of solutions, for example using sshd's AuthorizedKeysCommand to query a LDAP directory (see eg this serverfault question). If you're worried about the LDAP server going down, there are workarounds for that. It's difficult for me to come up with an environment where some solution like this isn't feasible, and such solutions retain the advantage that you always have full control over what identities are trusted and you can reliably audit this.

(I would not use CA-signed host keys as part of host-based authentication with /etc/shosts.equiv. It suffers from exactly the same problem as CA-signed user keys; you can never be completely sure what you're trusting.)

Although it is not mentioned much or well documented, you can apparently set up a personal CA for authentication via a cert-authority line in your authorized_keys. I think that this is worse than simply having normal keys listed, but it is at least less damaging than doing it system-wide and you can make an argument that this enables useful security things like frequent key rollover, limited-scope keys, and safer use of keys on potentially exposed devices. If you're doing these, maybe the security improvements are worth being exposed to the CA key-issue risk.

(The idea is that you would keep your personal CA key more or less offline; periodically you would sign a new moderate-duration encrypted keypair and transport them to your online devices via eg a USB memory stick. Restricted-scope keys would be done with special -n arguments to ssh-keygen and then appropriate principals= requirements in your authorized_keys on the restricted systems. There are a bunch of tricks you could play here.)

Sidebar: A CA restriction feature I wish OpenSSH had

It would make me happier with CA signing if you could set limits on the duration of (signed) keys that you'd accept. As it stands right now, it is only ssh-keysign with the CA that enforces any expiry on signed keys; if you can persuade the CA to sign with a key-validity period of ten years, well, you've got a key that's good for ten years unless it gets detected and revoked. It would be better if the consumer of the signed key could say 'I will only accept signatures with a maximum validity period of X weeks', 'I will only accept signatures with a start time after Y', and so on. All of these would act to somewhat limit the damage from a one-time CA key issue, whether or not you detected it.

sysadmin/SSHWithCAAuthenticationViews written at 01:07:55; Add Comment


The fundamental practical problem with the Certificate Authority model

Let's start with my tweet:

This is my sad face when people sing the praises of SSH certificates and a SSH CA as a replacement for personal SSH keypairs.

There is nothing in specific wrong with the OpenSSH CA model. Instead it simply has the fundamental problem of the basic CA model.

The basic Certificate Authority model is straightforward: you have a CA, it signs things, and you accept that the CA's signature on those things is by itself an authorization. TLS is the most widely known protocol with CAs, but as we see here the CA model is used elsewhere as well. This is because it's an attractive model, since it means you can distribute a single trusted object instead of many of them (such as TLS certificates or SSH personal public keys).

The fundamental weakness of the CA model in practice is that keeping the basic CA model secure requires that you have perfect knowledge of all keys issued. This is provably false in the case of breaches; in the case of TLS CAs, we have repeatedly seen CAs that do not know all the certificates they mis-issued. Let me repeat that louder:

The fundamental security requirement of the basic CA model is false in practice.

In general, at the limits, you don't know all of the certificates that your CA system has signed nor do you know whether any unauthorized certificates exist. Any belief otherwise is merely mostly or usually true.

Making a secure system that uses the CA model means dealing with this. Since TLS is the best developed and most attacked CA-based protocol, it's no surprise that it has confronted this problem straight on in the form of OCSP. Simplified, OCSP creates an additional affirmative check that the CA actually knows about a particular certificate being used. You can argue about whether or not it's a good idea for the web and it does have some issues, but it undeniably deals with the fundamental problem; a certificate that's unknown to the CA can be made to fail.

Any serious CA based system needs to either deal with this fundamental practical problem or be able to explain why it is not a significant security exposure in the system's particular environment. Far too many of them ignore it instead and opt to just handwave the issue and assume that you have perfect knowledge of all of the certificates your CA system has signed.

(Some people say 'we will keep our CA safe'. No you won't. TLS CAs have at least ten times your budget for this and know that failure is a organization-ending risk, and they still fail.)

(I last wrote about this broad issue back in 2011, but I feel the need to bang the drum some more and spell things out more strongly this time around. And this time around SSL/TLS CAs actually have a relatively real fix in OCSP.)

Sidebar: Why after the fact revocation is no fix

One not uncommon answer is 'we'll capture the identifiers of all certificates that get used and when we detect a bad one, we'll revoke it'. The problem with this is that it is fundamentally reactive; by the time you see the identifier of a new bad certificate, the attacker has already been able to use it at least once. After all, until you see the certificate, identify it as bad, and revoke it, the system trusts it.

tech/CAFundamentalProblem written at 02:12:58; Add Comment


Old Unix filesystems and byte order

It all started with a tweet by @JeffSipek:

illumos/solaris UFS don't use a fixed byte order. SPARC produces structs in BE, x86 writes them out in LE. I was happier before I knew this.

As they say, welcome to old time Unix filesystems. Solaris UFS is far from the only filesystem defined this way; in fact, most old time Unix filesystems are probably defined in host byte order.

Today this strikes us as crazy, but that's because we now exist in a quite different hardware environment than the old days had. Put simply, we now exist in a world where storage devices both can be moved between dissimilar systems and are. In fact, it's an even more radical world than that; it's a world where almost everyone uses the same few storage interconnect technologies and interconnects are common between all sorts of systems. Today we take it for granted that how we connect storage to systems is through some defined, vendor neutral specification that many people implement, but this was not at all the case originally.

(There are all sorts of storage standards: SATA, SAS, NVMe, USB, SD cards, and so on.)

In the beginning, storage was close to 100% system specific. Not only did you not think of moving a disk from a Vax to a Sun, you probably couldn't; the entire peripheral interconnect system was almost always different, from the disk to host cabling to the kind of backplane that the controller boards plugged into. Even as some common disk interfaces emerged, larger servers often stayed with faster proprietary interfaces and proprietary disks.

(SCSI is fairly old as a standard, but it was also a slow interface for a long time so it didn't get used on many servers. As late as the early 1990s it still wasn't clear that SCSI was the right choice.)

In this environment of system specific disks, it was no wonder that Unix kernel programmers didn't think about byte order issues in their on disk data structures. Just saying 'everything is in host byte order' was clearly the simplest approach, so that's what people by and large did. When vendors started facing potential bi-endian issues, they tried very hard to duck them (I think that this was one reason endian-switchable RISCs were popular designs).

In theory, vendors could have decided to define their filesystems as being in their current endianness before they introduced another architecture with a different endianness (here Sun, with SPARC, would have defined UFS as BE). In practice I suspect that no vendor wanted to go through filesystem code to make it genuinely fixed endian. It was just simpler to say 'UFS is in host byte order and you can't swap disks between SPARC Solaris and x86 Solaris'.

(Since vendors did learn, genuinely new filesystems were much more likely to be specified as having a fixed and host-independent byte order. But filesystems like UFS trace their roots back a very long way.)

unix/OldFilesystemByteOrder written at 23:04:43; Add Comment

Clearing SMART disk complaints, with safety provided by ZFS

Recently, my office machine's smartd began complaining about problems on one of my drives (again):

Device: /dev/sdc [SAT], 5 Currently unreadable (pending) sectors
Device: /dev/sdc [SAT], 5 Offline uncorrectable sectors

As it happens, I was eventually able to make all of these complaints go away (I won't say I fixed the problem, because the disk is undoubtedly still slowly failing). This took a number of steps and some of them were significantly helped by ZFS on Linux.

(For background, this disk is one half of a mirrored pair. Most of it is in a ZFS pool; the rest is in various software RAID mirrors.)

My steps:

  1. Scrub my ZFS pool, in the hopes that this would make the problem go away like the first iteration of smartd complaints. Unfortunately I wasn't so lucky this time around, but the scrub did verify that all of my data was intact.

  2. Use dd to read all of the partitions of the disk (one after another) in order to try to find where the bad spots were. This wound up making four of the five problem sectors just quietly go away and did turn up a hard read error in one partition. Fortunately or unfortunately it was my ZFS partition.

    The resulting kernel complaints looked like:

    blk_update_request: I/O error, dev sdc, sector 1362171035
    Buffer I/O error on dev sdc, logical block 170271379, async page read

    The reason that a ZFS scrub did not turn up a problem was that ZFS scrubs only check allocated space. Presumably the read error is in unallocated space.

  3. Use the kernel error messages and carefully iterated experiments with dd's skip= argument to make sure I had the right block offset into /dev/sdc, ie the block offset that would make dd immediately read that sector.

  4. Then I tried to write zeroes over just that sector with 'dd if=/dev/zero of=/dev/sdc seek=... count=1'. Unfortunately this ran into a problem; for some reason the kernel felt that this was a 4k sector drive, or at least that it had to do 4k IO to /dev/sdc. This caused it to attempt to do a read-modify-write cycle, which immediately failed when it tried to read the 4k block that contained the bad sector.

    (The goal here was to force the disk to reallocate the bad sector into one of its spare sectors. If this reallocation failed, I'd have replaced the disk right away.)

  5. This meant that I needed to do 4K writes, not 512 byte writes, which meant that I needed the right offset for dd in 4K units. This was handily the 'logical block' from the kernel error message, which I verified by running:

    dd if=/dev/sdc of=/dev/null bs=4k skip=170271379 count=1

    This immediately errored out with a read error, which is what I expected.

  6. Now that I had the right 4K offset, I could write 4K of /dev/zero to the right spot. To really verify that I was doing (only) 4K of IO and to the right spot, I ran dd under strace:

    strace dd if=/dev/zero of=/dev/sdc bs=4k seek=170271379 count=1

  7. To verify that this dd had taken care of the problem, I redid the dd read. This time it succeeded.

  8. Finally, to verify that writing zeroes over a bit of one side of my ZFS pool had only gone to unallocated space and hadn't damaged anything, I re-scrubbed the ZFS pool.

ZFS was important here because ZFS checksums meant that writing zeroes over bits of one pool disk was 'safe', unlike with software RAID, because if I hit any in-use data ZFS would know that the chunk of 0 bytes was incorrect and fix it up. With software RAID I guess I'd have had to carefully copy the data from the other side of the software RAID, instead of just using /dev/zero.

By the way, I don't necessarily recommend this long series of somewhat hackish steps. In an environment with plentiful spare drives, the right answer is probably 'replace the questionable disk entirely'. It happens that we don't have lots of spare drives at this moment, plus I don't have enough drive bays in my machine to make this at all convenient right now.

(Also, in theory I didn't need to clear the SMART warnings at all. In practice the Fedora 23 smartd whines incessantly about this to syslog at a very high priority, which causes one of my windows to get notifications every half hour or so and I just couldn't stand it any more. It was either shut up smartd somehow or replace the disk. Believe it or not, all these steps seemed to be the easiest way to shut up smartd. It worked, too.)

linux/ClearingSMARTComplaints written at 00:51:13; Add Comment


Your SSH keys are a (potential) information leak

One of the things I've decided I want to do to improve my SSH security is to stop offering my keys to basically everything. Right now, I have a general keypair that I use on most machines; as a result of using it so generally, I have it set up as my default identity and I offer it to everything I connect to. There's no particular reason for this, it's just the most convenient way to configure OpenSSH.

Some people will ask what the harm is in offering my public key to everything; after all, it is a public key. Some services even publish the public key you've registered with them (Github is one example). You can certainly cite CVE-2016-0777 here, but there's a broader issue. Because of how the SSH protocol works, giving your SSH public key to someone is a potential information leak that they can use to conduct reconnaissance against your hosts.

As we've seen, when a SSH client connects to a server it sends the target username and then offers a series of public keys. If the current public key can be used to authenticate the username, the server will send back a challenge (to prove that you control the key); otherwise, it will send back a 'try the next one' message. So once you have some candidate usernames and some harvested public keys, you can probe other servers to see if the username and public key are valid. If they are valid, the server will send you a challenge (which you will have to fail, since you don't have the private key); if they are not, you will get a 'try the next one' message. When you get a challenge response from the server, you've learned both a valid username on the server and a potential key to target. In some situations, both of these are useful information.

(If the server rejects all your keys, it could be either that none of them are authorized keys for the account (at least from your IP) or that the username doesn't even exist.)

How do people get your SSH public keys if you offer them widely? Well, by getting you to connect to a SSH server that has been altered to collect and log all of them. This server could be set up in the hopes that you'll accidentally connect to it through a name typo, or it could simply be set up to do something attractive ('ssh to my demo server to see ...') and then advertised.

(People have even set up demonstration servers specifically to show that keys leak. I believe this is usually done by looking up your Github username based on your public key.)

(Is this a big risk to me? No, not particularly. But I like to make little security improvements every so often, partly just to gain experience with them. And I won't deny that CVE-2016-0777 has made me jumpy about this area.)

tech/SSHKeysAreInfoLeak written at 03:25:21; Add Comment


You can have many matching stanzas in your ssh_config

When I started writing my ssh_config, years and years ago, I basically assumed that how you used it was that you had a 'Host *' stanza that set defaults and then for each host you might have a specific 'Host <somehost>' stanza (perhaps with some wildcards to group several hosts together). This is the world that looks like:

Host *
   StrictHostKeyChecking no
   ForwardX11 no
   Compression on

Host github.com
   IdentityFile /u/cks/.ssh/ids/github

And so on (maybe with a generic identity in the default stanza).

What I have only belatedly and slowly come to understand is that stanzas in ssh_config do not have to be used in just this limited way. Any number of stanzas can match and apply settings, not just two of them, and you can exploit this to do interesting things in your ssh_config, including making up for a limitation in the pattern matching that Host supports.

As the ssh_config manpage says explicitly, the first version of an option encountered is the one that's used. Right away this means that you may want to have two 'Host *' stanzas, one at the start to set options that you never, ever want overridden, and one at the end with genuine defaults that other entries might want to override. Of course you can have more 'Host *' stanzas than this; for example, you could have a separate stanza for experimental settings (partly to keep them clearly separate, and partly to make them easy to disable by just changing the '*' to something that won't match).

Another use of multiple stanzas is to make up for an annoying limitation of the ssh_config pattern matching. Here's where I present the setup first and explain it later:

Host *.cs *.cs.toronto.edu
  [some options]

Host * !*.*
  [the same options]

Here what I really want is a single Host stanza that applies to 'a hostname with no dots or one in the following (sub)domains'. Unfortunately the current pattern language has no way of expressing this directly, so instead I've split it into two stanzas. I have to repeat the options I'm setting, but this is tolerable if I care enough.

(At this point one might suggest that CanonicalizeHostname could be the solution instead. For reasons beyond the scope of this entry I prefer for ssh leave this to my system's resolver.)

There are undoubtedly other things one can do with multiple Host entries (or multiple Match entries) once you free yourself from the shackles of thinking of them only as default settings plus host specific settings. I know I have to go through my .ssh/config and the ssh_config manpage with an eye to what I can do here.

sysadmin/SSHConfigMultipleStanzas written at 01:19:39; Add Comment


Some notes on SMF manifests (on OmniOS) and what goes in them

Recently, I needed to create a SMF manifest to run a script at boot. In most init systems, this is simple. SMF is not most init systems. SMF requires services (including scripts run at boot) to be defined in XML manifests. Being XML, they are verbose and picky, but fortunately there are some good general guidelines on what goes in them; the one I started from is Ben Rockwood's An SMF Manifest Cheatsheet. But there are a number of things it didn't say explicitly (or at all) that I had to find out the hard way, so here's some notes.

First, on OmniOS you'll find most existing SMF manifests under /lib/svc/manifest, especially /lib/svc/manifest/system. If you get confused or puzzled on how to do something, itt's very worth raiding these files for examples.

What both Ben Rockwood's writeup and the documentation neglects to mention is that there is a fixed order of elements in the SMF manifest. The manifest is not just an (XML) bag of properties; the elements need to come in a relatively specific order. You can get all sorts of puzzling and annoying errors from 'svccfg validate' if you don't know this.

(The error messages probably make total sense to people who understand XML DTD validation. I am not such a person.)

For just running a script, everyone seems to set things so there is only a single instance of your SMF service and it's auto-created:

<create_default_instance enabled='false' />

(This comes right after the opening <service> tag.)

There is probably an art to picking your SMF dependencies. I went for overkill; in order to get my script run right at the end of boot, I specified /system/filesystem/local, /milestone/multi-user, /milestone/network, and for local reasons /network/iscsi/initiator. 'svcs' defaults to listing services in start order, so you can use that to fish around for likely dependencies. Or you can look at what similar system SMF services use.

(It turns out that you can put multiple FMRIs in a single <dependency> tag, so my SMF manifest is more verbose than it needs to be. They need to have the same grouping, restart_on, and type, but this is probably not uncommon.)

Although you might think otherwise, even a single-shot script needs to have a 'stop' <exec_method> defined, even if it does nothing. The one that services seem to use is:

<exec_method type='method'
             timeout_seconds='3' />

The timeout varies but I suspect it's not important. Omitting this will cause your SMF manifest to fail validation.

If you just want to run a script from your SMF service, you need what is called a 'transient' service. How you specify that your service is a transient one is rather obscure, because it is not something you set in the overall service description or in the 'start' exec_method (where you might expect it to live). Instead it's done this way:

<property_group name='startd' type='framework'>
    <propval name='duration' type='astring' value='transient' />

This is directions for svc.startd, which is responsible for starting and restarting SMF services. You can thus find some documentation for it in the svc.startd manpage, if you already understand enough about SMF XML manifests to know how to write properties.

(Since it is an add-on property, not a fundamental SMF XML attribute, it is not to be found anywhere in the SMF DTD. Isn't it nice that the SMF documentation points you to the SMF DTD for these things? No, not particularly.)

Some documentation will suggest giving your SMF service a name in the /site/ overall namespace. I suggest using an organizational name of some sort instead, because that way you know that a particular service came from you and was not dropped in from who knows where (and it's likely to stand out more in eg 'svcs' output). Other people creating SMF packages are already doing this; for instance, pkgsrc uses /pkgsrc/ names.

(This is the kind of entry that I write because I don't want to have to re-research this. SMF was annoying enough the first time around.)

Sidebar: A quick command cheatsheet

SMF manifests are validated with 'svccfg validate <file>.xml'. Expect to use this often.

Once ready to be used, manifests must be imported into SMF, which is done with 'svccfg import <file>.xml'. If you specified that your service should default to disabled when installed (as I did here), you then need to enable it with the usual 'svcadm enable /you/whatever'.

In theory you can re-import manifests to pick up changes. In practice I have no idea what sort of things are picked up; for example, if you delete a <dependency> block, does it go away in the imported version when reimported? I'd have to experiment (or know more about SMF than I currently do).

Your imported SMF manifest can be completely removed with 'svccfg delete /you/whatever'. Normally you'll want to have disabled the service beforehand. The svccfg manpage makes me slightly nervous about this in some circumstances that are probably not going to apply to many people.

(Svccfg has an export operation, but it just dumps out information, it doesn't remove things.)

solaris/SMFServiceManifestNotes written at 01:15:00; Add Comment


Django, the timesince template filter, and non-breaking spaces

Our Django application uses Django's templating system for more than just generating HTML pages. One of the extra things is generating the text of some plaintext email messages. This trundled along for years, and then a Django version or two ago I noticed that some of those plaintext emails had started showing up not as plain ASCII but as quoted-printable with some embedded characters that did not cut and paste well.

(One reason I noticed is that I sometimes scan through my incoming email with plain less.)

Here's an abstracted version of such an email message, with the odd bits italicized:

The following pending account request has not been handled for at least 1 week.

  • <LOGIN> for Some Person <user@somewhere>
    Sponsor: A professor
    Unhandled for 1 week, 2 days (since <date>)

In quoted-printable form the spaces in the italicized bits were =C2=A0 (well, most of them).

I will skip to the punchline: these durations were produced by the timesince template filter, and the =C2=A0 is the utf-8 representation of a nonbreaking space, U+00A0. Since either 1.5 or 1.6, the timesince filter and a couple of others now use nonbreaking spaces after numbers. This change was introduced in Django issue #20246, almost certainly by a developer who was only thinking about the affected template filters being used in HTML.

In HTML, this change is unobjectionable. In plain text, it does any number of problematic things. Of course there is no option to change this or to control this behavior. As the issue itself cheerfully notes, if you don't like this change or it causes problems, you get to write your own filter to reverse it. Nor is this documented (and the actual examples of timesince output in the documentation use real spaces).

Perhaps you might say that documenting this is unimportant. Wrong. In order to find out why this was happening to my email, I had to read the Django source. Why did I have to do that? Because in a complex system there are any number of places where this might have been happening and any number of potential causes. Django has both localization and automatic safe string quotation for things you insert in templates, so maybe this could have been one or both in action, not a deliberate but undocumented feature in timesince. In the absence of actual documentation to read, the code is the documentation and you get to read it.

(I admit that I started with the timesince filter code, since it did seem like the best bet.)

Is the new template filter I've now written sufficient to fix this? Right now, yes, but of course not necessarily in general in the future. Since all of this is undocumented, Django is not committed to anything here. It could decide to change how it generates non-breaking spaces, switch to some other Unicode character for this purpose, or whatever. Since this is changing undocumented behavior Django wouldn't even have to say anything in the release notes.

(Perhaps I should file a Django bug over at least the lack of documentation, but it strikes me as the kind of bug report that is more likely to produce arguments than fixes. And I would have to go register for the Django issue reporting system. Also, clearly this is not a particularly important issue for anyone else, since no one has reported it despite it being a three year old change.)

python/DjangoTimesinceNBSpaces written at 23:42:32; Add Comment

You aren't entitled to good errors from someone else's web app

This particular small rant starts with some tweets:

@liamosaur: Developers who respond to bad URLs with 302 redirects to a 200 page with error info instead of a proper 404 page should be shot into the sun

@_wirepair: as someone who does research for web app scanners, a million times this.

@thatcks: It sounds like web apps are exercising good security against your scanners & denying them information.

If you are scanning someone else's web application, you have absolutely no grounds to complain when it does things that you don't like. Sure, it would be convenient for you if the web app gave you all the clear, semantically transparent HTTP errors you could wish for that make your life easy, but whatever error messages it emits are almost by definition not for you. The developers of those web apps owe you exactly nothing; if anything, they owe you less than nothing. You get whatever answers they feel like giving you, because you are not their audience. If they go so far as to give you deliberately misleading and malicious HTTP replies, well, that's what you get for poking where you weren't invited.

(Google and Bing and so on may or may not be part of their audience, and if so they may give Google good errors and you not. Or they may confine their good errors to the URLs that Google is supposed to crawl.)

Good HTTP error responses (at least to the level of 404's instead of 302s to 200 pages) may serve the goals of the web app developers and their audience. Or they may not. For a user-facing web app that is not intended to be crawled by automation, 302s to selected 200 pages may be more user friendly (or simply easier) than straight up 404s. As a distant outside observer, you don't know and you have no grounds for claiming otherwise.

(There are all sorts of pragmatic and entirely rational reasons that developers might do things that you disagree with.)

It's probably the case that web app developers are better served over the long term by doing relatively proper HTTP error handling, with real 404s and so on (although I might not worry too much about the exact error codes). However this is merely a default recommendation that's intended to make the life of developers easier. It is not any sort of requirement and developers who deviate from it are not necessarily doing it wrong. They may well be making the correct decision for their environment (including ones to deliberately make your life harder).

(See also Who or what your website is for and more on HTTP errors, which comes at the general issue from another angle.)

PS: If you are scanning your own organization's web apps, with authorization, it may be worth a conversation with the developers about making the life of security people a little easier. But that's a different issue entirely; then 'our security people' are within the scope of who the web app is for.

web/NotEntitledToGoodErrors written at 00:50:21; Add Comment


A justification for some odd Linux ARP behavior

Years ago I described an odd Linux behavior which attached the wrong source IP to ARP replies and said that I had a justification for why this wasn't quite as crazy as it sounds. The setup is that we have a dual-homed machine on two networks, call them net-3 and net-5. If another machine on net-3 tries to talk to the dual-homed machine's net-5 IP address, it would send out an ARP request on net-3 of the form:

Request who-has <net-3 client machine IP address> tell <net-5 IP address>

As I said at the time, this was a bit surprising as normally you'd expect a machine to send ARP requests with the 'tell ...' IP address set to an IP address that is actually on the interface that the ARP request is sent out on.

What Linux appears to be doing instead is sending the ARP request with the IP address that will be the source IP of the eventual actual reply packet. Normally this will also be the source IP for the interface the ARP request is done on, but in this case we have asymmetric routing going on. The client machine is sending to the dual homed server's net-5 IP address, but the dual homed machine is going to just send its replies directly back out its net-3 interface. So the ARP request it makes is done on net-3 (to talk directly to the client) but is made with its net-5 IP address (the IP address that will be on the TCP packet or ICMP reply or whatever).

This makes sense from a certain perspective. The ARP request is caused by some IP packet to be sent, and at this point the IP packet presumably has a source IP attached to it. Rather than look up an additional IP address based on the interface the ARP is on, Linux just grabs that source IP and staples it on. The resulting MAC to source IP address association that many machines will pick up from the ARP request is even valid, in a sense (in that it works).

(Client Linux machines on net-3 do pick up an ARP table entry for the dual homed machine's net-5 IP, but they continue to send packets to it through the net-3 to net-5 gateway router, not directly to the dual homed machine.)

There is probably a Linux networking sysctl that will turn this behavior off. Some preliminary investigation suggests that arp_announce is probably what we want, if we care enough to set any sysctl for this (per the documentation). We probably don't, since the current behavior doesn't seem to be causing problems.

(We also don't have very many dual-homed Linux hosts where this could come up.)

linux/ArpOddBehaviorJustification written at 00:38:13; Add Comment

(Previous 10 or go back to January 2016 at 2016/01/31)

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.