2007-03-31
Microsoft has another problem
As sort of a successor of a previous entry, here is a thesis:
Increasingly, people use Microsoft Windows not because they want to but because they have to.
(There's various reasons they have to: they're forced to by their employer, they need to use programs that only run on Windows, such as IE or Microsoft Office or various computer games, or even because the alternative costs too much.)
This is a problem for Microsoft, because customers that are only using you because they don't have a choice are extremely fickle customers; they are not a good source of long term success. It is much better to offer your customers something they find intrinsically valuable and actively like, instead of just being something they have to tolerate on the way to their real interest.
Now, this is only a thesis; I don't have enough exposure to the Windows world to know if it is actually true. But it is certainly my perception, and definitely my perception that it is not true of the Apple world, that Mac users are using Macs because they want to and like it.
(It is certainly not true of my own Unix usage; I use Unix on my desktops because I actively like the environment. This makes me rather peculiar, all things considered.)
2007-03-30
What an Ethernet splitter looks like
Partly because I couldn't easily turn up any good pictures of them on the Internet, here's a picture of an Ethernet splitter in its natural habitat:

(Full sized version here.)
The grey thing with the scratches in the background is the side of my office bookcase. Since this is a university, it is a dun metal one, hence the shade. (It may technically be some species of green. At a certain level of dull colouration, it becomes hard to tell.)
You will note that wisely, we have our data drops numbered and labeled. Given the chaos to be found in our wiring area, this is a really good idea. We even have maps with them all noted down, which is glorious and kind of miraculous.
(Carefully cropped out of this picture is the pair of very surplus thicknet Ethernet cables that dangle down from the ceiling just to the right of the junction box. I suppose I should go push them entirely behind the bookcase.)
2007-03-27
The VPN routing problem
An end node machine connecting in through a VPN has two IP addresses; one IP address that is inside the VPN (call it I), and its normal IP address outside the VPN (call it O). The outside address is sometimes called the wild side address, because it is accessibly by the wild Internet.
A lot of writing about VPNs assumes that I and O are disjoint from each other, effectively on completely isolated networks; I talks to the corporate network behind the VPN and only the corporate network, and O talks only to the Internet and never to the corporate network (and vice versa; machines inside the corporate network never talk directly to O). This assumption makes routing possible and even simple: routes to the corporate network are established that point to the VPN, and the machine's default route remains going out its usual connection.
The problems with this come in if we violate the isolation assumptions; you wind up with asymmetric routing. If the corporate network tries to talk directly to O, the return packets will try to flow back over the VPN, which may or may not work. If the outside world tries to talk to I, the inside address, the return packets will try to flow back out over the end machine's regular Internet connection, which almost certainly won't work.
The core issue is that the machine really has two identities, the inside one and the outside one, and these identities need different routing. The inside identity should route everything over the VPN; the outside identity should route everything over the regular Internet connection.
Normal routing tables have no concept of separate identities; they pick where to send a packet based purely on what its destination address is. So things only completely work out when the routes for the two identities are completely distinct anyways: when the corporate network behind the VPN is in address space that's not reachable from the Internet.
(It can be publicly assigned and even nominally routed; the important thing is that a machine on the corporate network can't make a direct connection to O and a machine on the Internet can't make a direct connection to I.)
In this case, we can use the local IP address that a packet is coming from as a proxy for which identity of the machine sent it, and thus which connection it should go out over. If it comes from I, send it out over the VPN; if it comes from O, send it out over the regular connection.
This sort of routing goes by the general name of 'policy based routing', and is unfortunately a very complicated field with no standardization between systems. For example, Linux can do a certain amount of it purely with routing magic, while at least some other systems put it in the hands of IP filtering systems.
(This problem doesn't come up with a pure IPSec VPN implementation, because the IPSec specification essentially requires you to have a second routing layer that can match on the source as well as the destination IP address of packets.)
2007-03-21
Users are almost always right
One of the rules of system design ought to be this:
When the users keep doing it wrong, the users are right and your system is wrong.
I don't mean this in the sense that accommodating users is the right thing to do; I mean this in a very pragmatic sense. If the users keep making some error, changing their behavior is going to be very difficult and therefor costly. (In a fight between user inertia and anything else, bet on user inertia.)
This means that changing your system so that what the users are doing is right is simply the easiest way to fix the overall situation. You can stick to your guns and try to educate users instead, but you're definitely trying to swim upstream.
The less ruthlessly pragmatic way to look at this is that, whatever we would like to imagine, users don't do things wrong out of some perverse desire to cause problems. Each user error has a why and a how behind it, much like 'pilot errors' in airplane accidents, and real progress only comes when we understand and fix these root problems.
The obvious corollary is that if you cannot change the system so that the way the users want to do it is right, you need to change the system so that it is impossible. Because clearly the users are going to keep on trying to do it that way no matter what you say, and stopping them entirely is generally better than letting them make mistakes.
(This whole issue is closely related to user education.)
2007-03-19
The three strata of virtualization
It struck me recently that you can view the general virtualization world as having three strata:
- at the first level, people are just interested in virtualizing and
isolating user-level programs. This is the world of FreeBSD jails
and Solaris zones.
- at the second level, people are interested in virtualizing the
entire operating system. This is the world of server consolidation,
hypervisors, paravirtualization and so on.
- at the third level, people actively want to virtualize even the hardware itself. This is often the world of testers and so on, who want to run systems that are as stock as possible.
Virtualization products fit into these strata too; each supports a particular level, and usually all higher levels (at an efficiency cost). For example, Xen is mostly a second level system at the moment (and IBM mainframes have been doing it for decades), while VMWare and Parallels are third level products, somewhat from necessity.
(If you want to do any sort of virtualization of an uncooperative operating system, you have no choice but to use full hardware virtualization.)
I like this view because it sheds light on why different groups of people interested in virtualization can find it hard to talk with each other: they're interested in different levels, and thus often have vastly different concerns and interests. With interests that are too different, people wind up not talking to each other but past each other. (And in the extreme, arguing past each other, relying on premises that are self-evident to them but foreign to the other side.)
This also points out a potential source of confusion: you can wind up using a product that supports a different level from the level you're actually interested in. Usually you wind up using a deeper level product and chafing at its obvious inefficiencies.
We aren't currently doing enough work with virtualization that this is a significant concern to us, but I'm going to have to remember it for the future. It's not enough to say something like 'use VMWare'; we need to understand why we're virtualizing something, and what we want to get out of it. That way we can keep focused on what's important and not get distracted into a technology-centered view of our problems.
(Yeah, yeah, this is probably obvious to lots of people already. I don't promise profound thoughts here, just my thoughts, and sometimes I'm slow to the party.)
2007-03-16
GRE is a translucent tunnel
I normally expect IP tunnels to be opaque, that is to act as if they were physical links: the packet is sent down one end and pops out the other end unchanged, just as if it went over a connection between two routers, and the tunnel itself is indifferent to the details of the packets it is transporting. (The technical ISO way of describing this is that I expect IP tunnels to act entirely as layer 2 entities.)
However, one of the peculiarities of GRE is that it is a translucent tunnel, where some of the bits of the packet being tunneled show through (and are affected). In particular, GRE uses the real packet TTL.
More precisely, GRE encapsulated packets and the underlying real packets reuse each other's TTL. By default, the initial TTL of the encapsulated packet is the same as the real packet had when it got to the start point, and at the end of the tunnel the TTL of the de-encapsulated packet is whatever the TTL the encapsulated packet arrived with.
How I noticed this was trying to do a traceroute of a GRE tunneled
link. Because traceroute uses the packet TTL and GRE temporarily
rewrites the origin IP address, everything after my GRE gateway went
blank (the TTL was expiring and the message about it was going to my GRE
gateway instead of to the system running traceroute). Given that there
are about 20 hops between the endpoints of the GRE tunnel, I wouldn't be
surprised if it was also affecting the general reachability of the far
end of the tunnel.
IPSec in transport mode uses (and alters) the regular packet headers, so GRE over transport mode IPSec is also affected by this. Tunnel mode IPSec is an opaque tunnel, so GRE over tunnel mode IPSec does not have this issue. As a result, I now have a very small IPSec tunnel.
(That tunnels temporarily rewrite the source IP address has interesting consequences for path MTU discovery; if the packet is larger than the MTU of the path the tunnel currently takes, the ICMP error packet will go to the source endpoint instead of the real source. I don't know if kernels are generally smart enough to rewrite the ICMP message a bit and send it on to the real source, if they update the tunnel MTU, or if the ICMP packet just gets dropped.)
2007-03-03
On useful front-panel LEDs
I moved up to ADSL recently and it's quite nice, but I miss one thing about my old Supra modem: I've been spoiled by its luxurious front panel display. The front panel of my ADSL modem is somewhat of a letdown and I've been missing information that I used to be able to get at a glance, in particular how saturated my link is.
The ADSL modem has four front-panel LEDs, all equally prominent:
Power ADSL DATA LAN
(DATA is on if the DSL line is active; LAN is on if the Ethernet has signal and then blinks off when the Ethernet is busy.)
This may look good, but in practice all the information I get is that something is going on with my link. LAN's blinking is redundant, since it only blinks when DATA does (and the DSL link will saturate long before the Ethernet). How much DATA blinks is almost completely opaque: does a nearly solid DATA mean that my down link is saturated, that my up link is saturated, or that both of them are alternately 50% busy?
To tell how saturated the link is, you need to split DATA into DOWN and UP (possibly killing off LAN in the process if you want to stay at four LEDs). This would keep the reassuring blinking when the DSL link was active, while giving people useful information about how busy things are.
(The Supra's front panel would report what line speed it was currently getting, but I'm not counting that as a strike against the ADSL modem since it has an internal web server that gives me the same information. Annoying, the internal web server does not seem to report current ADSL down/up utilization levels. Maybe it's hiding somewhere in the SNMP data.)