Wandering Thoughts archives

2007-03-28

Dual identity routing with Linux's policy based routing

In the last entry I talked about the idea that an end node machine in a VPN actually has two distinct identities that need to be routed separately: the inside identity that should be routed over the VPN, and the outside identity that should be routed over the Internet. In this case each identity has an IP address associated with it; call the inside identity's IP address I and the outside identity's IP address O.

We can do this sort of routing with Linux's policy based routing, which fortunately is not as scary as it looks. To simplify somewhat, Linux does policy based routing by having multiple routing tables, and then providing rules to pick between them. We need two new routing tables, one for each identity, but fortunately we don't need them to have many routes; all they actually need is a default route.

So we set up the tables like this:

ip route add default dev VPN table 10
ip route add default dev REAL table 11

(Here VPN and REAL are your VPN and REAL network interfaces; you may need to add 'via GATEWAY' to the one for your real connection. My setup uses PPPoE DSL, so I can just throw my packets in the direction of ppp0 and have everything work right.)

Now we need two policy rules so that traffic explicitly from one or another of our identities is sent out the appropriate interface:

ip rule add from I iif lo priority 5000 table 10
ip rule add from O iif lo priority 5001 table 11

(The table numbers and the priorities are mostly arbitrary.)

This says that traffic explicitly from I should consult table 10 before theoretically falling through to the default main routing table. Similarly for traffic explicitly from O, which gets sent to table 11. Since tables 10 and 11 have a default route, we never actually fall through.

The final thing to do is to set up which networks default to using the inside or the outside identity when we connect to them. We do this in the regular routing table by choosing which interface we send a route to, for example to route 128.100.5.0/24 over my VPN connection I do:

ip route add 128.100.5.0/24 dev VPN

(Unless you change your default route, the default identity is your outside identity and thus you just need to add route entries for stuff you want to go over the VPN. Because the most specific route matches, you can make some large network go through the VPN, and then exempt specific subnets if you need to.)

This works because both the VPN and the REAL interfaces have local IP addresses associated with them, which determines the local IP address for outgoing connections that haven't explicitly specified it. (Connections that have specified the local IP address will get stolen by our explicit rules before we reach the main routing table.)

As a bonus you can make outgoing connections use a specific identity by specifying the origin address; for example, I could still connect to a machine in 128.100.5/24 as my outside identity by using 'ssh -b O <whatever>'. (This might be necessary if the VPN connection dies for some reason.)

(I owe a real debt to the documentation for the ip command, which turned out to be quite understandable once I actually sat down and read it carefully. There's a lot of useful and interesting stuff there.)

DualIdentityRouting written at 00:45:37; Add Comment

2007-03-24

Randomly engaging NumLock considered irritating

Dear Fedora Core 6 X server: please stop randomly turning my NumLock on. It's getting really old by now, especially since I use a BTC-5100C mini keyboard and so turning NumLock on sprinkles numbers around my typing instead of the letters that I expected.

(It also makes various fvwm2 operations not fire, since I'm not hitting shift+alt+mouse button, I'm 'hitting' shift+alt+numlock+mouse button. I'd tell fvwm2 to ignore the state of NumLock entirely, except it currently serves as a useful cue to me that hey, NumLock got turned on again.)

Perhaps this is some accessibility feature that I am accidentally waking up, but it seems unlikely; I'm running in a bare session, without the usual Gnome or KDE stuff started up. Nor is there any apparent pattern for when it happens, although it happens fairly infrequently and I probably don't notice it right away when it does.

PS: this is unlikely to be hardware failure since it is happening on two machines, although both have BTC-5100C keyboards. (I really like them.)

XServerNumlock written at 14:00:59; Add Comment

2007-03-16

Things I have learned while doing GRE tunnels on Linux

In no particular order:

  • point to point GRE tunnels have to be symmetric, where each end is a mirror image of the other. Otherwise the destination kernel will reject the inbound GRE packets, which makes sense from a security perspective once you think about it.

  • GRE tunnels require a local IP address before you can point routes at them; I suspect that this is generic behavior and is so that the kernel knows what default origin IP address to put on packets going out through them.
  • because GRE tunnels are network devices, you can give them a distinct local IP address, which becomes the default source IP address for anything routed over the tunnel.

  • GRE over IPSec over PPPoE requires a significantly smaller MTU than you might think for reliable operation. Do not assume that the kernel will get it right for you. (Especially if it doesn't know that the other end is using PPPoE.)
  • it helps to make sure that both ends are using the same MTU. Unlike with PPP, nothing automates this for you. (At least I think PPP automates this for you.)

  • because GRE tunnels provide an explicit source address and device, you can play some really peculiar routing tricks. You don't even seem to need policy based routing. My current trick is routing the subnet that the target of the tunnel is on over the tunnel itself, which makes my head hurt.

  • reading the documentation for the ip command is really useful; there's all sorts of powerful tricks lurking there. (And simple policy based routing is not as scary as it looks, honest.)

Now all I have to do is figure out the best way to automate all of this so that it happens automatically on system boot. This may be kind of tricky, because I am using a totally manually set up IPSec (complete with direct invocation of setkey and fixed keys), and I only want to IPSec my GRE traffic, not anything else between the two endpoints involved.

(The Fedora ifup-ipsec will do everything except GRE-only IPSec; it wants to IPSec all traffic. I prefer not to, because that way I have an out if something goes wrong with IPSec.)

GREThingsLearned written at 00:03:43; Add Comment

2007-03-15

An annoying limitation of Linux IPSec

I just got IPSec doing exactly what I need it to do. Unfortunately it turns out not to do me any good.

The problem is with tunnel mode IPSec, which has a cascade of issues. First, IPSec tunnels don't create actual network devices. This sounds harmless, except that you need a network device to establish routes and thus say things like 'network foo/24 is reachable over this IPSec tunnel'.

IPSec deals with this with black magic involving Security Policy Descriptors. If you add an SPD for a tunnel mode IPSec connection, IPSec magically starts routing packets for the target of the SPD, without any sort of routing table entry being created.

Which creates my real problem: proxy ARP. The core kernel proxy ARP stuff only responds to ARP queries for things it knows it routes to, ie things that are in the routing table. No routing table entry, no answer to the ARP query for the IP address at the far end of the IPSec tunnel.

(And there seems to be no proxy ARP stuff apart from the core kernel stuff; I haven't found a working way to manually publish an ARP entry or the like. Both arp and ip sort of have options for it, but neither appears to do anything in practice.)

The net result is that although my IPSec connection works great between the two machines directly involved (and the SPDs involved were trivial to write and worked the first time), it doesn't do me any good because I can't get any other machine to send packets for the far end of the IPSec connection to the near end.

So, in summary: Linux IPSec tunnels are incompatible with proxy ARP, which means that you can't use simple IPSec to push a single IP address on your subnet off to a remote machine. Presumably this hasn't been fixed before now because most people use IPSec tunnels to shuffle around full subnets that get routed explicitly.

(GRE over IPSec is clearly in my future. GRE works fine for this because GRE tunnels are real network devices, with all the resulting benefits.)

IPSecLimitation written at 17:39:00; Add Comment

2007-03-08

A belated set of more power consumption numbers

Following up on previous numbers, here are a few more (mixed in with some repeated from last time, for easier comparison):

Samsung SyncMaster 900NF 19" CRT displaying stuff 83 watts
Dell 1907FP LCD displaying stuff 26 watts

Clearly, switching to LCDs can pay off. Bearing in mind that my computers seem to idle at around 75 to 90 watts, using a 19" CRT is roughly the equivalent of having a second one running.

Here's some interesting figures from one of my machines with two different graphics cards:

ATI X800 GT PCIe idle 98 watts
both cores busy 155 watts
ATI X300 PCIe idle 87 watts
displaying graphics 90 watts
compiling Firefox with -j4 145 watts

Unfortunately I don't have figures for the X800 just being used for just displaying graphics stuff (without CPU soakers active), but it looks like dropping down to a simpler graphics card saves you around 10 watts. Plus you get a passive heatsink instead of a fan, and the X300 still has both analog and DVI out (and it's not as if Linux can really use the extra graphics power of the X800 at the moment.)

One of the things that strikes me about all of this is how comparatively little power a modern workstation system is likely to use. An idle machine with an active display is only a bit over a single 100 watt incandescent, and one that's as busy as I'm likely to get it is still under 200 watts (assuming an LCD instead of a CRT).

(Of course if I switched to compact fluorescents I wouldn't be using 100 watts per light fixture, but so far I strongly prefer the look of indirect incandescent light. This is ironic, given that when I look at LCDs I'm looking at filtered fluorescents, but I never claimed to be consistent.)

PowerConsumptionIII written at 00:44:10; Add Comment

2007-03-06

Why I don't like USB keyboards

Our problem with USB keyboards on Dell 2950s neatly illustrates why I remain deeply dubious about USB keyboards. It isn't that something broke; it's that it illustrates how complex USB is.

Specifically, USB keyboards require a pile of sometimes fragile code to bring them up natively, or trusting a large amount of magic BIOS code being run behind your back. However, I really want my console keyboard to be simple; the simpler it is, the less likely it is for something going wrong in the rest of the kernel to affect it, the less support from fragile bits of the kernel it needs, and the sooner it can come up during early boot.

By contrast, the PS/2 keyboard interface is pretty simple; my understanding is that you more or less bang on a single port and read some bytes. Almost anyone can get it going, and in a pinch I believe you can do without interrupts.

(The sheer amount of kernel code that has to be working to deliver a magic SysRq from a USB keyboard is daunting.)

I know that this makes me an outlier; the spate of servers with only USB connectors would have convinced me, if nothing else. If you don't interact with your machine until it's showing the graphical login screen, this isn't really an issue you care about; the machine already has to run a pile of code to be usable, so what's a bit more?

Fortunately you can still get motherboards with PS/2 connectors, although I have to wonder how much longer that will be true. Perhaps someday they will be as endangered as plain three button mice (of which I have a carefully hoarded stock).

USBKeyboardDislike written at 23:21:34; Add Comment

2007-03-05

Some useful new Linux software RAID features

Courtesy of a pointer from the linux-kernel mailing list to the Gentoo wiki entry, here's two useful (relatively) new software RAID features:

  • Normally, if a RAID array goes out of sync the kernel assumes everything on the out of date drive(s) has to be rewritten. With write-intent bitmaps the kernel keeps track of what areas actually got written to and only resyncs them.

    (This brings Linux up to more or less parity with an equivalent Solaris DiskSuite feature.)

  • 'data scrubbing' reads the entire RAID array to check that all of the sectors are still good and, if one disk has a flaw somewhere, attempts to rewrite the sector using the good data from another disk.

    (This is superior to trying to dump the filesystem to /dev/null every so often because it scans all of the mirrors in a RAID-1 array, not just one of them.)

Both features need a modern kernel (according to the Gentoo wiki, at least 2.6.16); enabling and disabling write-intent bitmaps also needs a modern version of mdadm. Unfortunately, this means that our Ubuntu 6.06 and Red Hat Enterprise 4 machines are out of luck; the Ubuntu LTS kernel is at 2.6.15 or so , and the RHEL 4 kernel is all the way back at some version derived from 2.6.9. My Fedora Core 6 machines are good, though, which makes me happy.

(You can do data checking by hand by dd'ing from the raw devices every so often, and you probably should. And I'm not sure if the software RAID data scrubbing will give a clear and easily found report if it finds a bad sector it can't rewrite, although possibly your SMART drive monitoring will give you an alert.)

NewSoftwareRAIDFeatures written at 12:19:24; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.