Wandering Thoughts archives

2011-04-24

Notes to myself on the priorities of Linux routing policy rules

Linux's policy based routing is done by writing a set of rules about what routing to do under various circumstances; this is implemented by 'ip rule' (and kernel code, of course). All policy rules have a priority, and rules are examined in priority order (low to high). Historically, I've given each rule a different priority (related rules get priorities that are close to each other). However, this is not required and I've come to think that it's sort of a mistake.

It's fine to have multiple rules at the same priority if they are what I'll call 'non-conflicting', ie if only one rule can possibly match a given packet. Often this is easy to arrange, for example if you have a series of rules like:

ip rule add from IP1 iif lo table 10
ip rule add from IP2 iif lo table 11
ip rule add from IP3 iif lo table 12
ip rule add from IP4 iif lo table 13

Since a packet can only have one source IP address, at most one of these rules can match a given packet. Similarly, if you have a set of rules that match incoming interfaces to dispatch to a routing table, they can all have the same priority since a packet can only have a single incoming interface.

(I'm confidant that all of this is in the documentation, along with a warning that conflicting rules at the same priority have undocumented behavior. This is a note to myself.)

You can still give all of these rules different priorities, but I've come to believe that this is a mistake for two reasons. First, it makes you work harder than you need to when you're adding rules, especially if you're doing it in scripts. Second, it somewhat obscures your intentions. Someone looking at your policy rules later (including yourself) is going to have to work out if the different priorities are important or just something you did just because. Meanwhile, grouping all of the variants of the same rule together at the same priority makes it clearer how things are organized (at least as I think of it); you can immediately see what your functional groups of different rules are. And of course you get simpler scripts for automating things like adding an isolated interface, since you know ahead of time what priorities the various rules are going to have.

(Having said that, I'm probably not going to revise my existing policy based routing rulesets to shuffle the priorities around to go with my new insight because my current setup works okay. But if I find myself adding and removing rules a bunch, it will get tempting.)

IpRulesPriority written at 02:17:11; Add Comment

2011-04-23

The Upstart dependency problem

We just ran into another issue with how Upstart handles startup scripts. The simple way to put it is that Upstart glues together the startup script itself and ordering dependency information on when it needs to be run. This is a problem because the startup script is 'owned' by the package but dependencies can be system dependent, which means that local sysadmins need to change them.

(This is similar to the previous Upstart coupling problem.)

Now, the old /etc/init.d system didn't exactly have dependencies, but it at least did have ordering and there was a strong convention that packages left the ordering alone when they installed updates. This let sysadmins manipulate the startup ordering so that the local special dependencies worked out right.

With Upstart, the only way to modify or add ordering dependencies is to modify the actual /etc/init/ script. Even if the package management system then leaves it alone on package updates, you have to hope that the update didn't make any (important) changes to the startup script, changes that you will have to notice and propagate into your own local version. There is no way to change just the dependency information while having the package system manage the rest of things.

You know, I had thought that both the sysadmin world and the Linux world had learned a lesson about shoving unrelated information into the same file. In fact the Linux world has spent years carefully splitting monolithic files into separate, much easier to manage things; this is what gave us such things as /etc/cron.d (and these things are a great idea; it is much easier to manage files in a directory than sections of a file). It's sad to see Upstart taking a step back into the past.

(It is especially annoying because almost all of the /etc/init.d scripts converted to Upstart scripts have no greater dependency than 'start me in runlevel X, Y, and Z'.)

Sidebar: an example of why ordering dependencies are system-dependent

We have a chain of dependencies for a machine getting its NFS mounts up. For reasons beyond the scope of this entry, mounting any NFS filesystem requires that the SSH daemon be running on the client, then our automounter replacement requires a single NFS filesystem be mounted first. So far, so good; we can do all of this with our own custom init scripts with their own ordering without needing to change system ones.

Then we add our system of user-run webservers. The simple way to start user-run webservers on boot is for users to have crontabs with @reboot entries. Cron runs @reboot entries immediately on startup. This implies that on our web server (and in fact on any user-accessible server), cron startup depends on our NFS filesystems being mounted; if cron starts before then, user @reboot actions will fail because the files they're trying to use are on filesystems that haven't been mounted yet. We can't handle this dependency with new init scripts; we have to change cron's own dependency list.

UpstartDependencyProblem written at 00:41:38; Add Comment

2011-04-02

How to use gdb to call getpeername()

A commenter on my entry about the process state observability gap suggested an evil hack to recover information about the destination of Unix domain sockets: use gdb on the process to call getpeername() on the socket and examine the result. Because I am sometimes a crazy person, I decided to work out how to do this in case I ever needed to do it for real.

Where this stuff hides in the GDB documentation was not obvious to me when I started. General information about gdb's expressions is in the section on printing and displaying data (section 10 in my GDB documentation, 'Examining data'). Information on calling functions in your program from gdb is in the section on altering execution (section 17 in my documentation, specifically 17.5).

First, getpeername() has one of those odd APIs. It takes three arguments; the file descriptor, a pointer to some sort of sockaddr structure to store the name of the peer, and a pointer to where it can read and write the length of the name buffer. Since we are trying to use this on a Unix domain socket, we'd normally use a sockaddr_un structure. You will probably not be so lucky as to be dealing with a program that already has an unused sockaddr_un sitting around ready to be hijacked for this, so we need to make our own. In fact, gdb probably won't even know what a struct sockaddr_un is, so we'll have to fake it.

(gdb takes its knowledge of structures from the program you're debugging, and most programs on most Linux systems don't have debugging information installed in the default configuration.)

This is the sort of situation where I present the gdb commands and then explain them, so:

print/x malloc(256)
This will be our sockaddr buffer. It's larger than a real struct sockaddr_un, but that's harmless and it saves us writing a program to determine exactly how big a sockaddr_un is. I'll assume that gdb calls what it prints here $1, which is true if you're starting from a clean new gdb session.
print/x malloc(4)
This is for the length of the sockaddr buffer (what the getpeername() manpage calls namelen).
print {int} $2 = 256
Set the length of our name buffer for the kernel.

print getpeername(<fd>, $1, $2)
If this prints anything besides zero, something went wrong. Print out errno and work out what.

x/s $1+2
Display the path part of our sockaddr_un buffer. I use x instead of print mostly because dumping memory is really what I'm doing and it feels right.

This is for a 32-bit Linux machine. If you're using a 64-bit machine you may need a different offset in the last command and maybe a different size for the second malloc.

This is not something you really want to do in a program that you want to keep running unperturbed. Even if all goes well we've leaked some memory, and there's a number of ways that something could go badly if your timing with gdb is just unlucky enough (for example, the program could be in the middle of a malloc() or a free() of its own; libc is probably not written to be safely reentrant in this particular situation).

(Having worked all of this out, I would feel very silly if I didn't write it down and then needed it later. I'm sure that it's all completely obvious to people who use gdb regularly.)

GdbGetpeername written at 01:03:29; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.