Wandering Thoughts archives

2009-07-31

Using SystemTap to trace the system calls of setuid programs on Linux

Suppose that you have a setuid program that is failing mysteriously and you want to see what it's doing. With normal programs you can use strace, but not even root can strace a setuid program (if you try, the program runs non-setuid).

(Yes, strace has the -u option, but it doesn't help if the setuid program is being run as part of a whole chain of processes in a specific environment and you can't just run it directly. It would be nice if root could use 'strace -f ...' for this, but alas it doesn't work.)

On a Solaris system you could use DTrace for this. SystemTap is the rough Linux equivalent and, although much less polished and not as well documented, it does work. Here is the crude SystemTap script that I used:

probe syscall.* {
  en = execname();
  ui = uid();
  eui = euid();
  if (en == "<redacted>") {
    printf("%s(%d): %s(%s)", en, pid(), name, argstr);
    if (ui != eui) {
      printf(" as %d/%d ", ui, eui);
    } else {
      printf(" as %d ", ui);
    }
  }
}

probe syscall.*.return {
  en = execname();
  if (en == "<redacted>") {
    printf("= %s\n", retstr);
  }
}

This produces output with system call arguments and return values helpfully decoded for you; it looks like:

<redacted>(14087): open("/etc/passwd", O_RDONLY) as 2315/0 = 3
[...]
<redacted>(14087): close(1) as 2315/0 = -9 (EBADF)

(In some ways this is nicer than DTrace. But the lack of documentation on what sort of information you can get about system calls and so on really hurts; I had to read the source for the syscall tapset in order to find out about name, argstr, retstr, and so on.)

Note that, despite the presence of the PID in the output, this isn't really useful for tracing if more than one instance of the program is running at once. That would take more SystemTap magic than I know so far (or worse output and some postprocessing). Also, since stap is kind of slow you'll want to run it with the -v flag so that you know when it's actually finished checking, compiling, and enabling your tracing.

One of the things that the documentation isn't very clear about is that the execname() function returns the bare command name of the current process and not its full path. (There is probably a way to extract the full path if you need it. I didn't, so I didn't go digging.)

All in all, I would have to score my first real exposure to SystemTap as a reasonably pleasant experience. Although there were a bunch of frustrating bits, it did work, it gave me what I wanted to know, and it wasn't particularly difficult to do or to work out how to do it (and it didn't take particularly long).

SystemTapSetuidTracing written at 22:26:55; Add Comment

2009-07-22

A peculiar change in Linux flock() and fcntl() behavior

Here is one of those fun issues that cause me to pull out my hair (although it can give me a peculiar sense of satisfaction to track it down).

Suppose that you have two filenames, such as (not entirely hypothetically) .vacation.dir and .vacation.pag. As it happens, these filenames are actually hardlinks, so there is only one actual file involved. Now, suppose you have code that is like this C-oid pseudo-code:

fl = {F_WRLCK, SEEK_SET, 0, 0, getpid()};
fd1 = open(".vacation.dir", O_RDONLY);
flock(fd1, LOCK_EX);
fd2 = open(".vacation.pag", O_RDWR);
fcntl(fd2, F_SETLK, &fl);

If and only if you are on a sufficiently modern Linux and the files are on an NFS filesystem (possibly it depends on the NFS server), the fcntl() will fail with EAGAIN. If you don't know that the files are hardlinks, it may take you some time to realize what's going on, especially because many of the test programs you write will probably work fine (except when applied to that specific pair of files).

(But wait, it gets weirder. Replace the fcntl() with flock() and it will fail even on local filesystems and on older kernels. This behavior disagrees with the manpage, which is explicit in that separate file descriptors to the same file are treated independently. Updated: I was badly misreading the manpage and this is correct flock() behavior; treating the file descriptors independently means that separate locks on them will conflict, not that they won't. See the comments.)

In this case, Ubuntu 6.06 is not sufficiently modern but Ubuntu 8.04 is, and guess what we just did today. (If you guessed 'upgraded our mail server', you win.)

Now, you might sensibly ask why we have code that is trying to do such a crazy thing in the first place. The answer to that is that the fcntl() is actually done in the gdbm library's dbm_open function, which errors out if it fails. We don't want to error out, we just want to serialize things, and so we need to add our own locking to do so, which needs a file, and what better file to use than the other one of the DBM database files, since we know that it has to exist.

(I am not sure what file to use as a replacement for the serialization, although clearly we need to find one.)

FlockFcntlChange written at 01:05:00; Add Comment

2009-07-21

Packages should not contain both tools and policies

If you install the Ubuntu mdadm package on a machine with no software RAID arrays, it will 'helpfully' email you every day to report:

checkarray: W: no active MD arrays found.
checkarray: W: (maybe uninstall the mdadm package?)

(The message comes from the /usr/share/mdadm/checkarray shell script.)

This is a terrible idea, for reasons beyond that it's not an error. (It might be an error if the script bothered to check to see if there were configured arrays that weren't active, but it doesn't.)

The larger issue is that it means that Ubuntu is forcibly joining together tools (the mdadm program and its manpage) with administrative policy (complaining if there are no active arrays). This is a serious mistake, because it forces your administrative policy down the throats of people who just want the tools. Administrative policies are not 'one size fits all'; they do not come anywhere near close to working for everyone. The more you forcibly join your administrative policies to the tools in your packaging system, the more you force people who want the tools but not your policy to step outside the packaging system, which is a terrible idea for all sorts of reasons.

(Also, you don't know all of the reasons that people have to install tools to start with. To feed my particular pet peeve, in a multi-machine environment there are any number of reasons to have 'inapplicable' packages installed on machines, starting with that it's nice to not have to worry about logging in to a specific machine just in order to read a manpage.)

The right way is to take one of two approaches: either put the tools in one package and your policies in a second one, or remove policy from your packages and just ship things that quietly work right. (Yes, you may miss some obscure corner cases, but people with obscure corner cases always have to do some work themselves.)

ToolPackagesVsPolicy written at 01:30:17; Add Comment

2009-07-17

The hard problem of live major release upgrades

I've held forth before on things that boil down to 'it would be really nice if my favorite Linux distribution had live release upgrades'.

One might think that this is an easy feature to add. After all, in a modern Linux an upgrade from one release to the next mostly consists of updating a huge number of packages, and everyone has excellent, very well tested support for doing that. So, just change things to get packages from the new release's repository and do a regular package update and you're mostly done, right?

(This is in fact what the Fedora yum-based upgrade method basically boils down to.)

But doing this reliably is actually a hard problem. There are a number of smaller issues, but the big issue is dependency chains for package updates combined with the need to keep things working on the running system. Consider a hypothetical situation: standard shell utilies use the new version of glibc (of course), the new version of glibc requires a new version of the kernel that supports new APIs that it's using, and of course, installing the new kernel requires a reboot. So much for your live upgrade.

The inevitable conclusion is that to make live upgrades work, you need at least core system components to be backwards compatible for at least one release. What counts as a 'core system component' here is one of those somewhat hard problems, and in general this is a constraint that developers may not be too happy about.

(It's also worth noting that backwards compatibility can be very hard to test thoroughly, and it's very easy for hidden dependencies to creep in.)

The smaller issues that I see are not insignificant either:

  • not all system changes done during upgrades are just package updates; your live release upgrade is going to have to be more than just a bunch of package updates.

  • sometimes packages are rearranged, split, renamed, and so on between releases in ways that the normal package manager can't handle (or at least not handle safely).

    (Fedora has traditionally handled this by running package updates in an unsafe 'overwrite everything' mode during release upgrades and hard coding a list of packages that need special handling.)

All of these problems are drastically simplified by being done from 'outside' of the system being upgraded, as happens during a non-live release upgrade. When you're working outside, you don't have to keep the system running during the whole process and as a result can do relatively violent changes. You also don't have to use the usual package update tools, tools that are explicitly designed to avoid horrible accidents during normal upgrades and package installations.

LiveUpgradeProblem written at 03:50:59; Add Comment

2009-07-08

Fedora and workstations (on Linux distributions for desktops)

Comments on the last entry brought up the question Fedora on desktops (what I call 'workstations' out of tradition), and in general the issue of what distribution to run on them. It's a good question and I can't claim exhaustive experience, but here's my views.

I run Fedora on my own desktop and, despite my gripes, it generally works and works well. I deal with the short support period partly by accepting that I'll lose a few days to upgrades every year (or every six months if I'm ambitious) and partly by running Fedora versions well beyond their end of life date and accepting the potential security risks with my eyes open.

(I only run beyond EOL on my own machines, which are single user and relatively locked down, with very little exposed. As far as bugs in things like Firefox are concerned, I am either very safe or very exposed no matter what, because I am still running a custom compiled Firefox from quite a few years ago.)

For all that I don't particularly like Ubuntu, I think that it's your best choice if once a year upgrades (or reinstalls) are still too much disruption. It has a great package selection, relatively recent software versions, and even the normal version is supported for 18 months. And you can move between regular and LTS versions as your needs for stability versus current software versions change, especially since Ubuntu has good release to release upgrades.

I wouldn't run RHEL/CentOS for desktops unless they were basically a captive environment that was used only or almost entirely to run known applications. As general use machines I'd be concerned about a limited package selection and about having rather old software versions (at least eventually, since RHEL releases seem to be happening at most once every two years).

DesktopDistribution written at 01:10:17; Add Comment

2009-07-07

Why and why not Fedora

There's a certain perception that Fedora is the beta-quality testbed for Red Hat Enterprise (to condense a comment from an earlier entry), and this is why you shouldn't put it on any machine you care about. This isn't the case, but I think people wind up with this perception because they hear the accurate suggestion that you probably shouldn't put it on servers or production machines unless you really know what you're doing.

Fedora is supposed to be (and by and large is, for all that I sometimes gripe about it) a real, release quality Linux distribution. Fedora is unsuitable for 'business' or 'production' use for two reasons: versions aren't supported long enough, and it is willing to include relatively bleeding edge software versions.

The support period is the real killer, as each Fedora release is only supported for about a year (the standard support period is 'two releases plus a month', and Fedora does a release roughly every six months). For many organizations, us included, an update a year is both too much work and too destabilizing; we want to run servers and desktops without significant changes for much longer than that, and in some cases our users demand it.

(Also, a month is simply not enough for most places to build, validate, and deploy a new OS release, especially since most Fedora releases need time to stabilize to start with.)

That Fedora is willing to include very recent software has two effects. First, things sometime break, don't work as well as desired, or have rough edges, and second, it means that the Fedora environment can change significantly from release to release. To put it one way, every Fedora release is a major release; there are no minor releases. And my perception is that Fedora has a bias towards shipping the most recent version of things instead of the 'most known to be stable' version.

(For example, Fedora 11 shipped with a Firefox 3.5 pre-release, now updated to the released version of Firefox 3.5. A more conservative distribution would have shipped with Firefox 3 and waited until the next release cycle before shipping something as new and significant as Firefox 3.5.)

All of this isn't particularly unique to Fedora. Ubuntu does much the same thing (although my perception is that their software versions are usually slightly less recent that Fedora's) for their regular releases.

(All of this ties into my (old) views of Linux distributions.)

FedoraWhyAndNot written at 01:06:50; Add Comment

By day for July 2009: 7 8 17 21 22 31; before July; after July.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.