Wandering Thoughts archives

2008-07-31

A crude system verification method

Suppose that you have a system that you are not entirely confidant of, and you want to look to see if bits of it have been modified from stock. The easiest way is to use your packaging system's verification support, but let us suppose that your package system doesn't have support for this (or at least that the support is optional and not installed at the moment).

If you happen to have another theoretically identical system lying around (as we do), you can do a crude system verification with rsync:

rsync -n -a --delete -IOc root@hostA:/usr/ /usr/

Here hostA should be the machine that you want to verify, not the machine that you want to verify it against. It also assumes that you can do ssh root logins to hostA. Some of these options are not obvious; -O makes rsync ignore changed directory times, while -I and -c forces rsync to always checksum files to check to see if they're different, instead of trusting the size and the timestamp.

(Package systems generally don't reset the directory modification time when they update programs in a directory, so directories like /usr/bin can naturally have different timestamps on different machines. Ignoring them saves you from drowning in noise.)

This isn't likely to work on Linux machines that use prelinking, because prelinking can create different binaries even on machines with identical package sets.

Disclaimer: as a crude verification method, this should only be used if you are mostly confidant in the system to start with. If you are not, remember the zeroth law of compromised systems.

RsyncSystemVerification written at 23:28:05; Add Comment

2008-07-20

Thinking about uses for (system) activity tracers

System activity tracers are a hot topic, with the best known one being Sun's DTrace. In thinking about this issue recently, I believe that there are three sorts of questions that they can be used to answer, or at least that I'm interested in having answered:

  • what is my system doing?

    Performance related tracing is one obvious subset of this, both in the 'what is taking all the time' sense and in the 'how long does some operation take' sense.

  • why is my system doing X, in the sense of 'what is doing X on my system'?

    Here you have some peculiar thing happening on your system and you want to trace it back to the program or system or action that causes it. For example, laptop people are interested in questions like 'what is accessing my hard drive' and 'what is waking up all the time'.

  • why is some part of my system doing what it is, or at least what information is it using to make the decisions about what it does?

The latter is important for solving specific problems; often you know roughly what is going wrong and what program is responsible, but you don't know why and how it is going wrong because you can't see the program's decision making process or even the information it is getting to make the decision. For example, consider 'I can't NFS-mount a filesystem that I think I should be able to'.

In theory you could deal with this by having programs optionally log a lot of information. My personal feeling (partly from having dealt with programs that did copious logging if asked) is that it is better to have a single central interface for deciding what you want to watch and log than to try to give every program options to control all of this; it just scales better, and it's probably easier for program authors too (since they just have to make some hooks available, instead of building a dynamically reconfigurable debug logging system).

ActivityTracerUses written at 23:46:28; Add Comment

2008-07-11

The case of the mysteriously failing connections

One of the strange networking mysteries around here is that every so often, one of our login servers will report that outgoing mail was delayed because it could not connect to the mail server's SMTP port. There's several things that make this puzzling:

  • the connection is failing with 'host not reachable' errors, not 'connection refused' or the like
  • the mail server is up, running fine, and not loaded at all
  • the login servers and the mail server are on the same subnet, although they are not connected to the same switch.

This happens very infrequently, and every time we've seen it happen it's gone away when the mailer retried a bit later (which is one reason we haven't worried about it more).

Like the last mystery I don't have any answers, but I do have a theory. First, the background: our login servers are all on a single switch, along with our compute servers. We know that during periods of high activity the switch is sending 'stop transmitting' Ethernet flow control frames to the login servers; we believe that the switch's uplink is saturated, since it's only got a gigabit uplink and is connecting eight or nine actively used machines that get all the important filesystems over NFS.

(We actually split the machines between two switches moderately recently; I don't know if we've seen the problem since then.)

So my theory is that during periods of high network activity when the switch is choked, the login server's ARP requests for the mail server's Ethernet address are getting dropped (either by the switch or by the login server's network driver). Linux does report 'host unreachable' if there's no answer to its ARP queries, and people send email from the login servers sufficiently infrequently that the necessary information could drop out of the local ARP cache.

LocalConnectionMystery written at 00:43:46; Add Comment

2008-07-06

A small drawback to Wietse Venema's TCP Wrappers

Wietse Venema's tcpwrappers is mostly used for controlling access to services run from inetd or your local equivalent, where you are not expecting high performance or high load. However, it can be built as a library that you link your daemon against, and some daemons are.

(At least on Linux, both OpenSSH and the portmapper are built this way.)

It turns out that there is a small drawback to using tcpwrappers this way in some sorts of high-performance applications, specifically in application where you expect a lot of connections or want to be able to dispatch connections very fast. The drawback is this:

Tcpwrappers does no caching of the hosts.allow and hosts.deny files.

Every time you call the tcpwrappers routines to check for host access, they open, read, and parse the files completely from scratch. If your files are small, this doesn't matter, but if they're large, you may be burning more CPU time on this than you expect.

(You're very unlikely to be hit with disk IO for reading the files; if you're getting any sort of connection volume they'll be in the filesystem cache.)

One useful thing to know for best performance is that tcpwrappers deals with the files strictly a line at a time (instead of parsing the entire file, then evaluating the parsed rules). This implies that it's worth putting the rules for the most common cases first if you have big files.

(Big hosts.allow and hosts.deny are probably uncommon, but I once had a hosts.deny file that was over 4,000 lines long. That was admittedly a special case, and eventually got replaced with better technology.)

TcpwrappersDrawback written at 01:17:12; Add Comment

2008-07-03

Why system administrators like interpreted languages

Or, more specifically, why sysadmins like programs written in interpreted languages. I say this because we do; it is one reason for the enduring popularity of writing things in the Bourne shell, because in a sense the Bourne shell is the platonic ideal of an interpreted language on Unix systems.

Here's why I at least really like such programs:

  • you don't need a different copy for every different sort of system that you have, because
  • you don't have to compile your programs.
  • thus there is no bootstrapping required on new systems; you can simply grab a copy of the program and run.
  • also, that means that you don't have to keep the source somewhere; the program is the source and can't get lost.

Strictly speaking, some of these aren't advantages of writing in interpreted languages, they're advantages of using self-contained programs. You can certainly write large programs in interpreted languages that require being installed before they can run (and that have large lists of fragile dependencies).

(However, some language environments make it easy to skip the installation step, at least on a temporary basis, with features such as search paths that include the current directory by default. And these days any sensible system should make copying over a directory as easy as copying over a file, because everyone should ship with a version of rsync, right? Yes, I'm looking at you, Sun.)

This also explains why I would generally rather use a hacked together and limited shell script than a much nicer C program; the shell script is a lot less hassle for casual use.

Sidebar: the source code advantage (again)

Take it from me: the last advantage is a big one. When the program is its own source, not only can it never get lost but there is never any question about which version of the source was used to build the binary (and with what compile options and so on).

(And let's not get started about old source that won't build in a modern environment, leaving you with a bunch of work to do if the old binaries stop working someday.)

This isn't to say that interpreted languages make portability issues go away entirely; of course they don't (and sometimes shell scripts make it worse). But they at least make it obvious, because your program blows up right away.

(The more technical way to put it is that with interpreted programs, you don't have any difference between source compatibility and binary compatibility. Compiled programs can preserve binary compatibility while breaking source compatibility.)

SysadminsLikeInterpreters written at 01:03:49; Add Comment

By day for July 2008: 3 6 11 20 31; before July; after July.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.