Wandering Thoughts archives

2008-08-03

A performance gotcha with syslogd

Stated simply: many versions of syslogd will fsync() logfiles after writing a messages to them, in an attempt to make sure that the message makes it to disk in case something happens to the system immediately afterwards (crashes, loses power, etc). This obviously can have an impact (sometimes a significant one) on any other IO activity going on at the time.

On some but not all systems with this feature, you can this off for specific syslog files by sticking a '-' in front of them; this is especially handy for high volume, low importance log files, such as ones you're just using for statistical analysis. (For example, one system around here has a relatively active nameserver that syslogs every query. You can bet that we have fsync() turned off for that logfile, and when we accidentally didn't we noticed right away.)

(From the moderate amount of poking I've done, Solaris always does this and has no option to turn it off, FreeBSD only does this for kernel messages and can turn it off, and Linux's traditional syslog daemon always does this and can turn it off. I don't know about the new syslog daemon in Fedora. OpenBSD doesn't say anything in its manpages, but appears to always fsync().)

As a side note, if you really need syslog messages to be captured, I recommend also forwarding them to a remote syslog server. That way you have a much higher chance of capturing messages like 'inconsistency detected in /var, turning it read-only' (which has happened to us), and you have a certain amount of insurance against the clock on the machine going crazy.

(A central syslog server is also a convenient place to watch all of your systems at once and easily correlate events across them.)

sysadmin/SyslogFsyncIssue written at 23:48:23; Add Comment

First impressions of using DTrace on user-level programs

I've finally gotten around to trying out DTrace, and I have to say that it actually is pretty cool so far. I haven't used it on kernel side stuff, just to poke at user programs, for which it makes a nice print based debugger; it's easy to point DTrace at a process and see what's going on (easier than attaching a debugger and getting anywhere in my opinion), and you have a bunch of interesting analysis options, none of which require you to sit there holding the debugger's hand.

(For example, it is easy to see all of the library calls that a program makes, or all of the calls that it makes to a specific shared library; this is a powerful way of working out how a program is making decisions.)

One drawback to using DTrace on userland programs is that DTrace is fundamentally a kernel debugger, so it does not give you direct access to a process's user-side memory and thus its variables and data structures. In order to get any of this sort of thing, you have to first copy it from user space with various DTrace functions, primarily copyin() for random data and copyinstr() for C-style strings. Another drawback is that DTrace has no looping and very little conditional decision making, which makes it hard to trace down complex data structures.

(I understand why DTrace has to have this basic model, but I really wish that the DTrace people had put more convenience functions in the DTrace language for this. And I don't understand the whole avoidance of an if or equivalent at all.)

That DTrace is not a user-level debugger is actually reassuring in one sense; user-level debuggers are traditionally rather invasive and do things like stopping the process you're monitoring while you hold their hand. This is alarming for a sysadmin, since the processes we want to monitor are often rather important system ones that we want disturbed as little as possible (and certainly not stopped and potentially aborted).

solaris/UserlandDtraceImpressions written at 01:02:59; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.