Wandering Thoughts archives

2014-05-19

A building block of my environment: sps, a better tree-based process list

Once upon what is now a very long time ago, it appears back in 1987 or so, Robert Ward wrote a better version of BSD ps that he called sps (cf). The highly useful signature feature of sps was that it displayed processes in sort of a tree form, with UID transitions marked. This was in the days before pstree and equivalents were even a gleam in anyone's eye, and anyways I maintain that sps's display is better than pstree, ptree, or the Linux ps option that will do process trees. I used SPS happily for a number of years on BSD-based machines but then wound up dealing with the System V based SGI Irix and really missed it. Rather than take on the epic work of rewriting code that grubbed around in kernel data structures, I redid the important features I cared about as a script that used a pile of awk code (well, nawk code) to post-process ps output (using the System V ps feature of printing out only specific columns in parseable ways).

(In the process I learned a great deal about how what are now ancient versions of awk and nawk handled attempts at things like recursion and how to fake local variables.)

Ever since then I have carried my sps script forward across OS after OS (SGI Irix 6, then Solaris, then Linux), adopting it slightly for each one. It remains my favorite way of getting process listings on Linux (partly because I fixed the Linux ps problem with long login names); on modern versions of Solaris ptree is almost as good, especially since our Solaris machines don't have users (and thus UID transitions).

(Jim Frost wrote a Linux version of sps back in 1998 and I used it for a while but it has to be compiled, I don't think it's been updated for a long time, and I don't know if it still works on modern Linuxes. For that matter I don't know where you'd still get the source code today.)

SPS output looks like this:

Ty     User            PID CMD
[...]
       root           1192 /usr/lib/postfix/master
        |postfix      1197 qmgr -l -t fifo -u
        |postfix     22675 cleanup -z -t unix -u -c
        |postfix     22676 trivial-rewrite -n rewrite -t unix -u -c
        |postfix     22677 smtp -t unix -u -c
        |postfix      6205 pickup -l -t fifo -u -c
[...]
       root           6899 /usr/sbin/sshd -D
        |            22741 sshd
         |cks        22760 sshd
pts/0     *          22761 -rc
pts/0      |         22856 /bin/sh /u/cks/bin/bin.i386-linux/sps -A
pts/0       |        22858 ps -A -o user

This is a very small excerpt from 'sps -A' that shows the essential features (it's a small excerpt because modern Linux systems have a lot of processes even if they're not doing much).

If this sounds interesting I've put my current version of sps for Linux on the web here and there's also a lightly tested OmniOS version. Adaptation for other Unixes is left as an exercise for the interested.

One of the reasons I quite like my script version of sps, apart from its sheer usefulness, is that it shows how Unix evolves useful capacities over time (and how more CPU power makes them more feasible). In the BSD era ps was sufficiently hard-coded and awk was sufficiently limited that you'd probably have had a hard time duplicating the C version of sps with a script and if you had, the result would have been pretty slow and resource intensive. Move forward a decade or two and there's no serious problem with either. Today I doubt you could measure the impact of using a script for this and committing to modern gawk features would probably make this even easier.

(A truly modern version of sps would probably use Perl instead of trying to mangle everything with awk and other shell tools; Perl is now ubiquitous enough to make that perfectly viable. Since I'm not really fond of Perl, I'm not the right person to write that version of sps but feel free to go wild. I'd expect a Perl version to be smaller, better, and possibly faster.)

sysadmin/ToolsSps written at 23:59:59; Add Comment

Why desktop Linuxes want you to reboot after updates

Anyone who uses a mainline Linux desktop may have noticed a trend where more and more the system wants you to either reboot the system or log out and log back in again after you apply distribution updates (the two are roughly equivalent in terms of disrupting you, so I'm going to treat them the same). You might wonder why Linux has been shifting towards this increasingly, well, Windows-like experience. While I don't have direct knowledge of the internal decisions of Linux distributions, as a system administrator I can certainly see the factors that are driving people towards this.

There are two basic problems. The first one is simply getting your new updates fully activated when they may be updating either long-running programs or shared libraries used by long-running programs (because the copy of the shared library that a program is using is usually fixed at the point it starts). Some of these programs may be things like browsers and email clients; others may be daemons that are deeply tangled into the desktop environment (or even the system environment) to the point where other things assumes that they never exit or restart.

(Making a desktop environment that can survive random parts of itself restarting is actually quite a challenge. For instance you're going to need lots of programs to be able to safely serialize their state and then re-execute themselves, including security sensitive programs like ssh-agent. Many of them don't do this today, so you've got a lot of work ahead.)

The second problem is that of making a partially updated environment work. You get such a partially updated environment when some running programs (or loaded shared libraries, or whatever) are the old, pre-update versions while others are the new post-update versions. Unlucky programs can also see a partially updated environment if they start during the update process and see some files from after the update and some files from before it. Pragmatically it's quite hard for a distribution to even test that stuff works in this sort of situation; there are a huge number of different combinations and things that can go wrong and most of this is upstream software that a distribution has little power over.

The easy way out for both problems is to tell you to either log out or reboot after updates have been applied, depending on what's been updated. A reboot guarantees that everything is the current version and it's all coherent with each other (barring bugs in the actual updates). It may be overkill but it's simple and reliable overkill and this has a certain attraction to distributions that want to just make things work.

(This isn't the same issue as offline updates, but it's closely related. Offline updates are an even more extreme version of this that try to avoid potential problems even while applying updates.)

linux/WhyRebootOnUpdates written at 01:22:19; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.