2014-05-19
A building block of my environment: sps
, a better tree-based process list
Once upon what is now a very long time ago, it appears back in 1987
or so, Robert Ward wrote a better version of BSD ps
that he called
sps
(cf).
The highly useful signature feature of sps
was that it displayed
processes in sort of a tree form, with UID transitions marked. This
was in the days before pstree
and equivalents were even a gleam
in anyone's eye, and anyways I maintain that sps
's display is
better than pstree
, ptree
, or the Linux ps
option that will
do process trees. I used SPS happily for a number of years on
BSD-based machines but then wound up dealing with the System V based
SGI Irix and really missed it. Rather than take on the epic work
of rewriting code that grubbed around in kernel data structures, I
redid the important features I cared about as a script that used a
pile of awk
code (well, nawk
code) to post-process ps
output
(using the System V ps
feature of printing out only specific
columns in parseable ways).
(In the process I learned a great deal about how what are now
ancient versions of awk
and nawk
handled attempts at things
like recursion and how to fake local variables.)
Ever since then I have carried my sps
script forward across OS
after OS (SGI Irix 6, then Solaris, then Linux), adopting it slightly
for each one. It remains my favorite way of getting process listings
on Linux (partly because I fixed the Linux ps problem with long
login names); on modern versions of Solaris
ptree
is almost as good, especially since our Solaris machines
don't have users (and thus UID transitions).
(Jim Frost wrote a Linux version of sps
back in 1998 and I
used it for a while but it has to be compiled, I don't think it's
been updated for a long time, and I don't know if it still works
on modern Linuxes. For that matter I don't know where you'd still
get the source code today.)
SPS output looks like this:
Ty User PID CMD [...] root 1192 /usr/lib/postfix/master |postfix 1197 qmgr -l -t fifo -u |postfix 22675 cleanup -z -t unix -u -c |postfix 22676 trivial-rewrite -n rewrite -t unix -u -c |postfix 22677 smtp -t unix -u -c |postfix 6205 pickup -l -t fifo -u -c [...] root 6899 /usr/sbin/sshd -D | 22741 sshd |cks 22760 sshd pts/0 * 22761 -rc pts/0 | 22856 /bin/sh /u/cks/bin/bin.i386-linux/sps -A pts/0 | 22858 ps -A -o user
This is a very small excerpt from 'sps -A
' that shows the essential
features (it's a small excerpt because modern Linux systems have a
lot of processes even if they're not doing much).
If this sounds interesting I've put my current version of sps
for
Linux on the web here and
there's also a lightly tested OmniOS version.
Adaptation for other Unixes is left as an exercise for the interested.
One of the reasons I quite like my script version of sps
, apart
from its sheer usefulness, is that it shows how Unix evolves useful
capacities over time (and how more CPU power makes them more
feasible). In the BSD era ps
was sufficiently hard-coded and awk
was sufficiently limited that you'd probably have had a hard time
duplicating the C version of sps
with a script and if you had,
the result would have been pretty slow and resource intensive. Move
forward a decade or two and there's no serious problem with either.
Today I doubt you could measure the impact of using a script for
this and committing to modern gawk
features would probably make
this even easier.
(A truly modern version of sps
would probably use Perl instead
of trying to mangle everything with awk
and other shell tools;
Perl is now ubiquitous enough to make that perfectly viable. Since
I'm not really fond of Perl, I'm not the right person to write that
version of sps
but feel free to go wild. I'd expect a Perl version
to be smaller, better, and possibly faster.)
Why desktop Linuxes want you to reboot after updates
Anyone who uses a mainline Linux desktop may have noticed a trend where more and more the system wants you to either reboot the system or log out and log back in again after you apply distribution updates (the two are roughly equivalent in terms of disrupting you, so I'm going to treat them the same). You might wonder why Linux has been shifting towards this increasingly, well, Windows-like experience. While I don't have direct knowledge of the internal decisions of Linux distributions, as a system administrator I can certainly see the factors that are driving people towards this.
There are two basic problems. The first one is simply getting your new updates fully activated when they may be updating either long-running programs or shared libraries used by long-running programs (because the copy of the shared library that a program is using is usually fixed at the point it starts). Some of these programs may be things like browsers and email clients; others may be daemons that are deeply tangled into the desktop environment (or even the system environment) to the point where other things assumes that they never exit or restart.
(Making a desktop environment that can survive random parts of itself restarting is actually quite a challenge. For instance you're going to need lots of programs to be able to safely serialize their state and then re-execute themselves, including security sensitive programs like ssh-agent. Many of them don't do this today, so you've got a lot of work ahead.)
The second problem is that of making a partially updated environment work. You get such a partially updated environment when some running programs (or loaded shared libraries, or whatever) are the old, pre-update versions while others are the new post-update versions. Unlucky programs can also see a partially updated environment if they start during the update process and see some files from after the update and some files from before it. Pragmatically it's quite hard for a distribution to even test that stuff works in this sort of situation; there are a huge number of different combinations and things that can go wrong and most of this is upstream software that a distribution has little power over.
The easy way out for both problems is to tell you to either log out or reboot after updates have been applied, depending on what's been updated. A reboot guarantees that everything is the current version and it's all coherent with each other (barring bugs in the actual updates). It may be overkill but it's simple and reliable overkill and this has a certain attraction to distributions that want to just make things work.
(This isn't the same issue as offline updates, but it's closely related. Offline updates are an even more extreme version of this that try to avoid potential problems even while applying updates.)