2005-12-31
Notes on getting a Solaris hardware inventory
Being able to find out what hardware is in a random machine is one of those things you don't think about very much until you inherit responsibility for a bunch of machines that you didn't build yourself.
The best hardware inventory program I've used is SGI's hinv
(although it doesn't have enough disk information). Linux has decent
hardware inventory support, but not bundled into a single command; you
have to look through a bunch of /proc files and know a few commands
like lspci. Unfortunately, Solaris is less friendly.
The old-fashioned way to get hardware information is to look at the
kernel's boot messages; on Solaris this is in syslog or via dmesg.
However, these logs get aged away if the system has been up for a
while. (I've been known to arrange for kernel syslog messages to never
expire, but I haven't set that up on my Solaris systems yet.)
The best program seems to be prtdiag, which gives CPU, memory, and
some hardware slot information (and works for non-root users, always a
bonus). There's also prtconf and a number of others, but they don't
seem to give much additional useful information about hardware.
The names of stuff in /devices has a some information, but I suspect
a good familiarity with Solaris device driver names is needed for best
results. (Solaris /proc is for processes only, so there is nothing
like Linux's collection of informative files.)
(People seem to use Magnicomp's sysinfo a fair bit, but it's
commercial software (with a 30 day free trial), and binary packages on
systems without real package managers make me twitchy. And its
installer has glitches that don't inspire confidence.)
2005-12-09
A surprising hazard of running as root all the time
We have some machines that are 'no user-operable parts inside'
setups; as part of that, they have no user logins, just root. (Yes,
yes, running as root all the time is bad, but on these boxes almost
all we'd ever do with a plain login is su to root.)
I'm attuned to all of the regular hazards of this, but today I
stumbled over a new one: how long it takes to notice that /var had
accidentally wound up mode 0750 (and owned by a group that didn't have
hardly anything in it) on a Solaris 2.4 machine.
Of course, root doesn't get permission denied messages, and most of
the obvious things were running as root and kept on working. About the
only sign was a large collection of files called things like
'mailAAAa00087' scribbled in /var/tmp. It turned out that these
files were complaints from cron about being unable to run lp cron
jobs because it couldn't change to lp's home directory, and bounce
messages talking about 'lp... Can't create output'.
So I looked at lp's home directory, /usr/spool/lp, which looked
perfectly fine and I could even cd into it as root. Only when I
did 'su lp' and tried it did I get a 'permission denied' error and
started backtracking to discover the /var permissions problem.
Sidebar: so how did it happen?
What I think happened is that someone built a tar file of a
/var/named directory they wanted to move around, but instead of
tarring up the directory, they cd'd into the directory and tared up
'.'. Then they moved it to this machine and accidentally untarred it
in /var instead of making a /var/named directory and untarring it
there. As part of unpacking, tar dutifully set the permissions on
all of the files and directories in the tarball, including '.'.
So the moral is: tarfiles that include . are annoying and
dangerous in more than one way.
2005-11-24
Solaris 9's slow patch installs
Yesterday was my first time installing the Solaris 9 recommended patch set on a production machine; we rolled it onto a basically unpatched server. Because it was a server, I did it in single user mode (the patch set recommends this, as some patches in the patch set say explicitly to apply them in single-user mode).
I already knew that installing the patch set was achingly slow on my test machine, but my test machine is an Ultra 10 so I wasn't surprised. The machine from yesterday was a Sunfire V210, which has modern CPUs and more importantly modern amounts of memory and fast SCSI disks.
It still took an hour.
There are 134 patches in the patch set, so Solaris was only able to average a patch every 26 seconds. Considering how much work a modern machine can do in 26 seconds, I believe I can safely say that the Solaris patch install system is hideously inefficient.
(And, as previously noted it spews incomprehensible and alarming messages on the screen.)
Fortunately it doesn't demand I answer any questions during its run, so next time around I'll know to just go back to my office for a while. Still, an hour is an irritatingly long time to have a production server down in single-user mode.
2005-11-19
Solaris 9 sendmail irritations
Here's how to give a system administrator a heart attack: the default
Solaris 9 sendmail configuration apparently allows other machines that
your Solaris machine thinks are in your local domain to relay through
you. I say 'apparently' because there's nothing in the sendmail.mc about
this, and nothing clear in the generated /etc/mail/sendmail.cf either.
In other fun discoveries, the default sendmail configuration is also set up to relay all your mail through a machine called 'mailhost' in your domain. We don't have such a machine in our subdomain here, so god knows where any administrative mail my test machine may have been trying to send for the past month or so may have wound up.
Solaris 9 was shipped in 2002, and Sun actually started to care about security by that point; for example, it ships with tcpwrappers. In 2002, I would have thought that Sun would know that any open relaying is a bad idea.
In fact it turns out that Solaris sendmail's default configuration has
other dubious features, even for 2002: for example, it will happily
accept MAIL FROM addresses without domains or with unresolvable
domains. None of this is set visibly and explicitly in their supplied
.mc files; it is hiding away in the 'solaris-generic' set of settings
that those use.
The light at the end of the tunnel is that Solaris 9 actually includes another set of settings, 'solaris-antispam'; changing from 'solaris-generic' to these will give you much stronger settings. (These are in fact the default Sendmail settings, so Solaris deliberately shipped with a less secure, more open to spam and abuse sendmail configuration.)
2005-10-08
Solaris 9 'Power management'
I had another Solaris 9 learning experience today: I came in to find my ssh sessions to my Ultra 10 test machine dead, because the machine was powered off. This was more than a little bit disconcerting, since the last thing I had left it doing was installing the current Solaris 9 patch set. (It took sufficiently long that I'd had to go home before it finished.)
Powering the machine on showed not a normal boot sequence, but a message about restoring the system. This caused me to remember that when I had installed the system, I'd said yes to an offer to have 'power management' software installed. (Unfortunately the installer does not have very many clear explanations of what the software packages all do.)
In the PC world I usually operate in, 'power management' is things like spinning down disks and dropping into low-power CPU states when the machine is idle. In the SPARC world, it turns out that 'power management' is turning the machine off entirely.
Fortunately I was able to find the dtpower program after some quick
Googling. Unfortunately dtpower doesn't run over a ssh X connection
for some reason, so I had to fire up dtlogin, log in, and run it to
shut this feature off. (There is probably a way to fire up the X
server and the environment from a console login, but starting
dtlogin was faster than trying to figure it out.)
(This whole episode is my fault, not Solaris 9's. I should have read the documentation before firing up the installer, and certainly before answering installer questions I didn't fully understand. But at least I've stubbed my toe on this now, in case I ever have to deal with Sparcs that mysteriously power themselves off every so often.)
2005-10-06
First irritations with Solaris 9
As with Fedora Core 4, I haven't been using Solaris 9 long enough to have given it a fair shake. So instead of any sort of review, this is just a collection of things that have irritated me about it on first exposure.
I'll start with a nice simple one:
#useradd -m -c 'Chris Siebenmann' cks
UX: useradd: ERROR: Unable to create the home directory: Operation not applicable.
This is on a default configured Solaris 9 machine, straight out of the 'take more or less the defaults' install. Is it too much to ask that the apparent best way to add users from the command line actually works?
(The reason this fails is that /home, the location of nominal user
home directories, is actually an automounter setup. But useradd
doesn't know about this. Whoops. For extra bonus fun, you actually have
to make an entry in the /etc/auto_home automounter map to get things
to work.)
Installer irritations:
- the installer asked me to reinsert a CD-ROM it had already asked for
(Solaris 9 Software disk 2, after the documentation). This is just
sloppy; you should be able to order your entire install series so
it asks for each CD-ROM only once.
- practically every time it had me swap CD-ROMs, it stopped to prompt
me if I really wanted things from the CD-ROM installed. This was
despite walking me through an entire earlier 'what stuff do you want
installed' step that led it to wanting those CD-ROMs.
- periodically it would pop up a dialog about continuing in 30 seconds
if I did nothing, or I could continue right away, or I could pause.
The first time I rolled my eyes and clicked 'Continue'. The next
time I realized that this dialog was obscuring a lower dialog with
informative options that I might wish to inspect and perhaps change.
- having previously wanted the sort of interaction normally seen in
needy young children, the installer decided to automatically reboot
at the end.
Update: mea culpa; this one is my fault. Right near the start, the Solaris 9 installer asks you if you want to automatically reboot at the end. (Then you are asked sixty zillion other questions so you forget this.)
I'd criticize the installer for not looking very pretty, but it was running on an 8 bit deep framebuffer. (Probably not a very fast one, either. Ultra-10s are not where you go if you want even 1998-era PC graphic basics, like 32-bit colour.)
Then there's the small issue of patch installer error messages, which are lovely things like:
Installation of 117067-01 failed. Return code 2.
[...]
Installation of 112233-12 failed. Return code 8.
Neither are helpful error messages. Does one or both of them mean that it's a patch not applicable to this system? Does one or both of them mean that something important has gone wrong during patch installation?
(It appears that return code 2 means 'update already installed' and return code 8 means 'this update isn't applicable to your system'. But to find this out I had to read the detailed error log. It would not have killed Sun to print an actually useful error message instead of 'Return code N'.)