Sometimes a little change winds up setting off a large cascade of things
(This is a sysadmin war story.)
We have a password master machine, which runs some version of Ubuntu LTS like almost all of our machines. More specifically, it currently runs Ubuntu 12.04 and we need to upgrade it to Ubuntu 16.04. Naturally upgrading our master machine for passwords requires testing, which is a good thing because I wound up running into a whole cascade of interesting issues in the process. So today I'm going to walk through how one innocent change led to one thing after another.
Back in the Ubuntu 12.04 days, we set our machines up so that
/bin/sh
was Bash. I don't think this was the Ubuntu default for
12.04, but it was the default in the Ubuntu LTS version we started
with and we're busy sysadmins.
In 2014, we changed our Ubuntu 14.04 machines from Bash to the
default of dash as /bin/sh
(after finding issues with Bash) but left the 12.04 machines
alone for various reasons.
(This change took place in stages, somewhat prompted by Shellshock, and we were fixing up Bashisms in our scripts for a while. By the way, Bashisms aren't necessarily a bug.)
Our password change process works in part by using a PAM module to
run a script that does important things like push the changed
password to Samba on our Samba servers (possibly there is a better
way to do this with PAM today, but there is a lot of history here
and it works). This script was written as a '#!/bin/sh
' script,
but it turns out that it was actually using some Bashisms, which
had gone undetected before now because this was the first time we'd
tried to run it on anything more recent than our 12.04 install.
Since I didn't feel like hunting down all of the issues, I took the
simple approach; I changed it to start '#!/bin/bash
' and resumed
testing.
I was immediately greeted by a log message to the effect that bash
couldn't run /root/passwd-postprocess
because of permission denied.
It took quite a lot of iterating around before I found the real
cause; our PAM module was running
the script directly from the setuid passwd
program, so only its
effective UID was root and it turned out that both Bash and dash
(as /bin/sh
) were freaking out over this, although in different
ways. Well, okay, I could fix that by telling Bash that everything
was okay by using '#!/bin/bash -p
'.
Things still failed, this time later on when our passwd-postprocess
script tried to run another shell script; that second shell script
needed root permissions, but because it started with only '#!/bin/sh
',
its shell freaked out about the effective UID things and immediately
dropped privileges, causing various failures. At this point I saw
the writing on the wall and changed our PAM module to run
passwd-postprocess
as root via setuid()
(in the process I cleaned
up some other things).
So that's the story of how the little change of switching /bin/sh
from Bash to dash caused a cascade of issues that wound up with me
changing how our decade-old PAM module worked. Every step of the
way from the /bin/sh
change to the PAM module modifications is
modest and understandable in isolation, but I find the whole
cascade rather remarkable and I doubt
I would have predicted it in advance even if I'd had all of the
pieces in my mind individually.
(This is sort of related to fragile complexity, much like performance issues.)
|
|