What happens when a modern Linux system boots without
It turns out that a whole lot of things explode when your system boots up with /bin/sh not working for some mysterious reason.
Here the mysterious reason was that there was an unresolved dynamic
library symbol, so any attempt to run
with an error message from the ELF interpreter.
The big surprise for me was just how far my systemd-based Fedora
23 machine managed to get despite this handicap. I certainly saw a
cascade of unit failures in the startup messages so I knew that
something bad had happened, but the first inkling I had of just how
bad it was came when I tried to log in as
root on the (text)
console and the system just dumped me back at the
Most of the system services had managed to start because their
.service files did not need
/bin/sh to run; only a few
things (some of them surprising) had failed, although a dependency
chain for one of them wound up blocking the local resolving DNS
server from starting.
The unpleasant surprise was how much depends on
/bin/bash working. I was able to log in as myself because I use
a different shell, but
root was inaccessible, my own environment relies on a
certain amount of shell scripts to be really functional, and a
surprising number of standard binaries are shell scripts these days
/usr/bin/fgrep, for example). In the end I got somewhat lucky
in that my regular account had
sudo access and
sudo can be used
to run things directly, without needing
/bin/sh or root's shell
to be functioning.
(I mostly wound up using this to run
less to read logs and
reboot. If I'd been thinking more carefully, I could
sudo to run an alternate shell as root, which would
have been almost as good as being able to log in directly.)
Another pretty useful thing here is how systemd captured a great deal of the error output from startup services and recorded it in the systemd journal. This gave me the exact error messages, for example, which is at least reassuring to have even if I don't understand what went wrong.
What I don't have here is an exciting story of how I revived a
system despite its
/bin/sh being broken. In the end the problem
went away after I rebooted and then power cycled my workstation.
Based on the symptoms I suspect that a page in RAM got scrambled
somehow (which honestly is a bit unnerving).
As a side note, the most surprising thing that failed to start was
udev trying to run the install command for the sound card drivers
snd_pcm). I suspect that this is used to restore
the sound volume settings to whatever they were the last time the
system was shut down, but I don't know for sure because things
didn't report the exact command being executed or whatever.
(My system has a 90-alsa-restore.rules udev rules file that tries
alsactl. It's not clear to me if udev executes
system(), which would have hit the issue, or in
some more direct way. Maybe it depends on whether the RUN command
seems to have anything that needs interpretation by the shell. I'm
pretty certain that at least some udev
RUN actions succeeded.)
Sidebar: What exactly was wrong
This was on my Fedora 23 office machine, where
/bin/sh is bash, and
bash was failing to start with a message to the effect of:
symbol lookup error: /bin/bash: undefined symbol: rl_unix_line_disc<binary garbage>
Bash does not mention a symbol with that exact name, but it does
want to resolve and use
this is an internal symbol (it's both used and defined in bash);
despite this, looking it up goes via the full dynamic linker symbol
resolution process (as determined with the help of
My guess is that the end of the symbol name was overwritten in RAM
with some garbage and that this probably happened in the Linux
kernel page cache (since it kept reappearing with the same message,
it can't have been in a per-process page).
Assuming I'm reading things correctly, the bytes of garbage are (in hex):
ae 37 d8 5f bf 6b d1 45 3a c0 d9 93 1b 44 12 2d 68 74
less displays this as '
doesn't fully capture it. I had to run a snippet of journalctl's
raw output through 'od -t c -t x1' to get the exact hex.)