== What happens when a modern Linux system boots without _/bin/sh_ So [[this happened to me https://twitter.com/thatcks/status/705808960717193216]]: > It turns out that a whole lot of things explode when your system boots > up with /bin/sh not working for some mysterious reason. Here the mysterious reason was that there was an unresolved dynamic library symbol, so any attempt to run _/bin/sh_ or _/bin/bash_ died with an error message from [[the ELF interpreter HowProgramsExecute]]. The big surprise for me was just how far my systemd-based Fedora 23 machine managed to get despite this handicap. I certainly saw a cascade of unit failures in the startup messages so I knew that something bad had happened, but the first inkling I had of just how bad it was came when I tried to log in as _root_ on the (text) console and the system just dumped me back at the _login:_ prompt. Most of the system services had managed to start because their systemd _.service_ files did not need _/bin/sh_ to run; only a few things (some of them surprising) had failed, although a dependency chain for one of them wound up blocking the local resolving DNS server from starting. The unpleasant surprise was how much depends on _/bin/sh_ and _/bin/bash_ working. I was able to log in as myself because I use [[a different shell ../sysadmin/NonstandardShellAdvantage]], but obviously _root_ was inaccessible, my own environment relies on a certain amount of shell scripts to be really functional, and a surprising number of standard binaries are shell scripts these days (_/usr/bin/fgrep_, for example). In the end I got somewhat lucky in that my regular account had _sudo_ access and _sudo_ can be used to run things directly, without needing _/bin/sh_ or root's shell to be functioning. (I mostly wound up using this to run _less_ to read logs and eventually _reboot_. If I'd been thinking more carefully, I could have used _sudo_ to run an alternate shell as root, which would have been almost as good as being able to log in directly.) Another pretty useful thing here is how systemd captured a great deal of the error output from startup services and recorded it in the systemd journal. This gave me the exact error messages, for example, which is at least reassuring to have even if I don't understand what went wrong. What I don't have here is an exciting story of how I revived a system despite its _/bin/sh_ being broken. In the end the problem went away after I rebooted and then power cycled my workstation. Based on the symptoms I suspect that a page in RAM got scrambled somehow (which honestly is a bit unnerving). As a side note, the most surprising thing that failed to start was udev trying to run the install command for the sound card drivers (specifically _``snd_pcm''_). I suspect that this is used to restore the sound volume settings to whatever they were the last time the system was shut down, but I don't know for sure because things didn't report the exact command being executed or whatever. (My system has a 90-alsa-restore.rules udev rules file that tries to run _alsactl_. It's not clear to me if udev executes ((RUN+=)) commands via _system()_, which would have hit the issue, or in some more direct way. Maybe it depends on whether the RUN command seems to have anything that needs interpretation by the shell. I'm pretty certain that at least some udev _RUN_ actions succeeded.) === Sidebar: What exactly was wrong This was on my Fedora 23 office machine, where _/bin/sh_ is bash, and bash was failing to start with a message to the effect of: .pn prewrap on symbol lookup error: /bin/bash: undefined symbol: rl_unix_line_disc Bash does not mention a symbol with that exact name, but it does want to resolve and use ((rl_unix_line_discard)). Interestingly, this is an internal symbol (it's both used and defined in bash); despite this, looking it up goes via the full dynamic linker symbol resolution process (as determined with the help of _``LD_DEBUG''_). My guess is that the end of the symbol name was overwritten in RAM with some garbage and that this probably happened in the Linux kernel page cache (since it kept reappearing with the same message, it can't have been in a per-process page). Assuming I'm reading things correctly, the bytes of garbage are (in hex): > ae 37 d8 5f bf 6b d1 45 3a c0 d9 93 1b 44 12 2d 68 74 (_less_ displays this as '((7_kE:ٓ^R-ht))', which doesn't fully capture it. I had to run a snippet of journalctl's raw output through 'od -t c -t x1' to get the exact hex.)