2012-07-16
Getting an Ubuntu 12.04 machine to give you boot messages
As part of a slow move towards Ubuntu 12.04, we recently worked on the problem that our 12.04 servers were pretty much not showing boot messages and in particular they weren't showing any kernel messages. Not showing boot messages is a big issue for servers because if anything ever stalls or goes wrong in the boot process you wind up basically up the creek without boot messages; you have a hung server and no clue what's wrong.
(Since I've gone through this with a 12.04 server that was hanging during boot, I can tell you that various bits of magic SysRq are basically no help these days.)
The main changes we need to make are to /etc/default/grub
, which
magically controls the behavior of Grub2. We needed to make two main
changes:
- change
GRUB_CMDLINE_LINUX_DEFAULT
to delete 'quiet splash
'. On 12.04 servers without a serial console, we leave this blank. - uncomment the '
GRUB_TERMINAL=console
' line. Without this change the console stays blank for a while and only the later boot messages show.(I don't understand why this is necessary; my best understanding of the Grub2 documentation is that 'console' should be the default.)
We've also changed GRUB_TIMEOUT
to 5 (seconds) and commented out
GRUB_HIDDEN_TIMEOUT
and GRUB_HIDDEN_TIMEOUT_QUIET
. This
causes the Grub2 menu to always show for five seconds, which I find
much more useful than the default behavior of having to hold down
Shift at exactly the right time in order to get the menu to show.
(I understand why a desktop install wants to hide the Grub menu by default, but this is the wrong behavior for a server.)
Remember that after you change /etc/default/grub
you have to run
update-grub
to get the change to take. Forgetting this step can
make you very puzzled and frustrated during testing (I speak from
sad experience).
(This is where I could insert a rant about the huge mess of complexity that is Grub2. I do not consider having a programming language for Grub menus to exactly be progress, especially not when they become opaque and have to be machine generated.)
The remaining change is to /etc/init/tty1.conf
. By default the virtual
console logins clear the screen when they start; on tty1, this has the
effect of erasing the last screen's worth of boot-time messages. To
tell getty
not to do this, we add --noclear
to the exec
line:
exec /sbin/getty --noclear -8 38400 tty1
Unfortunately the result of all of these changes isn't exactly
perfect. We get kernel messages and now avoid wiping out what messages
Upstart prints about starting user-level servers, but the 12.04 Upstart
configuration doesn't print very many messages about that. I believe
that only the remaining /etc/init.d
scripts really produce boot
time messages and there are an ever decreasing number of them; native
/etc/init
things don't seem to print much or any messages.
(There are ways to coax Upstart into logging messages about services, but I haven't found one that causes it to print 'starting <blah>' and "done starting <blah>' on the console during boot.)
Things that don't work to produce more verbose boot messages
I've experimented with a number of options and arguments that seem like they should help but in practice don't. All of these are supplied on the kernel command line:
debug=vc
(from theinitramfs-tools
manpage): This prints relatively verbose debugging information from the/init
script in the initial ramdisk. Unfortunately our problems have always been after this point, once the initial ramdisk had handed things over to the real Upstart init.(It is useful to verify that the Upstart init is being started with your debugging options, though.)
--verbose
(from theupstart
manpage): In theory this makes Upstart be verbose. In practice, I haven't been able to get this to print useful messages to the console so that you can see what services are being started when (so you can, say, identify which service is causing your boot to hang).- '
--default-console output
' (from theupstart
manpage combined withinit(5)
): My memory is that this dumps output (if any) from the actual commands being run to the console but still doesn't tell you which services are starting. If the problem command is hanging silently, you're no better off than before.
(For reasons kind of described in my entry on the kernel command
line, --default-console
can't be written with an =
in the way that
the upstart
manpage shows it. Fortunately Upstart uses standard GNU
argument processing so we can write it with a space instead.)
Sidebar: what caused our Ubuntu 12.04 machines to hang on boot
It turns out that our 12.04 servers will stall during boot if a
filesystem listed in /etc/fstab
is not present. This happens even
if the filesystem is marked noauto
. It's possible that this stall
eventually times out; if this is the case, the timeout duration is much
longer than we're willing to wait for.
As best as I can determine, this behavior is not directly caused
by anything in /etc/init
and thus is not easy for us to change.
No, we are not happy about this. This might be vaguely excusable for
regular filesystems; it's inexcusable for noauto
filesystems.
My arrogance about Unicode and character encodings
Yesterday I described how I could get away with ignoring encoding issues and thus how forced Unicode was and is irritating. However there is a gotcha in my approach, one that hides behind a bit of arrogance. Let me repeat the core bit of how my programs typically work:
What they process is a mixture of ASCII (for keywords, directives, and so on, all of the things the program had to interpret) and uninterpreted bytestrings, which are simply regurgitated to the user as-is in appropriate situations.
This simple, reasonable description contains an assumption: this approach assumes that any encoding will be a superset of ASCII, because it assumes that code can extract plain text ASCII from a file without knowing the file's encoding. This works if and only if the file's actual encoding is implemented as ASCII plus other stuff hiding around the edges, which is true for many encodings including UTF-8 but not for all of them.
This is the arrogance of my blithe approach to ignoring character encoding issues. It assumes either that all character sets are a superset of ASCII or that any exceptions are sufficiently uncommon that I don't have to care about them. Of course, by assuming that my programs will never be used by people with such character sets I've insured that they never will be.
The conclusion that I draw from this is I can't ignore character encoding unless I'm willing to be somewhat arrogant. The pain of dealing with decoding and encoding issues is simply the price of not being arrogant.
(On the other hand it's still very tempting to be arrogant this way, for reasons that boil down to 'I can get away with it because the environments where it matters are probably quite rare, and it's much easier'.)