Getting an Ubuntu 12.04 machine to give you boot messages

July 16, 2012

As part of a slow move towards Ubuntu 12.04, we recently worked on the problem that our 12.04 servers were pretty much not showing boot messages and in particular they weren't showing any kernel messages. Not showing boot messages is a big issue for servers because if anything ever stalls or goes wrong in the boot process you wind up basically up the creek without boot messages; you have a hung server and no clue what's wrong.

(Since I've gone through this with a 12.04 server that was hanging during boot, I can tell you that various bits of magic SysRq are basically no help these days.)

The main changes we need to make are to /etc/default/grub, which magically controls the behavior of Grub2. We needed to make two main changes:

  • change GRUB_CMDLINE_LINUX_DEFAULT to delete 'quiet splash'. On 12.04 servers without a serial console, we leave this blank.
  • uncomment the 'GRUB_TERMINAL=console' line. Without this change the console stays blank for a while and only the later boot messages show.

    (I don't understand why this is necessary; my best understanding of the Grub2 documentation is that 'console' should be the default.)

We've also changed GRUB_TIMEOUT to 5 (seconds) and commented out GRUB_HIDDEN_TIMEOUT and GRUB_HIDDEN_TIMEOUT_QUIET. This causes the Grub2 menu to always show for five seconds, which I find much more useful than the default behavior of having to hold down Shift at exactly the right time in order to get the menu to show.

(I understand why a desktop install wants to hide the Grub menu by default, but this is the wrong behavior for a server.)

Remember that after you change /etc/default/grub you have to run update-grub to get the change to take. Forgetting this step can make you very puzzled and frustrated during testing (I speak from sad experience).

(This is where I could insert a rant about the huge mess of complexity that is Grub2. I do not consider having a programming language for Grub menus to exactly be progress, especially not when they become opaque and have to be machine generated.)

The remaining change is to /etc/init/tty1.conf. By default the virtual console logins clear the screen when they start; on tty1, this has the effect of erasing the last screen's worth of boot-time messages. To tell getty not to do this, we add --noclear to the exec line:

exec /sbin/getty --noclear -8 38400 tty1

Unfortunately the result of all of these changes isn't exactly perfect. We get kernel messages and now avoid wiping out what messages Upstart prints about starting user-level servers, but the 12.04 Upstart configuration doesn't print very many messages about that. I believe that only the remaining /etc/init.d scripts really produce boot time messages and there are an ever decreasing number of them; native /etc/init things don't seem to print much or any messages.

(There are ways to coax Upstart into logging messages about services, but I haven't found one that causes it to print 'starting <blah>' and "done starting <blah>' on the console during boot.)

Things that don't work to produce more verbose boot messages

I've experimented with a number of options and arguments that seem like they should help but in practice don't. All of these are supplied on the kernel command line:

  • debug=vc (from the initramfs-tools manpage): This prints relatively verbose debugging information from the /init script in the initial ramdisk. Unfortunately our problems have always been after this point, once the initial ramdisk had handed things over to the real Upstart init.

    (It is useful to verify that the Upstart init is being started with your debugging options, though.)

  • --verbose (from the upstart manpage): In theory this makes Upstart be verbose. In practice, I haven't been able to get this to print useful messages to the console so that you can see what services are being started when (so you can, say, identify which service is causing your boot to hang).

  • '--default-console output' (from the upstart manpage combined with init(5)): My memory is that this dumps output (if any) from the actual commands being run to the console but still doesn't tell you which services are starting. If the problem command is hanging silently, you're no better off than before.

(For reasons kind of described in my entry on the kernel command line, --default-console can't be written with an = in the way that the upstart manpage shows it. Fortunately Upstart uses standard GNU argument processing so we can write it with a space instead.)

Sidebar: what caused our Ubuntu 12.04 machines to hang on boot

It turns out that our 12.04 servers will stall during boot if a filesystem listed in /etc/fstab is not present. This happens even if the filesystem is marked noauto. It's possible that this stall eventually times out; if this is the case, the timeout duration is much longer than we're willing to wait for.

As best as I can determine, this behavior is not directly caused by anything in /etc/init and thus is not easy for us to change.

No, we are not happy about this. This might be vaguely excusable for regular filesystems; it's inexcusable for noauto filesystems.

Written on 16 July 2012.
« My arrogance about Unicode and character encodings
Strings in Python 2 and Python 3 »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Jul 16 22:36:23 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.