I hate hardware (Dell 2950 edition)

February 16, 2007

The Dell 2950 is a decent 2U server that has recently started being popular around here; like most recent servers, it has dispensed with PS/2 connectors and only has USB for the keyboard and mouse. It has four USB connectors, two on the front and two on the back.

If you plug your keyboard into the back USB connectors, the Ubuntu 6.06.1 x86_64 server kernel hangs with 'BUG: soft lockup detected on CPU#0!'. (Sometimes it recovers from this hang, sometimes not.)

If you plug your keyboard into the front USB connectors, everything works great.

I love hardware. I really do.

(Don't ask how long it took to find out just why our 2950s were locking up when we tried to install Ubuntu, or how close we came to pitching Ubuntu out a window, as other distributions had kernels that worked fine.)

Sidebar: where this error message comes from

This message comes from kernel/softlockup.c and gets generated if the CPU's kernel watchdog thread hasn't run in ten seconds. (Kernel watchdog threads are the 'watchdog/<N>' processes in ps output.)

This means that some bit of code has locked out scheduling for those ten seconds. Usually this means both that the code has a bug and that it has run into some sort of hardware problem it wasn't expecting, since people writing kernel code rarely knowingly allow ten second stalls.

An inspection of what stack backtrace I can recover from the system is not particularly revealing as to what driver or bit of hardware might be at fault. It runs more or less:

atkbd_connect → atkbd_activate → i8042_interrupt → ps2_command → ps2_sendbyte → i8042_kbd_write → _spin_unlock_irqrestore → idr_get_new <interrupt>

(Transcribed by hand and accuracy not guaranteed, especially as I have omitted the offsets, which means that the code might actually be in a nearby static function or something. The Ubuntu 6.06.1 server install kernel is some version of 2.6.15, and there have been fixes to lib/idr.c since then.)


Comments on this page:

From 67.181.30.74 at 2007-02-16 22:56:03:

Most likely, BIOS is the culprit. These days the i8042 is often emulated through USB, and this is a very hefty chunk of SMM BIOS code. If it takes your CPU for extended periods, soft watchdog can fire.

-- Pete

By cks at 2007-02-17 01:26:18:

If the real culprit is the BIOS, I wonder what the other kernels did differently to make it work. I suppose it might be as simple as poking the 'i8042 controller' in a slightly different way.

(The whole SMM BIOS stuff makes me twitch even from what little I know about it.)

By cks at 2007-03-07 11:25:55:

For more amusement, it turns out that Dell Poweredge 860s have the reverse problem: the front USB ports cause the Ubuntu install to crash, but the rear USB ports are fine.

Written on 16 February 2007.
« QOTD: There are three types of authentication
Programming fun »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Feb 16 18:51:10 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.