An interesting hardware mystery

April 6, 2009

As I've written about, I have a problem machine. I have recently had the opportunity to stop using it as my office workstation and thus do some systematic testing on it, which has turned up some interesting yet mysterious results.

(It's remarkable but perhaps not surprising how much my mood has improved from getting a stable primary workstations and setting up these tests.)

The short summary is that the machine reliably crashes if I do significant disk activity and I am not running something that burns up CPU. (To be technical, I have only tested running the distributed.net client.)

The machine survives a whole bunch of tests; so far I have tried memtest86+, simply leaving it sitting idle, the distributed.net client ('dnetc'), dnetc plus continuous full speed bidirectional network traffic, dnetc plus lots of NFS activity, and dnetc plus repeatedly running bonnie++ and compiling the kernel. However, running just bonnie++ (with or without compiling the kernel) will kill the machine in short order. The most striking test I have done is to start dnetc, start bonnie++ and the kernel compile cycle, and then after a while kill the dnetc processes; the machine consistently panics within minutes.

(Since I normally run the distributed.net client on my machine but stopped after the Fedora 10 upgrade, this means that I may have had the hardware problem for quite some time without realizing it.)

All of this adds up to a puzzle: what bit of hardware is broken and needs to be replaced? If the failure mode was simpler there would probably be a clear likely suspect, but as it is I'm left scratching my head.

Sidebar: hardware details

The machine has an Asus M2N4-SLI, 2 GB of RAM, and I believe an AMD X2 4600+. It currently has a single SATA drive, but had two earlier (and the two drives are fine; they are running in my current office workstation). The graphics card is likely to be irrelevant, since this has happened with both an ATI X300 and now an nVidia of some description (running the open source drivers).

Written on 06 April 2009.
« Why I don't expect ARM-based netbooks to be a success
The technical problems with 'sender stores messages' schemes »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Apr 6 01:12:29 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.