Sometimes, firmware updates can be a good thing to do

February 17, 2017

There are probably places that routinely apply firmware updates to every piece of hardware they have. Oh, sure, with a delay and in stages (rushing into new firmware is foolish), but it's always in the schedule. We are not such a place. We have a long history of trying to do as few firmware updates as possible, for the usual reason; usually we don't even consider it unless we can identify a specific issue we're having that new firmware (theoretically) fixes. And if we're having hardware problems, 'update the firmware in the hope that it will fix things' is usually last on our list of troubleshooting steps; we tacitly consider it down around the level of 'maybe rebooting will fix things'.

I mentioned the other day that we've inherited a 16-drive machine with a 3ware controller care. As far as we know, this machine worked fine for the previous owners in a hardware (controller) RAID-6 configuration across all the drives, but we've had real problems getting it stable for us in a JBOD configuration (we much prefer to use software RAID; among other things, we already know how to monitor and manage that with Ubuntu tools). We had system lockups, problems installing Ubuntu, and under load such as trying to scan a 14-disk RAID-6 array, the system would periodically report errors such as:

sd 2:0:0:0: WARNING: (0x06:0x002C): Command (0x2a) timed out, resetting card.

(This isn't even for a disk in the RAID-6 array; sd 2:0:0:0 is one of the mirrored system disks.)

Some Internet searches turned up people saying 'upgrade the firmware'. That felt like a stab in the dark to me, especially if the system had been working okay for the previous owners, but I was getting annoyed with the hardware and the latest firmware release notes did talk about some other things we might want (like support for disks over 2 TB). So I figured out how to do a firmware update and applied the 'latest' firmware (which for our controller dates from 2012).

(Unsurprisingly the controller's original firmware was significantly out of date.)

I can't say that the firmware update has definitely fixed our problems with the controller, but the omens are good so far. I've been hammering on the system for more than 12 hours without a single problem report or hiccup, which is far better than it ever managed before, and some things that had been problems before seem to work fine now.

All of this goes to show that sometimes my reflexive caution about firmware updates is misplaced. I don't think I'm ready to apply all available firmware updates before something goes into production, not even long-standing ones, but I'm certainly now more ready to consider them than I was before (in cases where there's no clear reason to do so). Perhaps I should be willing to consider firmware updates as a reasonably early troubleshooting step if I'm dealing with otherwise mysterious failures.

Written on 17 February 2017.
« Waiting for a specific wall-clock time in Unix
robots.txt is a hint and a social contract between sites and web spiders »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Feb 17 01:27:00 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.