Wandering Thoughts archives

2022-02-02

Disk drives can have weird SMART values for their power on hours

Disk drive SMART attributes famously have meanings that are relatively drive and maker specific, which can make for fun misinterpretations if you don't have an up to date database of what they mean. However, some SMART attributes are old enough and well established enough that their meaning is pretty well established. One of those is SMART attribute ID 9, "Power on hours" (sometimes it has more precise units). For reasons beyond the scope of this entry I was recently looking at this across our fleet's disks, and stumbled over something.

# smartctl -A /dev/sda
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
[...]
  9 Power On Hours and Msec 0x0032   000   000   000    Old_age   Always       -       931146h+18m+05.460s

This made me go "what", because if you translate that to more useful time units it's over a hundred years.

We have five drives that report these extremely high numbers; all of them are Intel 520 series SSDs, although in several capacities ranging from 60 GB to 240 GB. As far as I can tell, we don't have any other Intel 520 series SSDs that have sensible values for this attribute, or at least where smartctl reports a sensible value.

One possible explanation is that smartctl has the wrong data on what units this attribute value is in and how to decode it. The smartctl drive database lists this as 'msec24hour32' for this drive series, and reporting it as a raw 48-bit hex value has it as 0xf82e000e354a (interpreted in little endian byte order if I'm reading the smartctl manpage correctly, or '79 152 0 14 53 74' in the 'raw8' format of base-10 bytes).

We have a number of drives that report low power on hour values although I know they've been in service for much longer than the reported value. This happens with a number of different drive models, including at least one HD, and most drives of each model have plausible values; only a few have weird ones. My obvious guess is that the attribute has rolled over in the drive's firmware in some way, especially as other SMART attributes (such as 'LBAs written') have plausibly large values. This doesn't seem to be a matter of SMART mis-interpreting the raw data; one drive that claims an implausible 56 hours of power on hours also has raw byte values for this attribute of '0 0 0 0 0 56' (base 10, in smartctl's 'raw8' output format).

All of this is yet another lesson that various SMART things are not necessarily as useful as you might hope (like SMART Threshold numbers). We clearly can't use SMART's 'power on hours' attribute to find all of our very old drives or to confidently identify very new ones. Either we need to use SMART attributes that are less regular across drive models (like the various attributes for drive read and write volumes), or we need to use additional contextual knowledge such as 'this machine has been there for years without a drive change', or probably both.

tech/SMARTWeirdPowerOnHours written at 23:08:37; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.