Disk drives can have weird SMART values for their power on hours
Disk drive SMART attributes famously have meanings that are relatively drive and maker specific, which can make for fun misinterpretations if you don't have an up to date database of what they mean. However, some SMART attributes are old enough and well established enough that their meaning is pretty well established. One of those is SMART attribute ID 9, "Power on hours" (sometimes it has more precise units). For reasons beyond the scope of this entry I was recently looking at this across our fleet's disks, and stumbled over something.
# smartctl -A /dev/sda ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE [...] 9 Power On Hours and Msec 0x0032 000 000 000 Old_age Always - 931146h+18m+05.460s
This made me go "what", because if you translate that to more useful time units it's over a hundred years.
We have five drives that report these extremely high numbers; all
of them are Intel 520 series SSDs, although in several capacities
ranging from 60 GB to 240 GB. As far as I can tell, we don't have
any other Intel 520 series SSDs that have sensible values for this
attribute, or at least where smartctl
reports a sensible value.
One possible explanation is that smartctl
has the wrong data on
what units this attribute value is in and how to decode it. The
smartctl drive database lists this as 'msec24hour32' for this drive
series, and reporting it as a raw 48-bit hex value has it as
0xf82e000e354a (interpreted in little endian byte order if I'm
reading the smartctl manpage correctly, or '79 152 0 14 53 74' in
the 'raw8' format of base-10 bytes).
We have a number of drives that report low power on hour values although I know they've been in service for much longer than the reported value. This happens with a number of different drive models, including at least one HD, and most drives of each model have plausible values; only a few have weird ones. My obvious guess is that the attribute has rolled over in the drive's firmware in some way, especially as other SMART attributes (such as 'LBAs written') have plausibly large values. This doesn't seem to be a matter of SMART mis-interpreting the raw data; one drive that claims an implausible 56 hours of power on hours also has raw byte values for this attribute of '0 0 0 0 0 56' (base 10, in smartctl's 'raw8' output format).
All of this is yet another lesson that various SMART things are not necessarily as useful as you might hope (like SMART Threshold numbers). We clearly can't use SMART's 'power on hours' attribute to find all of our very old drives or to confidently identify very new ones. Either we need to use SMART attributes that are less regular across drive models (like the various attributes for drive read and write volumes), or we need to use additional contextual knowledge such as 'this machine has been there for years without a drive change', or probably both.
|
|