One reason to not trust SMART attribute data for consumer drives

January 17, 2021

In theory, disk drive SMART attributes should give us valuable information on how our disk drives are doing, how many problems they've already experienced, and how likely they are to fail (or how close they are to failure). In practice, there has always been a significant view among sysadmins (and other people) that consumer drives understate SMART attributes or flat out lie, and their failure data is often not trustworthy (although sometimes it's actually useful).

(More neutral and informational SMART attributes like the drive temperature and the number of power on hours and power failures is more commonly seen as trustworthy.)

On the one hand, this seems wrong and perhaps crazy, since the whole purpose of SMART attributes is to provide information on the health of the drive. On the other hand, it's not too hard to see pressures pushing consumer drive vendors in this direction. The reality of life is that a certain number of people who buy their drives will look at the SMART data, see something alarming, and decide to try to return the drive as 'failing' or 'failed'. The more honest that drives are about failure data, the more such people there will be. Even if drive vendors don't accept the returns, merely dealing with them consumes people's time and thus runs up your customer support expenses. It also causes your customers to be unhappy with you, since you're refusing to replace drives that the customer thinks are 'bad'.

(It's no good to say that only sophisticated buyers will look at SMART data, because the reality of life is that any number of helpful people are going to make 'analyze your drive's health through its SMART attributes' applications. These people will have varied views of what is alarming in SMART attributes, which guarantees that some of the programs will be unhappy about drives that the drive vendor considers at least 'not something we'll replace'.)

My perception is that this is more likely to happen with consumer drives, which are bought by a wide variety of people and usually in small quantities by each one, rather than 'enterprise' drives, which tend to at least cost more and are often bought in larger quantities by the purchasers. On the other hand, large organizations with a lot of (enterprise) drives are perhaps more likely to keep a close eye on their drives, develop predictive models, and replace them before they fail outright, including wanting to return them to the drive maker for free replacements.

This sort of pressure is not really present for SMART attributes that are more neutral, such as drive temperature or power on hours, or that are explicitly excluded from warranty replacement, like the amount of data written to the drive. You may want to replace your SSD when the SMART attribute for that gets high, but the drive vendor will not give you a replacement for free, unlike if the drive is 'bad' (and still within warranty).

Written on 17 January 2021.
« SMART attributes can predict SSD failures under the right circumstances
Password managers automate checking the website address for you »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jan 17 01:07:49 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.