Chris's Wiki :: blog/tech/SSDDeathDisturbing Commentshttps://utcc.utoronto.ca/~cks/space/blog/tech/SSDDeathDisturbing?atomcommentsDWiki2018-12-12T14:42:42ZRecent comments in Chris's Wiki :: blog/tech/SSDDeathDisturbing.By A grumpy ex-sysadmin on /blog/tech/SSDDeathDisturbingtag:CSpace:blog/tech/SSDDeathDisturbing:e6f80bfb4c47196ee9fd9074ed77db2cb759834eA grumpy ex-sysadmin<div class="wikitext"><p>I use the same workaround to make SSDs reliable that I used for mechanical HDs: Use software RAID (firmware RAID is opaque as it's not free software), and try to only have one of each make and model of device in each array.</p>
<p>You can't predict defects but you can reduce the chance of multiple simultaneous failures crashing an important system by not having a storage monoculture.</p>
<p>On the topic of opaque SSD issues, intel's powertop recommends settings which can cause intel SSDs in laptops with intel chipsets, to hang. The kernel repeatedly resets the SATA link for reasons seemingly unrelated to power management.</p>
<p>Specifically, powertop considers any setting for "link_power_management_policy" other than "min_power" to be "Bad". But, "min_power" is the setting that causes the problems. In 4.19 the default can be changed in .config, with CONFIG_SATA_MOBILE_LPM_POLICY=3 or "med_power_with_dipm" being the best setting for my laptops.</p>
</div>2018-12-12T14:42:42ZBy Jonas on /blog/tech/SSDDeathDisturbingtag:CSpace:blog/tech/SSDDeathDisturbing:dbff630c096949a75fc4961fa8906f2c9e66f106Jonashttps://github.com/galvez<div class="wikitext"><p>If I was running a data center, I would invest in electromagnetic shielding. IMHO solar activity accounts for a lot of electronic damage that get dismissed as random accidents.</p>
</div>2018-12-12T02:20:27ZBy Jonas on /blog/tech/SSDDeathDisturbingtag:CSpace:blog/tech/SSDDeathDisturbing:7c87a0bac6ecbef8fcb984686d73ac0bdcd7e445Jonashttps://github.com/galvez<div class="wikitext"><p>My guess? Go to <a href="https://www.swpc.noaa.gov/">https://www.swpc.noaa.gov/</a> and check the planetary k-index on December 10 :)</p>
</div>2018-12-12T02:13:38ZBy Ryan P on /blog/tech/SSDDeathDisturbingtag:CSpace:blog/tech/SSDDeathDisturbing:de3b170e62ffee84e0bb0a08c7f575294c29ecd9Ryan P<div class="wikitext"><p>Apparently SpinRite from www.grc.com can sometimes resurrect SSDs also.</p>
</div>2018-12-12T01:34:53ZBy Shawn M. on /blog/tech/SSDDeathDisturbingtag:CSpace:blog/tech/SSDDeathDisturbing:53fcff0de12b67d1ecefac780e7617fd6617932dShawn M.<div class="wikitext"><p>Unfortunately, SSDs are a real, physical object and fail. :(</p>
<p>I know that sounds glib, but physical objects fail. CPUs and RAM fail when they physically look fine; a microscopic investigation MIGHT yield the cause, but the effort outweighs the knowledge.</p>
<p>The value in this event in the reminder of the value and complexity of redundancy: Keep backups. RAID for uptime is a good idea. Source good components.</p>
<p>It sounds like you already do the last one. As an example, early SSDs were event less reliable, and I had an early run of OCZ SSDs fail - in a handful of consumer desktops and laptops, 5 over the course of the year.</p>
<p>I learned the same lessons then. And, to be fair, I can still always come up with an external circumstance for my data to be destroyed. All I can do is backup, failover, and monitor. (And make it stupidly easy to do those things, so I keep doing them!)</p>
</div>2018-12-11T21:08:02ZBy Greg A. Woods on /blog/tech/SSDDeathDisturbingtag:CSpace:blog/tech/SSDDeathDisturbing:6c4b45d377118410af12872cacc95e9b6fbf3331Greg A. Woods<div class="wikitext"><p>Once upon a time it seemed to me as if the controller on a hard drive was almost more likely to fail than the mechanical bits (once head crashes were less of a likelihood). Indeed the drive would get spots where the rust was unreliable for whatever reason, but of course that doesn't cause a full-on failure -- just a bad sector. However a bad capacitor or weak diode somewhere in the digital logic could cause what appeared to be a complete failure of the drive, but by simply replacing the controller board, the drive could continue to be used without even any data loss. I rebuilt more than one MFM drive by just swapping the controller boards from drives with crashed heads.</p>
<p>Now that the whole drive is electronics through and through, failures are perhaps more apt to look like controller failures, even if strictly speaking it's not the actual cause, but rather just a symptom, e.g. the controller locked up because it can't talk to one of the memory chips any more, and the controller firmware author didn't take that particular failure mode into consideration. Controller firmware shouldn't fail badly, but there's a lot of firmware in all modern drives, spinning rust or solid state, and I wouldn't be surprised if it wasn't more buggy now than ever before.</p>
</div>2018-12-11T21:00:53ZBy dgilbert on /blog/tech/SSDDeathDisturbingtag:CSpace:blog/tech/SSDDeathDisturbing:e2efeb912da21aaa1a26cf4237848ba8d5009a33dgilberthttp://blog.daveg.ca/<div class="wikitext"><p>Reading more... I strongly suspect it's controller failure. A number of cheap spinning drives in my ghetto home array have died suddenly with no SMART warning. Some seem to be unhappy with a SAS controller (rather than SATA ... repurposing them <code>seems</code> to be fine)... Others just seem to die.</p>
<p>Bad cold solder joints? Excess heat somewhere bad?</p>
</div>2018-12-11T20:34:02ZBy dgilbert on /blog/tech/SSDDeathDisturbingtag:CSpace:blog/tech/SSDDeathDisturbing:b086f6e0aabdfa25b48a5b7c2b94e74f8f2aa759dgilberthttps://blog.daveg.ca/<div class="wikitext"><p>Congrats on /.</p>
</div>2018-12-11T20:30:53ZBy BostonEnginerd on /blog/tech/SSDDeathDisturbingtag:CSpace:blog/tech/SSDDeathDisturbing:487698c5d5fd7a61908b06e846e46d1687dd692fBostonEnginerdhttps://www.bostonenginerd.com<div class="wikitext"><p>If you saw how the sausage was made, you would be a bit horrified. Solid State drives work by trapping little bits of charge inside silicon nitride / silicon dioxide layers over a transistor. As the devices have scaled smaller and have now gone 3D, manufacturers have to use all sorts of tricks to make the devices work -- data whitening so there is no local bias in charge, error corrections, extra bits, etc. The actual data stored on the drive looks like noise, and is extracted through the magic of math. </p>
<p>It's amazing that they work at all!</p>
</div>2018-12-11T18:34:33ZBy Chip Overclock on /blog/tech/SSDDeathDisturbingtag:CSpace:blog/tech/SSDDeathDisturbing:8025901ee0806bfbaede1d0475a016679bc0ed76Chip Overclockhttps://chipoverclock.com<div class="wikitext"><p>SSDs are indeed inscrutable. Worse: they are autonomous devices that do stuff behind the scenes no matter what they're told (or not told) to do by your drive controller. I admit, I use plenty of them, in pretty much every laptop, desktop, and server I own. But I'll never entirely trust them. I've been writing about this (and related issues) in my own blog: <a href="https://coverclock.blogspot.com/search/label/Storage">https://coverclock.blogspot.com/search/label/Storage</a> .</p>
</div>2018-12-11T18:33:24Z