Wandering Thoughts archives

2015-03-11

The irritation of being told 'everyone who cares uses ECC RAM'

One of the hazards of hanging around ZFS circles is hearing, every so often, that everyone who cares about their data uses ECC RAM and if you don't, you clearly don't care (and should take your problem reports and go away). With Rowhammer in the news, this attitude may get a boost from other sources as well. Like other 'if you really care you'll do X' views, this attitude makes me reflexively angry because, fundamentally, it pretends that the world is a simple single-dimensional place.

The reality is that in the current world, picking ECC RAM on anything except server machines is generally a tradeoff. For this we may primarily blame Intel, who have carefully insured that only some of their CPUs and motherboard chipsets support ECC. Although the situation is complex, ever-changing, and hard to decode, it appears that you need either server Xeon CPUs or lower-end desktop CPUs; the current and past middle of the road desktop CPU line (i5 and i7) explicitly do not support ECC. Even with a CPU that supports ECC, you need a chipset and even a motherboard that does, and it's not clear to me what those are and how common they are.

(AMD gets its share of the blame, because apparently maybe not all AMD CPUs, AMD chipsets, and motherboards support it.)

Eliding a bunch of ranting, the upshot is that deciding you must have ECC is not trivial and will almost certainly force you to give up other valuable things in many cases. You'll probably sacrifice some combination of thermal efficiency, system performance, motherboard and system features, and sheer cost in order to get ECC, at least in the desktop space.

(These complications and tradeoffs are why my current desktop machines do not have ECC, although I would love to have it if I could. In fact I have a whole list of desired desktop motherboard features that are probably all more or less mutually exclusive, because desktop choices are suffering.)

For people to say that ECC should be your most important criteria anyways is, well, arrogance; it assumes that the world turns around the single axis of having (or not having) ECC and anything else is secondary. The real world is much more complex than that, especially given that not using ECC does not make your system aggressively dangerous in practice (even with lots of RAM). It follows that saying people who do not use ECC don't actually really care about their data is abrasively arrogant. It is the kind of remark that gets people to give you the middle finger.

It is a great way to make a lot of bug reports go away, though (and a certain amount of people with them).

This applies to pretty much any specific technology, of course. ECC is just the current bugbear (or at least mine).

PS: the corollary to this is that system designs that are actively dangerous or useless without ECC RAM are not broadly useful designs, because plenty of machines do not and will not have ECC RAM any time soon. A 'must have ECC' design is in practice a server only design, and maybe not even then; I don't know if ECC RAM is now actually mandatory on much or all server hardware designs so that, eg, our low-end inexpensive Dell 1Us will all have it.

(I'd like it if they all did, but I don't think we even thought about it when selecting the machines. We did specifically insure and get ECC RAM on our new OmniOS servers, in part because ZFS people keep banging this drum.)

tech/UseECCIrritation written at 00:52:31; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.