The irritation of being told 'everyone who cares uses ECC RAM'

March 11, 2015

One of the hazards of hanging around ZFS circles is hearing, every so often, that everyone who cares about their data uses ECC RAM and if you don't, you clearly don't care (and should take your problem reports and go away). With Rowhammer in the news, this attitude may get a boost from other sources as well. Like other 'if you really care you'll do X' views, this attitude makes me reflexively angry because, fundamentally, it pretends that the world is a simple single-dimensional place.

The reality is that in the current world, picking ECC RAM on anything except server machines is generally a tradeoff. For this we may primarily blame Intel, who have carefully insured that only some of their CPUs and motherboard chipsets support ECC. Although the situation is complex, ever-changing, and hard to decode, it appears that you need either server Xeon CPUs or lower-end desktop CPUs; the current and past middle of the road desktop CPU line (i5 and i7) explicitly do not support ECC. Even with a CPU that supports ECC, you need a chipset and even a motherboard that does, and it's not clear to me what those are and how common they are.

(AMD gets its share of the blame, because apparently maybe not all AMD CPUs, AMD chipsets, and motherboards support it.)

Eliding a bunch of ranting, the upshot is that deciding you must have ECC is not trivial and will almost certainly force you to give up other valuable things in many cases. You'll probably sacrifice some combination of thermal efficiency, system performance, motherboard and system features, and sheer cost in order to get ECC, at least in the desktop space.

(These complications and tradeoffs are why my current desktop machines do not have ECC, although I would love to have it if I could. In fact I have a whole list of desired desktop motherboard features that are probably all more or less mutually exclusive, because desktop choices are suffering.)

For people to say that ECC should be your most important criteria anyways is, well, arrogance; it assumes that the world turns around the single axis of having (or not having) ECC and anything else is secondary. The real world is much more complex than that, especially given that not using ECC does not make your system aggressively dangerous in practice (even with lots of RAM). It follows that saying people who do not use ECC don't actually really care about their data is abrasively arrogant. It is the kind of remark that gets people to give you the middle finger.

It is a great way to make a lot of bug reports go away, though (and a certain amount of people with them).

This applies to pretty much any specific technology, of course. ECC is just the current bugbear (or at least mine).

PS: the corollary to this is that system designs that are actively dangerous or useless without ECC RAM are not broadly useful designs, because plenty of machines do not and will not have ECC RAM any time soon. A 'must have ECC' design is in practice a server only design, and maybe not even then; I don't know if ECC RAM is now actually mandatory on much or all server hardware designs so that, eg, our low-end inexpensive Dell 1Us will all have it.

(I'd like it if they all did, but I don't think we even thought about it when selecting the machines. We did specifically insure and get ECC RAM on our new OmniOS servers, in part because ZFS people keep banging this drum.)

Comments on this page:

By liam at unc edu at 2015-03-11 11:40:44:

What annoys me is those people who say a FS like ZFS is useless if you don't have ECC memory. That negates all the other features that can be big wins which you can still gain using hardware without ECC support.

Goes along with all the 'you can't run ZFS if you don't have XXGB of memory' camp. I run ZFS on a machine with 2GB - and it runs slow compared to a bigger better brighter machine, but it runs fast enough to do the job I need it to do.

By steve at sk2 dot org at 2015-03-16 05:09:23:

There are a few ECC-supporting equivalents of Core i7's, marketed as Xeon E3s. They all support full VT-x/VT-d, ECC, and before it was disabled, TSX; some of them support graphics too. For example, the E3-1246v3 is equivalent to an i7-4771, the E3-1275Lv3 to an i7-4790T, and the E3-1276v3 to an i7-4790. In some cases they are cheaper than the equivalent i7... There are lower-speed variants (down to 3.1GHz for the E3-1220v3) which effectively end up being similar to i5s but with larger caches.

As you point out the toughest problem is finding a motherboard which properly supports ECC. Theoretically speaking Asus and ASRock produce workstation motherboards with ECC support, but there are reports of bugs in various forums and their support doesn't seem to be all that interested in fixing them. Traditional workstation vendors produce appropriate motherboards (Supermicro, Tyan), but they tend to be quite a bit more expensive than desktop motherboards with similar functionality.

I've got good results with Xeon E3s on Supermicro X10SAE and X10SAT motherboards, but that's just one data point.

Written on 11 March 2015.
« Why installing packages is almost always going to be slow (today)
My feelings about GRUB 1 versus GRUB 2 »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Mar 11 00:52:31 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.