Decent Whitepaper on ECC Memory

kylef

Golden Member
Jan 25, 2000
1,430
0
0
Here is a fairly thorough whitepaper on ECC memory and why it's useful:
Corsair ECC Whitepaper

However, one thing the paper doesn't mention is the performance penalty associated with running the Hamming code ECC algorithm. Presumably, much of the checking can be done in parallel while accessing the memory itself; I'm guessing that should an error be detected, the read could probably be invalidated by the memory controller before it's actually "used" by the CPU and the appropriate interrupt thrown... But as I said, I'm only guessing.

I'm just curious if anyone knows the ACTUAL performance hit involved, or knows of a link that purports to explain the like.

Kyle
 

kylef

Golden Member
Jan 25, 2000
1,430
0
0
I guess no one knows about ECC Performance then :(

Well, I'm off to bed! Have fun tweaking, ladies and gentlemen!

Kyle
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
It is a good paper.

I like this quote:
"Data from varying sources shows that systems with memories on the order of 256 Mbytes will experience single bit soft errors in DRAM at a rate somewhere between once a month and twice a year."

This could be read two different ways: 'that the data from sources indicates that an error will statistically occur sometime between one month and six months', or 'that data from sources can't seem to agree at all on how often it occurs, and some say once a month while others say once in six months'. I think the casual reader would think that the first case is what they meant to say, but the truth is that the second interpretation is correct. Micron is the once in six months, and IBM is the once per month and everyone else is somewhere in between these two. The article also doesn't mention the effect that altitude has on these estimates; that increasing altitude to the elevation of a mile above sea level (ie. Denver) increases the odds by 10x.

As far as the performance hit, there is a thread about this somewhere on here where a user named "ruckb" (a memory design engineer) and myself discuss this issue and a bunch of people did benchmarks with ECC enabled and disabled using the SiSoft Sandra Memory benchmark. I have also participated in other threads on other BBS's that look into this. The performance penalty occurs in the fairly rare "read-after-modified write" case. This is a read to a memory location that was just written to in the previous cycle. There is a latency hit in this case of two or three cycles (I can't remember) in this case. The performance penalty was approx. 4-5% in the Sandra benchmark but this resulted in a neglible performance hit in real world benchmarks (like, 3DMark 2000). Quake 3 seemed the most noticeably affected with a real world difference of approx. 2% - which is within the error margin on a single system, but was present on multiple systems and so is likely to be real. Personally I think the performance hit is so minor that it's not worth considering - especially when compared to the cost and data integrity issues. In other words, I think that cost and the importance of data integrity should be the only reason that one would buy ECC or not because a 1% performance difference in real world apps is simply too insignificant for anyone to consider.
 

kylef

Golden Member
Jan 25, 2000
1,430
0
0
Thanks for the info. I'll look for the thread. Hope your headache gets better soon :)

Kyle
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
Oops, I editted the bit about the headache out... :) I figured that who really cares whether or not my head hurts. The thread was some time ago. Search for the word ECC in the title and only look at threads that are longer than a dozen posts because this one was in the 20's IIRC.
 

kylef

Golden Member
Jan 25, 2000
1,430
0
0
Here is the thread. I hope the link works... I've never tried to "HTMLize" the results of a search before...

Kyle