Thanks, Sugadaddy. I think that proves it for Pentium III's and Thunderbird Athlons. L2 ECC doesn't affect performance at all, so leave it enabled.
Seph:
Ruckb:<< the calculation for right or wrong data should be done in parallel on the fly, because if you are adding some additional time for every read, I would expect a bigger performance impact. >>
Seph:<The main reason that it is having trouble is usually bandwidth. The Thunderbird has a 64-bit bandwidth to the L2, which is why the P3 is able to gain on it and even surpass it in non-FPU situations at high clock speeds. The Thunderbird simply cant get all the data across like the P3 can, with it's 128-bit pipe.>
I agree with ruckb here, you are performing the calculation in parallel. So, it's not really a matter of bus bandwidth because the calculation is within the cache unit and is not sent on the bus between the core and the L2.
< Well, as i understand it, it is a set bus. If ECC works like parity, using another bit (although not just checking for odds and evens, of course) the bus would have to drop a data bit and replace it with the ECC bit, right? >
No, ECC works similarly to, but not just like, parity. I can post up the 1-bit corrected, 2-bits detected ECC algorithm if you like.
But anyway, essentially with both parity and ECC you are doing something like coming up with a checksum for the bus. Parity is only capable of checking 1-bit and if there's an error detected you have no idea which bit is hosed. 1-bit corrected ECC is capable of checking for 2-bits being wrong and is capable of correcting one bit being wrong (so if there are two bits which are wrong, then it pulls a parity exception, but if there is one bit wrong it can correct it). The checksum in the case of parity is usually one bit. In the case of ECC, the checksum varies on the size of the memory chunk. I can't remember how many bits it is for various sizes (that's what I have books for

), but Ruckb is right that it's 8 bits for a 64-bit value, and I believe him when he guesses it's 10 bits for a 128-bit value. Sounds about right to me.
I keep saying "1 bit corrected" ECC because you can have higher values of correction in ECC. You can add more bits to the checksum and make it "2 bit corrected ECC" (which I think is capable of checking for 4 bits being wrong, not 3 like you might think, but I don't have my reference book in front of me to check this). You can do this for more and more protection. But nowadays 1-bit corrected ECC is what everyone calls "ECC" because this is all that's necessary.
< Well, for the ECC to check the data, it has to be the final information. So then, instead of using the clock to send the data to the registers, memory, wherever, it is used to check data, and then sent on the next clock?>
Not sure that I follow here. If the ECC calculation on reads takes longer than simply reading the data out of the cache (which is does, in my experience), then you pull a CPU exception (the CPU equivalent of throwing down a penalty flag in US Football) and then you can just do the equivalent of a branch misprediction pipeline flush. IE. the CPU can say, "I'm doing the wrong thing, flush and redo it". This is a performance hit, of course, but you should only encounter it on those few unlikely times when an alpha or beta particle nukes the contents of a memory bit and flips it the other way around. This is rare enough that the performance hit is negligible.
It definitely is rare enough that you don't want to add a way to keep the pipeline going and then slide the correct data in the place of the incorrect data. This would add a lot of control complexity and it's unnecessary.
Ruckb What are your feelings on cosmic ray statistics in memory? I've been quoting the statistics from that paper from that guy who does memory over at IBM (can't recall his name, but he presented a paper at a conference that I attended a few years ago... ISSCC, I think). It was (statistically) on DRAM, one bit flip per month per 256MB at sea level on 0.25um. You design memory, what kind of statistical error rates do you guys see/design to?
Nice discussion, Seph and Ruckb! Thanks for sending me the PM inviting me to participate, Ruckb.
Either of you read the cover story of Forbes (US economics magazine) on Sun having bit-flipping problems in the L2 caches on their UltraSPARC systems? People were quoted as seeing bit corruptions as frequently as once per month in the 2MB L2 cache modules at elevations of cities like Denver, CO. The $50k+ system would pull a parity error and crash since the L2 on the UltraSPARC is parity - not ECC.