- Apr 6, 2015
- 5
- 0
- 66
NB: Due to formatting limitations of vBulletin, I had to choice but to exclude certain information in this forum thread. Please download the full article (PDF) here (hosted by Google Drive)
Please download the full article in the link provided below for the details of my system and test procedure as well as other information.
Download full article (PDF) here (hosted by Google Drive)
Screenshots and pictures
CPU-Z Screenshot:
Corsair DRAM modules:
Memtest86 result:
Objectives of this article
- To increase awareness of a prevalent and insidious but little-known RAM instability issue;
- to beseech manufacturers to tighten their quality control as well as their screening and validation process; and
- to encourage the use of robust ECC techniques in all memory and storage technologies.
Description of the RAM instability issue
- According to a research paper entitled "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors", the charge of a DRAM cell can be lost when a nearby address is repeatedly activated, thereby causing data corruption. To quote,
"Activating the same row in DRAM corrupts data in nearby rows We identify the root cause of disturbance errors as the repeated toggling of a DRAM row's wordline, which stresses inter-cell coupling effects that accelerate charge leakage from nearby rows DRAM disturbance errors are caused by the repeated opening/closing of a row, not by column reads Disturbance errors can be exploited by a malicious program to breach memory protection We conclude that the coupling pathway responsible for disturbance errors may be independent of the process variation responsible for weak cells Sever-grade systems employ ECC modules with extra DRAM chips, incurring a 12.5% capacity overhead. However, even such modules cannot correct multi-bit disturbance errors Disturbance errors are a general class of reliability problem that afflicts not only DRAM, but also other memory and storage technologies: SRAM, flash, and hard-disk." - This RAM instability can be exposed by running the "Hammer Tests" (test 13 in Memtest86). Read up this "Hammer Test" on Passmark's website, but here's the pertinent paragraph (amended punctuation slightly):
"The Hammer Test is designed to detect RAM modules that are susceptible to disturbance errors caused by charge leakage. This phenomenon is characterized in the research paper 'Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors by Yoongu Kim et al'. According to the research, a significant number of RAM modules manufactured 2010 or newer are affected by this defect. In simple terms, susceptible RAM modules can be subjected to disturbance errors when repeatedly accessing addresses in the same memory bank but different rows in a short period of time. Errors occur when the repeated access causes charge loss in a memory cell, before the cell contents can be refreshed at the next DRAM refresh interval This test 'hammers' rows by alternatively reading two addresses in a repeated fashion, then verifying the contents of other addresses for disturbance errors." - After running a variety of stability tests and varying my system configuration, I conclude that my brand new pair of Corsair Dominator Platinum RAMs is not 100% stable. While it passes Prime95, Intelburntest, Windows Memory Diagnostic Tool and traditional Memtest86 tests, it consistently fails the Hammer Test. The details of my system as well as testing procedure are provided in the link below.
<space> - I'm not the only one who is experiencing this problem. See this thread entitled "How to relate to errors in Hammer Test 13?"
I would like, in particular, to draw your attention to post #13 by the Administrator,
"Many computers are fundamentally (slightly) unreliable in a random ways. Maybe this doesn't matter for home use, but for medical devices, banking systems, flight control systems, etc.. it is a big deal Equally worrying is that our algorithm for provoking the problem is probably non optimal. Meaning that with prefect knowledge of the addressing scheme on each CPU, the channels in use and ram timings, etc.. we could probably force even more errors. The current algorithm is fairly general and not targeted at any particular RAM setup or CPU."
as well as by the OP
"I am guessing it'll blow up slowly. Like the Samsung 840 EVO, who just went into round two. Everyone can measure the impact on 840 EVO. Still, Samsung are dragging their feet, trying to create firmware/software solutions. With Hammer 13, I could guess a myriad of PR speak: Within 'normalized' specifications... Negligible impact with normal usage... and so on :/ But as you rightly pointed out. There are systems where 1 single unintended bit flip can have a major impact. And you can bet many of them are using normal RAM where ECC would be sensible (cost)." - There's a Wikipedia page on this "Row Hammer" issue as well.
Please download the full article in the link provided below for the details of my system and test procedure as well as other information.
Download full article (PDF) here (hosted by Google Drive)
Screenshots and pictures
CPU-Z Screenshot:

Corsair DRAM modules:

Memtest86 result:

Last edited: