Hi all,
I'm having a very frustrating / depressing problem. I've been using PCs for quite a while, and I've never heard of anything like this before - hopefully someone here has, and can help!
First, I'm not overclocking - settings are unchanged out of the box, stock. My goal with this system is stability & reliability more than anything.
E8400 , GA-P35-DS3L
RAM is G.SKILL 4 GB (2x2GB) DDR2800 ( 5-5-5-15 "1.8-1.9 Volts" model# F2-6400CL5D-4GBPQ )
2 drives, a 640GB WD and a 1TB Seagate
I set everything up, things looked good.
I use Acronis for backups, so I ran a backup of the WD to the Seagate to see how fast it would go. I ran "validate" on the backup and I got an error. I thought this might be a bug in Acronis, so I tried something else - I created a very large file (30GB) and copied it from one drive to the other, and compared the two to see if they were identical. They were. For the hell of it, I tried it again... and this time they weren't!
What I've found is, if I copy this 30GB file, either WD->Seagate or Seagate->WD, there is about a 1 in 4 chance it will not match. But strangely, If I copy WD->WD or Seagate->Seagate, there are no errors! (at least, so far).
This is really disturbing... I've been using PCs for years and never had data corruption before. (Or maybe I have, but I've never noticed!)
If I copy little files, they're fine. I can copy 1GB files back and forth all day and get no errors.
The system seems totally stable. I ran prime95, orthos, memtest86, memtest+ each for 12+ hours, all fine. Ran the drive tests from WD & Seagate, both OK.
Someone on another forum mentioned trying "Windows Memory Diagnostic" - http://oca.microsoft.com/en/windiag.asp
It does not actually run under windows, but is its own OS / boot disk, like Memtest.
I ran it, and like everything else, it passed. However, it has an "extended mode", so I tried that, and I got 3000+ errors for a test called "MATS+ Uncached".
Now, I've never heard of this program, it has a (c) date of 2003, and it is from Microsoft (grin), so for all I know it's a bug in the program.
I tested each stick on its own, and the test passes. But, put them back in (dual channel mode), and 3000+ errors again.
Here's the really, really strange part...
I put them both in, but in single channel mode, and I only get 1 single error, at e88e7fa0.
(I was expecting 0 errors, thinking the dual channel was causing it... but 1 error.)
The strange part is that the order that I put the sticks in matters!!! If I reverse the sticks, I don't get that 1 error!
Let's call one of the sticks "A" and one "B" (even though they should be identical, right?)
So, here's a chart... Channel 1=slot 1&2, Channel 2=slot 3&4.
A B X X = 1 error
B A X X = no errors
X X A B = 1 error
X X B A = no errors
Dual channel setup:
X B X A = 3000+ errors
B X A X = 3000+ errors
Either stick, by itself, in any slot = NO errors!
As for the data corruption - I've only done limited testing, but it seems to have gone away when I use only 1 stick of RAM. So, I think the two must be connected.
I don't have the money (or time left to RMA) in order to test different RAM, or different MB. I am just going to RMA both the MB and RAM and get different brands of both. I am too paranoid about losing data.
Has anyone used this Microsoft test and had it catch things the others didn't? I still have trouble believing Memtest, Prime95 etc all failed to catch this...
And why are the sticks passing individually? Why does it matter if I order them "B,A" vs "A,B" ?
This has been the month from hell...