Memtest error at stock settings; what to do?

QuixoticOne

Golden Member
Nov 4, 2005
1,855
0
0
I've got a memtest error on a new system I've put together yesterday.

I'm not quite sure as to the easiest next step in diagnosing the specific problem location with a minimum of tearing hardware
apart and running memtest a lot more times etc.

ABIT IP35-E Motherboard with latest BIOS upgrade installed.
4x2GB sticks of SuperTalent T8UB2GC5 2.1V / 5-5-5-15 DDR2-800,
Q6600-G0,
ANTEC Trio-550W PSU,
EVGA 8600GT-512,
WD 400 GB SATA drive.

I'm running everything at stock speed and voltage and timings as far as I can tell given the way the BIOS presents and executes those details.

BIOS PC health monitoring of the various voltages and temperatures looks reasonable to me at first glance when in idle in the BIOS.

Now the error I get has been seen several times over several hours of testing, and is located at the exact same address
with the exact same bit position of the error. However it does NOT happen on every test pass, indeed I've only seen it happen during
"Test 6, Moving Inversions", and even then it only happens on around 20% of the passes according to limited testing.
When I limit the test scanning to the specific 2GB range around the reported address and using Test #6 exclusively, I can get it to reproduce the error more frequently than if I just told it to test all memory with every test pattern.

The fact that the error happens at the same address and bit position indicates the potential for some true and repeatable hardware fault as the cause.

The fact that it seems to pass without problem using all other kinds of memtest test pass type and the BIOS POST memory test indicates the fault has to be at least a bit of a subtle thing assuming it is a real error and not a software / BIOS / CPU bug or whatever.

The fact that everything is at supposedly stock speed / voltage / timing indicates that this could be considered an RMA-able defect unless there's some BIOS bug or installation problem that might fix it.

I assume my DIMMs are in some kind of dual channel interleaved mode, though I have no idea how to (using math / system information) localize that error address to a specific DIMM, it would be f****** nice if they could actually report in the BIOS *which* DIMM slot corresponded to which physical address range given the mapping in use, but I guess that would be too obviously useful for them to consider adding code for instead of something useless like, say, a boot logo that covers up all useful POST information. Maybe there is some utility that could probe the memory controller registers and tie it down.. though I imagine it'd have to be motherboard model specific since they could solder the sockets in in any order they wanted, so ... bleh.

Even given that the error is at a specific address / bit doesn't mean (AFAIK) it is a RAM fault, I suppose it could be in a DIMM, in the motherboard's northbridge, or in the CPU depending on the set of circumstances that cause this problem.

So now what? Any tips on converting a hex address and bit number into a physical DIMM slot on the IP35-E given the way it maps the memory?

Given that I bought all the DIMMs together as one bunch I suppose I could pull them all and RMA them all, though it has been more than 14 days since my purchase date so I'd probably have a big or impossible fight trying to return them and would get directed to warranty type service. Given that a single one of the DIMMs was like $37, paying RMA shipping charges and wasting a lot of time getting the RMA and sending the DIMMS off to warranty service and so on would quickly approach my spending the cost of the original product.

I'm pretty sure this would be likely a DIMM problem given the fixed address, though I suppose it could be something flaky with the CPU/NB/MB/BIOS, so any tips on confirming the nature of the error most easily other than the obvious option of tearing apart another PC, swapping all the DIMMs back and forth, then playing "musical DIMM" to see which permutations of DIMM swaps leave the error unchanged and which ones move the error address or switch it to the other PC? I'm no big fan of tearing out the DIMMs more often than necessary given that I'm not even sure if I can get them out with the CPU heatsink installed and the motherboard still in the case so that'd be a royal pain to do even once.

Has anyone had a similar kind of error that was NOT due to a RAM hardware fault or incorrect user selected voltage/timing choice, but was something like a bug in the CPU or motherboard / northbridge / etc.?
 

LOUISSSSS

Diamond Member
Dec 5, 2005
8,771
57
91
ur best bet here would be to do the trian and error tests, try each dimm one at at time in memtest all in the same slot. this say u can isolate the bad dimm and/or bad slot...