Heat causing errors in Memtest86+?

Synomenon

Lifer
Dec 25, 2004
10,542
6
81
Been running Memtest86+ v4.10 on my laptop. Ran it overnight last night. When I got up this morning, it showed one error from the first pass.

Turned the laptop off and left it for a couple of hours. After that couple of hours, I turned it on again and tried running Memtest86+ again. This time after only 14 minutes it showed 22 errors.

I turned the laptop off again and picked it up. The underside was very hot. I waited again for a couple of hours then put it on one of those cooling pads to help it cool down while trying Memtest86+ again. Memtest86+ is currently running and has been now for three hours with no errors. it has completed one pass so far.

Could the errors from earlier have been caused by too much heat?

==============
Update 1:
Ok, just tested stick 1 in slot 1 for 6 hours / 3 passes. No errors found.

Used Memtest86+ v4.10 again for this test.

==============
Update 2:
Tested stick 2 in slot 1 for 7 hours / 4 passes. No errors found.

Used Memtest86+ v4.10 for this test. Two more test sessions to go (stick 1 in slot 2 and stick 2 in slot 2).

==============
Update 3:
Ok, just tested stick 1 in slot 2 for 11 hours / 6 passes. No errors found.

Used Memtest86+ v4.10 again for this test.

==============
Update 4:
Tested stick 2 in slot 2 for 6 hours, 30 minutes / 4 passes. No errors found.

Used Memtest86+ v4.10

==============
Update 5:
Tested a new set of RAM (also 8GB - two 4GB sticks) for 27 hours and 10 minutes / 8 passes. No errors.

Used Memtest86+ v4.10

==============
Update 6:
Tested the new set of RAM with HCi Memtest Deluxe. 205 percent coverage on all the RAM. 0 errors.


I think the errors before were just from the first set of RAM. I only had errors when both sticks from that first set were in at the same time.
 
Last edited:

repoman0

Diamond Member
Jun 17, 2010
4,470
3,311
136
Could the errors from earlier have been caused by too much heat?

Sure, as I'm sure you know, excess heat can be detrimental to error-free operation of any PC component. Just like with overclocking, keeping temps down only increases your max clocks, but if temps are too high in the first place your stuff won't even be stable at stock speed.

Sounds like you need to figure out how to keep your laptop cooler, or try some 'better' memory - really just memory that can handle it's rated speed at very high temp.
 

Synomenon

Lifer
Dec 25, 2004
10,542
6
81
It's the stock RAM that came with the laptop. Two 2GB DDR2 800 Elpida sticks. Elpida is supposed to be good, right?

Edit:
Sorry, it's not the stock RAM it's 8GB of Corsair DDR2 800 (two 4GB sticks).
 
Last edited:

RebateMonger

Elite Member
Dec 24, 2005
11,588
0
0
Memtest is actually testing many things, including the CPU, memory management system, and even the memory slots themselves. Any of those could also be affected by overheating.
 

Synomenon

Lifer
Dec 25, 2004
10,542
6
81
14 hours, 18 minutes / 4 passes and no errors.

I'll let it run four at least two more passes, then I'll try a different memory testing program.


I don't understand why it gave me errors yesterday and now, when it's on a cooling pad, it's not giving me any errors. I don't know now whether to trust the RAM or if it's some other faulty hardware in the laptop.
 
Last edited:

Synomenon

Lifer
Dec 25, 2004
10,542
6
81
Ok, it's showing errors. Here's what it shows now:

Intel Core 2 1297 MHz
L1 Cache: 32K 18264 MB/s
L2 Cache: 3072K 8475 MB/s
L3 Cache: None
Memory: 8095M 2881MB/s
Chipset: Intel PM/GM45/47 (ECC: Disabled) - FSB: 199MHz - Type: DDR-II
Settings: RAM: 399MHz (DDR798) / CAS: 6-6-6-18 / Dual Channel

WallTime____Cached____RsvdMem____MemMap____Cache____ECC____Test____Pass____Errors____ECC Errs
26:16:50----8095M------252K---------e820--------on--------off------Stdd-----7--------7--------0

Error Confidence Value: 222
Lowest Error Address: 000b9c67350 - 2972.4MB
Highest Error Address: 000b9c67350 - 2972.4MB
Bits in Error Mask: 00400000
Bits in Error - Total: 1 Min: 1 Max: 1 Avg: 1
Max Contiguous Errors: 1
ECC Correctable Errors:
Errors per Memory Slot:
Code:
0: 1       4: 0       8: 0       12: 0
1: 1       5: 0       9: 0       13: 0
2: 0       6: 0       10: 0      14: 0
3: 0       7: 0       11: 0      15: 0
Code:
Test         Errors
-----------------------
0            0
1            0
2            0
3            0
4            2
5            0
6            0
7            0
8            0
9            0
 
Last edited:

Synomenon

Lifer
Dec 25, 2004
10,542
6
81
Just finished testing the RAM with HCi Memtest Deluxe v4.0.

The test ran for 20 hours and reached 201% coverage. 0 errors found.


HCi Memtest runs in Windows so it can't check the memory that the OS / system uses. I've read though that it stresses the system more than Memtest86+ does.

Don't know what to think now. Is the RAM bad or not? Are the errors being caused by heat produced when running the tests?
 

Synomenon

Lifer
Dec 25, 2004
10,542
6
81
Ok, not knowing what to do next I'm going to try testing each stick, one-by-one.

Should I test each stick in both slots (say six hours in each slot):

- Stick 1 in slot 1 for 6 hours, then stick 1 in slot 2 for 6 hours.
- Stick 2 in slot 1 for 6 hours, then stick 2 in slot 2 for 6 hours.

?

Or is it good enough to test stick 1 in slot 1 overnight then stick 2 in slot 2 the next night?
 
Last edited:

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
try 1 dimm in 1 slot at time.

poor quality ram will overload the bus and cause instability
 

Synomenon

Lifer
Dec 25, 2004
10,542
6
81
So 1 dimm in 1 slot at a time being this process?
- Stick 1 in slot 1 for 6 hours, then stick 1 in slot 2 for 6 hours.
- Stick 2 in slot 1 for 6 hours, then stick 2 in slot 2 for 6 hours.

Or is it good enough to test stick 1 in slot 1 and stick 2 in slot 2 (stick 1 in slot 1 for a couple of hours then remove and then stick 2 in slot 2 for a couple of hours)?
 

Synomenon

Lifer
Dec 25, 2004
10,542
6
81
Ok, just tested stick 1 in slot 1 for 6 hours / 3 passes. No errors found.

Used Memtest86+ v4.10 again for this test.
 

Synomenon

Lifer
Dec 25, 2004
10,542
6
81
Tested stick 2 in slot 1 for 7 hours / 4 passes. No errors found.

Used Memtest86+ v4.10 for this test. Two more test sessions to go (stick 1 in slot 2 and stick 2 in slot 2).
 

Synomenon

Lifer
Dec 25, 2004
10,542
6
81
Ok, just tested stick 1 in slot 2 for 11 hours / 6 passes. No errors found.

Used Memtest86+ v4.10 again for this test.
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
there ya go. mobo or ram is not good. cheap mobo and/or cheap ram sucks.

sorry. i've moved on to using quality ECC memory now - never had a failure- never had an ECC interrupt. I'm not an overclocker - i like my machines to never crash. and guess what ever since going to ECC ram - crashes don't happen. ever.
 

Synomenon

Lifer
Dec 25, 2004
10,542
6
81
I haven't found an 11.6" laptop that can take ECC RAM. The laptop is an Acer Aspire 1810T.

I have another set of RAM coming in to test. Also, I still have to test stick 2 in slot 2.
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
try another laptop - it's probably a cheap motherboard.

i've never really seen this with ddr2 except old macbooks/imacs.

ddr3 is complete different story. total nightmare.

biggest EFF UP is mixing DDR3 with a Core2 processor (always stick ddr3 with i3/i5/i7)
 

Synomenon

Lifer
Dec 25, 2004
10,542
6
81
Tested stick 2 in slot 2 for 6 hours, 30 minutes / 4 passes. No errors found.

Used Memtest86+ v4.10.


I guess I'll just chalk up those errors from the early tests to anomalies.
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
to be frank i would call that intermittent failure. and that is the worst thing to have.

say if your car's fuel pump intermittently failed you'd probably replace something. likewise i would not ignore the error. since you do not have ecc memory your system has no way of knowing if corruption occurs and it usually leaves a nasty mess if it affects your filesystem.

keep testing man. but if i ever see errors in memtest i consider that critical.
 

Synomenon

Lifer
Dec 25, 2004
10,542
6
81
Damn that sucks. It's not like I can just go buy another laptop either. I don't have the disposable income for that and this one is way past the store's return period.
 

Synomenon

Lifer
Dec 25, 2004
10,542
6
81
Emulex, so Acer's motherboards suck. Which company makes laptops with good motherboards? Can you recommend any?
 

RebateMonger

Elite Member
Dec 24, 2005
11,588
0
0
i've moved on to using quality ECC memory now - never had a failure- never had an ECC interrupt. I'm not an overclocker - i like my machines to never crash. and guess what ever since going to ECC ram - crashes don't happen. ever.
While I am an ECC fan, I was shocked to have two ECC memory failures this year on a friend's six-year-old Dell 400SC, being used as a desktop PC running 24/7.

It contained 750 MB of original Dell-provided ECC DDR3200 (Micron?). Both failures were in the first memory slot. The machine stopped and wouldn't come back up, warning of an uncorrectable memory error with both a text message and with diagnostics lights on the back of the Dell. It did seem to be a real memory failure because replacing the memory fixed the problem and simply moving the memory around didn't fix it.

Previously, I've never seen ANY memory error on an ECC module, so having two modules fail in such a short time was surprising. I maintain several other Dell 400SCs and none of the others have experienced any memory errors.
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
can you imagine now all those machines on the planet without ECC; they would just never know and corruption would continue until someone ran memtest86+ or like most people throw out the whole computer.

today servers have advanced ECC / chipkill. They can detect and fix more error; they can disable an entire dimm and keep on trucking similar to how they can disable an entire cpu. (crash may happen depends if its caught on POST or happens online).

HP and IBM have pioneered this because they know you can't put your servers in a giant lead chastity belt to prevent whatever- (alpha/beta?gamma?) rays from flipping a bit.

Think back to failures that manifested themselves like bad ram. random disk corruption; bsod; etc. without ECC you would never know; and knowing is half the battle.
 

Synomenon

Lifer
Dec 25, 2004
10,542
6
81
With larger DIMM capacities becoming more common, why haven't manufacturers taken ECC mainstream?
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
ECC covers more than just ram guys! consumer gear has no ECC.

servers run ecc end to end.

I've had routers with bad RAM or whatever corrupt an ftp. both ends were perfect. talk about a nightmare since it wasn't our network router corrupted.

ECC means alot to me because i must know that from A to B - what i send - is what is going to be received.

It is highly underappreciated.

Study the Xeon, xeon chipset, raid controllers, etc end to end protection.

same cpu (i7, x58, intel matrix raid) - do you have ECC anywhere? why no intel has a reason why that same Xeon costs a ton more.

most under appreciated feature of workstations and servers. and precisely why i rock workstations and servers for my top employees.

** it costs 1 bit more and slows down access a little ** but so do condoms ;)


create random files - i have a set - use MD5 or SHA1/256 to validate them (iso's work great)
burn in:
1. target storage 5 cores
2. target iscsi/nas storage 3 cores

zip, target 1 local drives, target 2 network based.
unzip on destination local drives, destination network.
check file against md5/sha1/256
repeat for a day

Any failures = box is junk and service call to replace whatever it takes.
 
Last edited:

Synomenon

Lifer
Dec 25, 2004
10,542
6
81
Update 5:
Tested a new set of RAM (also 8GB - two 4GB sticks) for 27 hours and 10 minutes / 8 passes. No errors.

Used Memtest86+ v4.10