2 identical systems, but one overheats regularly and shuts down

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Turbonium

Platinum Member
Mar 15, 2003
2,157
82
91
Have we looked at the temperature's 3 hours later? Be interesting to see if they are now more in line with the other PC.
The temperatures are supposedly always, from the moment of booting up, at maximum (Tjmax).

Remember that from even a cold boot, BIOS thinks the difference between Tjmax and the actual temperature of the CPU is 0, meaning it's reportedly at Tjmax.
 

Turbonium

Platinum Member
Mar 15, 2003
2,157
82
91
This is weirder still. Currently have the case vertical and covered, 100% CPU load, and it's been going for over an hour. The temps are rising very slightly (by one to two degrees on all fronts since an hour ago), but no reset or shutdown.

So confusing.
 

QuietDad

Senior member
Dec 18, 2005
523
79
91
You can show me all the links you want. I'm not intimately familiar with the Q9650 and I've been building computer systems starting with IBM 360s in the late 1970s. It may be that it's a bad CPU sensor BUT to diagnose it right, there a three things you need to make it work. A good CPU, a good mount to an adequate fan and airflow over the heat sink. Hundreds of systems later I have found That it's usually improperly applied thermal paste (more is NOT better) followed by a box shoved under a desk or in a drawer. Occasionally it's a bad CPU. Testing the seating of the CPU and the airflow costs NOTHING to try (A tube of paste is $15 and does alot of cpus...) and it causes the most problems. If this solves the problem, your up and running for nothing. Had you bought a new motherboard and CPU and shoved it back in and it turned out to be airflow, you'd be out $250 minimum and still be stuck. It's called trouble shooting and not googling problems.

Hopefully this will work. I wouldn't be to worried about temp fluctuations after long periods of high loads. I worry about rapid rises of temps in short periods of time. I have two Q6600's overclocked to 3Gb runing Seti 24/7 and while they may run "hot" to certain peoples standards, thet've been doing it for 2 years straight.

Time will tell.
 

Turbonium

Platinum Member
Mar 15, 2003
2,157
82
91
I just realized that today, the temperature inside my apartment has been a good 1.5-2 degrees cooler than the last few days. Subtle, but it could mean all the difference.

I'm going to try testing it again once it gets a bit warmer.

In any case, I'm going to get some exhaust fans for these cases (see thread).
 

Turbonium

Platinum Member
Mar 15, 2003
2,157
82
91
I'm going to request this thread be moved to the Cases & Cooling forum. It seems more appropriate.
 

biodoc

Diamond Member
Dec 29, 2005
6,338
2,243
136
Ok, I opened up the system, and the HSF looks seated perfectly fine, and has almost no dust buildup either (I used compressed air on it anyway to clean it up). Fan seems to be spinning fine as well, based on how easily and noiselessly it spins when applying compressed air.

I see no point in removing the HSF and reapplying it, given I'd have to use up more thermal paste.

Diode problem perhaps? I really need your help on this.

Here is a pic of the "overheating" internals:

The HSF "push-pin" on the lower left is clearly not seated properly. Compare it to the one on the lower right.

Your CPU (quad-core) has 4 thermal sensors embedded in the chip. So far we don't have a reading from those sensors. I really would like to see the output from "sudo sensors-detect". Would you please post it here?

This is what your looking for to get a read on those 4 sensors in your CPU: Note the red bold line.

Some south bridges, CPUs or memory controllers contain embedded sensors.
Do you want to scan for them? This is totally safe. (YES/no): yes
Module cpuid loaded successfully.
Silicon Integrated Systems SIS5595... No
VIA VT82C686 Integrated Sensors... No
VIA VT8231 Integrated Sensors... No
AMD K8 thermal sensors... No
AMD Family 10h thermal sensors... No
AMD Family 11h thermal sensors... No
AMD Family 12h and 14h thermal sensors... No
AMD Family 15h thermal sensors... No
AMD Family 15h power sensors... No
Intel digital thermal sensor... Success!
(driver `coretemp')

Intel AMB FB-DIMM thermal sensor... No
VIA C7 thermal sensor... No
VIA Nano thermal sensor... No


thanks, looking forward to seeing the output. :)
 

Turbonium

Platinum Member
Mar 15, 2003
2,157
82
91
The HSF "push-pin" on the lower left is clearly not seated properly. Compare it to the one on the lower right.

Your CPU (quad-core) has 4 thermal sensors embedded in the chip. So far we don't have a reading from those sensors. I really would like to see the output from "sudo sensors-detect". Would you please post it here?

This is what your looking for to get a read on those 4 sensors in your CPU: Note the red bold line.

Some south bridges, CPUs or memory controllers contain embedded sensors.
Do you want to scan for them? This is totally safe. (YES/no): yes
Module cpuid loaded successfully.
Silicon Integrated Systems SIS5595... No
VIA VT82C686 Integrated Sensors... No
VIA VT8231 Integrated Sensors... No
AMD K8 thermal sensors... No
AMD Family 10h thermal sensors... No
AMD Family 11h thermal sensors... No
AMD Family 12h and 14h thermal sensors... No
AMD Family 15h thermal sensors... No
AMD Family 15h power sensors... No
Intel digital thermal sensor... Success!
(driver `coretemp')

Intel AMB FB-DIMM thermal sensor... No
VIA C7 thermal sensor... No
VIA Nano thermal sensor... No


thanks, looking forward to seeing the output. :)
The push pin has a bit of superficial "give" to it. I can easily put it back into the appropriate position. The important thing is that the HSF is firmly attached to the CPU heatspreader (some light pushing results in no change in the HSF position, nor does light pulling).

I'm about to try the sudo command you just mentioned. Will post again in a few moments.

Also, the cooling fins are caked with dust bunnies and other small rodents! Notice the dark patches between fins!
Um... The fins are perfectly flawlessly clean. That's just the angle of the shot. Look at the fins at 4 o'clock, and 10 o'clock.
 
Last edited:

Turbonium

Platinum Member
Mar 15, 2003
2,157
82
91
I apparently didn't scroll up far enough in the original temp reports (sigh). How stupid of me.

Anyway, here is the healthy system at 100% CPU load:

coretemp-isa-0000
Adapter: ISA adapter
Core 0: +91.0°C (high = +76.0°C, crit = +100.0°C)
Core 1: +88.0°C (high = +76.0°C, crit = +100.0°C)
Core 2: +89.0°C (high = +76.0°C, crit = +100.0°C)
Core 3: +88.0°C (high = +76.0°C, crit = +100.0°C)

Compare that to the troubled system moments after startup and in Linux environment (near idling), and with the case cover removed:

coretemp-isa-0000
Adapter: ISA adapter
Core 0: N/A (high = +76.0°C, crit = +100.0°C) ALARM (CRIT)
Core 1: +97.0°C (high = +76.0°C, crit = +100.0°C)
Core 2: N/A (high = +76.0°C, crit = +100.0°C) ALARM (CRIT)
Core 3: N/A (high = +76.0°C, crit = +100.0°C) ALARM (CRIT)

-it either reports that, OR-

coretemp-isa-0000
Adapter: ISA adapter
Core 0: N/A (high = +76.0°C, crit = +100.0°C) ALARM (CRIT)
Core 1: +97.0°C (high = +76.0°C, crit = +100.0°C)
Core 2: +100.0°C (high = +76.0°C, crit = +100.0°C) ALARM (CRIT)
Core 3: N/A (high = +76.0°C, crit = +100.0°C) ALARM (CRIT)

-OR-

coretemp-isa-0000
Adapter: ISA adapter
Core 0: N/A (high = +76.0°C, crit = +100.0°C) ALARM (CRIT)
Core 1: +97.0°C (high = +76.0°C, crit = +100.0°C)
Core 2: +99.0°C (high = +76.0°C, crit = +100.0°C) ALARM (CRIT)
Core 3: N/A (high = +76.0°C, crit = +100.0°C) ALARM (CRIT)

It's basically at thermal max according to lm-sensors, and I take it it's throttling (perhaps that's why 2/4 sensors are not reporting anything, because I'm assuming 2 of the cores are literally disabled as a result of throttling?).

Anyway, I'm going to try reseating the HSF later today (though I hate to for previously mentioned reasons). And yes, I do leave the troubled system off (other than for testing the temps purposes above).
 
Last edited:

biodoc

Diamond Member
Dec 29, 2005
6,338
2,243
136
Wow those are very high temps. I'm surprised both processors are still functional after 2 months at 85C +.

Please don't take this question the wrong way. Did you pull the protesctive tape off the thermal paste before setting the HSF? Those temps are "no thermal paste" temps.
 

Turbonium

Platinum Member
Mar 15, 2003
2,157
82
91
Wow those are very high temps. I'm surprised both processors are still functional after 2 months at 85C +.

Please don't take this question the wrong way. Did you pull the protesctive tape off the thermal paste before setting the HSF? Those temps are "no thermal paste" temps.
I'm 99.99% sure there was no tape on either one.

It's maybe because of the mATX cases they are in (and the fact that my apartment is really warm... about 26+ degrees).
 
Last edited:

GLeeM

Elite Member
Apr 2, 2004
7,199
128
106
Pull the HSF straight off - do not twist - so you can examine what the old paste looks like. It should be even over the entire surface. If you could a picture would be great!

If the HSF does not have enough pressure against the CPU (with all four pins fully engaged) it is only a little better than no HSF.
When I installed my TRUE I put a penny in between to get more pressure. It is almost to the point of cracking the mobo :)
 

Turbonium

Platinum Member
Mar 15, 2003
2,157
82
91
Pull the HSF straight off - do not twist - so you can examine what the old paste looks like. It should be even over the entire surface. If you could a picture would be great!

If the HSF does not have enough pressure against the CPU (with all four pins fully engaged) it is only a little better than no HSF.
When I installed my TRUE I put a penny in between to get more pressure. It is almost to the point of cracking the mobo :)
I'll post a pic soon. Stay tuned.
 

Turbonium

Platinum Member
Mar 15, 2003
2,157
82
91
Prepare to laugh...

nRflvl4.jpg


8UFSuVH.jpg


I tried reseating it, and the temps go straight to the 100 degree max once it goes to full load. I will try reseating again, but with thermal paste. If it's still borked, I will be at a loss.

I mean, that would mean something is wrong with the HSF. Or perhaps the thermal pad that is on it as pictured above (I mean what the heck... 2 months of hot temps and no melting and spreading out; and it's not like the HSF wasn't seated properly, though perhaps it wasn't as tight as it should have been, given the clips/pins are a bit wonky).

Anyway, I also tried applying some manual pressure for a few moments while it's at 100% load to see if it "melts" into place even a bit. I'll find out in a sec once I remove the HSF again.
 

Turbonium

Platinum Member
Mar 15, 2003
2,157
82
91
Ok, I cleaned up the old Intel thermal pad (or more like defective space-filler), and replaced it with a horizontal line of Noctua NT-H1, and the temps are maxing out at around 83-84 degrees at 100% CPU load.

Better. Sixteen+ degrees better, to be exact. Still a bit hot though, but whatever.

I'm guessing the thermal paste that came with both HSFs was defective and dried up or something. I can't believe it hadn't melted and spread out properly. I bet the other HSF is the same, so I'm going to fix that one up too.
 
Last edited:

Turbonium

Platinum Member
Mar 15, 2003
2,157
82
91
Ok, more weirdness...

I removed the working system's HSF as I said I would, and found this:

svzk1U6.jpg


xOyFiHD.jpg


Meaning the thermal pad on it was likely fine on this one!

No worries I figured (temps could only stay the same or get better after applying Noctua instead, right?).

Well I was wrong. Now, my temps are maxing out on this system instead... It goes straight to 100 degrees during full CPU load. :/

Sigh. I'm going to have to reseat this one now. A classic case of "if it's not broken, don't fix it"...
 
Last edited:

QuietDad

Senior member
Dec 18, 2005
523
79
91
Just get them at or under 70c and life will be fine. My personal belief is that your a little heavy on they paste. I just usually put a pea sized blob in the middle of the CPU. then press on the HSF as I seat it. But it was obviously working. I'd be worried on the first one that overheated, the one with the stripped paste, that the contact pad on the HSF is warped.
 

Turbonium

Platinum Member
Mar 15, 2003
2,157
82
91
These HSFs are definitely wonky. That's the issue (the pins I mean).

Argh. Probably a bad batch from Intel.

Just get them at or under 70c and life will be fine. My personal belief is that your a little heavy on they paste. I just usually put a pea sized blob in the middle of the CPU. then press on the HSF as I seat it. But it was obviously working. I'd be worried on the first one that overheated, the one with the stripped paste, that the contact pad on the HSF is warped.
I'm just putting a horizontal line of thermal paste down, about 1.25mm thick. It's really not that much at all.
 
Last edited:

Turbonium

Platinum Member
Mar 15, 2003
2,157
82
91
Reseated the new problem system's HSF (turns out the stupid pin had detached itself... I'm telling you, it's a bad batch of HSFs).

Temps are currently as follows...

System 1 @ 100% CPU load (°C): 84, 74, 77, 80 (2700 rpm)
System 2 @ 100% CPU load (°C): 90, 85, 96, 99 (3000 rpm)

System 1 I can live with. System 2 however is a different story.

Note: System 2's HSF has a faulty pin that doesn't fully engage, so it leaves it slightly (very very slightly) loose on one of the four corners (it's still fairly firm). Still, would that cause it to overheat so much?

Going to idle System 2 until I figure this out. Please help!
 
Last edited: