Cruncher rebooting?

VirtualLarry

No Lifer
Aug 25, 2001
56,225
9,987
126
My secondary Q9300 @ 3.0, with HD4850, has been hard at work crunching PrimeGrid, and I wonder if the (temp.) load is just too much for it.

CoreTemp measured the cores having a max temp of 77C, and GPU-Z records the HD4850 at 6000RPM and 97C.

I think that it is more likely that the GPU is overheating than the CPU.

The HD4850 normally runs that hot, 90C+. I think 110C is shutdown temp, not sure though.

Wondering about the PSU and VRM temps too.

The PSU is a few years old, it's an Antec VP-450, which should be enough juice.

Note that when I've checked on it the last two days, I've found that it has rebooted on it's own. It did not shutdown, however.
 
Last edited:

petrusbroder

Elite Member
Nov 28, 2004
13,343
1,138
126
I recognize the problem - have had it too.
Heat is -as you say - probably the problem. I have solved sometimes by cleaning the computer's coolers (incl. GPU) free from dust, checking the fans (especially on the GPU).
Often it is not the GPU in itself which over heats, but som of the VRMs on the graphics card.
In one instance I have changed the HSF-unit of the graphics card and the problem was solved.
In an other instance I directed a large fan into the computer's case, and that solved the problem.
In a third instance I've changed project for the GPU: PrimeGrid is (or was?) more heat generating compared to e.g. Seti@Home or Collatz, and some PrimeGrid applications are more heat generating than others.
I see this problem as a cause for maintenance of the cruncher ...
Just my 2 cents ... ;)
 

VirtualLarry

No Lifer
Aug 25, 2001
56,225
9,987
126
My environment isn't hugely dusty, but it's probably time for a cleaning.

I checked the eventlog, and just before each "Critical" error, Kernel Power Error, unexpected shutdown, there was a regular Error, Bugcheck 0x00000124. Have to look that one up, but I think that it's video drivers.

Now, I recently switched my secondary Q9300 cruncher, from VGA (having to manually swap the singular VGA cable to my HDTV, between my primary and secondary cruncher, through a DVI-to-VGA adaptor), to HDMI, using the included ATI DVI-to-HDMI dongle (I seem to have lost the other one), and then into a monoprice mini HDMI switcher (powered by the devices HDMI ports). I'm wondering if switching the HDMI port out from under the OS, is causing it to crash, or there is otherwise some funny HDMI wierdness going on.

Edit: Apparently Stop 0x124 is a hardware-related Machine Check Exception.

I'll try giving it a cleaning. Maybe the hardware is just getting a little old. It's been "rode hard" all its life, but the overclock is so mild, I didn't think it would cause any damage, even long-term. I only notched up the vcore a tiny, tiny, bit over stock. Something under 1.3v, which is more than safe for 45nm Core2-era CPUs.

Edit: Perhaps it's jealous of the new member of the family, a Gateway 20" AIO PC I recently bought. :p

http://www.sevenforums.com/bsod-help-support/90980-bugcheck-124-a.html
http://msdn.microsoft.com/en-us/library/windows/hardware/ff557321(v=vs.85).aspx

Edit: I checked the primary cruncher and temps, and they were no higher than the secondary. Both were basically the same.

So I notched up the vcore on the secondary one step.
 
Last edited:

Fardringle

Diamond Member
Oct 23, 2000
9,183
751
126
Try running WhoCrashed on the machine. That can usually give a better idea of which program, driver, or device is causing the errors.

Those temperatures are pretty high, though, so I'd definitely give it a cleaning, make sure all of the fans are spinning properly, and maybe replace the thermal paste if it hasn't been done for a long time.
 

Assimilator1

Elite Member
Nov 4, 1999
24,118
507
126
My secondary Q9300 @ 3.0, with HD4850, has been hard at work crunching PrimeGrid, and I wonder if the (temp.) load is just too much for it.

CoreTemp measured the cores having a max temp of 77C, and GPU-Z records the HD4850 at 6000RPM and 97C.

At the least your CPU is running too hot, max safe temp is 71.4C , now maybe that's not causing the reboot but it ain't healthy for it in the long run!
 

GLeeM

Elite Member
Apr 2, 2004
7,199
128
106
At the least your CPU is running too hot, max safe temp is 71.4C ,
That is Tcase temp., I am not sure what Tcase is but for my i7 920 that site says it is 67.9C and my i7 920 has been at 75-85C for 42,000+ hours!!
 

Assimilator1

Elite Member
Nov 4, 1999
24,118
507
126
Lol, interesting! At nearly 5yrs that's a long time! ;)

I just fired up coretemp & it shows a max tj.max temp of 100C for my CPU (almost the same as the Q9400, bar speed & cache size). Which should mean his 77C is well within limits, hmm.
 
Last edited:

VirtualLarry

No Lifer
Aug 25, 2001
56,225
9,987
126
The primary Q9300 cruncher, which has been running 24/7 for a while with no known unexpected reboots in recent memory, reports:

Max TJ temps (C), using CoreTemp:
77
77
76
77

Max GPU temps (C), using GPU-Z:
GPU temp: 100C
shadercore: 106C

So I don't think that the temps on the secondary rig are the primary problem. I could be wrong. I usually don't have a pure PrimeGrid load on the CPU + GPU though (although I have for the last week or so).
 

Assimilator1

Elite Member
Nov 4, 1999
24,118
507
126
Don't you mean measured tj temps rather than max? Surely your max tj temps is the same as my CPU @100C?

Have you checked load voltages using a DMM? (don't use bios readings, unless you already know how they compare to real readings, as they tend to be inaccurate).
Maybe your PSU is failing? (I had lots of Antec units fail on me, & I don't buy them any more! :p)
+v tolerances is 5% max, +/-, also worth checking the ATX plug & socket for charring.
 
Last edited: