Overclock suddenly unstable

keldog7

Senior member
Dec 1, 2005
235
0
0
Can anyone suggest why my overclocked chip has suddenly started running hotter, and is now prime unstable. No hardware changes (see sig) or BIOS changes were made. The only difference is that the machine was turned off for about a 3 week period after the initial overclock was setup, and it has been physically moved to a new lcoation (gently, of course) Previously, it was 12 hours dual core prime stable with idle at 39 and load at 46.
Now, I'm having failures on EITHER chip (!) at random times...sometimes 3 minutes, sometimes 3 hours. Idle temps are now 44C, with load up to 53. The room temperature IS us just a bit more - maybe increased from 19 to 22 degrees C. Ambient temps in the case running about 34 - really unchanged from the previous location.

Can anyone help? Can a Scythe Ninja stop working/break? Would going from 46 to 53 degrees cause the chip to be prime unstable? Would you suggest I take it apart and re-mount the heatsink?...if so, A Zalman 9500 is going in its place...
-A
 

keldog7

Senior member
Dec 1, 2005
235
0
0
*sigh* somewhere inside, I knew that's the first thing which needed to be done... You nkow how hard it is to re-mount that HSF without pulling the MB out of the case?? Nigh impossible...hence the Zalman swap... Any other suggestions?
 

keldog7

Senior member
Dec 1, 2005
235
0
0
New location is *barley* warmer - maybe 1-2 degrees, no more. Ventilation out the back of the case might be a bit more impeded BUT internal case ambient temps not really different. As for dust...all this again assumes the case internal temp is higher...and it isn't.
-A
 

robertk2012

Platinum Member
Dec 14, 2004
2,134
0
0
Open the side of the case and see what happens. Maybe your airflow changed somehow. Cable moved maybe?
 

keldog7

Senior member
Dec 1, 2005
235
0
0
Sorry - thought you were just throwing out a comment, and didn't expect that you were waiting for a reply. Anyway, I'm a bit dumbfounded and embarassed. Let me explain...

Took side panel off while running - lots of airflow, no movement of cables etc., and the fans are running as expected. However, had to take a bathroom break before had a chance to put case back together... When I came back I noticed the temperature readings were all way down. I had previously mis-read the GPU temp as the case ambient... Long story short, the GPU hasn't changed temp (was always 34), but the case ambient (when wide open) has dropped from 34 to 27 C. As a result, the processor is now idling at 38 C and running dual mprime (Prime95) at 47 C. I'll have to let it go for a while to see if it is once again prime stable. So clearly, this is an airflow issue.

However, once I have run mprime for a while, this raises another point...the change in operating temp from 47 to 53 may have made the chip unstable - yet it is still operating CLEARLY within AMD's spec of 65 C max for an Opteron. Can anyone comment on this? I can't believe that its simply a result of overclocking, because they're all basically the same chip - just speed-binned differently. By implication, I suppose the question is: Does lowering the CPU chip's temperature always allow it to be clocked faster?

Cheers,
A
 

lopri

Elite Member
Jul 27, 2002
13,314
690
126
Originally posted by: keldog7
By implication, I suppose the question is: Does lowering the CPU chip's temperature always allow it to be clocked faster?

Cheers,
A

Yes.

Why do you think those hardcord OC'ers use dry ice, LN, etc.? ;) Their chips might do >3GHz with their fancy cooling devices, but that doesn't mean the same chips will do 3GHz @65C although it might be still under AMD's specification. :D

AMD's specifications basically assume everything stock, normal environment.
 

keldog7

Senior member
Dec 1, 2005
235
0
0
Hmm. I guess I never really thought of it that way.
I assumed that using these other cooling methods allowed the voltage to be increased markedly (resulting in additional heat dissipation), so that the chip would be stable at faster clocks. I also assumed that provided the chip temp was in a good range, that with enough volts (within reason), it will usually run faster. This isn't the case?
Incidentally then, exactly how hot is an AMD Opteron when under full load, with the stock cooling solution? I guess that should be my target temperature to achieve stability? Something sounds fishy here...it sounds like you're telling me that as long as I keep the chip at X degrees or below, it will be prime stable... This is not in the "overclockers manual" as far as I know....

Does it make more sense to "bin" a chip by using a specified temperature. In this way, someone buying a new Opteron could get a chip which is "guaranteed to run at 2.2 GHz, at stock temp of 41C"
In any event, I've discovered that my chip is no longer stable, even now that I'm back down to a load temp of 46 C. Second core is crapping out at about the 2-3 hour mark when priming. Recall that it WAS stable at this temp...12 hours +. I suppose I'm going to have to cool it down some more?

-A
 

keldog7

Senior member
Dec 1, 2005
235
0
0
Its not the RAM. RAM proven stable with 90 hour MEMTEST86+ run at current settings, with processor multiplier decreased to 7x, using same FSB.
...that is, unless you can argue otherwise.
-A
 

robertk2012

Platinum Member
Dec 14, 2004
2,134
0
0
Originally posted by: keldog7
Its not the RAM. RAM proven stable with 90 hour MEMTEST86+ run at current settings, with processor multiplier decreased to 7x, using same FSB.
...that is, unless you can argue otherwise.
-A

There is an onboard memory controller in the CPU. It may be the cause of the instability. Just try it so we can rule that out.

Thanks

Oh and temp isnt the only thing that causes instability and just because you have good temps doesnt mean it will be stable.
 

keldog7

Senior member
Dec 1, 2005
235
0
0
I'm not sure you understand...the RAM was tested first at the rated speed, with CPU underclocked. 90 hours stable.
Then, the RAM was underclocked with a big divider, and the CPU was overclocked to its current 2.5 GHz (250MHz FSB, with a 10x multi). It was 12 to 18 hours stable (can't recall exactly, as it was 2 months ago). When I put the two together (both overclocked), it continued to be stable - but I got impatient at 8 hours.
Lately, my priming (2 instances) fails at anywhere from 2 minutes(!), up to 6 hours...doesn't always seem to be the same core either... Very odd.
Still, I asked for suggestions...
I'll retry both overclocks independently, and post the results in a few days (going skiing x4 days, the day after tomorrow...)
Oh, and I know that the temp isn't the only factor, but lets face it....it *was* stable, and the rest of the hardware didn't change...hmmm...might have re-enabled an onboard feature (eth1)... As I say, gimme a few days.
-A
 

robertk2012

Platinum Member
Dec 14, 2004
2,134
0
0
or the first time you might have just got lucky.

Its quite easy to change to memory to CAS 3 and then run prime.

You may just be at your max overclock. was 2.5 as high as you could get with the memory turned down?
 

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
21,067
3,573
126
did u use the standard themeral paste? or did u use AS5? if you used AS5 did u only apply a little dot, vs covering the entire top of the cpu with AS5?

AS5 will help cooling if applied properly, but if u apply too much, it can do the exact opposite.

If u used the standard thermal paste then nm, there isnt a very big noticible temp difference from using the standard white stuff vs AS5.
 

keldog7

Senior member
Dec 1, 2005
235
0
0
robertk2012: I never pushed past the 2.5 with the RAM. At 90 hours stable, that hardly sounds lucky...it also sounds damn stable.

aigomoria: I used AS Ceramique, with a small dot applied (after pre-treating the base of the heatsink) to the CPU. All of it done exactly as Artic Silver website tells you to.

Otherwise, still haven't had the time to re-run benchmarks...will have to wait until about March 5th.

Cheers,
A
 

nealh

Diamond Member
Nov 21, 1999
7,078
1
0
it seems to me this is related to drivers and software

I have a 165 and 170 I ran both Dual Prime 95 for 15 hrs then 2 days later same exact setting with no software changes ..it fails..retweak..goes again for 12-15hrs...bizarre


if you passed once with no issues..do not go back and retest..seems to be a waste

I think alot of it is dll and xp

at one point having my monitor turn off by xp with certain video drivers seem to be the issue with failing prime 95

I really believe Prime 95 needs a very stable winxp platform for proper testing..all the screwing around I do ..seems to cause problems
 

keldog7

Senior member
Dec 1, 2005
235
0
0
Well...I *really* doubt that XP and the .dll thing is the problem... Mainly because neither of those ugly kludges are installed on my system. Generally running Mandriva 2006 x86-64 build, with 2.6.12-17SMP kernel. As a result, I'm runing "mprime" - the linux version of "Prime95" - with A0 and A1 [processor] affinities. I'm also using memtest86+ with the bootable CD option, which is (unless I'm mistaken) booting from a linux core variant directly into the application.

Anyway, informal testing so far supports the temperature suggestions, that others have made. If I keep her below 43 C, she generally runs stable.
 

professor1942

Senior member
Dec 22, 2005
509
0
0
Cut a hole in the side of your case and add another fan :p

Seriously though, look at the case in my link - it may look freakish but the CPU is 7 degrees (Celsius) cooler at idle and 10-12 cooler under load than when I put it in my Lian Li.
 

keldog7

Senior member
Dec 1, 2005
235
0
0
As far as cutting open my [relatively] new P180... Not a freaking chance.
*BUT* I have identified airflow (or at least the ambient air temp inside the case) as a MAJOR contibutor to the problem in my PC case. I've got a whole pile of temperature benchmarks with the front door fo the P180 open / closed, front fan grille in/out, front fan running/stopped, air filter in /out etc etc, all made some under 100% LOAD. I was surprised by the results! The front grille, as open and innocuous as it looks, seems to make the biggest difference - I suspect because its angle alters the direction of the airflow in a non-ideal way.
Anyway, one of these days, I'll get around to posting the results for all to see (and flame my methods, I'm sure...)
-A