Odd OC problem -- Not for noobs

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,097
16,014
136
OK, go I have a 920 D0, and a megahalem on an ASrock extreme, and 6 gig of OCZ 1600 @ stock. At first I had it at 4.0, then down to 3.9, then to 3.8 (in the first few days) It was doing 3.8 24/7 100% load with HT on, 2 SMP F@H units, and 2 GPU's @ 100%. Then I burned out a 700 watt PSU (possibly due to a short in a SATA cable, not sure). So I put in a 1010 watt OCZ, and I was able to then maintain 3.9 again.

2 weeks later, all of a sudden it would lock up once a day. So back to 3.8 (just 190 instead of 195 BLCK). It couldn't work without locking up for 24 hours. So down to 3.6 (just the multi down one). Then 3.4, then 3.2. After 3.2 failed with all else equal, I said WTF, and changed PSU's to a Corsair 750. No diff. So then I went to stock, all settings except no EIST, C1E, etc.. but still has HT. So 3 days stable.

So after many years OC'ing, this is very odd. And temps is not an issue. The most it ever saw was 1.35 vcore @ 3.9 and 68c.

So what do you all think is the problem ? did I kill the motherboard ? Remember, it still works fine @ stock, and it used to work fine for weeks @ 3.9, all other hardware is fine.

Oh, and I did memtest @ 1.65 vdimm, and it passed with flying colours @ the 3.8 ghz settings (just one hour)
 
Last edited:

faxon

Platinum Member
May 23, 2008
2,109
1
81
Sounds like your PSU failure damaged your board and/or chip somehow. You got another board you can try it in?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,097
16,014
136
Nope....Hence the post. But normally, when a PSU dies, it takes things out, not just kills your OC. And what about the 2 weeks @ 3.9 right after that ????

Very odd, but that is my first suspect.....Just hate to spend $200-300 to find out.

I do have 12 other 775 socket boards....(see sig) but I haven't converted to 1366 broadly yet.
 
Last edited:

Tsavo

Platinum Member
Sep 29, 2009
2,645
37
91
I'd guess mainboard.

Try juicing up the RAM a bit and/or relaxing the timings.

Also, it has been my experience that Memtest doesn't stress the piss out of the RAM enough. I've had RAM pass with flying colors but would wet itself doing intensive RAM operations.
 

Visaoni

Senior member
May 15, 2008
213
0
0
It sounds like one of those situations where you just have to try switching things out; in this case most notably the motherboard. Even if you do get it back up to where it was, you will probably never know exactly what happened. The two weeks of service after the psu failure is just odd.

At any rate, this is a little off topic, but I have to ask. Where the heck do you keep all of those active machines?
 

SanDiegoPC

Senior member
Jul 14, 2006
460
0
0
It sounds like one of those situations where you just have to try switching things out; in this case most notably the motherboard. Even if you do get it back up to where it was, you will probably never know exactly what happened. The two weeks of service after the psu failure is just odd.
Agree completely! Both counts.

At any rate, this is a little off topic, but I have to ask. Where the heck do you keep all of those active machines?

yea, what are all those machines doing? Running a company?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,097
16,014
136
All in my house..... 4 in the computer room, 4 in the dining room (its a bachelors house, never used), 1 in my sons room, 3 in my bedroom, and they all do F@H.

Just blew a breaker tonight. We used to microwave, and too much juice !
 

Rubycon

Madame President
Aug 10, 2005
17,768
485
126
PWM partially damaged on board is a good bet. I had it happen on a DFI X48 board. Interestingly enough it would still OC a 65nm cpu but put a Yorkie in there and all hell would break loose! Even stock. It happens. It's one of the joys of overclocking - an excuse to buy something newer and better. ;)
 

nyker96

Diamond Member
Apr 19, 2005
5,630
2
81
When it comes to these types of trouble shooting, I find the process of elimination quite useful. your cooling is np, PSU you swaped a few times should be np, so only things remain is m.b., cpu and ram. now I had some ballistix die on me back in the days and they just don't start so your symptom doesn't suggest that. so what is left is the board/cpu, I'd put my cash on the board been damaged. could you do a visual inspection on the caps? see if any has leaks (sometimes it's subtle, look under it see if any is displaced? tilted? might not be visible on a solid cap but worth a check) because all these boards nowadays are phased, when some caps go the board might not be gone, might still limp along until you push it like in OC. I'd say the board and stop throwing more PSU at it you might just damage them as well. from your sig you probably don't have another 1366 board to test things out. I'd get the board replaced as the logical next step. or if you can find a friend with 1366 setup swap out the cpu/mem to his machine see if it can go 3,8 there.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,097
16,014
136
Well, it seems everybody agrees, the motherboard is the culprit. I just can't believe it after 2 weeks @ 3.9, but.....Its the only thing that makes sense.
 

DrMrLordX

Lifer
Apr 27, 2000
22,702
12,652
136
What kind of PWM cooling does that ASRock board have on it anyway? I'm wondering if this was a temp-related issue or something else.
 

Gillbot

Lifer
Jan 11, 2001
28,830
17
81
PWM partially damaged on board is a good bet. I had it happen on a DFI X48 board. Interestingly enough it would still OC a 65nm cpu but put a Yorkie in there and all hell would break loose! Even stock. It happens. It's one of the joys of overclocking - an excuse to buy something newer and better. ;)

This would be my guess, it can handle the load at stock but the stress of OC makes it buckle.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,097
16,014
136
What kind of PWM cooling does that ASRock board have on it anyway? I'm wondering if this was a temp-related issue or something else.

ImageGallery.aspx

Looks pretty good to me !

Edit (why doesn;t the image show....)
 

mav451

Senior member
Jan 31, 2006
626
0
76
My bet is CPU degradation/damage from that PSU failure - considering that the same mobo could sustain both 3.9Ghz and stock speeds. CPUs can degrade... but I don't think I've seen mobos degrade - especially since the IMC is now on the CPU. I mean we got the QPI I/O and southbridge...but I don't really put those at fault when it can still run stock and be stable.

Seems to me if you're varying CPU speeds for stability that the CPU is the key here.
 

Rubycon

Madame President
Aug 10, 2005
17,768
485
126
My bet is CPU degradation/damage from that PSU failure - considering that the same mobo could sustain both 3.9Ghz and stock speeds. CPUs can degrade... but I don't think I've seen mobos degrade - especially since the IMC is now on the CPU. I mean we got the QPI I/O and southbridge...but I don't really put those at fault when it can still run stock and be stable.

Seems to me if you're varying CPU speeds for stability that the CPU is the key here.

The only way to tell is try another motherboard. I agree 1366 boards are a ripoff. $500 is a LOT for a (non server) motherboard! I doubt we'll see cheap Amptron 1366 boards. Even if we did I would not want one!

As far as PWM cooling goes despite the factory supplied solution appearing adequate (most are) you may suffer from a poor mount or insufficient airflow. I always check temps (the Classified shows VRM temps in BIOS which is nice and revealed a mounting issue - 70°C stock in BIOS - 123°C under load! That would've blown out if left alone methinks.) before continuous deployment under heavy load. IR thermometers even with adjustable emissivity (REQUIRED when shooting metal!) are affordable now and no techie should be without one IMO. ;)
 

mav451

Senior member
Jan 31, 2006
626
0
76
I'm just wondering how he was able to run for so long if the PWM temps were bad, and then the two weeks with the new PSU. If it was a temperature issue, why didn't it fail earlier and then why does it still fail with a clock as low as only 3.2Ghz?

That said, he can probably just touch the PWM heatsink to see if it's working, which I think it would be if it was working well before the PSU failure.
 

fixxxer0

Senior member
Dec 28, 2004
357
0
0
i had a similar problem where i had a powersupply fail on me and my OC was no longer stable.

i couldnt go much above the stock FSB without having issues, luckily the board still ran stable at stock settings so i just kept it at stock, but definitely lost its ability to OC.


leaking/defective/exploded capacitor could just coincidently happen around the same time too... never know. but i would bet money that a new board would allow you to go back to the chips maximum
 
Last edited:

exar333

Diamond Member
Feb 7, 2004
8,518
8
91
Did you try running with a single DIMM installed? I had a similar issue with a C2Q a while back where all sticks passed memtest with flying colors, but would get intermittent errors. After testing single sticks in different slots, one of the DIMMs ended-up being the culprit. This would at least eliminate the RAM from the equation and leave just the CPU/MB as the culprit.
 

deimos3428

Senior member
Mar 6, 2009
697
0
0
2 weeks later, all of a sudden it would lock up once a day. So back to 3.8 (just 190 instead of 195 BLCK). It couldn't work without locking up for 24 hours. So down to 3.6 (just the multi down one). Then 3.4, then 3.2. After 3.2 failed with all else equal, I said WTF, and changed PSU's to a Corsair 750. No diff. So then I went to stock, all settings except no EIST, C1E, etc.. but still has HT. So 3 days stable.
I'll just throw this other possibility out there as a long shot. It's possible the instability has nothing to do with overclocking at all.

It didn't seem to make a difference whether you're at 3.2 or 3.9 on the 1010W OCZ, or which PSU you used at 3.2. Stock worked but that's also way down at 2.66 in ultra-stable land, possibly stable enough to compensate for other instabilities.

You also mentioned a PSU failure from an unknown source, and that you blew a breaker. If it was the breaker that is attached to the PSU, you may have a problem with the AC circuit.

But yeah, it's probably the motherboard. Try reseating it in case there's just something shorting it out before replacing it.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,097
16,014
136
any aftermarket software that checks that ? Also, it will run even 3.9 for 12-18 hours before it locks up/

Also, this is in an Antec 900 case with all the fans on medium. Lots of airflow. And the abient is 68f now, since the window is open with a fan on blowing in, and its 32f out there.(4 comps in this room)
 

fixxxer0

Senior member
Dec 28, 2004
357
0
0
what OS? 64bit i assume right?


are you getting minidumps generated when the computer crashes?? open up those minidumps if you are, it could be a driver flaking out like i had... my ATI win7 x64 drivers were causing strange intermittent lockups.


after reinstalling windows i finally got everything working fine again.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,097
16,014
136
it locks up. No video at all. Win XP 32 bit. There is no screen saver at all (the GPU F@H client hates that). Blank screen in 10 minutes is all.
 

Rubycon

Madame President
Aug 10, 2005
17,768
485
126
it locks up. No video at all. Win XP 32 bit. There is no screen saver at all (the GPU F@H client hates that). Blank screen in 10 minutes is all.

Sounds like the box needs to be pulled and tested - memtest, occt, linpack etc. Disable ALL power management and screen savers and see what happens. I abhor the vanishing box! D:
 

dguy6789

Diamond Member
Dec 9, 2002
8,558
3
76
It's the motherboard I believe. I once experienced the exact symptoms you describe a few years back with a Pentium D. Degrading overclock, complete lockups started happening to a once completely rock solid and stable system. After a pretty long time of troubleshooting and being baffled, I looked very carefully at the motherboard and noticed that the cpu power port on the motherboard was damaged. Replaced the motherboard, problem solved.