Odd OC problem -- Not for noobs

Markfw · Dec 29, 2009

OK, go I have a 920 D0, and a megahalem on an ASrock extreme, and 6 gig of OCZ 1600 @ stock. At first I had it at 4.0, then down to 3.9, then to 3.8 (in the first few days) It was doing 3.8 24/7 100% load with HT on, 2 SMP F@H units, and 2 GPU's @ 100%. Then I burned out a 700 watt PSU (possibly due to a short in a SATA cable, not sure). So I put in a 1010 watt OCZ, and I was able to then maintain 3.9 again.

2 weeks later, all of a sudden it would lock up once a day. So back to 3.8 (just 190 instead of 195 BLCK). It couldn't work without locking up for 24 hours. So down to 3.6 (just the multi down one). Then 3.4, then 3.2. After 3.2 failed with all else equal, I said WTF, and changed PSU's to a Corsair 750. No diff. So then I went to stock, all settings except no EIST, C1E, etc.. but still has HT. So 3 days stable.

So after many years OC'ing, this is very odd. And temps is not an issue. The most it ever saw was 1.35 vcore @ 3.9 and 68c.

So what do you all think is the problem ? did I kill the motherboard ? Remember, it still works fine @ stock, and it used to work fine for weeks @ 3.9, all other hardware is fine.

Oh, and I did memtest @ 1.65 vdimm, and it passed with flying colours @ the 3.8 ghz settings (just one hour)

faxon · Dec 29, 2009

Sounds like your PSU failure damaged your board and/or chip somehow. You got another board you can try it in?

Markfw · Dec 29, 2009

Nope....Hence the post. But normally, when a PSU dies, it takes things out, not just kills your OC. And what about the 2 weeks @ 3.9 right after that ????

Very odd, but that is my first suspect.....Just hate to spend $200-300 to find out.

I do have 12 other 775 socket boards....(see sig) but I haven't converted to 1366 broadly yet.

Tsavo · Dec 29, 2009

I'd guess mainboard.

Try juicing up the RAM a bit and/or relaxing the timings.

Also, it has been my experience that Memtest doesn't stress the piss out of the RAM enough. I've had RAM pass with flying colors but would wet itself doing intensive RAM operations.

F1N3ST · Dec 29, 2009

Shit's weak.

Visaoni · Dec 29, 2009

It sounds like one of those situations where you just have to try switching things out; in this case most notably the motherboard. Even if you do get it back up to where it was, you will probably never know exactly what happened. The two weeks of service after the psu failure is just odd.

At any rate, this is a little off topic, but I have to ask. Where the heck do you keep all of those active machines?

SanDiegoPC · Dec 30, 2009

Visaoni said:
It sounds like one of those situations where you just have to try switching things out; in this case most notably the motherboard. Even if you do get it back up to where it was, you will probably never know exactly what happened. The two weeks of service after the psu failure is just odd.

Agree completely! Both counts.

At any rate, this is a little off topic, but I have to ask. Where the heck do you keep all of those active machines?

yea, what are all those machines doing? Running a company?

Markfw · Dec 30, 2009

All in my house..... 4 in the computer room, 4 in the dining room (its a bachelors house, never used), 1 in my sons room, 3 in my bedroom, and they all do F@H.

Just blew a breaker tonight. We used to microwave, and too much juice !

Rubycon · Dec 30, 2009

PWM partially damaged on board is a good bet. I had it happen on a DFI X48 board. Interestingly enough it would still OC a 65nm cpu but put a Yorkie in there and all hell would break loose! Even stock. It happens. It's one of the joys of overclocking - an excuse to buy something newer and better. 😉

nyker96 · Dec 30, 2009

When it comes to these types of trouble shooting, I find the process of elimination quite useful. your cooling is np, PSU you swaped a few times should be np, so only things remain is m.b., cpu and ram. now I had some ballistix die on me back in the days and they just don't start so your symptom doesn't suggest that. so what is left is the board/cpu, I'd put my cash on the board been damaged. could you do a visual inspection on the caps? see if any has leaks (sometimes it's subtle, look under it see if any is displaced? tilted? might not be visible on a solid cap but worth a check) because all these boards nowadays are phased, when some caps go the board might not be gone, might still limp along until you push it like in OC. I'd say the board and stop throwing more PSU at it you might just damage them as well. from your sig you probably don't have another 1366 board to test things out. I'd get the board replaced as the logical next step. or if you can find a friend with 1366 setup swap out the cpu/mem to his machine see if it can go 3,8 there.

Markfw · Dec 30, 2009

Well, it seems everybody agrees, the motherboard is the culprit. I just can't believe it after 2 weeks @ 3.9, but.....Its the only thing that makes sense.

DrMrLordX · Dec 30, 2009

What kind of PWM cooling does that ASRock board have on it anyway? I'm wondering if this was a temp-related issue or something else.

Gillbot · Dec 30, 2009

Rubycon said:
PWM partially damaged on board is a good bet. I had it happen on a DFI X48 board. Interestingly enough it would still OC a 65nm cpu but put a Yorkie in there and all hell would break loose! Even stock. It happens. It's one of the joys of overclocking - an excuse to buy something newer and better. 😉

This would be my guess, it can handle the load at stock but the stress of OC makes it buckle.

Markfw · Dec 30, 2009

DrMrLordX said:
What kind of PWM cooling does that ASRock board have on it anyway? I'm wondering if this was a temp-related issue or something else.

Looks pretty good to me !

Edit (why doesn;t the image show....)

mav451 · Dec 30, 2009

My bet is CPU degradation/damage from that PSU failure - considering that the same mobo could sustain both 3.9Ghz and stock speeds. CPUs can degrade... but I don't think I've seen mobos degrade - especially since the IMC is now on the CPU. I mean we got the QPI I/O and southbridge...but I don't really put those at fault when it can still run stock and be stable.

Seems to me if you're varying CPU speeds for stability that the CPU is the key here.

Rubycon · Dec 30, 2009

mav451 said:
My bet is CPU degradation/damage from that PSU failure - considering that the same mobo could sustain both 3.9Ghz and stock speeds. CPUs can degrade... but I don't think I've seen mobos degrade - especially since the IMC is now on the CPU. I mean we got the QPI I/O and southbridge...but I don't really put those at fault when it can still run stock and be stable.

Seems to me if you're varying CPU speeds for stability that the CPU is the key here.

The only way to tell is try another motherboard. I agree 1366 boards are a ripoff. $500 is a LOT for a (non server) motherboard! I doubt we'll see cheap Amptron 1366 boards. Even if we did I would not want one!

As far as PWM cooling goes despite the factory supplied solution appearing adequate (most are) you may suffer from a poor mount or insufficient airflow. I always check temps (the Classified shows VRM temps in BIOS which is nice and revealed a mounting issue - 70°C stock in BIOS - 123°C under load! That would've blown out if left alone methinks.) before continuous deployment under heavy load. IR thermometers even with adjustable emissivity (REQUIRED when shooting metal!) are affordable now and no techie should be without one IMO. 😉

mav451 · Dec 30, 2009

I'm just wondering how he was able to run for so long if the PWM temps were bad, and then the two weeks with the new PSU. If it was a temperature issue, why didn't it fail earlier and then why does it still fail with a clock as low as only 3.2Ghz?

That said, he can probably just touch the PWM heatsink to see if it's working, which I think it would be if it was working well before the PSU failure.

fixxxer0 · Dec 30, 2009

i had a similar problem where i had a powersupply fail on me and my OC was no longer stable.

i couldnt go much above the stock FSB without having issues, luckily the board still ran stable at stock settings so i just kept it at stock, but definitely lost its ability to OC.

leaking/defective/exploded capacitor could just coincidently happen around the same time too... never know. but i would bet money that a new board would allow you to go back to the chips maximum

exar333 · Dec 30, 2009

Did you try running with a single DIMM installed? I had a similar issue with a C2Q a while back where all sticks passed memtest with flying colors, but would get intermittent errors. After testing single sticks in different slots, one of the DIMMs ended-up being the culprit. This would at least eliminate the RAM from the equation and leave just the CPU/MB as the culprit.

deimos3428 · Dec 30, 2009

Markfw900 said:
2 weeks later, all of a sudden it would lock up once a day. So back to 3.8 (just 190 instead of 195 BLCK). It couldn't work without locking up for 24 hours. So down to 3.6 (just the multi down one). Then 3.4, then 3.2. After 3.2 failed with all else equal, I said WTF, and changed PSU's to a Corsair 750. No diff. So then I went to stock, all settings except no EIST, C1E, etc.. but still has HT. So 3 days stable.

I'll just throw this other possibility out there as a long shot. It's possible the instability has nothing to do with overclocking at all.

It didn't seem to make a difference whether you're at 3.2 or 3.9 on the 1010W OCZ, or which PSU you used at 3.2. Stock worked but that's also way down at 2.66 in ultra-stable land, possibly stable enough to compensate for other instabilities.

You also mentioned a PSU failure from an unknown source, and that you blew a breaker. If it was the breaker that is attached to the PSU, you may have a problem with the AC circuit.

But yeah, it's probably the motherboard. Try reseating it in case there's just something shorting it out before replacing it.

Markfw · Dec 30, 2009

any aftermarket software that checks that ? Also, it will run even 3.9 for 12-18 hours before it locks up/

Also, this is in an Antec 900 case with all the fans on medium. Lots of airflow. And the abient is 68f now, since the window is open with a fan on blowing in, and its 32f out there.(4 comps in this room)

fixxxer0 · Dec 30, 2009

what OS? 64bit i assume right?

are you getting minidumps generated when the computer crashes?? open up those minidumps if you are, it could be a driver flaking out like i had... my ATI win7 x64 drivers were causing strange intermittent lockups.

after reinstalling windows i finally got everything working fine again.

Markfw · Dec 30, 2009

it locks up. No video at all. Win XP 32 bit. There is no screen saver at all (the GPU F@H client hates that). Blank screen in 10 minutes is all.

Rubycon · Dec 30, 2009

Markfw900 said:
it locks up. No video at all. Win XP 32 bit. There is no screen saver at all (the GPU F@H client hates that). Blank screen in 10 minutes is all.

Sounds like the box needs to be pulled and tested - memtest, occt, linpack etc. Disable ALL power management and screen savers and see what happens. I abhor the vanishing box! D:

dguy6789 · Dec 30, 2009

It's the motherboard I believe. I once experienced the exact symptoms you describe a few years back with a Pentium D. Degrading overclock, complete lockups started happening to a once completely rock solid and stable system. After a pretty long time of troubleshooting and being baffled, I looked very carefully at the motherboard and noticed that the cpu power port on the motherboard was damaged. Replaced the motherboard, problem solved.

Odd OC problem -- Not for noobs

Moderator Emeritus, Elite Member

Platinum Member

Moderator Emeritus, Elite Member

Platinum Member

Diamond Member

Senior member

Senior member

Moderator Emeritus, Elite Member

Madame President

Diamond Member

Moderator Emeritus, Elite Member

Lifer

Lifer

Moderator Emeritus, Elite Member

Senior member

Madame President

Senior member

Senior member

Diamond Member

Senior member

Moderator Emeritus, Elite Member

Senior member

Moderator Emeritus, Elite Member

Madame President

Diamond Member