Phenom 9600 Large FFT problems

ShawnD1

Lifer
May 24, 2003
15,987
2
81
Long story short, it's no longer stable at stock configuration. With "normal" voltages on everything and "auto" speed for everything, Prime95 Large FFT fails after about an hour. TLB patch on or off makes no difference; it fails in both cases after the same amount of time. I tried boosting northbridge and CPU voltages by 0.025V while keeping stock speeds on everything, but it still fails after an hour.

What's really upsetting me is how the test is failing. It's not just the standard error where Prime stops and the icon changes from green to red. The computer locks up and requires a hard reset. What kind of crash is that? It's not a temperature problem since all 4 cores run at about 47C.

What should I try doing next? Underclock the CPU? Underclock something else? Boost voltage a bit more? Kick the computer really hard?
 

ShawnD1

Lifer
May 24, 2003
15,987
2
81
This motherboard's bios doesn't allow memory tweaking. I can change it from the stock 800mhz down to something slower like 533mhz but I can't change any of the timings. I'll give this shot, but more ideas are always welcome.

I can't swap out memory sticks either since this computer only has 1 stick and I don't have any spares.
 

ShawnD1

Lifer
May 24, 2003
15,987
2
81
While the HSF doesn't have any dust in it, I noticed the northbridge is so hot that I can't hold my thumb against it. I touched a few of the larger capacitors and they are also incredibly hot.

I'll try underclocking it and lowering voltages all around. This computer's motherboard is one of those $50 ones on newegg so it's possible it really is not designed to power something as large as the phenom 9600. These things are rated at 125W which is really quite a bit.
 

ShawnD1

Lifer
May 24, 2003
15,987
2
81
I think I found the problem. I increased the FSB voltage by 0.05 and now it's prime stable.
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
Originally posted by: ShawnD1The computer locks up and requires a hard reset. What kind of crash is that? It's not a temperature problem since all 4 cores run at about 47C.

It is a fairly common failure when you give WAAAAY to little voltage... On my laptop chip, when underclocked. I get hard lockups at 0.8v and below, I get random lockups, blue screens, at 0.9, at 1.0 to 1.1 I will get "errors" in OCCPT and linepack, without a lockup, at 1.2v everything passes. (my VID is 1.4v)

have you been overclocking it? how old is it? as voltage passes through ANY printed circuit it is physically degraded over time, causing a gradually increasing leakage, voltages have to be constantly raised to match that. The default voltage is high enough so that in 3 years of use, the increase in voltage requirement will not cause errors... that is, a FRESH CPU can be undervolted safely, but you will have to constantly raise the voltage bit by bit as it ages. The "default" is intentionally set high enough so that 90% of CPUs will last 3 years @ stock voltage and clockspeed without a voltage increase and no errors. (at least by intel, who publically stated what I just said, I don't know how AMD does it).

as you increase the voltage to maintain stability, degradation occurs faster, and eventually it just fries the processor...
If it is out of warranty, well, tough luck, you can wring some more use out of it by increase the voltage, but pretty soon it will die on you. Try bumping it by 0.1 or 0.2v over stock. If it is the CPU aging, than that should allow it to run well right now...
If it doesn't work, there is a good chance that it is your ram or mobo that is failing...

you could also have a CPU failure that no amount of extra voltage will fix...

be warned that 0.2v might be too much for some processors, I am not familiar with the phenom. You should look up the agreed upon "maximum safe overvolt" and just try that.

Also, if you have been overclocking while leaving your mobo on "auto" voltage, there is a good change it overvolted your chip to UNsafe voltages which damaged it.
 

ShawnD1

Lifer
May 24, 2003
15,987
2
81
Originally posted by: cusideabelincoln
Try fiddling with the memory: Removing sticks, relaxing timings.

The instability problems came back with a vengeance last week and I've been trying to figure out what is going on. Some old posts on the AMD forum suggested running the memory below rated speed such as 667mhz instead of the memory's rated 800mhz. I set it to 667mhz and it worked. This system was only about 10 minutes OCCT stable at stock speeds, but it has been crunching away for more than 8 hours since lowering the ram to 667mhz.

Does it sound like this would be caused by my ram being literally the cheapest ram on Newegg or does it sound like a faulty memory controller? The reason I ask is that the person on the AMD forum had the exact same processor as me and other AMD people seemed to know what the problem was right away.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: ShawnD1
While the HSF doesn't have any dust in it, I noticed the northbridge is so hot that I can't hold my thumb against it. I touched a few of the larger capacitors and they are also incredibly hot.

Originally posted by: ShawnD1
I think I found the problem. I increased the FSB voltage by 0.05 and now it's prime stable.

If it was stable and now it isn't then that suggests something has degraded to the point that it is no longer able to operate stably with a given voltage at that temperature (shmoo plots are temperature dependent).

While increasing the voltage (and in concert the operating temps) to reach the next stability plateau is an option, doing so will only increase the rate of degradation of the component itself.

If your cause-and-effect analysis is correct, and the northbridge needs more voltage to operate stably at an already hand-searing temperature, then an alternative solution would be to jerry-build (aka ghetto-style) a fan in the case so as to blow more air over the components of interest in an attempt to lower the temperature and bring down the stability plateau to the existing operating voltage.

This will have an ancillary effect of reducing the rate of degradation in the effected component(s) as well.
 

ShawnD1

Lifer
May 24, 2003
15,987
2
81
Originally posted by: heyheybooboo
What motherboard/chipset?

I'm not at home right now and I don't know the exact model, but I know it's a 740G chipset and it's the cheapest Gigabyte motherboard listed on newegg
 

Zap

Elite Member
Oct 13, 1999
22,377
7
81
Originally posted by: ShawnD1
This computer's motherboard is one of those $50 ones on newegg so it's possible it really is not designed to power something as large as the phenom 9600. These things are rated at 125W which is really quite a bit.

Most cheap mobos are rated for up to 95W CPUs, so you may be unduly stressing your motherboard. I would do like Idontcare suggests and rig a fan to blow down on the chipset area. You can also try running with the case side off and a desk fan blowing into it.

EDIT: I remember that Gary Key (motherboard reviewer here at Anandtech) had some issues with running higher wattage CPUs, including killing several motherboards.

AMD 780G Motherboards

They basically had a spectacular run of blowing up motherboards by putting 125W CPUs in them, some failing at stock speeds even.
 

cusideabelincoln

Diamond Member
Aug 3, 2008
3,275
46
91
Originally posted by: ShawnD1
Originally posted by: cusideabelincoln
Try fiddling with the memory: Removing sticks, relaxing timings.

The instability problems came back with a vengeance last week and I've been trying to figure out what is going on. Some old posts on the AMD forum suggested running the memory below rated speed such as 667mhz instead of the memory's rated 800mhz. I set it to 667mhz and it worked. This system was only about 10 minutes OCCT stable at stock speeds, but it has been crunching away for more than 8 hours since lowering the ram to 667mhz.

Does it sound like this would be caused by my ram being literally the cheapest ram on Newegg or does it sound like a faulty memory controller? The reason I ask is that the person on the AMD forum had the exact same processor as me and other AMD people seemed to know what the problem was right away.

Well I was just taking a stab at the dark with the memory. It seemed like the first place to start.

I will say it seems across several forums I notice people have problem with the B2 stepping Phenom processors (9600, 9500). Whether it's unexplainable poor performance, CnQ errors, or just plain instability, I'm beginning to think those chips had a lot of defects. I have not, though, noticed any similar complaints about the B3 stepping (9550, 9850) processors. But all of this is just anecdotal.

I would do what IDC suggested and make sure you rule cooling (heating) as the "problem". So get a fan on that northbridge.

As for the problem Zap mentioned, the area around the CPU socket is where the VRMs are located, so you'll want to keep them cool. With higher-wattage processors, many of the cheap boards' VRMs can't handle it and they go boom. Although I don't think this is the problem in your case, but it could be. Can you monitor the voltages that the motherboard is actually delivering to its components?
 

heyheybooboo

Diamond Member
Jun 29, 2007
6,278
0
0
^ ^ ^ ^ ^
What they said.

And as you noted undervolting will help yah a bunch. I think the issue is that the 9600BE is a 95w proc that acts like 125w - :D

Try something like 1.1 --- 1.15 ---- 1.2v and see if you can run stable at stock clocks...
 

VirtualLarry

No Lifer
Aug 25, 2001
56,587
10,225
126
I have a buddy that built a rig based on a B2 step Phenom, I forget the exact model, but it had some monster sale at Newegg so he picked up the CPU cheap. He was never able to run it stable with a 64-bit OS for some reason. It was always crashing, not even overclocked.

Those were pretty crappy procs, IMHO.
 

Yukmouth

Senior member
Aug 1, 2008
461
0
0
OCCT and Prime 95 are a real PITA with this 720BE. I'm stable for about eight hours in either, then I get a BSOD/restart at 3.8ghz or *any* frequency above stock. I have an MSI C45.

I can't get this thing to crash in day to day use, video encoding, audio encoding, or any game I throw at it.

But it crashes in OCCT and Prime :confused: ... I've got a love/hate thing going on right now. At $100 bucks I could care less as long as it works for what I need it for. It'd be nice to claim 100% stable in these stupid stress tests though.

FYI, AMD overdrive stress testing dose not crash my system at these speeds :roll:.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: Yukmouth
OCCT and Prime 95 are a real PITA with this 720BE. I'm stable for about eight hours in either, then I get a BSOD/restart at 3.8ghz or *any* frequency above stock. I have an MSI C45.

I can't get this thing to crash in day to day use, video encoding, audio encoding, or any game I throw at it.

But it crashes in OCCT and Prime :confused: ... I've got a love/hate thing going on right now. At $100 bucks I could care less as long as it works for what I need it for. It'd be nice to claim 100% stable in these stupid stress tests though.

FYI, AMD overdrive stress testing dose not crash my system at these speeds :roll:.

I'm not saying its a guaranteed type thing, but saying your system won't crash when running those apps doesn't mean that while running those apps the maths are actually being computed correctly.

For all you know there might be some 3.11-3.10=0.0 type math errors going on and silently corrupting your install and data files.

Not that you are ever ensured this isn't happening when oc'ing, even if occt and prime95 stable, but being occt and prime95 stable at least puts you that much farther away from the point at which silent computation errors are going to occur.
 

Yukmouth

Senior member
Aug 1, 2008
461
0
0
Originally posted by: Idontcare
Originally posted by: Yukmouth
OCCT and Prime 95 are a real PITA with this 720BE. I'm stable for about eight hours in either, then I get a BSOD/restart at 3.8ghz or *any* frequency above stock. I have an MSI C45.

I can't get this thing to crash in day to day use, video encoding, audio encoding, or any game I throw at it.

But it crashes in OCCT and Prime :confused: ... I've got a love/hate thing going on right now. At $100 bucks I could care less as long as it works for what I need it for. It'd be nice to claim 100% stable in these stupid stress tests though.

FYI, AMD overdrive stress testing dose not crash my system at these speeds :roll:.

I'm not saying its a guaranteed type thing, but saying your system won't crash when running those apps doesn't mean that while running those apps the maths are actually being computed correctly.

For all you know there might be some 3.11-3.10=0.0 type math errors going on and silently corrupting your install and data files.

Not that you are ever ensured this isn't happening when oc'ing, even if occt and prime95 stable, but being occt and prime95 stable at least puts you that much farther away from the point at which silent computation errors are going to occur.

Such a problem should make Windows runs terribly should it not? Data recovery errors, GUI glitches, ect? None to happy about that highly probable point though, as I patiently await these AMD price drops. My 720 is not the best with voltage either, seems like more recent quad cores need less than my 720 to reach 3.8ghz.

I've never had to re-install my OS on this setup for any reason, but I'd imagine the downclocking of cool & quiet is a huge help in avoiding errors.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: Yukmouth
Such a problem should make Windows runs terribly should it not? Data recovery errors, GUI glitches, ect? None to happy about that highly probable point though, as I patiently await these AMD price drops. My 720 is not the best with voltage either, seems like more recent quad cores need less than my 720 to reach 3.8ghz.

I've never had to re-install my OS on this setup for any reason, but I'd imagine the downclocking of cool & quiet is a huge help in avoiding errors.

The pentium FDIV bug didn't cause windows to crap out or freeze, but it did cause all kinds of havoc on engineering calcs and programs.

At any rate I wasn't trying to make you feel bad or crappy about your rig, or feel less secure about using it, was just trying to give some rational on why people endeavor to get their rig to a happy place that doesn't give them any symptoms of the possibility of there being underlying silent corruption of their files.

There is no way to 100% rule it out even on a stock computer, but operating a system which is known to be less than stable at some point in testing with a rigorous application like prime95 or occt certainly raises a flag that the probability of having silent corruption is now higher on that rig than it would be if you lowered temps or clockspeed enough to reduce or eliminate the instability symptoms.

Its just a "look for fire where there is smoke" kind of situation, you've got the smoke (the symptom, instability with an application) so the only question is do you have a fire (the root-cause of the smoke, data corruption in less rigorous applications, etc)?