Question Possible IMC failure? Phenom II X4 960T

mikeymikec

Lifer
May 19, 2011
17,704
9,559
136
This is a bit of an odd one.

I built a computer back in 2010/2011 which had a CPU upgrade a few years ago, spec as follows:

AMD Phenom II X4 960T (which I had been using in my own PC previously, unlocked all six cores no trouble), 6 cores activated
4GB RAM
ASUS M4A89GTD PRO/USB3, BIOS 2101 (not the latest)
Onboard graphics, ATI HD 4290, UMA + Sideport
Seagate 500GB HDD
DVD drive
Corsair VX450W PSU
Windows 7, all updates installed

Symptoms:
Problem came on suddenly (Monday morning it refused to boot properly), the computer hasn't previously ever had any stability issues)
BSODs seemingly at random (PFN_LIST_CORRUPT, HAL_INITIALISATION_FAILED, MEMORY_MANAGEMENT)
Graphics corruptions (even at WIndows login screen)

When I started investigating, the first thing I saw was the BIOS saying "5 cores activated" rather than 6, making me think that it could be to do with core unlocking so I immediately switched that off. Then I felt as the randomness of the BSOD messages are a dead ringer for bad RAM I ran a couple of cycles with memtest 4.3, no errors found. However, Windows would soon flounder with the above symptoms. I wondered whether maybe onboard graphics / graphics RAM issues could maybe be playing a role so I tried a spare graphics card. Windows booted and then hung about 20 minutes in, then would BSOD (same messages) on startup. I ran another load of memory tests with memtest86+ 5.01, no errors found. I swapped out the RAM for a spare module, Windows still hung on startup, and would even hang on the F8 startup menu after up to 30 seconds. Very odd that memory apparently so faulty that Windows would routinely hang or BSOD didn't show up at all during testing, I would have though that either memtest would be flagging errors virtually immediately.

Somewhere along the way I tried reactivating the cores and running a memtest but it very quickly switched off the PC. Deactivate extra cores, try again, no problems with memtest.

Also, with the 960T in there, the system has hung a couple of times on the BIOS screen at 'initialising USB controllers' (only a trackball and keyboard wireless receiver plugged in).

While I'm never a big believer in "the CPU went wrong!", I felt that all the issues coincide with a dodgy CPU so I swapped it out for a spare X2 240, no problems since (though Windows didn't like me swapping back to the onboard graphics and stopped on the loading screen, I had to install the graphics driver in safe mode even after uninstalling the card's driver, then no problems since).

Other possibilities I've considered:
Faulty memory, seems unlikely given I swapped out the memory and the system still hung
PSU, seems unlikely given how specific the BSODs are, and that the system seemed more stable despite more load from an old graphics card, and why no instability during memtest?
Board, possibly but again given how specific the BSODs are, seems unlikely to me.
Windows, given the hangs outside of Windows and the hang on the F8 menu, I think I can rule this out.

Ideally I'd like a way to test the IMC thoroughly. I have a spare Phenom II X4 in another system which I could try, and I suppose I'll end up putting it in this system anyway to try and bring the spec as close to the 960T as it can be. That should also rule out the board if the system continues to play ball.
 

mikeymikec

Lifer
May 19, 2011
17,704
9,559
136
What kind of NB voltage and vDIMM were you pushing on that chip?

Nothing changed from BIOS defaults. I've done a 'load setup defaults' on the BIOS at some point during troubleshooting so I wouldn't be able to check the original figures, though in my experience this board doesn't try to do anything crazy like auto-OC (I had the same model board for my own PC for some years).
 

DrMrLordX

Lifer
Apr 27, 2000
21,629
10,841
136
Then I would not automatically finger the IMC, unless unlocking the cores somehow caused long-term damage to the NB. I've never heard of that, but it's possible . . .
 

mikeymikec

Lifer
May 19, 2011
17,704
9,559
136
Right now the computer is calling me a liar: I put the 960T back in (4 cores only), and the system is working fine so far (including my attempts to make it hang on the F8 startup screen).

I'm running Prime95 blend in Windows atm.
 

DrMrLordX

Lifer
Apr 27, 2000
21,629
10,841
136
Nope. It looks like it's command line only, is there a particular mode / switch you'd run it with to do the test you'd advise, and for how long etc?

You can launch it okay from the Windows UI. It'll open a text window for you. Use the following options:

0 - Benchmark Pi
1 - Multi-threaded
5 - 500k decimal digits

(assuming you're still stuck on 4 GB of RAM)

Run that at least three times in a row. If it crashes, something's amiss.
 

mikeymikec

Lifer
May 19, 2011
17,704
9,559
136
Prime95 has run for nearly an hour without errors. I don't like my only workable theory at this point being that swapping the processor out eliminated something awry in the BIOS, so when I reinstated the 960T, it's being used as it should.

I'm considering running something a bit graphics-intensive on it to see whether there are any more graphics-related symptoms ready to come out of the woodwork.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,339
10,044
126
Somewhere along the way I tried reactivating the cores and running a memtest but it very quickly switched off the PC.
To me, "switching off the PC" is almost always a power or temp issue. Either the VRMs on the board are shot, or the PSU is marginal or flaky. That would be my best estimation. It can cause symptoms that are similar to flaky RAM.

I would:
1) Re-seat the CPU and RAM, blow dust out of CPU/RAM sockets, VRMs/VRM heatsinks, and CPU heatsink/fans.
2) Clear CMOS, using jumper
3) Replace CMOS battery, if system has been in use or worse, storage, for 5 years or longer.

(Might, if replacing CMOS battery, do the CLR_CMOS afterwards too.)

Also, go into BIOS and set AHCI mode or IDE mode as appropriate to the installed copy of Windows, after clearing CMOS. Leave everything else at default settings, to establish a baseline.

Run Memtest86+, Prime95, etc., hammer it after those procedures. See if it powers-off, or crashes.

Here's a working theory, too, unlocking the extra cores, put an extra load/strain on the board's VRMs, and now that they're weakened, the CPU isn't getting enough current or voltage, to maintain stability, and portions of the CPU, such as the Memory Controller, are dropping out momentarily or flaking out.

What CPU TDP is the board specified for?

I have a friend, that had an ASRock 760/780/785G-something AM2+ board, well, it was specified for 140W TDP CPUs, we never dropped a Phenom II in there (although I had purchased one for him to do so on ebay, at one point), but his quad-core Athlon II X4 640 (3.0Ghz, 100W TDP), ran just fine for nearly 10 years or so at stock speeds, never really flaked out or crashed.

We even ended up putting in 4x4GB DDR2-800 in there (from a China-seller, cheap, they were OEM branded sticks, could have been fake stickers, who knows, but it worked). An Athlon II, with 16GB of RAM on a DDR2 platform! He was spoiled! (Yes, had an SSD, and a GPU, too.)

Edit: Keep in mind that the X2 250 is a bit of a lower-powered CPU, compared to the 960T, especially with unlocked cores. If that board was specified for a 95W TDP, then unlocking those two extra cores, could have put it into the 125W TDP category, and might have been (slowly) over-loading the VRMs. If the board was specified for a 140W TDP, as my friend's was, then perhaps this doesn't apply.
 
Last edited:

mikeymikec

Lifer
May 19, 2011
17,704
9,559
136
Heat: The CPU heatsink did have some dust in, but it wasn't an alarming level of dust and both memtest programs were reporting CPU temps of about 60C. However, when I swapped the 960T back in, I cleaned out the HSF and it's made a noticeable difference from the fan speed alone, so I imagine the CPU temps must have dropped significantly too. Readouts in HW_Monitor seem fine (57C on load, stock heatsink).

The board is capable of handling 140W CPUs.

VRMs: I just would have thought with the amount of hammering I've given it since that it would have been able to push the VRMs to at least what they theoretically consistently couldn't handle before. Having said all of this, I'm not overly fond of reactivating the extra two cores for whatever reason at least for the time being.

I've also run fishgl (the website) to thrash the onboard graphics for 50 minutes (procexp verified 100% gpu usage and maxed out video RAM). Combined with Prime95 for an hour, I'm fairly confident the problem is gone considering that Windows before could barely stay up for 10 minutes under non-100% load.

I'll probably cycle round those tests again. Unfortunately I don't have a spare new battery so that is a job for another time.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,339
10,044
126
Heat: The CPU heatsink did have some dust in, but it wasn't an alarming level of dust and both memtest programs were reporting CPU temps of about 60C. However, when I swapped the 960T back in, I cleaned out the HSF and it's made a noticeable difference from the fan speed alone, so I imagine the CPU temps must have dropped significantly too. Readouts in HW_Monitor seem fine (57C on load, stock heatsink).
Yeah, those FM1, FM2, and Phenom II-era CPUs, some of them would "flake out" at temps above 63-65C, sometimes, depending on chip. I think that they were specified as a Max Operating Temp of 62.5C. (Correct me if I'm wrong.) 60C is darn close.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,339
10,044
126
Found this thread through web search:


Same rig?

Edit: Oh, ignore the comments in that thread from TB. He was a known troll.

Edit: To expand and modify my comments vis-a-via the 95W TDP versus 125W TDP on the VRMs... maybe it's not the VRMs (if the board is specified up to 140W TDP, it most likely isn't), but the CPU heatsink.

Are you using a stock 95/100W AMD heatsink? Maybe you need the quad-heatpipe 125W heatsink? Those two extra cores unlocked, sure are bumping up the heat. In that past thread, you mention running at 60C or 63C or something, stock heatsink, under load. That's right at the limit, which makes me wonder, if, all this time, you've been using the 95/100W AMD stock heatsink, and not something beefier, not taking into account that the two unlocked cores would push TDP up to the next bin.
 
Last edited:

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
Phenom II's were certainly temp sensitive. Back when I ran Phenom II's (I had a 550BE at 3.6GHz and a 965BE at 4GHz) keeping them below ~60C was important.

But it should also be noted that the chip was sold as a quad core for a reason. One of those other two cores did not pass AMD's standards. Yes, there could be the case where they needed quads, and sold a perfectly fine hex that was cut down. But there is no way of knowing that.
 
Last edited:
  • Like
Reactions: Thunder 57

moinmoin

Diamond Member
Jun 1, 2017
4,950
7,659
136
Deactivate extra cores, try again, no problems with memtest.
Wait, am I following you correctly in that you have no problems with your X4 960T when running with 4 cores (as it's supposed to be) instead the full 6 cores? If all the issues you describe are due to the additional two cores I'd personally leave them off permanently.
 

mikeymikec

Lifer
May 19, 2011
17,704
9,559
136
Wait, am I following you correctly in that you have no problems with your X4 960T when running with 4 cores (as it's supposed to be) instead the full 6 cores? If all the issues you describe are due to the additional two cores I'd personally leave them off permanently.

The problem carried on after deactivating the extra 2 cores, then disappeared when switching out CPUs and back again. I haven't reactivated the extra 2 cores because I wanted to be certain that the problem is gone for good.
 
  • Like
Reactions: moinmoin

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
28,482
20,570
146
The problem carried on after deactivating the extra 2 cores, then disappeared when switching out CPUs and back again. I haven't reactivated the extra 2 cores because I wanted to be certain that the problem is gone for good.
I like the flaky bios hypothesis. Or perhaps it just needed to be pulled and reseated. Putting the other CPU in may have been superfluous. I don't know if it is a contact issue or what, but reseating works sometimes, when nothing else does, IME.

BTW, have 960T right now, and astonished how well it performs, all things considered.
 

Charlie22911

Senior member
Mar 19, 2005
614
228
116
Bad electrolytic capacitor(s) somewhere? If you are nerdy enough to to have access to some test equipment at home, you could check ripple on your rails and ESR and capacitance of your caps.
Tantalum and ceramic caps fail short IIRC, so I wouldn’t bother with those.
 
  • Like
Reactions: Stuka87