Question I make CPU fan booboo and cause unrecoverable CPU error?

Jul 27, 2020
16,339
10,349
106
Some of you may know that an XFX RX 580 card that was previously mined on, blew up my Z77 mobo's PCIe lanes when I was running the Pimax VR setup tool on it and clicked the Edge browser. The card's fans went into full screaming mode that sounds really scary, like it is going to blow up and the PC froze. RX 480 also had the same issue because AMD didn't prevent the card from pulling more than 75W power from the PCIe slot.

Fast forward to yesterday. My 12700K is idling on the High Performance power plan. I was doing something on my laptop and was going to watch a movie on my OLED connected to my 12700K. Suddenly, the fans of my Gigabyte Aorus Master RX 6800 went into Serious Sam kamikaze mad mode. I checked and the PC had frozen. Rebooted it and everything was fine. But what the heck is this? Why am I the only one that this happens to? The PC wasn't even doing anything!

I'm guessing that this is an issue with the Adrenaline drivers, since that's the only thing common between my two incidents. I will check the Adrenaline log today in the Program Files folder (hope it's there) and see what's up.

Anyone else have this happen, EVER?

Please don't let me feel that my relationship with AMD is cursed.
 
Last edited:
Jul 27, 2020
16,339
10,349
106
For whatever reason , your system is unstable. Dial back any custom settings, like RAM timings, undervolts etc.
I'm well aware of that. But what does my system being occasionally unstable have to do with the graphics card going crazy? Makes no sense, unless the Adrenaline drivers send a kill command to the GPU upon crashing.

That is a bad and potentially serious flaw (imagine if I had been AFK for an hour or so and came back to a damaged GPU whose fans wore themselves out and the GPU possibly also went beyond the max temperature threshold).
 

coercitiv

Diamond Member
Jan 24, 2014
6,211
11,941
136
I'm well aware of that. But what does my system being occasionally unstable have to do with the graphics card going crazy? Makes no sense, unless the Adrenaline drivers send a kill command to the GPU upon crashing.
Think of the card fans going full tilt as precautionary measure, since the card has no guidance anymore. Like a severed limb that can no longer talk to the central nervous system :) Depeding on the card brand and implementation, this default speed can be quite high. My Sapphire Pulse Vega 56 was quite a noise maker in this mode, the new Asus TUF 6800XT is comparatively silent.

RAM errors can be nasty and hard to understand from the user perspective. For example, two weeks ago I updated my UEFI and since I had some free time one evening, I thought I should restart my efforts for a DDR4 4000 OC on a dual rank set. I didn't get far as memory training was already a problem even with relaxed timings, but something else intervened while I was rebooting the system and undoing the OC. I forgot about the OC, more exactly I forgot to double check everything was back to safe config for work next day.

A week later I decided to play a new game. Initially the game had some random crashes, but I figured it was a buggy one. Next evening the crashes become too frequent, I was frustrated with the poor quality of the engine. Third evening I finally got a blue screen and that's when I knew something was wrong: checked UEFI, saw the 4000 OC that somehow managed to train and stay relatively stable during work for 1 week+, except while playing the game.

After dialing back the OC to safe settings the game never crashed again. The AMD driver didn't hate me, it was merely screaming in pain while juggling with a nondeterministic system.

That is a bad and potentially serious flaw (imagine if I had been AFK for an hour or so and came back to a damaged GPU whose fans wore themselves out and the GPU possibly also went beyond the max temperature threshold).
The fans are rated for that speed. The card won't overheat with fans running hard.

The AMD driver is extremely rugged nowadays, if your system locks down it must me something really bad. For example, a borderline bad RDNA2 overclock usually results in a driver reset instead of system reset. Also think about my example above, my system was unstable and still the machine crashed on just one occasion, the rest of the time it was the game that crashed. It even threw out a prompt to send debug info to the developer, which meant I was even able to save my progress before the game locked down completely.

So, go back to the basics. Stock CPU voltage, JEDEC timings on the RAM. Do some stability testing, run like this for a few days.
 

Leeea

Diamond Member
Apr 3, 2020
3,625
5,368
136
I'm well aware of that. But what does my system being occasionally unstable have to do with the graphics card going crazy? Makes no sense, unless the Adrenaline drivers send a kill command to the GPU upon crashing.

That is a bad and potentially serious flaw (imagine if I had been AFK for an hour or so and came back to a damaged GPU whose fans wore themselves out and the GPU possibly also went beyond the max temperature threshold).
The fans can run at 100% for about five years.

A day or so will not matter.

The gpu is not running wide open, it likely went into stand by mode.

Fyi your cpu also went to Max fan speed, along with the mainboard aio. If the mainboard had a bridge cooling fan that would have gone Max speed to.
 
Jul 27, 2020
16,339
10,349
106
Fyi your cpu also went to Max fan speed, along with the mainboard aio. If the mainboard had a bridge cooling fan that would have gone Max speed to.
What is this thing called? Why is it happening to me 2nd time around with a different system? Why can't I find similar issues faced by anyone else out there?

Instead of going full mad banshee, why doesn't everything just shut down?
 

Leeea

Diamond Member
Apr 3, 2020
3,625
5,368
136
Instead of going full mad banshee, why doesn't everything just shut down?
Everything in your computer is controlled by the CPU when it is running. All the fans, lights, VRMs, everything. None of it thinks on its own after being initialized by the UEFI.

In order for anything to shut down, they need to receive a command from the CPU to shut down.


If no commands are sent, after a certain time interval every component into your system goes into FAILSAFE.


For fans, failsafe is max speed, because that provides max cooling.



Your CPU is hard crashing. Instant failure, it just ceases function. Your CPU is dependent on three things: Itself, RAM, and Power.


Why can't I find similar issues faced by anyone else out there?
This is a very common issue. Standard behavior for unrecoverable CPU hardware errors. Your not the only person to have experienced it.

When I overclock my ram timings to hard, my system does the same thing.

Instead of going full mad banshee, why doesn't everything just shut down?
The fan controller no longer has communication. It does not know what is happening. If it just shut down and the CPU or whatever component it was cooling was still active said component would melt a hole through the circuit board.

FAILSAFE means assuming worst case scenario. For a fan, that is usually MAX speed for MAX cooling.

What is this thing called?
unrecoverable CPU fault
 
Last edited:

Leeea

Diamond Member
Apr 3, 2020
3,625
5,368
136
was going to watch a movie on my OLED connected to my 12700K
did you plug in an hdmi cable to do that?

if so, have you consider the possibility of a dirty/noisy ground on the hdmi cable? or OLED TV?

It is possible the issue is not with your computer at all, but a defective powersupply on the TV or something hooked to the TV.

Have you tried plugging the TV and computer into the same outlet / powerstrip?
 
Jul 27, 2020
16,339
10,349
106
Have you tried plugging the TV and computer into the same outlet / powerstrip?
Both on the same powerstrip.

I do use HDMI from the PC to the OLED. If that were the issue, it would happen more frequently. So far only once with the new build.

This is only the 2nd time ever that this has happened to me. First with a Z77 mobo where I was trying to use the Pimax VR headset and now this sudden hard crash. At least with VR, I was doing something that could be considered hard work for the CPU/GPU. But hard crashing on an idling desktop is just stupid and so random.
 
  • Wow
Reactions: Leeea

ZGR

Platinum Member
Oct 26, 2012
2,052
656
136
Crash during idle is quite common when something ain't quite right. I had a 1070 that caused that until I reinstalled Windows. But a fresh Windows install won't help you here.

What kit of RAM are you using? I'd do what was suggested above where you put your RAM back to default JEDEC timings to play it safe and go from there.
 
Jul 27, 2020
16,339
10,349
106
What kit of RAM are you using?
G.Skill EXPO kit with no XMP profile with a Z790 mobo.

I'm waiting for the hard crash to happen two more times. Too lazy to haul the huge case to the living room and disassemble the heatsink to replace the RAM (have a different kit I can try). Installed kit is useless anyway. Doesn't go over DDR5-4600.
 

Shmee

Memory & Storage, Graphics Cards Mod Elite Member
Super Moderator
Sep 13, 2008
7,409
2,443
146
You could run some stress tests, specifically I would do memtests and CPU stress tests. Also, did you check the event viewer? Like for critical errors, and any related WHEA errors.

If the system is hard freezing, as in everything gets stuck on the screen and non responsive, including clock, but the picture is still there, that sounds like a CPU fault to me. Can happen at load or idle. I had a 5900X that would randomly do that, before I sent it in for RMA.
 
Jul 27, 2020
16,339
10,349
106
If the system is hard freezing, as in everything gets stuck on the screen and non responsive, including clock, but the picture is still there, that sounds like a CPU fault to me. Can happen at load or idle. I had a 5900X that would randomly do that, before I sent it in for RMA.
I'll check the event log.

It once froze with the screen being displayed but the fans didn't go haywire. I just rebooted it coz I know the RAM is not 100% stable at the aggressive timings I've configured (28-30-30-60) and that happened with the iGPU.

When the fans went crazy, it happened with the RX 6800 and screen was black (system was idle on desktop and OLED was in standby mode. When the fans ramped up, I turned the OLED on but there was no HDMI signal so rebooted the PC and it was fine after that).
 
Jul 27, 2020
16,339
10,349
106
Ahem...while looking in the BIOS yesterday, I saw that the CPU fan was N/A in hardware monitor but chassis fan speed was being reported. Guess I brilliantly plugged the CPU fan in the wrong header. Chassis fan was monitoring the MB temp which would be VRM temp? So it wasn't going much more than 900 RPM.

Maybe that was the cause of instability? I have changed the option to monitor CPU and I can now hear the fan (was surprised why the system was so silent before). I'm such a n00b :blush:
 
Jul 27, 2020
16,339
10,349
106
I'm kinda surprised that ASROCK has no n00b protection in their BIOS. It SHOULD have warned me that it's not getting any signal from the CPU fan header. It's a Sonic branded mobo. Does Sonic know anything about building PCs? No. So they should have taken that into account. How are Sonic fans supposed to know something if Sonic himself has no clue about it? That's a FAIL, ASROCK.
 
  • Haha
Reactions: DAPUNISHER