Question Several different blue screens, cant find a cause

rogerdv

Member
Dec 2, 2010
149
4
81
I received a new PC for work a month ago, and a week after I started to get frequent blue screens. The most repeated are PFN list corrupt and System thread exception not handled, but also got Kernel mode heap corruption, Driver Irq not less or equal and Internal power error. Of course, I have no idea whats in common among all those errors. I reainstalled the system and the errors persists, which leads me to think that it is a hardware problem, specially because during the travel the PC got hit so hard that even a heatsink in the motherboard (an ASUS Rog Zenith Extreme Alpha) got bent in a corner. But my boss insists that it is not a hardware problem, but something that I installed that infected the PC, or some driver installed by DriverPack. He used the PC without problems for a week in USA before bringing it, and problems stated here, when we added a couple of additional cards (ESATA and SATA expansions) and a dozen hard drives. For him, definitely is not hardware or the hit, he thinks that if the motherboard is broken, it works or it doesnt and the errors prove that (some website says that the Power error can even be caused by a virus). Any suggestion about what can be wrong here?
 

Steltek

Diamond Member
Mar 29, 2001
3,042
753
136
Is the machine set up to write kernel dumps when it crashes? If so, use something like the free version of Whocrashed to do a crash dump analysis. If it is a probable hardware problem, the crash dump analysis will probably say so. If it is driver-based, the analysis will probably give you the drivers involved for further investigation.

Question: Did he ship happen to ship the thing with the CPU cooler installed? If so, is it water cooled or air cooled?
 
  • Like
Reactions: knev

rogerdv

Member
Dec 2, 2010
149
4
81
Is the machine set up to write kernel dumps when it crashes? If so, use something like the free version of Whocrashed to do a crash dump analysis. If it is a probable hardware problem, the crash dump analysis will probably say so. If it is driver-based, the analysis will probably give you the drivers involved for further investigation.

Question: Did he ship happen to ship the thing with the CPU cooler installed? If so, is it water cooled or air cooled?

Yes, the motherboard was shipped inside another case, with a watercooling system. We reassembled it here in our larger case, with same watercooling system.
 

rogerdv

Member
Dec 2, 2010
149
4
81
This is the Whocrashed report:

Code:
Computer name: DESKTOP-URBK0PG
Windows version: Windows 10 , 10.0, version 1903, build: 18362
Windows dir: C:\Windows
Hardware: ASUSTeK COMPUTER INC., ROG ZENITH EXTREME ALPHA
CPU: AuthenticAMD AMD Ryzen Threadripper 2950X 16-Core Processor AMD8664, level: 23
32 logical processors, active mask: 4294967295
RAM: 34219630592 bytes (31.9GB)

[B] 
 Crash Dump Analysis[/B]


Crash dumps are enabled on your computer. 

[B]Crash dump directories:[/B] 
C:\Windows
C:\Windows\Minidump

[B]On Mon 3/23/2020 12:27:38 PM your computer crashed or a problem was reported[/B]
crash dump file: C:\Windows\Minidump\032320-37968-01.dmp
This was probably caused by the following module: [URL='http://www.google.com/search?q=ntoskrnl.exe']ntoskrnl.exe[/URL] (nt+0x1BC810) 
Bugcheck code: 0x50 (0xFFFFF8FC40000000, 0x0, 0x0, 0x6)
Error: [URL='http://www.google.com/search?q=MSDN+bugcheck+PAGE_FAULT_IN_NONPAGED_AREA']PAGE_FAULT_IN_NONPAGED_AREA[/URL]
file path: C:\Windows\system32\ntoskrnl.exe
product: [URL='http://www.google.com/search?q=Microsoft® Windows® Operating System']Microsoft® Windows® Operating System[/URL]
company: [URL='http://www.google.com/search?q=Microsoft Corporation']Microsoft Corporation[/URL]
description: NT Kernel & System
Bug check description: This indicates that invalid system memory has been referenced. 
This appears to be a typical software driver bug and is not likely to be caused by a hardware problem. 
The crash took place in the Windows kernel. Possibly this problem is caused by another driver that cannot be identified at this time. 



[B]On Mon 3/23/2020 12:27:38 PM your computer crashed or a problem was reported[/B]
crash dump file: C:\Windows\MEMORY.DMP
This was probably caused by the following module: [URL='http://www.google.com/search?q=fltmgr.sys']fltmgr.sys[/URL] (FLTMGR!FltIsCallbackDataDirty+0x2EE) 
Bugcheck code: 0x50 (0xFFFFF8FC40000000, 0x0, 0x0, 0x6)
Error: [URL='http://www.google.com/search?q=MSDN+bugcheck+PAGE_FAULT_IN_NONPAGED_AREA']PAGE_FAULT_IN_NONPAGED_AREA[/URL]
file path: C:\Windows\system32\drivers\fltmgr.sys
product: [URL='http://www.google.com/search?q=Sistema operativo Microsoft® Windows®']Sistema operativo Microsoft® Windows®[/URL]
company: [URL='http://www.google.com/search?q=Microsoft Corporation']Microsoft Corporation[/URL]
description: Administrador de filtros del sistema de archivos de Microsoft
Bug check description: This indicates that invalid system memory has been referenced. 
This appears to be a typical software driver bug and is not likely to be caused by a hardware problem. 
The crash took place in a file system driver. Since there is no other responsible driver detected, this could be pointing to a malfunctioning drive or corrupted disk. It's suggested that you run CHKDSK.



[B]On Fri 3/20/2020 12:06:42 PM your computer crashed or a problem was reported[/B]
crash dump file: C:\Windows\Minidump\032020-39390-01.dmp
This was probably caused by the following module: [URL='http://www.google.com/search?q=ntoskrnl.exe']ntoskrnl.exe[/URL] (nt+0x1BC810) 
Bugcheck code: 0xA0 (0x608, 0xFFFFA30D8CF2A018, 0x0, 0x0)
Error: [URL='http://www.google.com/search?q=MSDN+bugcheck+INTERNAL_POWER_ERROR']INTERNAL_POWER_ERROR[/URL]
file path: C:\Windows\system32\ntoskrnl.exe
product: [URL='http://www.google.com/search?q=Microsoft® Windows® Operating System']Microsoft® Windows® Operating System[/URL]
company: [URL='http://www.google.com/search?q=Microsoft Corporation']Microsoft Corporation[/URL]
description: NT Kernel & System
Bug check description: This bug check indicates that the power policy manager experienced a fatal error. 
This is likely to be caused by a hardware problem. This problem might also be caused because of overheating (thermal issue). 
The crash took place in the Windows kernel. Possibly this problem is caused by another driver that cannot be identified at this time. 



[B]On Fri 3/20/2020 11:53:59 AM your computer crashed or a problem was reported[/B]
crash dump file: C:\Windows\Minidump\032020-37750-01.dmp
This was probably caused by the following module: [URL='http://www.google.com/search?q=ntoskrnl.exe']ntoskrnl.exe[/URL] (nt+0x1BC810) 
Bugcheck code: 0xA0 (0x608, 0xFFFFA08F4FE57058, 0x0, 0x0)
Error: [URL='http://www.google.com/search?q=MSDN+bugcheck+INTERNAL_POWER_ERROR']INTERNAL_POWER_ERROR[/URL]
file path: C:\Windows\system32\ntoskrnl.exe
product: [URL='http://www.google.com/search?q=Microsoft® Windows® Operating System']Microsoft® Windows® Operating System[/URL]
company: [URL='http://www.google.com/search?q=Microsoft Corporation']Microsoft Corporation[/URL]
description: NT Kernel & System
Bug check description: This bug check indicates that the power policy manager experienced a fatal error. 
This is likely to be caused by a hardware problem. This problem might also be caused because of overheating (thermal issue). 
The crash took place in the Windows kernel. Possibly this problem is caused by another driver that cannot be identified at this time. 


[B]On Fri 3/20/2020 6:50:49 PM your computer crashed or a problem was reported[/B]
crash dump file: C:\Windows\Minidump\032020-48390-01.dmp
This was probably caused by the following module: [URL='http://www.google.com/search?q=ntoskrnl.exe']ntoskrnl.exe[/URL] (nt+0x1BC810) 
Bugcheck code: 0xA0 (0x608, 0xFFFF860F9DFCC278, 0x0, 0x0)
Error: [URL='http://www.google.com/search?q=MSDN+bugcheck+INTERNAL_POWER_ERROR']INTERNAL_POWER_ERROR[/URL]
file path: C:\Windows\system32\ntoskrnl.exe
product: [URL='http://www.google.com/search?q=Microsoft® Windows® Operating System']Microsoft® Windows® Operating System[/URL]
company: [URL='http://www.google.com/search?q=Microsoft Corporation']Microsoft Corporation[/URL]
description: NT Kernel & System
Bug check description: This bug check indicates that the power policy manager experienced a fatal error. 
This is likely to be caused by a hardware problem. This problem might also be caused because of overheating (thermal issue). 
The crash took place in the Windows kernel. Possibly this problem is caused by another driver that cannot be identified at this time. 


[B]On Thu 3/19/2020 11:34:32 AM your computer crashed or a problem was reported[/B]
crash dump file: C:\Windows\Minidump\031920-37671-01.dmp
This was probably caused by the following module: [URL='http://www.google.com/search?q=ntoskrnl.exe']ntoskrnl.exe[/URL] (nt+0x1BC810) 
Bugcheck code: 0x4E (0x99, 0x3D9C3B, 0x2, 0x60002800028A63A)
Error: [URL='http://www.google.com/search?q=MSDN+bugcheck+PFN_LIST_CORRUPT']PFN_LIST_CORRUPT[/URL]
file path: C:\Windows\system32\ntoskrnl.exe
product: [URL='http://www.google.com/search?q=Microsoft® Windows® Operating System']Microsoft® Windows® Operating System[/URL]
company: [URL='http://www.google.com/search?q=Microsoft Corporation']Microsoft Corporation[/URL]
description: NT Kernel & System
Bug check description: This indicates that the page frame number (PFN) list is corrupted. 
This bug check belongs to the crash dump test that you have performed with WhoCrashed or other software. It means that a crash dump file was properly written out. 
The crash took place in the Windows kernel. Possibly this problem is caused by another driver that cannot be identified at this time. ]/CODE]

Not very useful, at least for me.
 

Steltek

Diamond Member
Mar 29, 2001
3,042
753
136
My first question is, if it was in a case, how the heck did the motherboard get hit hard enough to dent a heatsink? That shouldn't be possible, unless your boss left something reasonably heavy inside the case that came loose during transport (like a radiator or video card perhaps??). My second question would be to ask whatever else that it might have hit on the board? I suspect your bosses' attitude on the issue is probably is dictated by the fact he has no intention to admit that he might have destroyed a $600 motherboard because he didn't bother to package it correctly for shipment. If he had shipped anything other than a Threadripper like this (due to the TR torqued socket design), the CPU probably would have pulled out of the socket and been destroyed, too.

As far as errors go, you've got a dented heatsink and three documented kernel crashes there that were potentially a result of overheating. I'd concentrate on that first as the other two could be red herrings (i.e. a result of file system corruption resulting from the prior errors). If you run the machine under load for a while, then shut down and enter the BIOS, are the chipset temps in the BIOS monitoring out of line?

You might try to wipe Windows and reinstall the system again with only necessary drivers (i.e. AMD's chipset drivers, your video drivers, and Ryzen Master for temperature monitoring to start) and run a Prime95 torture test on both the CPU and the memory. If there is a heating issue there, it'll probably be apparent pretty quickly.
 

rogerdv

Member
Dec 2, 2010
149
4
81
We are also wondering how did that happenend. The hit damaged a corner of the case and deformed the whole structure (we had to cut part of it to put a motherboard there and just close it). Maybe in the airport they left it fall from some height, like 20-30 meters.
Today Im suppossed to do a clean install, removing all hard drives but the M2, install the official drivers that I downloaded yesterday, and see what happens. I have been looking at temperatures, but according to Aida sensors, they seem to be correct, around 40 C.
 

Steltek

Diamond Member
Mar 29, 2001
3,042
753
136
We are also wondering how did that happenend. The hit damaged a corner of the case and deformed the whole structure (we had to cut part of it to put a motherboard there and just close it). Maybe in the airport they left it fall from some height, like 20-30 meters.
Today Im suppossed to do a clean install, removing all hard drives but the M2, install the official drivers that I downloaded yesterday, and see what happens. I have been looking at temperatures, but according to Aida sensors, they seem to be correct, around 40 C.

I think that is a solid plan. Main thing is to eliminate everything except the bare basics to determine if the base system is stable or not.

Once you have it in baseline bootable state, were it me I'd run an all-core Prime95 torture test on it with the case sealed up to put some serious stress on the CPU and memory subsystems to see if it bluescreens.