Question When is it appropriate to pull and replace, versus upgrade piece-meal? (Freezing, black screens, won't power-off)

VirtualLarry

No Lifer
Aug 25, 2001
53,288
7,709
126
My main rig is a Ryzen 3600 (6C/12T), with 240mm CoolerMaster AIO LC kit for AM4, decent case, optical drive, 4x8GB Trident RGB 3600 DDR4 RAM, and currently, 2x Asus TUF GTX 1660 Super cards, mining on GPUs and CPU. Antec 750W Gold PSU. Storage is a pair of Intel 660p 1TB QLC NVMe drives.

I've been having more reboots lately, and they're happening more often since I starting doing PrimeGrid for a race. I should note here that I had my RAM @ XMP 3600 (runs mostly fine), FCLK @ 1800 (to match RAM), and CPB and PBO DISABLED, which means that my CPU maxes out at 3.60Ghz Ghz, rather than 4.0Ghz all-core, and my temps under water are mostly maxed @ 70C. Yet I get reboots.

Something really obscure, blue-screen, something about kernel access levels and IRQLs and locks, but not the usual "IRQL greater or equal" one.

I don't know if it's my RAM, my CPU, my mobo, my PSU, or GPUs or GPU drivers, or even my Windows 10 installation (fully patched, AFAIK).
I could start swapping components, run for a week or two to test, and keep testing, but what it the problem is more-or-less "systemic", "old system"?

I don't particularly WANT to replace the whole thing, but I could. I have some X370 mobos with 5GbE-T ports on them (ASRock Professional), and I could get some 3900X CPUs to drop in for $450 or so, I believe. Along with some 3600 RAM, and some fresh new SSDs (have a couple of unused 2TB ones).

Maybe I should update my AMD chipset drivers, and my NVidia GPU drivers, to start with, that would be something that would be basically zero-effort to try and fix the issue.

Edit: OK, updated my AMD chipset drivers for B450 for Windows 10, there were new ones, and updated NVidia drivers to the newest "Studio Drivers".

Edit: Oh, whoops, my current mobo is an Asus B450-F ROG STRIX Gaming ATX. Newest BIOS for Win11.
 
Last edited:

DAPUNISHER

Super Moderator and Elite Member
Moderator
Aug 22, 2001
23,767
6,109
146
I enjoy the troubleshooting process, and challenging myself to diagnose the problem in as few steps as possible. Hence, I always go part by part, starting with whatever I think is the most likely culprit. For the "ain't nobody got time for that" type, throw the cards in a new platform with fresh windows. If the problem persists, you know it is one of the cards. If not, problem solved.

I don't know how stressful your workloads are on the ram, but that is my first suspect. Never had a 3600 that would play well with 3600MHz ram without tweaking (ironic?) And on the 350&370 boards, never did get it fully dialed in. I think 3466 was the best I ever got rock solid on those. All ASRocks though, so there's that.

Second guess is software, be it the DC or mining stuff, drivers which you have addressed, or windows being windows. I would see what the health on any storage is looking like too.
 
  • Like
Reactions: VirtualLarry

VirtualLarry

No Lifer
Aug 25, 2001
53,288
7,709
126
Well, I recently took advantage of a sale @ Newegg for a Ryzen R9 5900X (12C/24T). I received it and installed it in place of my 3600. Ran it for a bit, temps were OK, didn't change thermal paste, still had some MX4 on the bottom, 70-75C temps mining on all 24T (64MB L3 cache, great for mining).

Even decided to stop mining on the CPU, and did some PrimeGrid on the CPU, 3x tasks of 8T ea., did a couple of batches of WUs, no issues.

So I suspended PrimeGrid (no more tasks, on this box), and went back to CPU mining. There was a new version of NH that I installed (3.0.6.8, I think), Mining on 24T, along with both GTX 1660 Super cards.

Bought some $12.99 "blue switch" RGB mech. keyboards from Newegg. Came out to the PC, plugged a keyboard into my USB hub plugged into my front-panel port, where my existing Rosewill Mech. RGB keyboard was plugged in and my MSI gaming mouse, and the PC wouldn't wake up from black-screen. HDD light blinked occasionally, weird. It was still mining, at last glance, just kind of off in la-la land. Maybe it couldn't allocation enough VRAM to build the desktop again. Who knows, pure speculation on my part.

I force rebooted, unplugged the Rosewill keyboard, plugged in this generic one, and powered-on. I was able to access the BIOS, Saved and Exited, with no apparent changes, went into Windows, seems good so far.

Will keep an eye out for any crashes or reboots. I had actually thought that it could be the processor, and now that I changed it, that it would be OK, but apparently, not the case. Maybe PSU, maybe video cards, maybe mobo. Probably mobo, honestly, it's seen better days. But maybe the mobo is a victim of a rogue PSU.

Will likely be swapping more parts in the future, if this keeps up.
 
  • Like
Reactions: killster1

killster1

Banned
Mar 15, 2007
6,208
473
126
just leave the mobo in but pull the cpu and ram to run on a bench with a new mobo.. im not a big fan of asus or b450's.. of course i didnt even think about your raided qlc drives.. :) glad the new cpu isnt doing the same thing at least. seems like every part in the rig is expensive except the board, is there going to be a am4 ddr5 board you are waiting for ? ( maybe im dreaming and the next mobo will be am5 and not work with the 5900x)
 

VirtualLarry

No Lifer
Aug 25, 2001
53,288
7,709
126
of course i didnt even think about your raided qlc drives
The two 660p 1TB NVMe drives are no longer in RAID-0. Ever since my last re-format months ago, I split them up to OS / WIndows drive, and Steam game drive.

Also, I had another black-screen freeze, this time, I couldn't even hold down the power button to force power-off.

I swapped in 2x kits of 2x16GB (so 4x16GB total) of Team Group Vulcan Z DDR4-3200 (grey), in place of my Trident RGB 4x8GB DDR4-3600.

Hopefully that helps? Otherwise, I'm going to replace PSU next, and then finally, mobo, probably.

I wonder if possibly my AIO cooler pump is dying when I'm not looking, though? Maybe it's temp-related? Although, I would think that it might reboot on me if temps get too high.
 

mikeymikec

Lifer
May 19, 2011
15,248
5,392
136
The only contribution I can make to this topic was that I built a 5800X-based rig recently, and the first Windows install I did had BSOD issues (the IRQL one) until I installed the AMD drivers. The second install I did (for various reasons), I put the drivers in straight away, no BSODs.

In hindsight I find it odd that I had to find out the hard way, pre Win10 I always would have put drivers in first. Win10's auto install everything is making me lazy?
 

VirtualLarry

No Lifer
Aug 25, 2001
53,288
7,709
126
Still crashing, got a couple of BSODs, "KERNEL_SECURITY_CHECK_FAILURE" and "PAGE_PAULT_IN_NONPAGED_AREA".

Sounds like RAM, right? I swapped out the 4x8 Trident RGB DDR4-3600 (@ XMP 3600 1.35V), with 4x16 Team Vulcan Z DDR4-3200 (@ XMP 3200, presumably 1.35V).

Also, did the run "mdsched.exe", which reboots and does the MS memory test, no errors reported.

Looking like:
1) PSU
2) mobo
3) maybe it's hotspotting, and not reported by my temp monitor in HWMonitor, and I DO need to change up the thermal paste, or add more.
4) Maybe my UPS is overloaded. It's not beeping / alarming, as I would expect it to, if overloaded, but my second rig right here with GTX 1660, and 2x RX 6600, Phenom II 1045T X6, 16GB DDR3, maybe be pushing the UPS over the edge if they are both on the UPS. Will have to see.

Edit: Temps and fans all look OK to me, using HWMonitor. CPU package temp 70-75C, CPU fan (rad) 2000RPM, Pump 2200 RPM. Making a little bit O noise, so it should be running.
 

killster1

Banned
Mar 15, 2007
6,208
473
126
Still crashing, got a couple of BSODs, "KERNEL_SECURITY_CHECK_FAILURE" and "PAGE_PAULT_IN_NONPAGED_AREA".

Sounds like RAM, right? I swapped out the 4x8 Trident RGB DDR4-3600 (@ XMP 3600 1.35V), with 4x16 Team Vulcan Z DDR4-3200 (@ XMP 3200, presumably 1.35V).

Also, did the run "mdsched.exe", which reboots and does the MS memory test, no errors reported.

Looking like:
1) PSU
2) mobo
3) maybe it's hotspotting, and not reported by my temp monitor in HWMonitor, and I DO need to change up the thermal paste, or add more.
4) Maybe my UPS is overloaded. It's not beeping / alarming, as I would expect it to, if overloaded, but my second rig right here with GTX 1660, and 2x RX 6600, Phenom II 1045T X6, 16GB DDR3, maybe be pushing the UPS over the edge if they are both on the UPS. Will have to see.

Edit: Temps and fans all look OK to me, using HWMonitor. CPU package temp 70-75C, CPU fan (rad) 2000RPM, Pump 2200 RPM. Making a little bit O noise, so it should be running.
dont think it could possibly be the cooling, the ram was switched and working fine? i wouldnt try 4 sticks i would use 1 then 2 then 3 then 4.
 

VirtualLarry

No Lifer
Aug 25, 2001
53,288
7,709
126
Oh, yeah, Firefox tabs crashing, too. They almost never crash. Maybe that 4x16GB was a bit too much?

What I don't get, either, is when I installed the 5900X in place of the 3600, and I had left the 4x8GB DDR4-3600 @ XMP 3600, and I was getting POST beeping, like it was having trouble "training" the RAM with the 5900X.

Maybe my particular 5900X just has a particularly bad IMC or something. It was in fact working with the other RAM at 3600, though, but it seemed like it took a few POST-RESET-REBOOT cycles (boot loops) to get it to come up. I don't get those with the 4x16GB 3200 RAM, but now I don't know. This 4x16GB 3200, is CAS16-18-18-xx. Maybe I need to mess with sub-timings to get a dual set of kits functional.
 

Hans Gruber

Golden Member
Dec 23, 2006
1,401
494
136
Three quick tips. Go into your bios and go with default settings for everything. Reboot your system. Go back into bios and increase the voltage on the memory to 1.4v @ 3600mhz. Reboot, make sure everything is working. Power down your system. Power off the back of your power supply or turn the back power off. Hold down the power on the case for 10 seconds. Wait another minute or two. Power up the system. Let it simmer for half a day and start doing whatever you were doing before.

If everything is stable for a day or two. Slowly back down the memory voltage by a 1/10. 1.39v-1.38v until you find your non BSOD. My other theory is that it takes a few days for the new ASUS bios to take with Windows and your PC.
 

VirtualLarry

No Lifer
Aug 25, 2001
53,288
7,709
126
Three quick tips. Go into your bios and go with default settings for everything. Reboot your system. Go back into bios and increase the voltage on the memory to 1.4v @ 3600mhz. Reboot, make sure everything is working. Power down your system. Power off the back of your power supply or turn the back power off. Hold down the power on the case for 10 seconds. Wait another minute or two. Power up the system. Let it simmer for half a day and start doing whatever you were doing before.

If everything is stable for a day or two. Slowly back down the memory voltage by a 1/10. 1.39v-1.38v until you find your non BSOD. My other theory is that it takes a few days for the new ASUS bios to take with Windows and your PC.
That's a good thought. Maybe me filling all four DRAM slots with 16GB DIMMs (double-sided? I don't know, they have heatspreaders), may take a slight bump UP in the vDIMM. Let me try that.
 

VirtualLarry

No Lifer
Aug 25, 2001
53,288
7,709
126
Bumped DRAM voltage to 1.3700, left VSoc at Auto, rebooted, beeped then went into Windows 10. So I shut down, powered-on, went into BIOS, set Vdimm to 1.3600, Vsoc to 1.10V, and Saved and Restarted, went into Windows 10 without any beeping, we'll see, fingers crossed.
 

Hans Gruber

Golden Member
Dec 23, 2006
1,401
494
136
Bumped DRAM voltage to 1.3700, left VSoc at Auto, rebooted, beeped then went into Windows 10. So I shut down, powered-on, went into BIOS, set Vdimm to 1.3600, Vsoc to 1.10V, and Saved and Restarted, went into Windows 10 without any beeping, we'll see, fingers crossed.
Larry, you have juice the ram a bit more than 1.37v. If you have Hynix ram 1.39v will do just fine. The other thing. You should run everything at stock settings for at least 24 hours before making any changes. That is the let it simmer approach. After BSOD's and hangs. You have to let the ghosts in the machine settle down and stabilize.
 
  • Haha
Reactions: killster1

killster1

Banned
Mar 15, 2007
6,208
473
126
Larry, you have juice the ram a bit more than 1.37v. If you have Hynix ram 1.39v will do just fine. The other thing. You should run everything at stock settings for at least 24 hours before making any changes. That is the let it simmer approach. After BSOD's and hangs. You have to let the ghosts in the machine settle down and stabilize.
wow super good info here.. i wasn't sure about calling ghost busters but after this i know i have to. good idea with high voltage ram all these timings are so hard to hammer down but a big bump should take care of any of those unstable timings.
 

VirtualLarry

No Lifer
Aug 25, 2001
53,288
7,709
126
PBO off, installed newest NV driver, swapped in 4x16GB Team Vulcan DDR4-3200 @ XMP, set Vdimm to 1.3600V, set Vsoc to 1.10V, still rebooting on me.

Last time, I moved the mouse to wake it, and the chassis beeped, and it rebooted.

About ready to do a full re-build, with new PSU and B550-F version of ROG STRIX Gaming mobo.
 
  • Like
Reactions: Ajay and DAPUNISHER

Ajay

Lifer
Jan 8, 2001
11,218
5,033
136
PBO off, installed newest NV driver, swapped in 4x16GB Team Vulcan DDR4-3200 @ XMP, set Vdimm to 1.3600V, set Vsoc to 1.10V, still rebooting on me.

Last time, I moved the mouse to wake it, and the chassis beeped, and it rebooted.

About ready to do a full re-build, with new PSU and B550-F version of ROG STRIX Gaming mobo.
My vote too. Taking ~10 days to debug one system is pretty crazy. Just check your PSU first (not sure what you're using). When we used to have Dells at one place I worked, the first thing we'd do is check the PSU - sometimes a bad PSU produces very strange errors.
 

VirtualLarry

No Lifer
Aug 25, 2001
53,288
7,709
126
My vote too. Taking ~10 days to debug one system is pretty crazy. Just check your PSU first (not sure what you're using). When we used to have Dells at one place I worked, the first thing we'd do is check the PSU - sometimes a bad PSU produces very strange errors.
Well... I work a little bit slower these days... anyways, the PSU is an Antec EDG 750W Gold-rated. But it was dormant for years in storage before being put into use.
 

Hans Gruber

Golden Member
Dec 23, 2006
1,401
494
136
My vote is to remove 2 sticks of memory and see if that helps. If I could be one of the voices in your head. You would have a working system. It's not a good idea to start a new system with 64GB of memory.
 
  • Like
Reactions: VirtualLarry

VirtualLarry

No Lifer
Aug 25, 2001
53,288
7,709
126

I'm ashamed to be taking tips from an LTT video, but I disabled "Global C-States", and have been mining on the CPU for nearly two days without a crash/freeze. Going to keep an eye on it.
 

Hans Gruber

Golden Member
Dec 23, 2006
1,401
494
136
Run everything stock for a week. That means go into bios and reset the board to default settings and F10. After a week you can tweak your memory but nothing else.
 

DAPUNISHER

Super Moderator and Elite Member
Moderator
Aug 22, 2001
23,767
6,109
146

I'm ashamed to be taking tips from an LTT video, but I disabled "Global C-States", and have been mining on the CPU for nearly two days without a crash/freeze. Going to keep an eye on it.
Why? He has the money to bankroll talent, and Anthony is talent. AMD and C state bugs go way back, so it won't be surprising if that is the fix.
 

ASK THE COMMUNITY