Question New build, random BSODs in Win11 23 or 24H2, with updates or none

mikeymikec

Lifer
May 19, 2011
20,949
16,186
136
Spec:

AMD 4300G
8GB Kingston Fury DDR4-3200 RAM
Samsung 980 PRO 1TB (latest firmware)
Corsair CX500 PSU
DVD drive
MSI B550M-PRO-VDH (latest BIOS)

It's a spec I've used a number of times before without issue, but this time I'm getting BSODs I think almost exclusively when the computer is idle. I've memtest86'd it twice and used the Windows memory diagnostic once, Prime95'd it for half an hour, Furmark'd it for half an hour, it handled the upgrade from 23H2 to 24H2 fine, sfc and dism fine, I've tried writing about 15GB of data to it without issue, there's nothing useful in the logs, the BSODs aren't recorded in the logs (the only record is that a crash dump failed to be written). I ran a full test on the SSD with Samsung Magician. One really curious thing is how I would run a test like Prime95 which it handled fine, then a minute or two after stopping the test, then it crashed. Another time I installed the display driver from USB (a known reliable driver but I've also tried a more recent one), rebooted even though it didn't ask for it, then shortly after the reboot it BSOD'd. Latest AMD chipset drivers FWIW, not that I've ever had any problems with them.

I've had CRITICAL_PROCESS_DIED twice, KMODE_EXCEPTION_NOT_HANDLED once, UNEXPECTED_STORE_EXCEPTION once (never seen that one before!), and that may be all the BSODs in the last 24 hours. The theory I'm working on is that it's related to the graphics hardware, because I've noticed a curious symptom about three times when the driver is installed in that I would say right-click on an object and I would get a drop shadow in the top-left corner of the screen as if there's a menu there but it's basically the desktop with an orphan drop shadow. I'm running the computer without an Internet connection or display driver for the moment. I have a sneaking suspicion that the crashes started shortly after I installed the graphics driver for the first time, but that's a change I would make pretty early on during a clean Windows install.

Despite my cardinal rule not to do a clean install when I'm troubleshooting stability issues, I've experienced two previous occasions purely with Win11 where an update screwed up the works and a clean install did the trick, but a second clean install of 23H2 has not helped (first install started on 23H2 then WU offered the upgrade).

If it crashes again then I guess I'll disconnect all the not-strictly-unnecessaries like extra USB ports, the DVD drive, that sort of thing, but it's all a bit weird.
 

mikeymikec

Lifer
May 19, 2011
20,949
16,186
136
I decided to pursue this as a graphics related issue, and I was just about to swap out the 4300G with a spare when I thought that if the system is allocating half a gig of system RAM, maybe it always allocates the same address space and maybe that address space is in faulty RAM and memtest86 hasn't picked up the issue because it never tests that address space. I decided to swap out the RAM with a spare module first as it's the less messy option but it didn't help: With the graphics driver installed, it hung while the screen was off.

This morning I swapped out the processor, and so far so good, uptime 4hr 20min so far (I've disabled sleep mode while testing).
 

mikeymikec

Lifer
May 19, 2011
20,949
16,186
136
When posting my previous post, I wondered how soon is too soon to be getting my hopes up for success. It turns out that it was too soon.
 

mikeymikec

Lifer
May 19, 2011
20,949
16,186
136
Just to make things extra spicy, it turns out the stock heatsink on the spare processor has a faulty fan, which kinda sorta invalidates this morning's testing even though the symptom was the same as last time. I can't say I'm buying it either. Anyhoo, all unnecessaries unplugged.
 

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
32,019
32,484
146
Post back with the solution please. The fact it is a goto combo for you says to me that it's probably the board since you swapped ram and APU already.

My TS list would be -
disable fast startup
Try the R.ID drivers
RMA the board

I did not waste inordinate amounts of time on cheap builds I was not going to make a lot on.
 

mikeymikec

Lifer
May 19, 2011
20,949
16,186
136
Post back with the solution please. The fact it is a goto combo for you says to me that it's probably the board since you swapped ram and APU already.

My TS list would be -
disable fast startup
Try the R.ID drivers
RMA the board

I did not waste inordinate amounts of time on cheap builds I was not going to make a lot on.

What are R.ID drivers?

I always disable fast startup :) I've more or less come to the conclusion that it's got to be the board at this point, but for the sake of being as thorough as possible (I don't want to experience egg on my face by swapping out the board and having the same problem again), I've done a clean install on to a spare SSD. Admittedly the idea of a stability issue that has given the appearance of being graphics related turning out to be the SSD is something I would have difficulty wrapping my head around, but one thing that nags at me are the two symptoms being random BSOD codes and that the crash dump is *never* being written, Windows has no record that a BSOD even happened, just an improper shutdown. Both symptoms scream "improper handling of data" which logically puts the ball in the RAM or SSD's court, but beyond that...
 

mikeymikec

Lifer
May 19, 2011
20,949
16,186
136

Will they give me some extra troubleshooting tool I can use?
https://sourceforge.net/projects/radeon-id-distribution/

NirSoft's BlueScreenView showed no crashes earlier, this one can't either.
https://www.resplendence.com/whocrashed
If you are running the ram without XMP, try bumping the voltage up. Swapping to different storage wouldn't hurt either.

It's on XMP and has been from the start.
 

mikeymikec

Lifer
May 19, 2011
20,949
16,186
136
I did a clean install onto a spare SSD (using the same slot as the original drive), and ran that install with a video driver for 16 hours straight without any issues. As it was first thing this morning and I didn't have any better ideas at the time, I put the original SSD back in, set the screen timeout to a minute and while it didn't immediately have problems, I checked on it about 30 minutes later and it had hung with the screen in standby.

One difference between the 980 PRO (with Samsung's heatsink) and the spare SSD (an old Intel one, but still NVMe) is that the latter had fewer connections on the underside of the connector whereas the Samsung drive had all of them. I am dreading the idea of explaining to supplier or manufacturer that the SSD is faulty but the fault only shows up when the screen goes off :D I'm wondering whether it could be a slot issue. To determine that, I coincidentally have a spare 990 PRO and am running a clean install on that right now.
 
Last edited:
  • Like
Reactions: DAPUNISHER

mikeymikec

Lifer
May 19, 2011
20,949
16,186
136
The system crashed multiple times in half an hour with the 990 PRO in the same slot, same symptoms as before.

As I've been using the second SSD slot the whole time because of the Samsung 980 with integrated heatsink, I've now switched the 990 PRO to the first M.2 slot. I think it's going to be stable.

I'm confused though. The Intel SSD I tried that was stable is a 660p drive, PCIe 3.0 4x interface. This AM4 system will be running the M.2 slots at PCIe 3.0 because of the Ryzen-G processor. Therefore a) why does the Intel drive have fewer connections on the underside of the connector and b) why would they be relevant?

If it turns out to be the second M.2 slot, it's going to be so weird to explain this to my supplier, but it's doubly weird because if I had used a non-heatsinked SSD in the first slot in the first place then I would never have discovered this issue; a reminder that there's no such thing as a complete testing regimen for a new PC build.
 
Last edited:

mikeymikec

Lifer
May 19, 2011
20,949
16,186
136
It crashed again. I'm now trying a spare PSU because I think that's a more likely fault than a board having two faulty M.2 slots or another fault that affects M.2 slots.

If it turns out to be this Corsair CX550 PSU, I swear to god I'm going to never buy another Corsair CX PSU again! (I had a run-in with them in around 2012 when the ultra-reliable VX450W stopped being made, I switched to the CX430 and ended up with 3 out of 5 PSUs faulty; I only switched to them this time because they rate significantly better on the PSU tier list than Be Quiet! System Power 10 (technically I've been using the 11 range though which isn't on the list yet)).