Question New build, random BSODs in Win11 23 or 24H2, with updates or none

mikeymikec

Lifer
May 19, 2011
20,966
16,203
136
Spec:

AMD 4300G
8GB Kingston Fury DDR4-3200 RAM
Samsung 980 PRO 1TB (latest firmware)
Corsair CX500 PSU
DVD drive
MSI B550M-PRO-VDH (latest BIOS)

It's a spec I've used a number of times before without issue, but this time I'm getting BSODs I think almost exclusively when the computer is idle. I've memtest86'd it twice and used the Windows memory diagnostic once, Prime95'd it for half an hour, Furmark'd it for half an hour, it handled the upgrade from 23H2 to 24H2 fine, sfc and dism fine, I've tried writing about 15GB of data to it without issue, there's nothing useful in the logs, the BSODs aren't recorded in the logs (the only record is that a crash dump failed to be written). I ran a full test on the SSD with Samsung Magician. One really curious thing is how I would run a test like Prime95 which it handled fine, then a minute or two after stopping the test, then it crashed. Another time I installed the display driver from USB (a known reliable driver but I've also tried a more recent one), rebooted even though it didn't ask for it, then shortly after the reboot it BSOD'd. Latest AMD chipset drivers FWIW, not that I've ever had any problems with them.

I've had CRITICAL_PROCESS_DIED twice, KMODE_EXCEPTION_NOT_HANDLED once, UNEXPECTED_STORE_EXCEPTION once (never seen that one before!), and that may be all the BSODs in the last 24 hours. The theory I'm working on is that it's related to the graphics hardware, because I've noticed a curious symptom about three times when the driver is installed in that I would say right-click on an object and I would get a drop shadow in the top-left corner of the screen as if there's a menu there but it's basically the desktop with an orphan drop shadow. I'm running the computer without an Internet connection or display driver for the moment. I have a sneaking suspicion that the crashes started shortly after I installed the graphics driver for the first time, but that's a change I would make pretty early on during a clean Windows install.

Despite my cardinal rule not to do a clean install when I'm troubleshooting stability issues, I've experienced two previous occasions purely with Win11 where an update screwed up the works and a clean install did the trick, but a second clean install of 23H2 has not helped (first install started on 23H2 then WU offered the upgrade).

If it crashes again then I guess I'll disconnect all the not-strictly-necessaries like extra USB ports, the DVD drive, that sort of thing, but it's all a bit weird.
 
Last edited:

mikeymikec

Lifer
May 19, 2011
20,966
16,203
136
I decided to pursue this as a graphics related issue, and I was just about to swap out the 4300G with a spare when I thought that if the system is allocating half a gig of system RAM, maybe it always allocates the same address space and maybe that address space is in faulty RAM and memtest86 hasn't picked up the issue because it never tests that address space. I decided to swap out the RAM with a spare module first as it's the less messy option but it didn't help: With the graphics driver installed, it hung while the screen was off.

This morning I swapped out the processor, and so far so good, uptime 4hr 20min so far (I've disabled sleep mode while testing).
 

mikeymikec

Lifer
May 19, 2011
20,966
16,203
136
When posting my previous post, I wondered how soon is too soon to be getting my hopes up for success. It turns out that it was too soon.
 

mikeymikec

Lifer
May 19, 2011
20,966
16,203
136
Just to make things extra spicy, it turns out the stock heatsink on the spare processor has a faulty fan, which kinda sorta invalidates this morning's testing even though the symptom was the same as last time. I can't say I'm buying it either. Anyhoo, all unnecessaries unplugged.
 

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
32,032
32,501
146
Post back with the solution please. The fact it is a goto combo for you says to me that it's probably the board since you swapped ram and APU already.

My TS list would be -
disable fast startup
Try the R.ID drivers
RMA the board

I did not waste inordinate amounts of time on cheap builds I was not going to make a lot on.
 

mikeymikec

Lifer
May 19, 2011
20,966
16,203
136
Post back with the solution please. The fact it is a goto combo for you says to me that it's probably the board since you swapped ram and APU already.

My TS list would be -
disable fast startup
Try the R.ID drivers
RMA the board

I did not waste inordinate amounts of time on cheap builds I was not going to make a lot on.

What are R.ID drivers?

I always disable fast startup :) I've more or less come to the conclusion that it's got to be the board at this point, but for the sake of being as thorough as possible (I don't want to experience egg on my face by swapping out the board and having the same problem again), I've done a clean install on to a spare SSD. Admittedly the idea of a stability issue that has given the appearance of being graphics related turning out to be the SSD is something I would have difficulty wrapping my head around, but one thing that nags at me are the two symptoms being random BSOD codes and that the crash dump is *never* being written, Windows has no record that a BSOD even happened, just an improper shutdown. Both symptoms scream "improper handling of data" which logically puts the ball in the RAM or SSD's court, but beyond that...
 

mikeymikec

Lifer
May 19, 2011
20,966
16,203
136

Will they give me some extra troubleshooting tool I can use?
https://sourceforge.net/projects/radeon-id-distribution/

NirSoft's BlueScreenView showed no crashes earlier, this one can't either.
https://www.resplendence.com/whocrashed
If you are running the ram without XMP, try bumping the voltage up. Swapping to different storage wouldn't hurt either.

It's on XMP and has been from the start.
 

mikeymikec

Lifer
May 19, 2011
20,966
16,203
136
I did a clean install onto a spare SSD (using the same slot as the original drive), and ran that install with a video driver for 16 hours straight without any issues. As it was first thing this morning and I didn't have any better ideas at the time, I put the original SSD back in, set the screen timeout to a minute and while it didn't immediately have problems, I checked on it about 30 minutes later and it had hung with the screen in standby.

One difference between the 980 PRO (with Samsung's heatsink) and the spare SSD (an old Intel one, but still NVMe) is that the latter had fewer connections on the underside of the connector whereas the Samsung drive had all of them. I am dreading the idea of explaining to supplier or manufacturer that the SSD is faulty but the fault only shows up when the screen goes off :D I'm wondering whether it could be a slot issue. To determine that, I coincidentally have a spare 990 PRO and am running a clean install on that right now.
 
Last edited:
  • Like
Reactions: DAPUNISHER

mikeymikec

Lifer
May 19, 2011
20,966
16,203
136
The system crashed multiple times in half an hour with the 990 PRO in the same slot, same symptoms as before.

As I've been using the second SSD slot the whole time because of the Samsung 980 with integrated heatsink, I've now switched the 990 PRO to the first M.2 slot. I think it's going to be stable.

I'm confused though. The Intel SSD I tried that was stable is a 660p drive, PCIe 3.0 4x interface. This AM4 system will be running the M.2 slots at PCIe 3.0 because of the Ryzen-G processor. Therefore a) why does the Intel drive have fewer connections on the underside of the connector and b) why would they be relevant?

If it turns out to be the second M.2 slot, it's going to be so weird to explain this to my supplier, but it's doubly weird because if I had used a non-heatsinked SSD in the first slot in the first place then I would never have discovered this issue; a reminder that there's no such thing as a complete testing regimen for a new PC build.
 
Last edited:

mikeymikec

Lifer
May 19, 2011
20,966
16,203
136
It crashed again. I'm now trying a spare PSU because I think that's a more likely fault than a board having two faulty M.2 slots or another fault that affects M.2 slots.

If it turns out to be this Corsair CX550 PSU, I swear to god I'm going to never buy another Corsair CX PSU again! (I had a run-in with them in around 2012 when the ultra-reliable VX450W stopped being made, I switched to the CX430 and ended up with 3 out of 5 PSUs faulty; I only switched to them this time because they rate significantly better on the PSU tier list than Be Quiet! System Power 10 (technically I've been using the 11 range though which isn't on the list yet)).
 

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
32,032
32,501
146
You picked an appropriate avatar; you are like a dog with a bone. :D If swapping PSUs is no joy, bump up the VDimm. Turn off XMP if the extra juice doesn't solve it.
 

mikeymikec

Lifer
May 19, 2011
20,966
16,203
136
It crashed again. I've never adjusted DDR4 memory voltages before, I expect with XMP that the Kingston FURY DIMM is at 1.35v. Would you still try this even though I've tried two identical modules already and this is the normal module I've been using with this model of board?

For now I'm switching off XMP only to see if that makes any difference. - edit - before I finished typing this last sentence, it crashed :)
 
  • Like
Reactions: DAPUNISHER

mikeymikec

Lifer
May 19, 2011
20,966
16,203
136
I'd bump it to 1.375v-1.4v for testing, even with XMP off.
You took too long in responding :p I replaced the board in the meantime, now beginning the same test with the original SSD with its most recent install already going. I just hate that I'm seemingly having to wait up to 8 hours before I can start being confident that the problem is sorted.

The original board still has a CPU in, so in theory I could hook that up without a case.

One thing that was really bizarre was that the original board was firmly stick into the case (a recent model I've started using, CiT Elite). The board raisers for this case has a little indent at the top that allows a board to get firmly wedged in, seemingly.

One virtually useless troubleshooting tip I've learned through this experience: While I am not an impatient person, and I feel that some of my customers are in that they complain about how long the screen takes to come back on when waking a computer, the fact that I was relying on the screen coming back on at sometimes it took up to 5-7 seconds to respond and I'm figuratively biting my nails waiting for the answer, I found that using the caps lock light is a much quicker test in this situation to know whether the computer is still responding :) The screen comes back on a few seconds later, not so bad.

!!!!! ARGH! IT JUST CRASHED WITH THE REPLACEMENT BOARD !!!!

Uh, both RAM modules out of the same pack have the same problem?? I've got a standard Kingston DDR4 module as opposed to these Fury modules, I'm going to try that next.
 
  • Haha
Reactions: DAPUNISHER

mikeymikec

Lifer
May 19, 2011
20,966
16,203
136
Hopefully different ram is the fix.
Yup. I think it was one of my first suspects because of the random BSOD codes (though they seem to have settled down to CRITICAL_PROCESS_DIED which is atypical for RAM), so that's why I came back to it. What I'm hating right now is that all the easy answers have probably been ruled out at this point, so it's probably going to be a long shot solution. Two faulty boards, two faulty memory modules, two faulty PSUs, processors....

Interestingly, in the crash before last, the caps lock response happened successfully then it BSOD'd :D

The only component I haven't tried a spare of is the case, but I just don't see it as being a possibility. The assumption bothers me though.
 

Shmee

Memory & Storage, Graphics Cards Mod Elite Member
Super Moderator
Sep 13, 2008
8,216
3,130
146
Certainly sounds like a RAM issue to me. I know you mentioned you tried prime95, but did your run memtest?
 

mikeymikec

Lifer
May 19, 2011
20,966
16,203
136
memtest86 11.3 is a standard part of my build routine, it passed that (4 passes is the standard that the free version allows you to do, an older free version allows a custom set of passes to be set but IIRC it wasn't compatible with AMD9000 so I updated my memtest86 memory stick to that version). I didn't bother checking the second module.

It crashed with the third module (plain Kingston non Fury, DDR4-3200, I had to reset the BIOS presumably because of the old XMP config, then the BIOS detected the new module and ran it at 3200MHz automatically. I then went straight into my current standard test for this problem (23H2, no updates, video driver, monitor set to power off within 1 minute, no sleep mode).

I'm now trying a janky PSU that the case came with because it's the newest non-Corsair spare PSU I've got. Impressively quiet PSU, I'm surprised.

If this doesn't do the trick, then logically I have to come back to the SSD because the Intel SSD seemingly working fine. I'm just trying to get my head around the apparent facts in this respect though:

SSD1: Samsung 980 with heatsink = problem
SSD2: Intel 660p no heatsink = seemingly no problem
SSD3: Samsung 990 PRO no heatsink = problem

Another build I've done recently which is extremely similar to this has a 990 PRO 2TB in, I didn't have any issues with that build (though my paranoia is starting to creep in, suggesting "well maybe you didn't test it enough...."). I normally use Samsung PRO SSDs with this board.

When I swapped boards, I purposefully did not update the BIOS because the BIOS update is a recent one that I only applied to this board and the previous PC I built a week or three ago with the 2TB SSD in.

One other note: I would say most of the time when Windows "crashes", the monitor is off and Windows fails to respond to standard wake-up techniques like mouse movement or a keypress. Sometimes it BSODs, and usually that BSOD is CRITICAL_PROCESS_DIED.
 
Last edited:
  • Like
Reactions: Shmee

mikeymikec

Lifer
May 19, 2011
20,966
16,203
136
660P néeds firmware update.
I'll try to remember. A customer gave me their old broken laptop which was fairly high end, I scrapped it for parts and didn't even realise that this is a 500GB drive until this weekend :)

7hr uptime and counting with the janky PSU. I'm trying not to get my hopes up...
 
  • Like
Reactions: Shmee

Steltek

Diamond Member
Mar 29, 2001
3,341
1,084
136
I wouldn't be absolutely shocked if it turns out to be the PSU. Especially with the machine not writing logs or crash dumps - that just screams bad PSU to me.

Corsair CX500 PSUs (most especially the ones with the green colored labels) do not have a good reputation for either quality or longevity. They are Corsair's bottom of the barrel "junk line" PSUs.

The CX500 with the gray colored labels (that replaced the green label ones) are a little better, quality-wise, but honestly not by much.
 

mikeymikec

Lifer
May 19, 2011
20,966
16,203
136
@Steltek they're rated pretty well on the PSU tier list, it's why I changed over to them (BQ's System Power 10 at the lower end especially has much worse ratings though I've never had any problems with them, and admittedly I've been using the 11 range for a while now which isn't yet on the list). But yes, ~18hr uptime on janky PSU.

As a general fyi - the Corsair CX550s I've been using have a yellow box, the PSUs themselves have black/white labelling.
 
Last edited:

In2Photos

Platinum Member
Mar 21, 2007
2,559
2,764
136
I've used about a half dozen Corsair PSUs in builds over the years, both CX and RMx units. I had 1 RMx650 that died about a week after installing it. 1 CX400 that had a fan start making noise after about 8 years of use. Otherwise they have been great. But honestly I don't think I have had any other PSU failures either from other brands, even cheap ones in pre-builts. I know that's probably nowhere near as many as you have used though.

I would have figured your issues were RAM or motherboard related, surprised at the PSU being the culprit. I guess that's one of the things that keeps us interested in PCs, they keep us on our toes!
 

mikeymikec

Lifer
May 19, 2011
20,966
16,203
136
@In2Photos I've got a CX430 spare that made it through its PC's life with a computer I built for a customer. Maybe the unreliable ones can sometimes be a case of what's known in the car industry as a "Friday build"?

When I started this business >20 years ago and desktops ruled the roost, it was really common for builders big and small (including mine) to use the PSU that came with the case so I became familiar with a phenomenon of ~6 year old PCs with a cheapo PSU that just blew. These days there's not enough of a sample for me to make statements about notoriously poor brands/models, probably >90% of the desktop PCs I see are my own builds and I can't remember the last one from those that blew. The last I remember that I had to replace was another CX430 years later, but only when the customer wanted an upgrade and the best choice was to go from a SATA SSD to NVMe, and the PSU just couldn't supply the juice on the 3V rail. IIRC there was an odd symptom then too (like crashing after the first reboot of a Windows install), I think I started a thread about it here.