Question Help Troubleshooting Hard Crashes / Restarts

delta-v

Junior Member
May 1, 2023
9
4
41
About a month ago, my computer started to randomly restart, going straight to a black screen then rebooting. I have been struggling to diagnose the source of the problem. Initially, the first few crashes would result in a WHEA-Logger event 18 error. After trying the usual troubleshooting methods (updating all drivers, making sure Windows is up to date, disabling XMP, SFC /scannow, etc.), I contacted AMD and they ended up replacing my CPU. After installing the replacement CPU, now my computer is still crashing, this time with a blue screen and the stopcode WHEA_UNCORRECTABLE_ERROR. When I can get back into Windows (as sometimes this error repeats itself), I see an event 46 error had occurred.

I have ran several memory tests and, so far I have not found any errors with the memory itself. This leads me to wonder if it is the motherboard at this point, as it is about the only common part to all of these errors (with the exception of the PSU). Does this make sense to you?

My system configuration is:
  • Gigabyte X570S Aorus Master
  • AMD R9 5900X
  • Corsair Vengeance LPX CMK64GX4M2D3600C18 (4x 32GB)
  • EVGA RTX 3090 FTW3
  • Seasonic PRIME 1000
  • Samsung 980 Pro 2TB (2x)
So far, all individual components have seemed fine during testing, so if there is anything else that you can recommend to try, I would really appreciate the help.
 

In2Photos

Golden Member
Mar 21, 2007
1,381
1,443
136
Ok. Does this happen quickly once in Windows or does it take time? Any idea what your temps are on things like CPU, GPU, RAM, nvme? Are you doing anything strenuous when it crashes or just light use? One thing to try is taking out 2 RAM sticks. What did you use to test the RAM? Windows or memtest86? Do you have a spare PSU you could try?
 

delta-v

Junior Member
May 1, 2023
9
4
41
It happens randomly. Sometimes it happens during boot, others a few minutes after logging in to Windows, others it will work fine for a while, but then will crash when idle, others it will be during use (games, CAD, spooling up VMs, etc.). There has not been a consistent thing that causes the crash.

All temps that I can measure either via my BIOS or HWinfo are all great. CPU peaks at 65-70°C under load, GPU peaks between 70-80°C, all others are typically between 35-50°C.

I have tested all 128GB of RAM with both Windows' memory test and memtest86. I will try pulling out two of the DIMMS. If the machine ends up working that way, would you suspect the motherboard or RAM. I had an old Z97 motherboard that had one memory channel fail over time, but that refused to even attempt to boot with more than 2 DIMMS after that.

I do not have a spare PSU that is capable of running this machine, the only one not in use is a 10 year old 650W that was starting to have problems when I retired it from active use last winter.
 
  • Like
Reactions: BoomerD

delta-v

Junior Member
May 1, 2023
9
4
41
I decided to enable the XMP profile then rerun memtest86. All tests passed. Therfore, I think now that the issue is probably not faulty hardware, but something with Windows/drivers. I'm going to try and dig though the Windows event logs and see if there is any other commonality with the errors besides event 46.
 

delta-v

Junior Member
May 1, 2023
9
4
41
After lots of hours dealing with each component's manufacturer's support team, and having nearly all of the components replaced under warranty, replacing the PSU seems to have solved the issue. I finally was able to reliably boot into Windows yesterday for the first time since July. Throughout the troubleshooting process, I ended up getting a different CPU, RAM, SSD, PSU, and having the motherboard "inspected" by Gigabyte (not sure what they did exactly, they only said it passed inspection and returned it to me a month later).
 

delta-v

Junior Member
May 1, 2023
9
4
41
Well, I spoke too soon. I tried running a stress test with Prime95 after reinstalling Windows. It worked for about 20 minutes, then the crashes started again. It has since crashed multiple times at idle and during the boot process. Once it even failed to POST after reboot, required a full power cycle to get to boot.

I managed to borrow a GTX 1070 to rule out the GPU, but it made no difference.

Gigabyte is now re-RMA'ing the motherboard since they are the only company that did not replace their product. Hopefully that finally solves it.

If Gigabyte can't find the issue, I'm going to just buy a different motherboard at this point. Anyone have any good recommendations for an AM4 motherboard with 2+ M.2 slots that can handle 128GB of RAM?
 

Tech Junky

Diamond Member
Jan 27, 2022
3,185
1,053
96
I've had good luck with ASRock at this point between both Intel and AMD. Currently just picked up a PG lightning x670e for $160 used off Amazon. It's been stable even at 100%+ CPU.
 

delta-v

Junior Member
May 1, 2023
9
4
41
Thanks Tech Junky. I've had an old ASRock FM2A88X-ITX+ motherboard with a A10-7850K in my HTPC since 2014 without any issues. Longest lasting motherboard I've ever owned now that I think about it. I'll see what they offer with my requirements.
I've gone through a couple ASUS boards, a couple Gigabyte boards, and a MSI board (in three different systems) in the time that I've been using that ASRock board...
 
  • Like
Reactions: VirtualLarry

Tech Junky

Diamond Member
Jan 27, 2022
3,185
1,053
96
@delta-v

I dug through all options when rebuilding and there are nuances between different MFGs when it comes to these AMD setups vs Intel. There doesn't really seem to be much of a standard approach with the AMD side which explains a lot of the gripes you see online about them.

MSI - they do things extremely odd when it comes to PCE lanes going as far as allocating an x2 and as low as Gen3 on their latest boards
Asus / GB - they both do odd splits on the x16 when using the 2nd / 3rd slot x8 /x4/x2
ASR - was the only one to be consistent X16 / x4 / x1

The biggest thing for me when looking at this build though was not to auto split the slot lanes when using more than the top slot. Most other boards at a reasonable price auto split to x8/x8 which in my original plan on using the top slot for storage / oculink x4/x/x4/x4 would have been an impediment for using all 16 lanes on a single card. I since had to add a GPU for media though and now that's not as of a high priority on the split. Kind of pondering another look at boards again but, really the PGL fits other needs with the number of slots as they're all being used (4). Everything else only has 2-3 slots but, I could tradeoff a slot if they have dual NIC / TB but, then they sacrifice in other functions by adding those options. It's all kind of screwy to make the most of the platform and hopefully on the refresh things will get better. The biggest hurdle is the x4 G4 link from the x670's being a bottleneck compared to Intel with an x8 uplink. Not sure why AMD didn't bump the bandwidth to match or beat Intel.
 

delta-v

Junior Member
May 1, 2023
9
4
41
Well, this is turning into a bit of a comedy of errors on Gigabyte's part... Their service department finally shipped me back the motherboard again after having it since the beginning of October. This was the second time they had the motherboard. After weeks of them not replying to my call's/emails, I finally got a hold of someone over the phone. He asked if I had heard from their troubleshooting team at all and I told them that I had not. He they looked into the matter a bit, and told me that they found the chipset was faulty. Two weeks later, I got a email stating that it was ready to ship and asked if there was anything else that I wanted them to test. I responded as quickly as I could that I wanted them to test a configuration with 128GB of RAM like I have been using. They never responded and shipped my board to me a few days later.

I got the system reassembled, and immediately ran into the same type of problems (random crashes to black screen and restart, occasional BSODs with Machine Check Exceptions). I even tried reinstalling Windows again, which seemed promising until the installation finished and then it crashed when I started to download the latest Nvidia driver. It has not booted properly since.

I recorded the debug two-digit LED display during three boot-loop cycles in slow motion and sent both Gigabyte and AMD's support a link to the video. There are no consistent error codes, but the system will not boot.

As a reminder, everything with the exception of the GPU and motherboard are new from the various manufacturers. I have tried two different GPUs now, and there is no difference in the behavior. Therefore, I can't help but think that this motherboard is broken severely. I worry too that when I attempted to use the new CPU to troubleshoot the issue, that the motherboard damaged the CPU in some way.

I'm beyond ready to throw in the towel on this. I have been fighting this problem since July, and Gigabyte's support has been essentially worse than useless (due to the costs of shipping the motherboard back to them multiple times and them keeping it there for months without any communication). So, I broke down and ordered a new Asrock motherboard, CPU, and RAM yesterday.

If Gigabyte can miraculously do the right thing after this, I will use it and the 5900X to upgrade my workshop machine to replace my older, partially broken Gigabyte motherboard and i7-5775c that has been barely limping along with a single RAM slot still functioning for the last three years. Still works better than this Aorus Master though at this point.
 
Last edited: