Question New build issues with GPU causing Win11 to be corrupted

JWMiddleton

Diamond Member
Aug 10, 2000
5,686
172
106
I have a new case, 1,200W PSU, 2TB M.2 SSD and MicroCenter Combo (Ryzen 9 7900x, ASUS ROG Strix B650E-F and G-Skill DDR5-6000 ram.) It is replacing many of the items in the system in my sig. I got the system running last week and installed a bunch of software and all was working well. I had a few minor issues like a tab would crash some times in Firefox when watching a YouTube video. Also, I'd start Eugene Heaven running and go off and do something else. When I returned Heaven would have closed and I'd be back at the desktop. I had made a few BIOS changes such as EXPO, reduced TDP limit from 95 to 80, and a few other suggestions that would help FPS. No CPU overclocking.

I was running a Radeon RX 6900 XT with the latest drivers. I loaded Steam to try some games and 3dMark. Well, Time Spy says it stated, but the screen was black. It was hard to even get out of it. I tried a few other small games like Quake II RTX as it wasn't much of a download. It worked, but rebooted the system as I exited. Then the system had a problem booting as if the boot drive was corrupted. The more I tried to fix it, the worse it got. I was ready to box it all up and take it back to MC.

This morning I decided to clear the BOIS, so it was in default state. I took out the GPU and unplugged drives D: & E: Running on the Radeon Graphics IGP, I did a fresh install of Windows 11 without the network. Then I installed all the drivers from a USB stick, got it on the network. and started graphics testing. It seemed to be working fine. Then I installed a know good RX 580 GPU and the previous version of the drivers. It was fine until I started to run 3DMark. The screen went blank and would not come back without a reset. When it tried booting into Windows it said it need to go into Recovery Mode. This didn't help and again it seemed to get worse the more I tried. At one point I could not click on the start menu without getting a critical error message. I shutdown the system, took out the GPU and rebooted. It did the recovery thing and is now working fine on the IGP. I've had Heaven running and hasn't crashed for about an hour.

20230716_180655.jpg

So, same issue with 2 different GPU's. What is the problem? Motherboard or PCIe 8-Pin Power or ???

John
 

Tech Junky

Diamond Member
Jan 27, 2022
3,418
1,148
106
Switch back to Intel or get a different board.

Could be the ram. The indicator would be the failed tab as ram is usually the cause for them crashing.
 

JWMiddleton

Diamond Member
Aug 10, 2000
5,686
172
106
Switch back to Intel or get a different board.

Could be the ram. The indicator would be the failed tab as ram is usually the cause for them crashing.

Hi TJ,

I had a doctor's appointment in town, so I've been gone a while. I left the system running Heaven for at least 4 hours and no issues. Thus, I'd think it has nothing to do with the RAM. It is going to be a PITA due to the weight of the system, but I'm going to try the PSU in my sig.

If you read this thread you will see why I gave AMD a try (again.) AARrrgghh...I really f'ed up...Question RESOLVED...
 

Tech Junky

Diamond Member
Jan 27, 2022
3,418
1,148
106
If you read this thread you will see why I gave AMD a try (again.) AARrrgghh...I really f'ed up...Question RESOLVED...
I'm aware.

I've contemplated switching to AMD as well as my first build back in the 90's was AMD. I just can't bring myself to do it though. It seems the polish they used to have on their setup has diminished and some things don't get resolved by their engineers until the next CPU release. For my use though a lot of it would be mitigated because I would be going more of a server build and not gaming. I just can't see paying the premium though for the same performance of Intel. It's a considerable difference in price for CPU/MOBO. The one thing I'm eyeing for AMD is bifurcation of slots to allow for a cheaper M2 Raid build AIC $100 vs $500 and getting the full speed of Gen4 drives vs an aggregate with Intel using PLX switching.

4 hours of idle game running won't trigger the same issue w/o interaction. Parameters need to change to invoke the issue if it's RAM related.

It could just be the MOBO being that GB isn't what it used to be back in the day. I don't keep as tight of a grasp on AMD though as Intel admittedly. Just general observations when news posts on things and file it away for future questions / issues. It would be easier to TS the issues though if not changing platforms being able to potentially swap things between 2 setups to test.

Since the timer is ticking on a return maybe order a different board off Amazon to test everything with and if there's still an issue at least you can send it back to MC for a refund before the return window closes. Amazon's an easy return either way. It's just dicey when dealing with a bigger purchase like this and having stuff not working 100%. This is why I badger people to build right away instead of letting things sit in the box for weeks and then have to deal with an RMA instead of a return.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,369
10,067
126
Have you monitored your temps on the M.2 drive, while operating / testing? Did the mobo come with a heat shield/spreader for the M.2? Are you using it?

Just wondering, if you're using the primary M.2 slot next to the CPU or under a GPU slot, and the heat from the GPU is causing NAND ERRORs while operating?

Just a shot in the dark. Didn't see you post much about storage, except to mention a fresh install of Win11 on a 2TB M.2.

It's not an a-data, by any chance, is it? Those IME can be problematic.
 
  • Like
Reactions: igor_kavinski

JWMiddleton

Diamond Member
Aug 10, 2000
5,686
172
106
Have you monitored your temps on the M.2 drive, while operating / testing? Did the mobo come with a heat shield/spreader for the M.2? Are you using it?

Just wondering, if you're using the primary M.2 slot next to the CPU or under a GPU slot, and the heat from the GPU is causing NAND ERRORs while operating?

Just a shot in the dark. Didn't see you post much about storage, except to mention a fresh install of Win11 on a 2TB M.2.

It's not an a-data, by any chance, is it? Those IME can be problematic.
First of all, I ignored the Intel jab. The drive is the infamous Samsung 980 Pro with the latest microcode, so it should be immune to the reported issues. The SSD is installed just above the metal backplate of the GPU. The M.2 slot does have a heat shield, plus the case is the Lancool 216 with 2 160mm fans in the front and a 120mm in the rear. So, plenty of cooling. Samsung Magician reports 41c. I don't know how to monitor the SSD while gaming.

Now for the fun stuff. After testing it with only the iGPU and a different PSU, I put back the RX 580 and NO PROBLEM! I even killed two chickens in CS:GO. :) Did some 3dMark as well with good results. Then switched back to the 1,200w PSU and it worked with the RX 580. So, all that was left to do was put back in the RX 6900 XT. It seems to be working as well. So, WFT??

John
 
Jul 27, 2020
16,541
10,541
106
Windows getting corrupted is no joke. That's a very serious issue with hardware. Something is heating up or power supply isn't good enough (some important voltage rail could be underpowered). I would first of all remove the Samsung drive from the equation. I don't trust this crappy company anymore. I've read all kinds of horror stories about their monitors.

Since you have an ASUS mobo, go into BIOS and reset all settings to default. Make sure that all the CPU/IMC voltages are within spec. Also make sure that you are on the latest BIOS, even if it's a beta BIOS.
 

JWMiddleton

Diamond Member
Aug 10, 2000
5,686
172
106
Hmm. SSD sounds fine, then. Original 1200W PSU now working, too? Interesting.
The 1200 is new, bought it when I got the case. So, I questioned if it could be the issue. I've had the Seasonic 750w for a few years. Both are 80+ Platinum.

So, now everything works?

HWINFO will log temps across the system while it's open.
I went back and looked at HWINFO and found two sections about the drive, one did have temps. See Thumbnail.

Works? Well, NO....I ran Valley for over an hour and all was fine. Then I started Superpostion 1080p Extreme. After a short period it hung, then rebooted. Took the system a few tries to get back into Windows. So, I still don't know what is causing it. I found this on my desktop this morning as it seems I didn't post before going to bed.

This morning as I was laying awake at 5:30, I thought I'd found a way to reproduce the problem: run Superposition in 1080p Extreme mode. So, I got up and tried it. I think it ran, but I don't see that I have the results saved. I went to the BIOS and turned on EXPO and Resizable Bar. Booted in to Windows and Superposition crashed almost immediately. After it rebooted, it seemed fine. I started Superposition again, but at 1080p Medium, then 1080p High. Both ran fine. I started 1080p Extreme and it crashed and I it recovered without rebooting the system. See thumbnail.

I'm going to try with another GPU and see if I can get it to fail.

John

HWINFO64 SSD Temp.pngSuperposition Crash.png
 

JWMiddleton

Diamond Member
Aug 10, 2000
5,686
172
106
Since the timer is ticking on a return maybe order a different board off Amazon to test everything with and if there's still an issue at least you can send it back to MC for a refund before the return window closes. Amazon's an easy return either way. It's just dicey when dealing with a bigger purchase like this and having stuff not working 100%. This is why I badger people to build right away instead of letting things sit in the box for weeks and then have to deal with an RMA instead of a return.
I tried that with a CPU for my dead Intel board and sent it back. It took 3 weeks a bunch of contact to get the refund. Still trying to figure out what to do which must be done before 8/1.
Windows getting corrupted is no joke. That's a very serious issue with hardware. Something is heating up or power supply isn't good enough (some important voltage rail could be underpowered). I would first of all remove the Samsung drive from the equation. I don't trust this crappy company anymore. I've read all kinds of horror stories about their monitors.

Since you have an ASUS mobo, go into BIOS and reset all settings to default. Make sure that all the CPU/IMC voltages are within spec. Also make sure that you are on the latest BIOS, even if it's a beta BIOS.
I'm not installing anything on that machine other than benchmarks to stress the system until I get it figured out. I updated the BIOS before starting the build. Yesterday I cleared CMOS to reset everything back to default. Although this morning I set EXPO and turned on Re-Bar. ***Just got something useful as I am multitasking*** I started up Heaven with HWINFO64 running to monitor the CPU/IMC voltage. It crashed, but I got a message from AMD that driver had crashed, that went away and I got this error that seems to be pointing to RAM. Now to get Memtest running.
 

Attachments

  • Screenshot 2023-07-18 071725.png
    Screenshot 2023-07-18 071725.png
    2 MB · Views: 4
  • Wow
Reactions: igor_kavinski

JWMiddleton

Diamond Member
Aug 10, 2000
5,686
172
106
I just finished testing the Samsung 980 Pro 2TB M.2 SSD. The Extended failed with the attached error message. It doesn't say that the drive was bad, just that it encountered an error and could not continue. So, I tired each of the other tests and all passed. BTW the drive does have the latest firmware.
 

Attachments

  • Samsung Magician Extended SMART Diag Scan.png
    Samsung Magician Extended SMART Diag Scan.png
    156.7 KB · Views: 4
  • Samsung Magician SMART Diag Scan.png
    Samsung Magician SMART Diag Scan.png
    134.2 KB · Views: 4
  • Samsung Magician Short San.png
    Samsung Magician Short San.png
    156.7 KB · Views: 4
  • Samsung Magician Error Msg.png
    Samsung Magician Error Msg.png
    93.3 KB · Views: 4
  • SSD Firmware Up to date.png
    SSD Firmware Up to date.png
    31.7 KB · Views: 5

JWMiddleton

Diamond Member
Aug 10, 2000
5,686
172
106
Could be flaky IMC too.
I changed the GPU to an RX 6750 XT and the drivers would not install. The error pointed to a lot of possible file corruption. So, I am trying the system with a different M.2 SSD and Windows 10.
Still gonna keep it?
Well, if another SSD fresh install doesn't fix it, then it is going back.
I'm still going with the memory with the above error messages about memory corruption.
You could be right. I just wish I had some other DDR5 sticks to test with.
 
  • Like
Reactions: bba-tcg