Question Repeated restarts (8x in ~5 hours)

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Steelbom

Senior member
Sep 1, 2009
455
22
81
Hi,

I'm having issues with my pc restarting abruptly. There is no BSoD or error otherwise. It just restarts.

I'm trying to diagnose what the issue is. I checked the event logs but did not see anything under system that stood out to me.

My first suspicion is memory. (I can't run my 2x32GB 5600MHz kit at 5600MHz -- only at 4800MHz.)

Could anyone point me in the right direction on where to start?

Specs:
Windows 11 Pro 10.0.22621 Build 22621
AMD 7950X
Gigabyte X670E Aorus Extreme
2x32gb 5600MHz
Gigabyte 3090 Ti
ASUS 3050 Aero
2x Ark Odyssey (one into the 3090 Ti via HDMI, one into the 3050 via HDMI)
1x Samsung 49" G9 Neo (DP into the 3090 Ti)

I've got some Windows updates to do but have been unable to run long enough to do a full backup of my OS drive. Working on that...

Cheers,
SB
 

Steelbom

Senior member
Sep 1, 2009
455
22
81
@Steelbom As a general rule I'd have everything listed in Device Manager as having properly installed drivers. I think previously you said you just disabled it.
I see. The Intel driver was up-to-date (apparently) but I disabled it anyway. I didn't want to click "Uninstall device" as I'm not sure if it's easy enough to install it again.
I'm kinda glad that you experienced these issues and successfully fixed them. Enlightening thread. Should help someone else encountering the same issues in future.
Yeah, hopefully!
@Steelbom so is the only hardware change you've made to remove the 3050?
Correct. I will do a summary at the bottom of this post.
Which Bios/AGESA version are you running?
I am running F6H (AGESA 1.0.0.4). I was previously on F6C (1.0.0.3).

----

Summary

- Backed up Windows + made restore point (no other programs running)
- Updated Windows to latest version
- Updated graphics drivers to latest
- Enabled Core Isolation
- Removed all USB except mouse/keyboard
- Ran dism, sfc and chkdsk (some corruption was fixed successfully)
- (Crashing still a problem at this point)
- Disabled Intel Driver, Bluetooth Driver
- Updated BIOS from F6c to F6h which included a microcode update to AMD AGESA 10.0.04 and "improved memory compatibility".
- Performed memtest86 for 1 full pass (not very long) (success)
- Rebooted and my monitor wouldn't turn on. The 3050 wasn't showing up in my system.
- Removed 3050 and no problems since doing all the same actions/operations.


Going fine so far! Work, gaming, a little stress test. Been fine since my last post!

---

Thanks again for all the help!
 
  • Like
Reactions: igor_kavinski

Steelbom

Senior member
Sep 1, 2009
455
22
81
And it's back :coldsweat:

Since my last post and yesterday I only had one reboot. However this morning, I've had about 4 in a short period.

Some select entries from Event Viewer:

Windows Logs/Setup
1 of 4 instances of system store corruption have been repaired. Unrepaired corruptions may lead to failures in future system servicing.

Windows Logs/System
Source: Disk - Disk 6 has been surprise removed.
Source - Bonjour Service: mDNSCoreReceiveResponse: Received from 192.168.80.1:5353 23 B.6.1.E.E.3.7.8.3.B.1.2.6.D.3.2.0.0.0.0.0.0.0.0.0.0.0.0.0.8.E.F.ip6.arpa. PTR
Source - Kernel-Power: The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.


Interestingly... I'm not sure what Disk 6 is (although I do have 6 drives, only 5 are labelled disk 1-5 in the Disk Management app).

I feel like I need to do a complete clean install of Windows 11... this one has been used for several years now.

@Steelbom As a general rule I'd have everything listed in Device Manager as having properly installed drivers. I think previously you said you just disabled it.
There is still an unidentified PCIE device but I'm not sure what it could be.
 
  • Wow
Reactions: igor_kavinski

Tech Junky

Diamond Member
Jan 27, 2022
3,825
1,343
106
Disk 6 has been surprise removed.
Windows is dumb and it might think a partition is a disk in some instances.

If you boot to a Linux livecd and open disks it will show you all of your disks.


only 5 are labelled disk 1-5
Should be 0-5

5 would be disk 6.

If there's a power issue or data issue it might cause the crash of it's loose.

When I had random reboots I thought it was power but it turned out to be a driver issue causing it in the end.
 

mikeymikec

Lifer
May 19, 2011
20,992
16,236
136
@Steelbom

My advice is *never* do a clean OS install unless you're reasonably certain that you've dealt with the hardware problem.

Ignore the bonjour message.

Data corruption / loss is most directly a storage problem, but it could be a symptom of say faulty memory. You've done one pass with memtest86, I think I'd set it to run overnight (I tend to set an absurd number of passes like 99 on v7.4, there's no way it'll finish overnight). Seeing the SMART stats for the drives wouldn't be a bad idea, for example bad sectors or CRC errors would be a good lead there.
 
  • Like
Reactions: igor_kavinski

Steelbom

Senior member
Sep 1, 2009
455
22
81
Windows is dumb and it might think a partition is a disk in some instances.

If you boot to a Linux livecd and open disks it will show you all of your disks.



Should be 0-5

5 would be disk 6.

If there's a power issue or data issue it might cause the crash of it's loose.

When I had random reboots I thought it was power but it turned out to be a driver issue causing it in the end.
I see I see. Thanks. I have been considering whether its a power issue.
@Steelbom

My advice is *never* do a clean OS install unless you're reasonably certain that you've dealt with the hardware problem.

Ignore the bonjour message.

Data corruption / loss is most directly a storage problem, but it could be a symptom of say faulty memory. You've done one pass with memtest86, I think I'd set it to run overnight (I tend to set an absurd number of passes like 99 on v7.4, there's no way it'll finish overnight). Seeing the SMART stats for the drives wouldn't be a bad idea, for example bad sectors or CRC errors would be a good lead there.
All right, I'll try to do a longer memtest86. See if we can rule out the memory or not.

Smart status seems OK. Everything is "good".

The reason I'm thinking about the clean install is in case I have conflicting drivers. My current drive has been through several different AMD CPUs and from an AMD GPU to NVIDIA GPU.... maybe something went wrong along the way.
What's the most recent hardware change you did (even a new connected USB counts)?
Added two small external keypads. That's about it though. I've unplugged my webcam, one of the keypads (which was the most recent) and an external raid bay. Just have Mouse+KB+Mic+DAC now.
 

mikeymikec

Lifer
May 19, 2011
20,992
16,236
136
Smart status seems OK. Everything is "good".

"Good" depends on the program. CrystalDiskInfo AFAIK ignores CRC errors, which frankly I'd regard >0 as noteworthy.

The reason I'm thinking about the clean install is in case I have conflicting drivers. My current drive has been through several different AMD CPUs and from an AMD GPU to NVIDIA GPU.... maybe something went wrong along the way.

Added two small external keypads. That's about it though. I've unplugged my webcam, one of the keypads (which was the most recent) and an external raid bay. Just have Mouse+KB+Mic+DAC now.

Yes, but if you rely on this PC then you're potentially just making more work for yourself. For example, a PC that is corrupting data a fair bit could end up with failing to install Windows until you find the hardware issue, but now you've deprived yourself of an OS with which to run more tests. You're also potentially destroying evidence of a problem.
 

Steelbom

Senior member
Sep 1, 2009
455
22
81
"Good" depends on the program. CrystalDiskInfo AFAIK ignores CRC errors, which frankly I'd regard >0 as noteworthy.
Do you have a recommendation? (Paid software is fine)

Yes, but if you rely on this PC then you're potentially just making more work for yourself. For example, a PC that is corrupting data a fair bit could end up with failing to install Windows until you find the hardware issue, but now you've deprived yourself of an OS with which to run more tests. You're also potentially destroying evidence of a problem.
I wouldn't wipe my drive -- I would set up a new nvme drive and copy data over. So I should have access to both but I'll leave this a bit later until I've done some more tests like you suggested.

>>>

On a similar note, I am having a sleep issue again. The computer goes to sleep, monitors turn off, then fans spin very loudly for a split second and then the monitor turns back on at the lock screen.

Also going to try the F7b BIOS and see if that is improved at all. I'll also drop the RAM back to 4800MHz (from 5600MHz) just in case.
 

Tech Junky

Diamond Member
Jan 27, 2022
3,825
1,343
106
The easiest method to rule out the os is boot to Linux off a USB drive and let it sit. If it's driver related it won't reboot. If it's hw then it might reboot with some stress.

As to the wipe, the above gives you the ability to boot and copy. I wouldn't copy anything though if there's a chance the corruption is in that data causing the issue.

Another thing for the GPU is use ddu to strip out all of the drivers. If you have other drivers remove them from device manager and reinstall fresh.
 
  • Like
Reactions: igor_kavinski

mikeymikec

Lifer
May 19, 2011
20,992
16,236
136
Do you have a recommendation? (Paid software is fine)

If you don't understand the stats then post screenshots. You may want to blank out the serial numbers though.

On a similar note, I am having a sleep issue again. The computer goes to sleep, monitors turn off, then fans spin very loudly for a split second and then the monitor turns back on at the lock screen.

A trawl through the system event log at the time of the sleep transition attempt might bring something up of interest. There's also a powercfg command that will analyse what happened during a triggered sleep transition which may tell you which device is being a naughty boy. TBH I doubt that investigating this will bring you to the root cause of the main problem though so I'd put this on the backburner unless you run out of things to investigate.

Also going to try the F7b BIOS and see if that is improved at all. I'll also drop the RAM back to 4800MHz (from 5600MHz) just in case.

Getting rid of any OC is a good idea, though try to avoid changing multiple things per attempt to correct the problem.
 

Steelbom

Senior member
Sep 1, 2009
455
22
81
The easiest method to rule out the os is boot to Linux off a USB drive and let it sit. If it's driver related it won't reboot. If it's hw then it might reboot with some stress.

As to the wipe, the above gives you the ability to boot and copy. I wouldn't copy anything though if there's a chance the corruption is in that data causing the issue.

Another thing for the GPU is use ddu to strip out all of the drivers. If you have other drivers remove them from device manager and reinstall fresh.
Good idea. My only concern with booting into another drive is the reboots are typically quite rare so I may need to stay in that OS for several weeks to see a reboot occur. And, the reboot often occurs whilst working in WSL2, but I'm not sure if that's a coincidence or not.

I didn't use DDU, but I did run the AMD Cleaner utility to remove catalyst and some old AMD drivers (now I'm on a NVIDIA card).
If you don't understand the stats then post screenshots. You may want to blank out the serial numbers though.
I'll attach them here.

A trawl through the system event log at the time of the sleep transition attempt might bring something up of interest. There's also a powercfg command that will analyse what happened during a triggered sleep transition which may tell you which device is being a naughty boy. TBH I doubt that investigating this will bring you to the root cause of the main problem though so I'd put this on the backburner unless you run out of things to investigate.
Good idea. I've done this for the reboots but not the sleeping issue. For now, I've tried turning on ErP (bios level) and turning off Wake from LAN (bios+OS level).

Getting rid of any OC is a good idea, though try to avoid changing multiple things per attempt to correct the problem.
I was mistaken -- it was already at 4800MHz. I must have disabled the OC earlier.
 

Attachments

  • d-drive.jpg
    d-drive.jpg
    156 KB · Views: 3
  • h-drive.jpg
    h-drive.jpg
    155.1 KB · Views: 2
  • e-drive.jpg
    e-drive.jpg
    155.7 KB · Views: 2
  • g-drive.jpg
    g-drive.jpg
    145.7 KB · Views: 2
  • F drive.jpg
    F drive.jpg
    147.5 KB · Views: 2
  • c-drive.jpg
    c-drive.jpg
    149.4 KB · Views: 1

Tech Junky

Diamond Member
Jan 27, 2022
3,825
1,343
106
My only concern with booting into another drive is the reboots are typically quite rare so I may need to stay in that OS for several weeks
Well, considering how often it's been rebooting I would think you would know within 24-48 hours if it's a hw issue or windows.

While it's in Linux run some stress tests to force a potential reboot.
 

mikeymikec

Lifer
May 19, 2011
20,992
16,236
136
G and F drive are reporting >0 "number of error information log entries". According to the interwebs this is not alarming in itself, but as part of a stability investigation I'd consider looking into it. Apparently this attribute is incremented when a SSD encounters an error that it deems worthy of reporting back to the OS and making an internal log entry for. Apparently that log can be retrieved in Linux using the smartctl terminal program. I imagine the likes of Samsung Magician or whatever other manufacturer's utility software would be able to retrieve this information too. It might be worth contacting the relevant manufacturers to find out how to access this information and deciphering it.

"Rare stability issues" - my earlier advice about memtest86, I'd consider running it for as long as you can. The default is 4 passes which is usually adequate to find memory problems in my experience, but some take more testing than that. Alternatively running the default 4 pass configuration on several occasions might yield some positive results. Also, it might be worth asking in the memory subforum as others have made suggestions for allegedly better alternatives to memtest86 but I can't remember what those were.
 
  • Like
Reactions: Steelbom

Steelbom

Senior member
Sep 1, 2009
455
22
81
Thanks again guys. Just dropping an update so far:

I suspect resolving the PCIE issue that was showing up in Device Manager has resolved the problem.
I noticed when I tried to update to the latest chipset drivers, it said there was an error updating something related to PCIE in the logs/summary.
I solved it by uninstalling the previous chipset drivers and re-installing the new ones which worked (the PCIE thing installed correctly) and the error from Device Manager disappeared.

I still had some sleep issues which I resolved by setting "Wake from LAN" to OFF in both BIOS and Windows and also enabling "ErP".

- Still got to do the memtest and am keeping an eye on the drives
 
  • Like
Reactions: Tech Junky