Question Resets/reboots with no crash dump (another weird problem)

tcsenter

Lifer
Sep 7, 2001
18,929
563
126
HA! I got another weird problem. I am cobbling together a system using a mobo from Dell Precision 3650 Tower (workstation), Intel W580 chipset for 10th/11th gen processors, Xeon W-1250. I was able to update the BIOS to latest (flash drive), get Windows 11 Pro 24H2 (UEFI mode with Secure Boot ON) installed and things started getting screwy. Latest drivers for Intel chipset, RST/AHCI, graphics.

When display is connected to the integrated Intel UHD graphics (via DP), within a couple minutes of desktop loading, the system just spontaenously resets, reboots with no crash dump. Unpredictable but usually within about 90 seconds from desktop loading, though it has made it up to about 5 minutes. There is nothing in Event Viewer except the bugcheck about Windows not shutting down properly. And no crash/kernel/memory dumps.

After a bunch of trial and error, I managed to learn when I install a graphics card in PCI-E x16 PEG slot, it doesn't reset. So right now you're thinking, AH HA, the integrated graphics must be defective. NO! When continuing to use the Intel processor graphics for display output. EVEN when I disable the dGPU in Device Manager so that I can be sure OS and applications are utilizing the integrated UHD graphics for everything and not offloading to the dGPU, confirmed by using Task Manager to view the utilization of the graphics adapter. Everything passes CPU tests, 3D benchmarks, video decoding/encoding tests, etc!

When I pull that darned dGPU, the resets return! It something to do with the integrated graphics but it is not defective. I ran PCMark 10 for like two hours. 3DMark for like two hours, on the Intel graphics. PERFECT! I also tested while connected to dGPU card and it is 100% stable too.

I have reset the BIOS once, loaded UEFI Defaults twice, disabled almost all of the integrated peripherals/devices, USB controllers except for back panel. Changed primary display enumeration to Auto, Onboard, and dGPU (tried all three). PCI decoding above 4GB on and off, ReBAR off (which is always safe). Went back to the last BIOS version that it permits me to several versions ago. No change.
 
Last edited:

tcsenter

Lifer
Sep 7, 2001
18,929
563
126
Clean install of W10 on #2 (identical) motherboard results in same behavior. Using integrated graphics alone, spontaneous reboots. Install any graphics card in PEG (processor) slot, stable as a rock THROUGH the integrated graphics, not using the dGPU card.

I suppose next is to acquire another 10th/11th gen CPU w/IGP. OR just run with graphics card installed. I wanted to run headless but a spare $30 graphics card is cheaper than a Xeon. :p
 
Last edited:

tcsenter

Lifer
Sep 7, 2001
18,929
563
126
How I managed to stumble on this by accident, I was intending to run from a dGPU to eliminate the onboard graphics as the culprit, just for troubleshooting. But after installing dGPU card, got distracted for five minutes, came back to the PC and mistakenly connected the display cable back into the IGP ports instead of the dGPU. Was running the PC for like an hour thinking AH HA the processor IGP was 'bad' before I realized, "Wait, I'm still plugged to the IGP port...WTF?'
 

tcsenter

Lifer
Sep 7, 2001
18,929
563
126
Got Rocket Lake i5-11600K to rule-out the Comet Lake Xeon W-1250. Same problem, but even worse! Unlike before, I can't even get it to load Windows SETUP. As soon as it starts loading the OS (WinPE) boot files = reboot! I have reset/cleared BIOS, load UEFI defaults, etc. No change.

When I insert a PCI Express graphics card, keep primary display adapter in BIOS to Onboard or Auto, with monitor connected to the onboard iGPU, everything works as you would expect!

So it not the CPU. Something gotta be going on with the chipset/BIOS, low level PCI/PCIE resource configuration or assignment (firmware). GAAAAHHHH!
 

tcsenter

Lifer
Sep 7, 2001
18,929
563
126
Yah man. Using dGPU or IGP runs and runs. As long as that PCI Express card is inserted, as though it were a darned security dongle or something. I tested Quadro P620 and GTX 1660 Super (not at the same time). Both run at their full speed/lane configurations.
 
  • Like
Reactions: DAPUNISHER

mikeymikec

Lifer
May 19, 2011
20,895
16,148
136
Are there any remotely relevant and odd BIOS options that might have a bearing on this situation? I'm thinking this is Dell we're talking about after all, so if it included some proprietary tech that made other stuff go wonky, it wouldn't surprise me.

Have you had this system from new? If not, I wonder whether there is a PCIE dongle available for it. An old AM3 board I used to use (ASUS M4A89GTD PRO / USB3) used to have what it called a PCIE switch card, if you wanted to use a graphics card in the board you had to use the second slot down and use the switch card in the first slot, without it you would only get an 8x link for the graphics card.
 

tcsenter

Lifer
Sep 7, 2001
18,929
563
126
Not that I have seen. I look through the marketing materials, service guide, for options. There are some device options not populated because I don't have all the hardware for them, but they are all options in BIOS to enable/disable. Anything I don't have populated, I have disabled. Now that I finally plugged a fan into the "SYSFAN" header, I don't receive any pre-boot errors at all.

As I mentioned, I wanted to run headless, plug to the onboard video as needed, but I guess I could keep a cheapo $30 graphics card in there. But I also don't want to put something into service that may be moribund, just needing another month or three to finally go kaput. Performance of all devices are consistent with the specs too, no reported configuration problems, driver issues. NUTTIN! I never seen a case where adding in a device healed a board that starts out broken. Usually the opposite case. :D

There is a brand new BIOS from DELL as of today! Notes do not mention anything relevant, just more security updates/patches but I'll give it a go.
 
  • Like
Reactions: igor_kavinski
Jul 27, 2020
28,038
19,143
146
I'm suspecting some driver+BIOS setting issue.

To rule the driver out, try using the Microsoft Basic Display adapter.

You may need to install some Sysinternals tool that logs what applications are loaded at startup etc.

Going over that data (it may be a lot, especially if the logging is extensive), you may be able to pinpoint the offending process.

I once got a pat on my back from my IT manager because there was a certain lame workaround they were using due to giving up on finding a solution for some sort of serious freezing issue on ALL Dell Latitude laptops and I was informed of that workaround as part of my briefing on how to prepare new laptops for our employees. Well, I'm too nosy to keep doing that like a drone so I looked at the processes, noticed that there was some odd encryption application being installed as part of the standard Dell software suite. I skipped installing that since no one was using it and it was only being installed because everyone thought it was required else why would Dell include it for installation?

The issue was solved and just like that, the workaround vanished from the standard installation procedure of those laptops :D
 
  • Like
Reactions: crashtech

In2Photos

Platinum Member
Mar 21, 2007
2,557
2,762
136
Is it possible that one of the pins in the PCIE slot is bent so when the card is removed it makes contact with the other side? Inserting a card pushes the pin back away and everything works fine?
 

tcsenter

Lifer
Sep 7, 2001
18,929
563
126
Is it possible that one of the pins in the PCIE slot is bent so when the card is removed it makes contact with the other side? Inserting a card pushes the pin back away and everything works fine?

It happens with two identical motherboards. I checked one over and no pin derangement, either in PCI Express slot or CPU socket.
 
  • Like
Reactions: In2Photos

Shmee

Memory & Storage, Graphics Cards Mod Elite Member
Super Moderator
Sep 13, 2008
8,205
3,126
146
This is very odd, my only thought is it must be something defective with the Dell firmware or something...
 
  • Like
Reactions: tcsenter

tcsenter

Lifer
Sep 7, 2001
18,929
563
126
Tried a different PSU (Antec) and that seemed to improve things, I could boot to the desktop and it ran great! I even ran 3DMark one pass and thought OMG it was the PSU all along? But 15 or 20 minutes of usage = reboot.

The ONLY configuration that is stable is when a graphics card is inserted into the PEG slot. I noted that when I was able to run that pass of 3DMark Night Raid with no graphics card inserted, the result was ~9600 on the Intel graphics. When I insert a graphics card BUT run the benchmark on the iGPU (Intel UHD), the result is lower ~8600. This I have verified twice now, in each configuration (when I was able to get that far without a graphics card inserted). Another interesting note is that when I changed the 'rendering device' from NVIDIA graphics card to the Intel graphics, the application warned "rendering device is not connected directly to the selected display" but in fact the display IS plugged to the onboard graphics port, which is the Intel UHD graphics. It further suggests to me there is some kind of PCIe routing, lane reversal or switching bug here.

So changing these things seems to be altering something, getting a lot further than half-way through OS loading but in the end, whether it is 5, 10, 15, or 20 minutes, it will reboot spontaneously. With graphics card inserted = UPTIME FOR HOURS AND HOURS.
 

tcsenter

Lifer
Sep 7, 2001
18,929
563
126
Nope there is no bugcheck or crash dump. It spontaneous reboot or reset. The only thing in Event Viewer is the notification that Windows was not shut down properly that gets logged at next boot.
 

tcsenter

Lifer
Sep 7, 2001
18,929
563
126
Understood but is automatic restart enabled or disabled?

It is enabled, automatic crash dump as well. This option is for an invoked 'soft' restart. I am experiencing spontaneous reboots, analogous to what happen by pressing Ye Ole "hard reset" button/switch of PC eras gone by (and many motherboards continue to have but increasingly fewer chassis offer the button). Or as the Three Finger Salute will still do, if you are in DOS/console environment that does not intercept those keystrokes to invoke a "safe" restart prompt.
 
  • Like
Reactions: igor_kavinski
Jul 27, 2020
28,038
19,143
146
But 15 or 20 minutes of usage = reboot.
Did it spontaneously reboot while a workload was running? How about run CB R23 indefinitely and turn on HWinfo logging? If it doesn't reboot, then you know it's something to do with light boosting instability. Otherwise, the last entry of the log may contain some clue.
 

tcsenter

Lifer
Sep 7, 2001
18,929
563
126
Did it spontaneously reboot while a workload was running? How about run CB R23 indefinitely and turn on HWinfo logging? If it doesn't reboot, then you know it's something to do with light boosting instability. Otherwise, the last entry of the log may contain some clue.

Man it has rebooted in every loading scenario that comes after POST. Two seconds after hand-off to the OS, as the desktop was loading, with one browser Window open and I was reading (nothing else running), and at idle when I was away from the computer with nothing running (except Windows). It has never rebooted before OS hand-off. e.g. in UEFI/BIOS Setup or during POST.

I ran CB 2023 for 30 minutes. Hit TjMAX @ 100'C. With the graphics card inserted. No problem.
 
  • Wow
Reactions: igor_kavinski

tcsenter

Lifer
Sep 7, 2001
18,929
563
126
I got these mobos for cheap, NEW field service replacements that did not even have a Service Tag, Express Service Code, asset tag, ownership or manufacturing date set, both booted up initially in BIOS Manufacturing Mode.

I saw a convo somewhere maybe a year ago, someone had several Precision 3650 Towers purchased when this model was newly shipping, i.e. early revisions, and motherboards had to be replaced due to system instability or mobo failures.

The seller had "more than 10" available. I'm thinking now there was a reason they were so cheap and he had so many available.... :rolleyes::p
 
Last edited:
  • Like
Reactions: igor_kavinski

tcsenter

Lifer
Sep 7, 2001
18,929
563
126
Yesterday, I experienced one spontaneous reboot with the graphics card inserted. While on YouTube. Actually the video had finished, I was scrolling through other recommended or related video links. First and only time! Nothing in Event Viewer. Not sure what to make of it. One off?

I am developing testing fatigue and might just decide to scrap both mobos to ewaste. Fuuuuugh
 
Jul 27, 2020
28,038
19,143
146
I am developing testing fatigue and might just decide to scrap both mobos to ewaste. Fuuuuugh
Assuming it's a Windows issue, you could test drive Linux if internet browsing is going to be the main task. When you are not using the system, just leave something on it running, even some message in Notepad or whatever the analogue is in Linux and keep checking from time to time if it rebooted.
 

tcsenter

Lifer
Sep 7, 2001
18,929
563
126
That's it! I got reboot #2 with the graphics card inserted. I'm making an 'executive decision' - to ewaste both mobos are going. I think there is some wonky component or VRMs on the mobo. What is the interaction that causes it to be least symptomatic or being masked when PEG slot is populated, I don't know. But time to move on.

Thanks for replies!
 
  • Wow
Reactions: igor_kavinski