Question Random crashes, need help narrowing down the source

In2Photos

Golden Member
Mar 21, 2007
1,629
1,651
136
I have an old PC, built in 2009, i7-920, MSI x-58 Pro mobo, 24GB RAM, MSI GTX 1660 Super, Corsair 650W PSU. Over the last month or so it has been crashing frequently. I started to wonder about the PSU as it's 12 years old, but I'm starting to lean another direction and figured I'd ask here for some help.

This PC is in my office and runs Folding at Home 24/7 unless I'm playing games on it, which isn't that often these days due to lack of time. I used to have an occasional crash, once every 4-5 months maybe, but the last month or so there is at least one a week, typically more. Temps on everything are upper 60s to low 70s during Folding so I don't think there is a temp problem. I have a slight OC on the CPU at 3.6GHz and one on the GPU, but I have tried removing the GPU OC and it still crashes. Every time I notice it has rebooted I check the event viewer, but all it says is that the PC was not shut down cleanly. I looked into a lack of crash dumps and evidently I did not have a page file set up. My C drive is only a 128GB SSD and with 24GB of RAM it looks like I need a healthy page file, but I only have 30GB of space on the drive, maybe some advice here? Yesterday I noticed that Windows Defender updated about 3 hours before the crash, so I started looking back further and sure enough every time the PC crashed it was 3-12 hours after a Defender update. There seems to have been several updates this week and the PC has crashed every time after updating and that trend continued back through to the beginning of June where I stopped looking.

Has anyone heard of anything like this before? I have disabled Windows Defender for now to see if that is in fact the problem or just coincidence.
 

Kamrooz

Member
Apr 14, 2013
28
3
71
Not sure if an update could cause the problem. Could try rolling back or doing a fresh install with an older build and see if it goes away. But a 12 year old psu can cause issues for sure if it's not capable of outputting the required load anymore considering it's age.

But as always, I suggest running memtest always. The amount of times I've come across stability issues due to faulty ram is astonishing. But yea, 12 years is quite a lot. I'd probably pick up a local PSU from a retailer and see if it solves the problem. If it does, you know the culprit, if it doesn't, you got 30 days at most places to return the new psu. Run memtest to check for faulty memory.

Is your system blue screening? is it outputting dump files? If so, you can do some research on running WinDBG to analyze the dump files, to see if they are all pointing to one specific cause, that will narrow it down to a software issue, and a possible solution.
 

Steltek

Diamond Member
Mar 29, 2001
3,042
753
136
Memory testing as previously suggested is a very good idea, as is swapping the PSU.

Some other ideas to ponder.

Make sure the system is writing kernel dumps. Then, install the free version of either Whocrashed or Nirsoft's BlueScreenView as either of these programs can analyze Windows dump files (antivirus may object to BlueScreenView, but is is a legitimate program so don't worry about any warnings). If the system was already set up to write kernel dumps, you can run the installed software immediately. Otherwise, check it the next time after the system shuts down.

If the system is not writing a dump file at all, it means that it is just turning off. Which, can be heat related, or can mean that you have bad hardware (i.e. a bad PSU, a bad video card, or a bad motherboard. I'd say your CPU is fine as you have good temps and a CPU going bad is a rare thing).

How long has it been since you blew the dust out of the system, the heatsinks, and the PSU? Does it shut off if you run it with the side of the case open and a fan blowing in it? Does it shut off if you do this and remove the overclock (resetting the BIOS to defaults)?

There is also another possibility regarding overheating. I dimly recall the X58 Pro (I had one for a while, years ago) even running stock, new out of the box had issues with the northbridge temps running just below Intel's thermal limits. You might check the northbridge temps in the BIOS (or install temperature monitoring software). If the northbridge is overheating (which for that chipset is anything closely approaching 100C), this could be the issue as it could be tripping a thermal protection shutoff. If this turns out to be the case, the only fix is to remove both the northbridge and southbridge heatsinks (I think maybe they had a heat pipe connecting the two heatsinks???) and replace MSI's especially sucktastic thermal material (which is probably even worse now than it was after 12 years) with something good (any decent CPU TIM will work). This was a widespread issue, so even without looking I am sure there will be a few videos on Youtube showing what to do.
 
Last edited:

solidsnake1298

Senior member
Aug 7, 2009
302
168
116
The issue you are describing sounds a lot like the problems I had this time last year when I was folding@home for COVID. I was diagnosing another problem I had where my audio setup would hum every minute or so. Turns out I had a ground loop caused by my laser printer. I turned off the laser printer when it wasn't in use and the hum stopped. But I also noticed that my computer no longer randomly crashed when folding@home was running. I hypothesized that the ground loop was causing the voltage going into the PSU to fluctuate, dip below a certain level for long enough, causing the system to power off due to the high, constant power requirements of CPU+GPU folding.

I wonder if you also have a ground loop.
 

In2Photos

Golden Member
Mar 21, 2007
1,629
1,651
136
Not sure if an update could cause the problem. Could try rolling back or doing a fresh install with an older build and see if it goes away. But a 12 year old psu can cause issues for sure if it's not capable of outputting the required load anymore considering it's age.

But as always, I suggest running memtest always. The amount of times I've come across stability issues due to faulty ram is astonishing. But yea, 12 years is quite a lot. I'd probably pick up a local PSU from a retailer and see if it solves the problem. If it does, you know the culprit, if it doesn't, you got 30 days at most places to return the new psu. Run memtest to check for faulty memory.

Is your system blue screening? is it outputting dump files? If so, you can do some research on running WinDBG to analyze the dump files, to see if they are all pointing to one specific cause, that will narrow it down to a software issue, and a possible solution.

Good idea, I'll run memtest. I have yet to see a blue screen, but I'm not at the machine when it crashes. Unfortunately there are no dump files either.
Memory testing as previously suggested is a very good idea, as is swapping the PSU.

Some other ideas to ponder.

Make sure the system is writing kernel dumps. Then, install the free version of either Whocrashed or Nirsoft's BlueScreenView as either of these programs can analyze Windows dump files (antivirus may object to BlueScreenView, but is is a legitimate program so don't worry about any warnings). If the system was already set up to write kernel dumps, you can run the installed software immediately. Otherwise, check it the next time after the system shuts down.

If the system is not writing a dump file at all, it means that it is just turning off. Which, can be heat related, or can mean that you have bad hardware (i.e. a bad PSU, a bad video card, or a bad motherboard. I'd say your CPU is fine as you have good temps and a CPU going bad is a rare thing).

How long has it been since you blew the dust out of the system, the heatsinks, and the PSU? Does it shut off if you run it with the side of the case open and a fan blowing in it? Does it shut off if you do this and remove the overclock (resetting the BIOS to defaults)?

There is also another possibility regarding overheating. I dimly recall the X58 Pro (I had one for a while, years ago) even running stock, new out of the box had issues with the northbridge temps running just below Intel's thermal limits. You might check the northbridge temps in the BIOS (or install temperature monitoring software). If the northbridge is overheating (which for that chipset is anything closely approaching 100C), this could be the issue as it could be tripping a thermal protection shutoff. If this turns out to be the case, the only fix is to remove both the northbridge and southbridge heatsinks (I think maybe they had a heat pipe connecting the two heatsinks???) and replace MSI's especially sucktastic thermal material (which is probably even worse now than it was after 12 years) with something good (any decent CPU TIM will work). This was a widespread issue, so even without looking I am sure there will be a few videos on Youtube showing what to do.

I'm not sure if the lack of dump files is due to the system just turning off or if it had anything to do with the lack of a page file. The PC is clean. I blow it out frequently and thermal paste was changed about a year ago on the CPU. Good memory on the northbridge!!! I tried a few methods to get the temps down years ago, eventually using some new thermal paste and nylon bolts to secure the stock heatsink down, but when I started folding that didn't work well enough so I picked up a heatsink with a small fan. My temps max out at 60C now.
The issue you are describing sounds a lot like the problems I had this time last year when I was folding@home for COVID. I was diagnosing another problem I had where my audio setup would hum every minute or so. Turns out I had a ground loop caused by my laser printer. I turned off the laser printer when it wasn't in use and the hum stopped. But I also noticed that my computer no longer randomly crashed when folding@home was running. I hypothesized that the ground loop was causing the voltage going into the PSU to fluctuate, dip below a certain level for long enough, causing the system to power off due to the high, constant power requirements of CPU+GPU folding.

I wonder if you also have a ground loop.
I haven't added anything to the system in a while so it would have to be something that has gone bad. I have a few USB devices connected but haven't noticed anything else like a hum or lights flickering or things dropping in and out. But I will certainly keep looking, thanks.
 

solidsnake1298

Senior member
Aug 7, 2009
302
168
116
I'm not sure if the lack of dump files is due to the system just turning off or if it had anything to do with the lack of a page file. The PC is clean. I blow it out frequently and thermal paste was changed about a year ago on the CPU. Good memory on the northbridge!!! I tried a few methods to get the temps down years ago, eventually using some new thermal paste and nylon bolts to secure the stock heatsink down, but when I started folding that didn't work well enough so I picked up a heatsink with a small fan. My temps max out at 60C now.

My PC does have a pagefile configured and I did not get any dump files either. The event viewer showed the same thing you are seeing. That everything was fine, then suddenly logs indicating that it was booting up with messages that the last power down was not graceful.

I haven't added anything to the system in a while so it would have to be something that has gone bad. I have a few USB devices connected but haven't noticed anything else like a hum or lights flickering or things dropping in and out. But I will certainly keep looking, thanks.

That was the weird thing. My lights didn't flicker. Nor did my wife notice any hum on her headphones. But she was driving her headphones with the onboard sound card, behind all the capacitors and filtering on the motherboard and PSU. Whereas my audio setup is discrete with an external DAC and headphone amp with their own power supplies. My monitors didn't flicker. The only signs were my UPS occasionally switching over to battery (there was an audible click, which also tipped me off to some sort of electrical problem), the hum in my speakers/headphones (not connected to UPS), and my computer crashing when under a very heavy load (550-600W while CPU+GPU folding on a 750W PSU).

Ground loops wouldn't be caused by anything on your computer, but other devices/appliances on the same circuit. In my case a laser printer with a 3 prong plug.

Assuming your Memtest doesn't reveal a memory stability problem, examine what other things are connected with 3 prong plugs on the circuit your computer is on and try turning them off or unplugging them.