Bad Power Supply, Motherboard Memory Socket, or Other?

SamirD

Golden Member
Jun 12, 2019
1,489
276
126
www.huntsvillecarscene.com
So I acquired this Optiplex 990 with an i5-2500, 8gb (4x2gb), 250gb, and win7 a while back.

I locked it down with reboot restore rx, and used it with a portable version of chrome as a desktop to rdp into and use.

Once in a while the system would disconnect from rdp and disappear and when I go look at the screen it has blue screened due to a page fault error. No biggie, just reboot it and use it again. But this kept happening.

I tested the ram using the dell diagnostics built into the system and all is well. Tested it using memtest as well with no issues.

Ran into a sweet deal on a 16GB upgrade and swapped it in and ran the dell and memtest--both passed. It worked without issue for quite a while until one day it was just shut off and wouldn't even boot again. After leaving it unplugged for a few weeks, I tried messing with it again last night and the dell diagnostic lights on the front indicated a memory configuration issue (1 and 3 lit).

I pulled out half the memory (2x4gb, still dual channel), and the system booted again. I shut it down and pulled all the modules and swapped their locations and all 16GB booted fine. Ran dell diag fine, ran memtest fine.

At this point I started to suspect an issue with a memory socket since we've seen issues with both the old set of ram and the new. So I downloaded and ran the intel cpu test (IPCT) on burn-in overnight. Woke up with it saying 'pass'.

I thought okay, seems like it's okay, so I attempted to rdp into it, which worked fine. Tried to launch portable chrome and the system disconnects rdp. Go over to it and again it has the page fault error and blue screen.
clear.png
:(

I don't know if its power supply is a bit weak as stock it's only the stock Dell 265w and has always acted a little odd by taking a few seconds to turn on and sounds like there's relays in it clicking on when initially starting, or if a memory socket is bad causing the page fault, or the cpu/cpu socket is bad. I haven't tried swapping the power supply yet.

I haven't tried a linux live cd yet, and I really don't know which one would be good to stress test it on to replicate any errors, so I'm looking for suggestions on that.

I also haven't tried re-seating the i5 to see if that changes anything. I also have an i3 that I can try if reseating the i5 doesn't change anything, but I don't know if using the i3 will replicate the same architecture under which the error occurs. I do have another i5 installed in another system I could try as well as an i7, but I'd rather not mess with working systems to fix this one.

Open to ideas and suggestions on things to try. I'm usually really good and nailing down bad parts and swapping them out, but I haven't dealt with potential motherboard issues like this before. Thank you in advance!
thumbsup.png
 

VirtualLarry

No Lifer
Aug 25, 2001
56,343
10,046
126
At first pass, sounds like a software error? Have you tested this scenario on the same OS / patch-level, on a different PC? What if you cloned the existing HDD / SSD, moved it to a different PC, and tried the same thing with it? (Keep the software, as much as possible, and change out the hardware underneath it, just to test?)
 

SamirD

Golden Member
Jun 12, 2019
1,489
276
126
www.huntsvillecarscene.com
At first that's what I thought of it too, thinking adjusting the swap file like I read online would fix it.

I didn't try the same os and same patch on another pc because I don't have another one with this exact configuration. I know the different drivers can make a huge difference if it's software so I didn't go down that road.

One of the things in today's testing that I thought I had narrowed down was that it was the hard drive. I cloned it to another known good one using an external hard drive cloner and it had the same problem. So either the sata cable is bad, which I highly doubt since I even tried a newer cable, or it's something else.

I didn't try the cloned drive to another pc because I highly doubt it would even boot as any other system is completely different hardware. But this is a pretty standard config I run on win7 based systems and haven't had issues before.

All this being said, I think it might definitely be a software issue as multiple linux live cds work fine, and I noticed that reboot rx I think is messing with a second volume that it shouldn't be as whenever something is trying to be written to that drive--instant bluescreen.

So I've restored the system default setup (which I had imaged with clonezilla) onto the new drive and will try rebuilding the setup from scratch again and see what happens. I think having the installer for reboot restore rx on my secondary drive did something weird where rr was trying to keep that drive unchanged as well. We'll find out as I'm going to install rr from c: this time and make sure there aren't even any other partitions on the drive for it to mess with. Then I'll add the additional drive and see what happens. Getting closer...
 

VirtualLarry

No Lifer
Aug 25, 2001
56,343
10,046
126
Could this simply be a "memory-leak"-type error?

If you have insufficient pagefile space, and you are running some sort of HDD lock-down program, could that program/driver, be preventing the OS from "growing" the pagefile, to accommodate more memory usage, and thus the OS throwing a blue-screen, because of OOM errors?
 

SamirD

Golden Member
Jun 12, 2019
1,489
276
126
www.huntsvillecarscene.com
That's what I was thinking initially until I turned off the program, changed the swap file and re-enabled the program. With 16GB of ram and a 16GB swap file, it wouldn't need to swap at all when just copying files over the network using explorer and that's when it hit.

I only remembered later that it seems the second volume on the drive that doesn't need a swap file is listed as having one which would probably mess up reboot restore. I'll find out as soon as I rebuild this config and test again.
 

SamirD

Golden Member
Jun 12, 2019
1,489
276
126
www.huntsvillecarscene.com
So as an update to this--rebuilt the config from scratch, this time with a permanent 16GB swap file (for the 16GB of ram) and installing reboot restore only with a single drive (c:) and everything has been working fine with no reboots now for about 2 weeks. Glad that it's a software issue, and I think I've solved it with this particular configuration.

I plan to image this configuration, and then put it back onto the original drive from my working drive (which runs a lot hotter than I want for the environment this system will get deployed in). If it works fine then I'm done. :D