Dead Computer

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
A storm came through and took out my power. Even though I had a nice Tripplite isolated surge protector, my computer still seems tohave kicked the bucket. It will boot up to a point then go through Windows 7 startup repair. The repair doesn't seem to find any issues, but it still won't boot.

I am wondering if anyone knows some good methods of isolating the problem, so I can try to fix it? I have gotten the programs in the Overclocking CPU/GPU/Memory Stability Testing Guidelines thread in CPU and Overclocking, but I can't figure out how to create a boot disk to actually run them on my PC.

I do have a nice Fluke 87 multimeter, but I am not sure how to troubleshoot the power supply. Any tips on how to troubleshoot my PC? I would greatly appreciate any help you can give. My PC components are listed in my signature. The C300 is the boot drive.
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
I just figured that maybe my BIOS settings were corrupted somehow, and I adjusted the storage configuration from IDE to AHCI, and now it boots. Although, I am still unsure if there is any hardware damage. I should be able to run those diagnostics I linked now though.

EDIT: I was a little quick on the trigger. The computer still crashes to a hard reboot after a couple minutes. I'll check if my fans are all spinning, as that was the issue the last time I had that issue.
 
Last edited:

mfenn

Elite Member
Jan 17, 2010
22,400
5
71
www.mfenn.com
Does your machine crash under stress tests or just normal usage? If it crashes during stress tests only, then the storm didn't necessarily do anything; there could have been a pre-existing problem that you didn't know anything.

While stress-testing, monitor your temperatures with HWMonitor. That'll give you some indication of whether or not the machine is resetting due to heat. Though that seems unlikely on a modern build, I'd expect it to throttle instead of reset.

Also, if your BIOS settings were reset to default after a power outage, your CMOS battery is most likely dead. It's a little coin cell on the motherboard, likely a standard CR2032.
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
Does your machine crash under stress tests or just normal usage? If it crashes during stress tests only, then the storm didn't necessarily do anything; there could have been a pre-existing problem that you didn't know anything.

While stress-testing, monitor your temperatures with HWMonitor. That'll give you some indication of whether or not the machine is resetting due to heat. Though that seems unlikely on a modern build, I'd expect it to throttle instead of reset.

Also, if your BIOS settings were reset to default after a power outage, your CMOS battery is most likely dead. It's a little coin cell on the motherboard, likely a standard CR2032.

Right now it crashes under normal usage. In fact it doesn't stay on long enough to even try to run anything. I'll change the battery though.

It does not appear to be a heat issue, at least not for the CPU. I'm not sure about the chip set though.
 
Last edited:

mfenn

Elite Member
Jan 17, 2010
22,400
5
71
www.mfenn.com
Since you have a P67, unfortunately you can't easily rule out the GPU. Try disconnecting any extra drives (HDD, ODD) and remove all but one DIMM (try several single sticks in succession). If it still crashes, try swapping in a spare GPU (maybe you have an older one laying around?). If it still crashes after that, you most likely have a PSU or motherboard problem.
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
Yeah. Since it was caused by a power outage, I kind of expect the PSU and Motherboard are damaged. Any power surge would have to pass through them first, although it's possible they weren't damaged by the surge. I'll try testing the memory first. Thanks for the advice so far!
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
I made a bootable CD to run Memtest86+ and so far I've come across 76 errors, and I'm only a third through the test. I may have a bad stick of ram, or maybe the memory controller is shot. I'm not smart enough to figure that out, but I'll try testing each stick individually to help figure that out.
 

crashtech

Lifer
Jan 4, 2013
10,695
2,294
146
The fact that it stays running long enough to test the RAM is a good sign. If there are that many errors, you may abort the test now, and start testing modules individually. The memory controller is on the CPU die. I suppose it is possible for it to be damaged and the machine still boot, but I see it as an unlikely event.
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
Ok. I ran each stick through one pass, and all 4 passed. Now I'm guessing it might be the actual slot? Maybe I should change around the settings and try then all again first.
 

crashtech

Lifer
Jan 4, 2013
10,695
2,294
146
If they all pass individually, the next step is to test one slot at a time. I presume you used the same slot to test all four? Now you must test the other three slots.
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
I ran 1 pass on each stick on slot 0, and they all passed. I next tried 1 pass on slot 0 and 1, with two sticks, and that passed. I next tried 2 passes with slot 0 and 2 (dual channel) and that passed. So I just ran all four sticks overnight, and I had one failure for 4 passes. I am quite confused.

20140802_075050.jpg


At least the error was in the same location, but I am unsure which slot that would be.
 

Deders

Platinum Member
Oct 14, 2012
2,401
1
91
Where you connected the the router via ethernet cable at the time?

lightning storm recently took out one of my ports and one on the router, the earth in seemed to direct the rest of the surge away from my computer but the ports that were connected via ethernet cable are now gone.
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
Where you connected the the router via ethernet cable at the time?

lightning storm recently took out one of my ports and one on the router, the earth in seemed to direct the rest of the surge away from my computer but the ports that were connected via ethernet cable are now gone.

No. I am connected wirelessly. The only connection I have is through my power cord.

I am able to boot into windows now, even though I didn't change anything other than moving the ram around. I'm running HCI Memtest86 now, and I found one error so far. "Pair 78406676 does not store values correctly"
 

crashtech

Lifer
Jan 4, 2013
10,695
2,294
146
At this point I would usually start substituting parts, starting with different RAM, but you may not have extra parts with which to do this. Looks like it is going to be a difficult problem to track down.

Actually, the best thing might be to test those sticks in another machine.
 

Ketchup

Elite Member
Sep 1, 2002
14,559
248
106
I would start with two sticks in slots 0 and 2 (dual channel) and see how it runs (just running normally, no memtest).
If it crashes, try slots 0 and 1
If that crashes, try slot 0 alone
If that crashes, try slot 1 alone
 

mfenn

Elite Member
Jan 17, 2010
22,400
5
71
www.mfenn.com
I ran 1 pass on each stick on slot 0, and they all passed. I next tried 1 pass on slot 0 and 1, with two sticks, and that passed. I next tried 2 passes with slot 0 and 2 (dual channel) and that passed. So I just ran all four sticks overnight, and I had one failure for 4 passes. I am quite confused.

20140802_075050.jpg


At least the error was in the same location, but I am unsure which slot that would be.

So if slots 0, 1, 2 all passed without errors, and now you're getting errors when populating slot 3, that means that slot 3 is likely the issue.
 

Ketchup

Elite Member
Sep 1, 2002
14,559
248
106
So if slots 0, 1, 2 all passed without errors, and now you're getting errors when populating slot 3, that means that slot 3 is likely the issue.

Good catch mfenn. Lightning is an amazing thing: one slot damaged, and nothing else along the bus, nearby bus, the CPU end of the bus or the power system (assuming this is indeed the extent of the damage).

For example, when I worked PC repair had a customer come in with computer issues: network connection went out, and that was it. It was due to lightning, and only that chip was fried: the rest of the south bridge and nearby components were fine.
 

westom

Senior member
Apr 25, 2009
517
0
71
I am able to boot into windows now, even though I didn't change anything other than moving the ram around.
Your symptoms are classic of numerous suspects including a power system problem. Nothing performed has exonerated or accused anything; nothing yet accomplished. Start by exonerating some suspects. IOW use your meter.

Set the meter to 20 VDC. Attach its black probe to the chassis. With computer powered off (but connected to AC mains), touch the red probe to a purple wire from PSU to where it connects to motherboard. IOW push its probe into the nylon connector to read a number somewhere around 5 volts. Record that number to three digits.

Do same for the green and gray wire both before and when pressing a power on button. Monitor each voltage as the power button is pressed. Record a voltage both before and as the power button is pressed - first for a green and then for a gray wire.

And finally measure any one red, orange, and yellow wire either during or after the power button is pressed. If any do not rise to a stable voltage, then note which ones rise or fall faster. Again, numbers must be to three digits.

Report those numbers to learn of many components in your power system. Either the system will be exonerated, or your strange symptoms (including those diagnostic reports) are explained.
 
Feb 25, 2011
17,000
1,628
126
Good catch mfenn. Lightning is an amazing thing: one slot damaged, and nothing else along the bus, nearby bus, the CPU end of the bus or the power system (assuming this is indeed the extent of the damage).

For example, when I worked PC repair had a customer come in with computer issues: network connection went out, and that was it. It was due to lightning, and only that chip was fried: the rest of the south bridge and nearby components were fine.
Many years ago, I had a power surge (blown breaker, not lightning) that knocked out my computer. One memory slot out of four died.

Six months later, another one stopped working.

Then some of the USB ports on the motherboard stopped working.

Slow death.
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
Alright,

I just completed a set of tests where I tested all 4 sticks in both dual-channel configurations. DIMM 0 and 2, and DIMM 1 and 3. In each, I tested stick 1 and 2, 1 and 3, and 1 and 4. This took most of the day. My results ended up with 0 errors.

This appears to have taken the ram sticks themselves from being the guilty culprit. It also appears to remove the individual DIMMS. I can guess that loading the IMC fully might be causing some issues when I fully populate the DIMMS? But I am not sure of that either. I'll have to think of other ways to remove possible issues.
 

crashtech

Lifer
Jan 4, 2013
10,695
2,294
146
Having all four DIMMs in is pretty taxing on the memory subsystem, and will reveal any weakness. As a stopgap, you could try overvolting the RAM slightly and see if that cleans up the signal. But clearly something has been damaged, better save up for some repairs. No doubt the problem will become more severe as time passes, as dave hypothesizes.
 

westom

Senior member
Apr 25, 2009
517
0
71
But I am not sure of that either. I'll have to think of other ways to remove possible issues.
Defective memory can work at room temperature (70 degrees). And fail hard at the other normal operating temperature (ie 100 degree F). That (and not testing for 24 hours) is called burn-in testing. Heat is a diagnostic tool that identifies defective hardware. Unfortunately, many blame heat (rather than hardware) for failures. Memory testing is best performed at the upper most and lowest temperatures.

Defective memory also does not correspond to your original symptoms. Also significant and a reason to not suspect memory was the pattern of your errors. Errors were various addresses and bits. Further implying a failure created by something elsewhere.

Described was what better trained techs do first since the 'foundation' of a computer is its power system. Do you also plane down sticking doors in a house? Or first inspect the house's foundation to identify and fix those doors? Same concept. Until its power system is known good (without doubt or speculation), then numerious strange and intermittent failures may cause you confusion.
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
Defective memory can work at room temperature (70 degrees). And fail hard at the other normal operating temperature (ie 100 degree F). That (and not testing for 24 hours) is called burn-in testing. Heat is a diagnostic tool that identifies defective hardware. Unfortunately, many blame heat (rather than hardware) for failures. Memory testing is best performed at the upper most and lowest temperatures.

Defective memory also does not correspond to your original symptoms. Also significant and a reason to not suspect memory was the pattern of your errors. Errors were various addresses and bits. Further implying a failure created by something elsewhere.

Described was what better trained techs do first since the 'foundation' of a computer is its power system. Do you also plane down sticking doors in a house? Or first inspect the house's foundation to identify and fix those doors? Same concept. Until its power system is known good (without doubt or speculation), then numerious strange and intermittent failures may cause you confusion.

I tested the memory first, because it was the easiest thing to test. I could run Memtest86 without bogging into windows, so I did. Since I found errors during my first run, I continued to test to narrow down the issue. So far, I have learned that the memory only produces errors when the slots are fully populated.

I'm checking the idle voltage on my PSU now. While it will be a relatively quick test, I won't be able to tell how clean the voltage is, nor will I have the loaded voltage. But it will give me some data. I am wary of my PSU anyway, since any power surge I had came through that component first. Unfortunately it is also one of the most expensive components to replace. Well I'm starting to put the horse before the cart now taking about replacements, I haven't even isolated the issue yet.

After I verify the idle voltage, I plan to increase VDIMM and see if that cuts out the memory failures. While that is likely only a temporary fix, it will give me some time to research new motherboards and PSUs.
 

westom

Senior member
Apr 25, 2009
517
0
71
I'm checking the idle voltage on my PSU now. While it will be a relatively quick test, I won't be able to tell how clean the voltage is, nor will I have the loaded voltage.
Follow the directions. Post those numbers. The reply will report on a long list of suspects INCLUDING how clean that voltage is.

What happens when all memory slots are in use? If a power system problem exists, than voltage does change enough to create intermittents.

Appreciate the generations of experience behind these recommendations. Also do not ignore comments about heat.
 
Last edited:

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
Well that didn't work. Without the MB connector connected the PSU won't turn on. I can probe the back plane, but that will require I remove the MB from the chassis and increases my risk of shorting signals. Any ideas on other ways to test the output from the PSU? I'm sure I'm missing something here.

In the meantime I'm going to fully populate the DIMMs and increase the VDIMM value by 2 notches, and see if that removes the memory errors