CPU/motherboard and hard disk failed - why?

Cato2

Junior Member
Jul 31, 2010
6
0
0
Not sure if anyone can help with this, as it's tough to investigate after the fact... but the short version is that a PC that I built 4 months ago had a major hardware failure in at least 2 components that should be independent. I'd really like to know why, and whether I can prevent a recurrence.

I diagnosed and rebuilt the PC by component swapping after this failure, and narrowed down the failing components to:

- CPU or motherboard (Core i3 530 and Gigabyte GA-H55M-UD2H)
- Hard drive (Samsung F3 1TB, HD103SJ)

The PC was on an APC UPS (Back-UPS) - don't have any record of power events as the UPS was switched off for quite a while. I could find out from other UPSs in the house if there was anything.

After the failure (seemed to be after a 0x3B BSOD but unclear how that's involved), the system would not boot - PSU powered up fans and the 4 LEDs lit up on motherboard, and about half of the time I would get a single beep on PC speaker, and half of the time no beep. Then it would reboot after spinning fans for a few seconds. Looked a bit like a RAM problem, but removing CMOS battery for a few minutes didn't help.

POST code shown on a Startech PCI card (added after failure) was 00. I tested with different CPU fan (stock), removed all SATA/PATA connections, removed 1 RAM stick, etc. Also tried a PCI graphics card, which the BIOS defaults to.

I ordered new instances of the same model PSU, RAM, CPU and motherboard. I built with new CPU+motherboard, testing with the new PSU and old PSU, and the old RAM, and it booted OK (though only 1 RAM stick). I didn't want to blow the new CPU or motherboard so I haven't tested them against the old ones.

The hard drive seemed to work for the C and D partitions, and I booted Win7 using this drive with the rebuilt PC, then realised that some parts of the disk were generating large numbers of unrecoverable read errors. The errors only under Ubuntu, and under Win7 there were no errors in event log about the hard disk, only the BSOD. However, while in Win7 I did have another BSOD (0x7A)

I've now replaced that drive with a new one of same model.

I have a complete backup of the C and D drives of the PC as of 1 hour before the BSOD, so I can see most of the event log up to that point. (Image backups using the excellent ShadowProtect.) I've also got the original BSOD (0x3B) details from the event log.

A power event seems the most likely cause, but the APC UPS includes AVR that should stop that (even though the battery is now 3 years old and the PC is powered up 24/7).

If it was 2 independent failures, I would have thought the hard drive would have shown progressively more read errors, as it did immediately on Linux afterwards. If it was the CPU/motherboard failing first, how could that create bad blocks on the hard disk?

Any ideas at all on what might have happened? Your help would really be appreciated on this one, as I'm completely at sea here...

Full components (original and rebuilt PC):

- Intel Core i3 530 2.93GHz
- Gigabyte GA-H55M-UD2H
- RAM: Corsair CMX4GX3M2A1600C9 - 4GB (2x2GB) XMS3 DDR3 PC3-12800 (1600), Non-ECC, Unbuffered, CAS 9-9-9-24, XMP, 1.65V
- Hard drive: Samsung HD103SJ - Spinpoint F3 1TB
- Optical drive: Samsung SH-S223C DVD writer
- Antec P183 Case
- Corsair CX400W PSU - 400W
 
Last edited:

ox246xo

Member
Jul 30, 2010
26
0
0
Hmm. Seeing as how the culprit is either the old cpu or motherboard AND the old hard drive is failing I'd have to say its either a power problem of some sort or its just coincidence that the hard drive is failing too.. though as you said I'd think a failing hard drive would be progressive :hmm:

I'm honestly surprised that the old PSU is working with the new components. Unless it has some odd intermittent problem that blew some of the old components and could potentially harm the new ones.
 

Bartman39

Elite Member | For Sale/Trade
Jul 4, 2000
8,878
51
91
I agree it seems more likely its a power event and just because your backup unit has AVR it does not mean its working correctly... Also if thats the case (faulty backup unit) then you may have bigger problem like possibly a loose neutral at the meter or main breaker panel... This is very common and could even have loose neutral connection coming from the power pole or at the weatherhead which I have seen this before as well... Any other strange electrical issues in the home...?
 

ItsAlive

Golden Member
Oct 7, 2005
1,147
9
81
Sorry to hear about your PC troubles. I have a couple of quick questions.

1) Were you overclocking at the time of failure?

2) Does your motherboard have a Lotes or Foxconn CPU socket?
If it is Lotes the logo should be printed on the lid of the CPU socket. If the lid is bare then its likely foxconn.

3) You say you tested with 1 ram stick, did you find one stick to be bad?


Some guys at XS found a flaw in some 1156 motherboard equipped with foxconn CPU sockets. I dont know if this pertains to your specific situation or not, but very likely could be the culprit. Have you disassembled the CPU from the motherboard and inspected the pins/socket for burns or signs of melting?

Foxconn skt. 1156 flaw
 

Cato2

Junior Member
Jul 31, 2010
6
0
0
Hmm. Seeing as how the culprit is either the old cpu or motherboard AND the old hard drive is failing I'd have to say its either a power problem of some sort or its just coincidence that the hard drive is failing too.. though as you said I'd think a failing hard drive would be progressive :hmm:

There were a LOT of errors on the drive - I gave up with GNU ddrescue (which is very good at recovering failing disks) after 120 plus errors in 34 GB out of 1TB. So I think it must be a common event that hit the CPU/motherboard and the hard drive.

Incidentally there was a 250 GB PATA drive in the PC which survived all this, and was the source of backups (I also have the backups on a server).

I'm honestly surprised that the old PSU is working with the new components. Unless it has some odd intermittent problem that blew some of the old components and could potentially harm the new ones.

If the common event was a really bad power surge, I'd have thought the PSU and the UPS would both be fried, but they seem OK... I should really have swapped in the new PSU, but it was the end of a very long three days diagnosing, rebuilding and recovering data.

One thing I've learnt is to keep spares since this is a key work PC - if I'd had a full spare PC ready to go, I could have recovered everything from backup in about 2 hours and then diagnosed the fault out of band.
 

Cato2

Junior Member
Jul 31, 2010
6
0
0
I agree it seems more likely its a power event and just because your backup unit has AVR it does not mean its working correctly... Also if thats the case (faulty backup unit) then you may have bigger problem like possibly a loose neutral at the meter or main breaker panel... This is very common and could even have loose neutral connection coming from the power pole or at the weatherhead which I have seen this before as well... Any other strange electrical issues in the home...?

The backup unit (UPS) hasn't been self-tested since the failure, so I will do that - good point.

Electrical issues generally in house - good point, here's what I know:

- We have quite flaky mains power supply into the home (this is a rural part of the UK) - brownouts or short blackouts (usually just a few seconds, sometimes 10 minutes) occur every 3-6 weeks approx, hence all the UPSs. The failure happened at 1am so it's hard to know if this was an issue, though I could check my other UPS (also an APS RS model).

- The iron did cause the fuse box (RCD, "consumer unit" in UK terms) to trip a couple of times 9am the morning after the failure, which may indicate something isn't right about this unit. The fuse in the iron's plug didn't blow. However, this doesn't happen very often, maybe every few months.

- I have a server on a line conditioner - this didn't crash at the time of the main PC failure, only 8 hours later when the iron caused the RCD to trip.

So, maybe something is going on with the electrics - not sure what I should ask an electrician to check though.
 

ehume

Golden Member
Nov 6, 2009
1,511
73
91
So, maybe something is going on with the electrics - not sure what I should ask an electrician to check though.

As bartmans39 suggested, have him/her check the integrity of your ground, neutral, etc. Electricians can be lifesavers.
 

Bartman39

Elite Member | For Sale/Trade
Jul 4, 2000
8,878
51
91
not sure what I should ask an electrician to check though

The neutral line for sure... Dont know about the UK but figure they have the same basic 2 hot lines coming in and a neutral line which gives 2 phase power (I know voltage and hz is different over there) but high consumption stuff like A/C units and dryers use the 2 phase connection while most everything else uses a single phase line... You can have them check the voltage on both phases coming in and they should be within 2 volts of each other... Then also they should check the continuity of the neutral line... A good electrician should know what to look for... :thumbsup:
 

Cato2

Junior Member
Jul 31, 2010
6
0
0
Sorry to hear about your PC troubles. I have a couple of quick questions.

1) Were you overclocking at the time of failure?
No, just using stock clocks with RAM settings on the Corsair XMP profile. In fact I think it was about 2.8 GHz CPU clock.

I was using an aftermarket HSF (Zalman CNPS10X Performa) with 120mm fan, and CPU temps were quite low, about 20-25C at idle. However, after the failure I noticed the fan wasn't spinning up initially - possibly due to dust or a resistor that reduced its voltage to 7V or so. Connecting an Intel HSF to the CPU fan connector (but not replacing the Zalman HSF on the CPU) didn't make the original PC boot though.

I did have a lot of problems originally when building with OCZ Gold RAM, which I RMAed, and I didn't keep detailed notes on the Corsair replacement, but I tried for Vdimm of 1.64V and QPI/Vtt of less than 0.5V less than that. Speed should have been 1333 MHz, though it's just possible I didn't pay attention when the XMP profile set speed to 1600 MHz.

Using the old RAM in new setup, I am not able to get both sticks working - tried various things including failsafe defaults etc - and even getting 1 stick working at 1.65V and Vtt of 1.3V or so is quite difficult.

2) Does your motherboard have a Lotes or Foxconn CPU socket?
If it is Lotes the logo should be printed on the lid of the CPU socket. If the lid is bare then its likely foxconn.
The lid is bare - was aware of the Foxconn socket issue when I bought the motherboard, but I thought that had been resolved by then (end March this year).

3) You say you tested with 1 ram stick, did you find one stick to be bad?
I was mainly focused on getting the whole system to boot, so I only put 1 stick in. I did run memtest86 for 3 hours on that stick without errors. However I haven't tested the other stick to see if it's bad or not, due to time constraints.

Some guys at XS found a flaw in some 1156 motherboard equipped with foxconn CPU sockets. I dont know if this pertains to your specific situation or not, but very likely could be the culprit. Have you disassembled the CPU from the motherboard and inspected the pins/socket for burns or signs of melting?

Foxconn skt. 1156 flaw

Good question - just checked the CPU and socket and they look fine, no burning or melting.

Finally, just wanted to say thanks to all who have replied so quickly, and with such good questions!
 

Cato2

Junior Member
Jul 31, 2010
6
0
0
The neutral line for sure... Dont know about the UK but figure they have the same basic 2 hot lines coming in and a neutral line which gives 2 phase power (I know voltage and hz is different over there) but high consumption stuff like A/C units and dryers use the 2 phase connection while most everything else uses a single phase line... You can have them check the voltage on both phases coming in and they should be within 2 volts of each other... Then also they should check the continuity of the neutral line... A good electrician should know what to look for... :thumbsup:

Thanks for those tips - will book an electrician in ASAP. This might be of interest for those who understand this stuff: http://en.wikipedia.org/wiki/Electrical_wiring_in_the_United_Kingdom

The UPS on the PC that failed has just passed a self-test, which the APC software is supposed to run every 2 weeks anyway. Its records show last time it cut in was 14th June (I guess the software records these events, the UPS is fairly dumb and doesn't record events when PC is off).

Summary so far based on all the input here:

- electrics in house may have a problem
- PSU may have an intermittent fault, will replace with the new one
- UPS appears OK
- the Foxconn CPU socket and CPU seem OK - no sign of burns

A common event seems most likely, but I still can't work out how a power event could be so bad that it fries the CPU/motherboard and hard drive, passing through good quality UPS and PSU, without frying them... Seems very odd.

Curious if anyone has ideas about the hard drive - still need to run the Samsung diagnostics on it, but it was electrically OK, as I could boot Win7 off it without errors on day 2 with the new CPU/board. However, a little later on day 2, I tried a Win7 repair install, booting off this drive, and it failed to complete the boot a few times. So perhaps it was deteriorating rapidly, or perhaps it's just that Ubuntu and Win7 recovery files were worst hit by bad blocks.

I'm going on a trip next week but will certainly respond as and when I have Internet access, possibly from my new Android phone which is a nice distraction.

When I get back, I will try the old CPU in a cheap P55 Intel desktop board I got off eBay, and see if it blows that :)
 

Arkaign

Lifer
Oct 27, 2006
20,736
1,377
126
It would really really be odd if the CPU was dead. Mobos die 100x more often at least than CPUs, for all kinds of reasons, but CPUs very rarely bite the dust, and usually only as the result of extreme overclocking/overvolting.
 

Arkaign

Lifer
Oct 27, 2006
20,736
1,377
126
Oh, and Samsung HDDs are generally awful :( It's odd really, most Samsung products are quite good, but their drives just aren't.
 

rickon66

Golden Member
Oct 11, 1999
1,823
15
81
I would take a long look at your PSU. Just because some voltage is apparent(lights on MB and fan spinning) does not mean that it is good. I have been working on computers for almost 20 years and PSU's are the place I look first, especially when components have been fried.