Random hard lockups - I give up!

Delitus

Member
Oct 27, 2004
54
0
0
Hi all,

So this is a problem which I have been trying to resolve for over a year and a half. My problem, as mentioned in the title, is random lockups - mouse stops moving, keyboard stops responding, sound loops, and the only next thing I can do is to hit the reset button. The lockups appear to happen at *random* - even while idling at desktop - although it happens *much* more frequently during graphical workload. It can be running 3DMark for 24 hours with no problems, and then suddenly decide to die. I notice this problem most frequently with the following activities: running PhysX-supported programs, watching movies, and having open multiple windows of Ragnarok Online (an MMO game). I also on rare occasions get random garbled artifacts on certain window elements (like the address bar), which temporarily disappear with a mouse click. Sounds like a video card problem, right? Alas, the problem appears to be more complicated than that.

My system specs, at the time when the issue was first noticed, were as follows:

PSU : Coolermaster Real Power Pro 1000W
Mobo: Asus Striker Extreme
CPU : Core 2 QX6850
RAM : 4x 1GB Corsair XMS2 Dominator DDR2-1066
GPU : 2x Leadtek GeForce 8800GTX
Running Windows Vista Ultimate x64.

Given the nature of the problem, I first suspected the video cards. I tried disabling SLI, and testing the cards one by one; the frequency of occurence was noticeably lower, although it continued to happen. Testing the two cards on a different computer, however, did not produce any problems at all.
As the cards ran rock-solid on a different computer, I decided to then check the stability of the overall system. Thinking that it could be a case of bad memory, I ran each memtest86 and Prime95 for 24 hours, only to have them report no errors.
Perhaps it was a weak power supply? I was skeptical at this point, as the above PSU *should* have no problems powering the above parts, and voltages appeared to be normal. I chose to give it a try anyway, and tested using an OCZ 850w with a minimal config - one card, one memory module, no optical drives, etc. Sure enough, the problem was happening less frequently but still occured at times.
I then checked the temperatures. The cards were around 85 degrees celsius at full load, which to my knowledge is acceptable for a G80. The CPU ran at around 55 degrees celsius with Prime95 stressing all four cores. The chipsets looked okay too, although I cannot recall the temperature now. I am using an Ultra-120 Extreme with two 120mm fans blowing into the chassis and two blowing out through the back.
It was after this point I began making various eccentric attempts in hopes of resolving the issue. I think I have covered just about every single possibilities - here is a list of what I have tried:

- scanning all hard disks for bad sectors
- fresh install of OS + service pack, trying both x86 and x64
- using the lastest, greatest drivers from NVIDIA
- both default drivers and latest nForce drivers
- testing using a different hard drive
- overvolting (but not too much!) CPU/RAM/chipsets, loosening memory timings, underclocking
- disabling/enabling various BIOS settings
- updating firmware/BIOS
- testing using two other motherboards (a different Striker, and a Striker II Formula)
- testing using different video cards (9500GS, 9800GTX+)
- testing a different brand card (XFX)
- cleaning the PCI-E slots
- using a different PCI-E slot
- using memory from the motherboard's QVL list
- trying a different keyboard/mouse
- unplugging every USB devices connected to the computer
- connecting to a UPS to smooth out possible bad power source
- Windows 7.

...and I am posting here, because none of them have worked. I am still stuck with this problem after almost two years, and have yet to find a cause. Given this time period, I think the possibilities of immature drivers/buggy Vista can be ruled out now. (This did happen on XP, too.) I was intending to use this PC for at least 4+ years, but its unreliable nature makes it nearly unusable. The reason why I cannot just return the damn thing is because... well, individually, every component appears to work just fine from what I have gathered, as above. The same cannot be said as a whole.

My question is... why? I hear of other people running similar configurations well, and even getting decent overclocks. I am not even *trying* to overclock - I just want a reliable system.

Please free me from this fiendish headache! I am offering $20 as a sign of gratitude to whoever can correctly identify and eliminate this problem, once and for all. (A new computer does not count! :])

Thanks in advance.

My current to-try list, from comments:
- different CPU. I will have to find another processor to test this.
- different PSU? So far, a Coolermaster, an OCZ and a Corsair were used for testing, each producing the same problem. Could have been three unlucky duds, although unlikely.
 
May 13, 2009
12,333
612
126
Have you tried updating the bios on your mobo?

Or updating drivers on all hardware including chipset?

Tried using another cpu in your system configured just the way it is, just a different cpu?

Tried using only 1 stick of ram? Different slots?

Checking to see that your mobo is not shorting out some how?

Checked all hardware to see any visible signs of damage leaking caps, burn spots, etc?

Have you tried putting all this same hardware into another case? Maybe your case has some wires crossed or something and when you hook your audio usb or whatever it has a short?

Go through checking your hardware over again. You had to of missed something.

 

Andrew1990

Banned
Mar 8, 2008
2,155
0
0
Hmm, do you have an old processor laying around like a Pentium 4 or Celeron you can test with. If you tried all of that I would test with a different CPU just in case. It is very unlikely but I have seen weirder.

 

videogames101

Diamond Member
Aug 24, 2005
6,777
19
81
Whoah, slow down man! I believe you ruled out PSU way to early, this sounds like a classic failing power supply.

The PSU could still be the problem, lowering load on the PSU might not reveal all problems, because PSUs can be borked even at low load. I'd try with a different PSU actually in your system before you rule it out. Considering fewer components = fewer lock-ups, everything points to the PSU.


(Unless it's your CPU, but the odds of that are absolutely minuscule)
 

Delitus

Member
Oct 27, 2004
54
0
0
OILFIELDTRASH:
Yes, I have indeed tried/checked all of those. My BIOS for the motherboard and the video card are the latest, and the problem has occured on an isolated testbed. Hardware appears to be in good condition, and I have already tried swapping out most parts to check. All cables are connected securely.

Blain:
I have two hard drives in a 4x 3.25" rack which is cooled by a 120mm fan blowing in. The two are spaced apart to allow adequate airflow.

Andrew1990:
Unfortunately, I don't and cannot test this. It would certainly be odd/surprising to see this work with a different processor though!

videogames101:
I have also tried an OCZ 850w PSU as above, but the problem still occured. I am pretty sure this is a power-related problem too, but I am stumped as the PSU has already been swapped out! I also once took it to the local PC shop for testing, where they used a Corsair PSU and replicated the freezing problem. They too seem to be having trouble finding the exact cause of this.

Thanks for all your replies. It does feel as if I have missed something simple - but for the life of me, I really cannot figure it out. Every suggestions/ideas are welcome!
 

Lunyone

Senior member
Oct 8, 2007
482
0
71
I've had almost similar problem with an AMD system and I'm wondering if its the CPU too.
 

Delitus

Member
Oct 27, 2004
54
0
0
No RAID; at the moment, I am only using a single hard disk.

In the meantime, I am seeking a replacement CPU with which I can test this.
 

videogames101

Diamond Member
Aug 24, 2005
6,777
19
81
This is nuts, it would seem then, that the CPU may indeed be the problem. You've ruled out everything else, so thats all you can try.
 

Delitus

Member
Oct 27, 2004
54
0
0
I've managed to get my hands on a P4 550, with which I'll be testing. I'll post my results when done.

Also - still open to suggestions! Please let me know of any ideas.
 

Andrew1990

Banned
Mar 8, 2008
2,155
0
0
Originally posted by: Delitus
I've managed to get my hands on a P4 550, with which I'll be testing. I'll post my results when done.

Also - still open to suggestions! Please let me know of any ideas.

If the cpu doesnt work out, then maybe try replacing the Sata cables?
 

VirtualLarry

No Lifer
Aug 25, 2001
56,009
9,877
126
Sounds kind of like my problem. PC passes all stability tests for 24hrs, but reboots sometimes, once it took 28 days, but it still rebooted. Don't know why. Seems to happen at 8x400, but doesn't happen at 8x350.

Seemed like a classic PSU instability at first, but I swapped PSUs and it still happens. Don't know what the gremlins are.

Btw, have you tried installing a 40mm chipset fan over the system chipset?
 

geneSW

Member
May 29, 2009
63
0
0
Random eh? Nothing is ever truely random... Something is causing these events... Ok, have you completely reseated everything? New cables both inside and out? Still sounds like a CPU problem....but I wouldn't quite rull out something as simple as a faulty power cable..
 

Beanie46

Senior member
Feb 16, 2009
527
0
0
I'd almost wager the problem lies in your motherboard and that wonderful nVidia 680i chipset. Maybe the heatpipes aren't quite making good contact with all the VRM's around the cpu. Maybe the problem lies elsewhere in the motherboard.....heck, even solid caps do fail and you've got those on your board. Maybe your NB is overheating and not registering with your monitoring software. Heck, who knows.....but I'd honestly try out a new mb, any kind that most of your present components can fit onto, and give that a try. Even an el cheapo Intel-chipset based ECS mb with one video card slot would rule out the mb as your source.....and you might be surprised when the problems cease. :)

But given that you've tested pretty much every part, and I seriously doubt it's the cpu as failures with them are so exceedingly rare, that I'd spend more time looking at the motherboard.
 

Absolution75

Senior member
Dec 3, 2007
983
3
81
Originally posted by: Beanie46
I'd almost wager the problem lies in your motherboard and that wonderful nVidia 680i chipset. Maybe the heatpipes aren't quite making good contact with all the VRM's around the cpu. Maybe the problem lies elsewhere in the motherboard.....heck, even solid caps do fail and you've got those on your board. Maybe your NB is overheating and not registering with your monitoring software. Heck, who knows.....but I'd honestly try out a new mb, any kind that most of your present components can fit onto, and give that a try. Even an el cheapo Intel-chipset based ECS mb with one video card slot would rule out the mb as your source.....and you might be surprised when the problems cease. :)

But given that you've tested pretty much every part, and I seriously doubt it's the cpu as failures with them are so exceedingly rare, that I'd spend more time looking at the motherboard.

I'm going to agree with this. Sometimes if you put a certain set of components together, they just don't work all that well.

Built a pc for a friend, fairly mid range at the time. Had an ECS-650i (680i's were way to expensive at the time, 650i was a good price for a SLI enabled board - he had 2x 6800GT's still) in it. I changed every piece of hardware in it (even RMA'ed the MB). It was never 100% stable. Random lockups at least once a day.

He finally built a new system and kept the old one for a backup pc. I sold him my 680i for a couple cases of beer when I got my gigabyte board (current). The PC now never locks up.

Definitally bad choice of original chipset. I definitally will never pay a premium for a SLI board and hense will probably never go SLI again (for any PC I build).
 

Lunyone

Senior member
Oct 8, 2007
482
0
71
I have an AMD x2 5000+ BE CPU system that I built for a friends relative. It can run all of the benchmarks and stress tests without a hitch, but when the owner plays some games, it locks up at random times. I originally thought it was the memory, because it was 2.1v memory and the mobo only supported 1.8v max!! So I got another mob that supported up to 2.2v's, but that didn't change anything. I even went out and got 1.8v memory to see if that changed anything, but still no changes there. I still think the CPU/GPU are at fault, but don't know what to do at this point. Maybe his Seagate HD 7200.10 has some issues that I'm not aware of, but hard to say at this point since I don't have easy access to the system for testing.