Shutdowns while playing games

pol II

Member
Oct 4, 2004
173
0
0
Please help. My computer crashes inexplicably when playing games on an SLi system. The crashes go straight to shut-down; no BSOD, no nothing. It is as if the plug was simply pulled from the wall and all power is lost to the system. It requires a hard reboot. Game/graphics titles include UT (1999 DX or OGL rendering), HL2, Doom3, 3D mark. These crashes happen with minimal installed software/hardware; although I have not experienced crashes with either single video card. I had originally attributed it to Creative drivers of my old Audigy 2 ZS, but it happened even after reinstall of OS and no sound card (or drivers). The case is open and airflow is unrestricted. I cannot determine anything that is common across crashes, i.e., it can crash after 2 min of UT or after 2.5 hrs of HL2.

Obviously, my principle suspect is SLi...

ASUS A8V-SLi deluxe BIOS 1007
AMD FX-53 @ 2.41 (HTT = 201)
2x geForce EVGA 7800GTX (450/1.2)
2X 512 Ballistix single-sided PC3200 @ 201MHz, 2.5-3-3-6-2T
HDA Mystique 7.1 soundcard
WD Raptor 74GB 10K RPM
Lite-On DVD ROM
PSU is an Enermax Noisetaker P series EG701P-VE SFMA Rev. 2.0 600w
Thermalright XP-120 w/ 120mm Papst fan @7 -- 12v (rheostat)

6.53 NVIDIA chipset drivers (also crashed with less-stable overall 6.66)
77.77 display drivers (also happened with two previous driver releases on two 680 0Ultras and on two 7800GTX)

BIOS is 1007, but had 1013 and 1002, which crashed, too.
PnP OS set to "no" in BIOS

Both video cards have been replaced.
I am on my third motherboard (RMA's the first two due to faulty mem controller; SATA controller).
Memory has been replaced.
I have used both water and air to cool the CPU, in both cases, crashes occur when temps are well within normal limits.
CMOS has been reset as many times as XP has been installed (~10x)

Software Highlights:

Windows XP SP1 (Also crashes with SP2)
Logitech mouseware
Norton internet security (also crashed with free AVG and Zone Alarm)
I have not disabled/tweaked any Window services.

Troubleshooting highlights:

Temps are well under max for all components (software monitoring)
CPU passes Prime95 >12hrs.
RAM passes memtest >8hrs.
Rails are stable, as determined from digital multimeter. Has crashed while I was watching multimeter and running rthdribl; 12v and 5v rails solid.
Passes DxDiag, including direct draw and D3d 7 - 9
HDD passes long Western Digital Diagnostics
Sound card shares IRQ #18 with video card #1 - have deleted soundcard in Device manager and reinstalled; XP still assignes to shared IRQ -crashes happen without sound card or drivers after a fresh install anyway.

This has been a problem for me going on 6-months now, along with a number of returns and exchanges. I don't really want to dump any more $$ into this, and I am getting sick and tired of this.

Any suggestions appreciated. I have done some more troubleshooting, but I want to let you guys know that heat and components seem to be okay, when tested. Thanks.

Are there any programs out there that could shed some light (diagnostically) on this? Some sort of log that would have an entry milliseconds before shutdown? Windows Event viewer is not helpful in this respect.
 

mechBgon

Super Moderator<br>Elite Member
Oct 31, 1999
30,699
1
0
What brand & model of power supply is it?

edit: while I'm asking, let me ask the usual sto0pid thing... is your EZ Plug hooked up? Also, which PCI slot is your sound card in, and what voltage are you giving the RAM?
 

pol II

Member
Oct 4, 2004
173
0
0
Hi, the PSU is an Enermax Noisetaker P series EG701P-VE SFMA Rev. 2.0 600w. Dual 12v Rails (18a 12v1; 17a 12v2). The EZ plug is hooked up, and the sound card is in the lowest PCI slot (according to mobo manual, should not share with anything else). The RAM is running on auto-detect; so I am not sure the value. I wanted to keep things as close to BIOS default as possible with the exception of the HTT. The comp will crash whether HTT is at 200 or 201, unfortunately. I have run the RAM at SPD (2-2-2-8-1T) at 2.8v without incident, except for the crashing. I have just been letting the BIOS decide what to do with the memory this past month(s) as I troubleshoot.

I have read some old rumblings about the first consumer dual 12v-rail Enermax PSUs having "Issues" with the A8N-SLi deluxe, but all seems well with only one video card in.
 

mechBgon

Super Moderator<br>Elite Member
Oct 31, 1999
30,699
1
0
The first reasons I can think of that a system would completely shut down as if the plug had been pulled out of the wall:

1) the plug WAS pulled out of the wall :D

2) I put my foot on the On/Off rocker switch on my surge supressor :eek:

3) My uninterruptible power supply couldn't handle the load when the utility power took a dip and forced the UPS to take over, so the UPS kicked out

4) The PSU overheated and kicked off. Is your Noisetaker's rear fan cranked up to maximum? If not, that would be easy to try as a fact-finding step. You got some serious 12V demand going on there, maybe it needs more cooling.

5) The CPU overheated, with help from two high-end video cards heating the air around it, and told the mobo to shut down. You could always try cranking up the Papst to full voltage as a fact-finding step, if you haven't already.


Sorry for the silly long-shot ideas, but I imagine you're willing to entertain any suggestions at this point :eek: Also, I would go right ahead and keep the memory at 2.8 volts, that's what Crucial specs it for if I recall correctly. But I don't see that causing complete power-offs.
 

pol II

Member
Oct 4, 2004
173
0
0
All your suggestions are well-received, thanks :) I have thought about PSU overheating, and the fan is cranked to the max. However, the "max" is meant to be silent, not necessarily move tons of air; this ain't no Delta. I tried blowing the dust out of it last week, and I'll do it again. I think I'll also open it again and spray the innards with some compressed air if things look dirty. I was focusing a bit on software on my own mind (driver for SLI immaturity, even though it has been out ~1 year), but the idea that the PSU could be suspect makes absolute sense. At this point I am leaning towards selling one card; I may lean that way even more if cleaning the PSU has no effect. One option is to get a well-cooled PSU with a decent warranty, e.g., PCP&C. But that is ~ $220; couple that with the possibility of selling one of these GTX'x at $350 or so, and I am looking at a $570 differential...

To address your bulleted suggestions:

1. The plug was not pulled out of the wall; I have checked many times because I am the kind of guy that would indeed forget to check something like that if I did not make a list :p

2. Same with the surger supressor; obvious suggestion, but I am sure it happens to many folks at one time or another (it has to me before).

3. No UPS now; I returned the one I had in order to save up for a better model.

4. Addressed above, I like this suggestion; definitely a good one to try for myself and anyone in the future who finds this thread on a search.

5. Have done so already, and have watched the comp shut down with load temps ~44 C (running prime95 small fft and rthdribl).

Thanks for the tips, I'll update soon when I clean the PSU and give it some time to test...
 

mechBgon

Super Moderator<br>Elite Member
Oct 31, 1999
30,699
1
0
If it were me, I'd also consider taking the mobo out of the case, laying it on cardboard, threatening it a bit, and running it isolated from the case. That's an even less likely cure, but what the heck, you could leave it that way overnight and see what happens (besides the neighbors wondering why their TV reception went wild ;)).
 

pol II

Member
Oct 4, 2004
173
0
0
Well, it appears that this problem may be RAM-related, in some way or another. The system is running on a test bench, outside the case with just the CPU, one stick of RAM, a single video card, and DVD-ROM. All non-essential controllers, ports, etc have been disabled in the BIOS. As of late, the computer has decided to refuse to soft boot. I can tell it to reboot through windows, or I can enter the BIOS and exit; in either case, the computer will not turn back on *most* of the time. It does like to tease me though. The HDD, fans, etc will spin up, but no video will appear and I cannot access a thing unless I cut power completely by turning off the PSU. I ended up switching sticks of RAM, and it rebooted six times in a row; unfortunately, after that, it will boot only 1 time out of ten or so.

When it does boot, I have been testing it. Either stick of RAM passes memtest86 for > 9 hours at either SPD timings (2-2-2-8-1T) or at board-given timings (2.5-3-3-8-2T). The CPU/mem will pass prime95 >15hrs at mobo-assigned frequency and voltages (2.41GHz, 1.525v) or at factory specs (2.4GHz @ 1.5v). It will pass prime95 in either blended mode (some RAM tested) or small FFTs. No overheating anywhere (~50 C CPU load).

This is the most ornery, obstinate, and bull-headed thing I have ever yelled at!

So, forgetting all about crashes during games at this point; I simply want to be able to reboot without having to cycle the power. I have since reflashed the BIOS (floppy method, not Windows method), reset CMOS, jumper, etc. I wonder if this nascent problem of booting is just the latest in a series of symptoms that underlie something more basic. What this "something more basic" could be, I don't know other than a few things:

1. CPU memory controller - should not pass memtest or prime95 if this is bad, but I would like to know if there is some other way to test whether this is the problem

2. Memory itself - although on ASUS's QVL list, perhaps there is something I am missing

3. Uh, a FOURTH bad motherboard? ASUS claims that they test their mobos for 3-hours before sending out for RMA; but my second RMA of this model had no working PCI slots and a bum SATA controller, so it is painfully obvious that at least a few slip through the cracks. I have spoke with ASUS several times over the phone (totaling several hours), and they don't want to lay blame on the motherboard....AFTER I tell them that it would be my fourth one (they suggested RMA until they found out).

If anyone has any suggestions of what to try, I welcome them, no matter how "simple" they may seem. Also, any ideas whether ther are some additional tests out there to test CPU/memory controller would be helpful for me to determine whether the CPU is, at least in part, at fault. Thanks :)
 

CyraKrin

Senior member
Dec 25, 2003
523
2
0
take the side off your case and try running it. I've had similar problems, all due to overheating- especially when converting video- and I had to resort to getting a fan and putting next to my box to keep it cool.
 

pol II

Member
Oct 4, 2004
173
0
0
Hello, and thanks for the suggestion. The components are running outside the box on a test bench, with very good airflow. CPU temps max around 50 C, and video around 80 C. Memory is lukewarm to the touch.

I'm on the phone with ASUS lvl 3 tech support now and I'll report back any resolution.

Edit: Left a message (second message left in as many days). I doubt I'll get a callback. Still soliciting suggestions.