Why do I now BSOD in every game, but never during a stress test?

Centauri

Golden Member
Dec 10, 2002
1,631
56
91
I'm suspecting I damaged my APU my overvolting the NB too aggressively in pursuit of a GPU overclock; every game either quits or BSODs on me within a few minutes.

What I don't get is why, if I've actually done hardware damage, my system has no problem running AIDA's system stability test, which taxes everything - including the GPU. Likewise for F@H or BOINC.

Is there something a game does to a GPU that a desktop app pushing the GPU to full clock doesn't do?

Pretty sure I've ruled out software/driver issues as I've gone as far as wiping my drive and doing a fresh install with very little 3rd party software, on top of only Windows Updates.
 
Last edited:

WaTaGuMp

Lifer
May 10, 2001
21,207
2,506
126
I just had a BSOD coming out of sleep mode. I have had 2 so far this week, not sure what caused them. I have had zero lockups gaming, but plenty of game stopped responding. These have been mostly in Black Ops 2 and BF3. No errors in memtest, no errors running Prime, no errors or locks when I ran Kombuster. Maybe just bad coding for the games, I dunno. Running 13.3 beta drivers also. Heck, I even had the drivers stopped responding and recovered scrolling here.
 

Centauri

Golden Member
Dec 10, 2002
1,631
56
91
What happens when you run heaven?

I let it run for a good 15 minutes. Headache inducing frame rate from an APU, but it was perfectly stable for a good 15 minutes until I stopped it manually. Ultra quality, Extreme tessellation and 8x AA too.

And that's what's weird, because I'm hugely overclocked at the moment with regard to the northbridge; 2200MHz clock, up from 1800, 2133 RAM in lieu of the 'supported' 1866, and a GPU clock at 1169MHz from 800MHz.

Stepping back down doesn't seem to make any difference in stability either, as I've already tried it.

I just had a BSOD coming out of sleep mode. I have had 2 so far this week, not sure what caused them. I have had zero lockups gaming, but plenty of game stopped responding. These have been mostly in Black Ops 2 and BF3. No errors in memtest, no errors running Prime, no errors or locks when I ran Kombuster. Maybe just bad coding for the games, I dunno. Running 13.3 beta drivers also. Heck, I even had the drivers stopped responding and recovered scrolling here.

Yeah, sounds similar. Except I haven't yet found a game that will run for more than a few minutes as of this weekend.

I'm 13.3 Beta as well, but I even used AMD's removal tool and then reinstalled all of their chipset drivers and fell back to the 13.1 official release.

No difference.

The few times I've caught a BSOD, it's been about 'atikmpag' or 'atikmdag', I forget which.

Even went through all of this a little while ago; http://windows7themes.net/fix-atikmpag-sys-atikmdag-sys-blue-screen-errors-bsod.html

No help.

Do you have a sound card ?

Nope, just the on board.
 
Last edited:

BallaTheFeared

Diamond Member
Nov 15, 2010
8,115
0
71
What about underclocked with a slight voltage bump?

I only ask because you're concerned about possible damage.

Also did you reset manually, or via clear cmos?
 

max347

Platinum Member
Oct 16, 2007
2,335
6
81
What about underclocked with a slight voltage bump?

I only ask because you're concerned about possible damage.

Also did you reset manually, or via clear cmos?

Yeah, I would reset the cmos if I were you and then start testing from there.
 

Centauri

Golden Member
Dec 10, 2002
1,631
56
91
I'll screw around more with BIOS settings in the morning, but with the settings as they are I can run Heaven, AIDA + Prime (simultaneously) and DC clients until the cows come home.

Yet I just finished reinstalling StarCraft II and immediately upon launching, at a default res of 1024x768 with everybody detail option in the toilet; freeze, hang, freeze, hang CRASH.

I just can't understand how I can run apps without issue that stress the hardware more than any game but almost immediately after launching any game, regardless of quality settings, I have to force a restart, BSOD or get an App Has Stopped Responding notification.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Might use different instructions. Remember most stresstests only covers a handful of instructions.
 

BallaTheFeared

Diamond Member
Nov 15, 2010
8,115
0
71
If you feel you've ruled out your hardware, than that leaves drivers.

If you've already gone and done a clean driver install, I'd suggest backing up your data and trying a fresh OS install.
 

bgt

Senior member
Oct 6, 2007
573
3
81
I remember to have once OCed my PC(2500K) far out of spec and it totally ruined the Windows setup. After reinstall...............no probs.
 

bgt

Senior member
Oct 6, 2007
573
3
81
Might use different instructions. Remember most stresstests only covers a handful of instructions.
I've found the same with my 7950. Trying to run 1400/1600 settings it only crashes in games, not in benches. So now I found a "best" average setting of 1250/1250. Since I've got a better PSU(Seasonic 650W series X) I can do almost anything in 1300/1500 setting............only the powerdraw:eek:
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
Most blue-screen crashes are caused by memory errors. The stress tests might work the card out very heavily, but not hit near as wide a range of memory addresses as a real game when it's loading and swapping textures.
 

Arkadrel

Diamond Member
Oct 19, 2010
3,681
2
0
Centauri

Blue screen of death = software saying theres a error.

That can be hardware, but it can also be software.

You could attempt a reinstall and see if your system is still instable.
If after you downclock + give a tiny overvolt, issue is still there.
Maybe you corrupted some system files that effect windows stability.
Thats why you get BSOD when you game! and not when you stress test.


When your gameing, your useing alot of Directx and gods know whatever else files from windows.
While your stress testing less so.

Solution might be its a software issue, and not a hardware one.
 

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
Benchmarking is normally GPU heavy, but because the game is on rails there is normally very little going on to update the game world and hence the CPU is not being stressed. But when you run a game the CPU is being stressed quite a bit and its then you start to see instability associated with the CPU and its memory subsystem that starts to show.

One thing to always be aware of is that the maximum CPU overclock that is stable for benchmarks and such is normally higher than is stable with the GPU as well. The graphics pipeline stresses different parts of the CPU's instruction set and you often need to come back from the edge to get the whole machine stable. Conversely the GPU overclock is often higher than what can be achieved with a CPU overclock as well.
 

Rvenger

Elite Member <br> Super Moderator <br> Video Cards
Apr 6, 2004
6,283
5
81
1.5v on the northbridge shouldn't hurt anything is there's adequate cooling. I wouldn't relate that to degradation just yet.
 

flexy

Diamond Member
Sep 28, 2001
8,464
155
106
Get OOCT

http://www.ocbase.com/

And FIRST test your CPU & mem, (do medium OOCT test and large OCCT test), I suspect it's a more complicated issue (NB data transfer)..rather than your card itself. As is my understanding, medium/large OCCT test can give you some idea if its something with the NB.

Don't focus just on the card, the fact that Heaven etc runs alright makes me suspicious. I also assume you already tested all standard components of your system, memory etc.?
 

flexy

Diamond Member
Sep 28, 2001
8,464
155
106
Centauri

Blue screen of death = software saying theres a error.

That can be hardware, but it can also be software.

You could attempt a reinstall and see if your system is still instable.
If after you downclock + give a tiny overvolt, issue is still there.
Maybe you corrupted some system files that effect windows stability.
Thats why you get BSOD when you game! and not when you stress test.


When your gameing, your useing alot of Directx and gods know whatever else files from windows.
While your stress testing less so.

Solution might be its a software issue, and not a hardware one.

agree!

CMD
--> SFC /scannow

Do a system file check.
 

Centauri

Golden Member
Dec 10, 2002
1,631
56
91
Thanks for the suggestions everybody. Though as I mentioned in the OP, I've already wiped my hard drive and done a clean install.

And here's a crash dump analysis from when the system crashed as I installed software updates overnight;

--------------------------------------------------------------------------------
Crash Dump Analysis
--------------------------------------------------------------------------------

Crash dump directory: C:\Windows\Minidump

Crash dumps are enabled on your computer.

On Fri 3/29/2013 5:30:55 AM GMT your computer crashed
crash dump file: C:\Windows\Minidump\032813-10920-01.dmp
This was probably caused by the following module: atikmpag.sys (atikmpag+0x98A4)
Bugcheck code: 0x116 (0xFFFFFA80073684E0, 0xFFFFF880043568A4, 0x0, 0x2)
Error: VIDEO_TDR_ERROR
file path: C:\Windows\system32\drivers\atikmpag.sys
product: AMD driver
company: Advanced Micro Devices, Inc.
description: AMD multi-vendor Miniport Driver
Bug check description: This indicates that an attempt to reset the display driver and recover from a timeout failed.
A third party driver was identified as the probable root cause of this system error. It is suggested you look for an update for the following driver: atikmpag.sys (AMD multi-vendor Miniport Driver, Advanced Micro Devices, Inc.).
Google query: Advanced Micro Devices, Inc. VIDEO_TDR_ERROR



On Fri 3/29/2013 5:30:55 AM GMT your computer crashed
crash dump file: C:\Windows\memory.dmp
This was probably caused by the following module: atikmpag.sys (0xFFFFF880043568A4)
Bugcheck code: 0x116 (0xFFFFFA80073684E0, 0xFFFFF880043568A4, 0x0, 0x2)
Error: VIDEO_TDR_ERROR
file path: C:\Windows\system32\drivers\atikmpag.sys
product: AMD driver
company: Advanced Micro Devices, Inc.
description: AMD multi-vendor Miniport Driver
Bug check description: This indicates that an attempt to reset the display driver and recover from a timeout failed.
A third party driver was identified as the probable root cause of this system error. It is suggested you look for an update for the following driver: atikmpag.sys (AMD multi-vendor Miniport Driver, Advanced Micro Devices, Inc.).
Google query: Advanced Micro Devices, Inc. VIDEO_TDR_ERROR

And already tried this to resolve it. Any other suggestions?
 
Last edited:

Centauri

Golden Member
Dec 10, 2002
1,631
56
91
Also, before letting the system run last night I lifted the voltage on the CPU from its undervolt to 1.325 and lowered the GPU clock by 10% without touching the voltage. Still crashed while simply downloading stuff^^^
 
Last edited:

Centauri

Golden Member
Dec 10, 2002
1,631
56
91
Things just got real. After another crash induced reboot, not even a BSOD, I decided to just load all BIOS defaults and see where that got me.

Instead of the Windows desktop, I got a black screen covered in artifacts and an immobile cursor.