How to test stability of the GPU overclock?

Timmah!

Golden Member
Jul 24, 2010
1,565
914
136
Well, yesterday, for the first time since last August, when i put my computer together, i got BSOD...during playing Black Ops multiplayer, more precisely after maybe one and half hour during the loading of the next map...
Obviously there might be several reasons for that, and it happened only ONCE so far, but basically i suspect its either CPU, GPU as both are overclocked, or alternatively its down to PSU insufficiency...
Lets start with CPU, its 980x @ 3913 MHz with HTT and all power features on at stock vcore (set to normal, reads in BIOS as 1,25625, CPUz shows 1,184 at load)... i have this clocks only since last weeks, for last half of a year i ran it at 3,78GHz (one speedbin less) with no issues at all at same settings. So its definitely a possibility, although i tested it with 5 runs with maximum stress setting (using all 12GB of RAM) of INTEL BURN TEST and it passed it, saying its stable.
Then GPU, its GTX590 and i have it only for 2 weeks. I overclocked it to 670/1340/1800 MHz. I did not test it with any benchmark, but it could render for almost 3 hours with Octane Render, which loads both cores to max...and again it was stable.
Lastly PSU, i have a Seasonic S12D 750W...this might be bit on edge, with OCed 980x and OCed gtx590, but its a good PSU,those overclocks are both only mild ones AND most importantly i played Blops on only ONE of the GPU cores (multi-gpu was off). And i guess 750W PSU should be enough for such CPU + basically gtx570/580 + 2 SSD + 1 WD Green disk, right?

So my question is, from your experience, what is the most likely cause to the BSOD? Is there any way to find this out "immediately"? As the CPU passed the IBT, i reckon its more probably the GPU...but should i run IBT for a longer time period? maybe 10 runs instead of 5, or even more to be 100 percent sure?
What about those GPU stress test, Kombustor/Furmark, how do they work? Its guaranteed the comp would BSOD while running them, if its down to GPU overclock? If not, how will i find out? Will it the app tell me its stable in similar manner to IBT or what will happen? How long should i run them to be sure?
So far i clocked the GPU back to the factory clocks and plan to play the game again like this (in fact already did for an hour, no problem) for a few days and if there wont be any more BSODs, i suppose it would be possible to say, the culprit was the GPU indeed. Anyway i would still be interested in some faster, less "trial and error" method of testing.
Final question, is there any way, to create overclocking profiles for GPU? I mean it will automatically overclock to the 670MHz only when i run Octane and will downclock back to the factory settings, when i shut it down...


Thanks for your answers...
 
Last edited:

Timmah!

Golden Member
Jul 24, 2010
1,565
914
136
Oh, thanks for suggestion, finally somebody arsed enough to respond... :p
So OCCT then, this is basically my issue, i know there is Furmark and Combustor and AtiTools and EVGA OC utility and 3Dmark 11, but i have no idea which one is the best to use, i read several sites and every other says this one is bad, that one is sh!t...
Not to mention i have no idea, how would i know, if its stable, unless its BSODs again... i suppose there is some kind of error checking built in OCCT to tell me, its not stable, right? How long do i need to run it?
 

Rifter

Lifer
Oct 9, 1999
11,522
751
126
the issue with most GPU stress programs is they dont error check, you have to notice any visual artifacts even if its only a pixel or two which is near impossible. With OCCT it error checks for you and reports them. This is why tons of people think they have awesome overclock but dont realize they are really having artifacts and are not 100% stable. It doesnt always BSOD when not stable.

You just need to check the error checking box in OCCT and it will count the errors on screen while running the test. 10-30 min will be fine for a test length.
 

Timmah!

Golden Member
Jul 24, 2010
1,565
914
136
Thank you.
One more thing, when the BSOD occured, it was only for a second, then the machine rebooted, so i was not able to read what it said... is there any way to find it out now? I mean does Windows dumps these mesagges somewhere for user to inspect/send to Microsoft for error check?

EDIT: Oh, nevermind, at last i managed to google it.
 
Last edited:

Timmah!

Golden Member
Jul 24, 2010
1,565
914
136
Ok, so i ran the OCCT for 30min with multi-GPU off as it was off when the BSOD occured....the settings were obligatory 1920x1200x60Hz and shader complexity at default 0...the max temp on the loaded core was 85 degrees C and the framerate was about 224, there were no errors found...BTW the image looked static, nothing rotating like the furryball with Furmark...this is normal, right?

Anyway i was looking into the Windows Event Log and found data regarding that BSOD, it says:

- <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
- <System>
<Provider Name="Microsoft-Windows-Kernel-Power" Guid="{331C3B3A-2005-44C2-AC5E-77220C37D6B4}" />
<EventID>41</EventID>
<Version>2</Version>
<Level>1</Level>
<Task>63</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000002</Keywords>
<TimeCreated SystemTime="2011-05-16T20:17:47.724410900Z" />
<EventRecordID>91029</EventRecordID>
<Correlation />
<Execution ProcessID="4" ThreadID="8" />
<Channel>System</Channel>
<Computer>Machine</Computer>
<Security UserID="S-1-5-18" />
</System>
- <EventData>
<Data Name="BugcheckCode">59</Data>
<Data Name="BugcheckParameter1">0xc0000005</Data>
<Data Name="BugcheckParameter2">0xfffff800036d58b1</Data>
<Data Name="BugcheckParameter3">0xfffff8800f5311e0</Data>
<Data Name="BugcheckParameter4">0x0</Data>
<Data Name="SleepInProgress">false</Data>
<Data Name="PowerButtonTimestamp">0</Data>
</EventData>
</Event>

Any experience with this?
I googled a bit about the bugcheck code 59 part, and apparently its this:
Bug Check 0x3B: SYSTEM_SERVICE_EXCEPTION
it seems it could be related to driver failure, but could it be the result of failed overclock as well?
 
Last edited:

betasub

Platinum Member
Mar 22, 2006
2,677
0
0
it seems it could be related to driver failure, but could it be the result of failed overclock as well?

Quite probably: the overclock may (under a specific set of conditions) cause the driver to crash. Normally with Windows7 these are caught by the OS and the driver re-loaded - did I miss you mention your OS and driver version?
 

Timmah!

Golden Member
Jul 24, 2010
1,565
914
136
Quite probably: the overclock may (under a specific set of conditions) cause the driver to crash. Normally with Windows7 these are caught by the OS and the driver re-loaded - did I miss you mention your OS and driver version?

I run Win7 Pro 64bit with all the recent updates and Nvidia 270.61 display drivers.
I just finished 2 an 1/4 hor long black ops gaming session, played it so long only to see if the BSOD will happen again and it finally did...funny thing is, i already wanted to shut the game down, but decided the one last map to be loaded, as it crashed during loading yesterday...and so did this time.
 

Blitz KriegeR

Senior member
Jan 30, 2005
261
0
0
5 passes certainly is not enough in IBT. You don't need to run it with all 12GB of RAM however. Test with 1 or 2 GB (like most apps will use) but do at least 20-25 passes. This will keep the memory system in the loop but not max it out. We're trying to isolate the CPU, so if you are running with all 12GB of ram in use it just makes the test take forever. With 1-2 my system takes 20 sec per pass for 1GB test or about 60sec for 2GB test.

As far as GPU my best experiences were with EVGA percision for the tunning/monitoring app and OCCT for the initial error check (~30min) and then actual gaming for 100&#37; verification (2+hrs)
 

Timmah!

Golden Member
Jul 24, 2010
1,565
914
136
5 passes certainly is not enough in IBT. You don't need to run it with all 12GB of RAM however. Test with 1 or 2 GB (like most apps will use) but do at least 20-25 passes. This will keep the memory system in the loop but not max it out. We're trying to isolate the CPU, so if you are running with all 12GB of ram in use it just makes the test take forever. With 1-2 my system takes 20 sec per pass for 1GB test or about 60sec for 2GB test.

As far as GPU my best experiences were with EVGA percision for the tunning/monitoring app and OCCT for the initial error check (~30min) and then actual gaming for 100% verification (2+hrs)

Thanks for the advice, i will try it tomorrow this way.

BTW i debugged the recent minidump and it points out to the ntkrnlmp.exe as the possible source of the problem.

Here is the whole analysis:


Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Windows\Minidump\051711-14227-01.dmp]
Mini Kernel Dump File: Only registers and stack trace are available

Symbol search path is: srv*
Executable search path is:
Windows 7 Kernel Version 7600 MP (12 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 7600.16792.amd64fre.win7_gdr.110408-1633
Machine Name:
Kernel base = 0xfffff800`0361c000 PsLoadedModuleList = 0xfffff800`03859e50
Debug session time: Tue May 17 22:34:05.232 2011 (UTC + 2:00)
System Uptime: 0 days 4:45:16.760
Loading Kernel Symbols
...............................................................
................................................................
.................................
Loading User Symbols
Loading unloaded module list
..................................
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 3B, {c0000005, fffff8000368b8b1, fffff8800c0d01e0, 0}

Probably caused by : ntkrnlmp.exe ( nt!KiSystemServiceGdiTebAccess+7a )

Followup: MachineOwner
---------

5: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

SYSTEM_SERVICE_EXCEPTION (3b)
An exception happened while executing a system service routine.
Arguments:
Arg1: 00000000c0000005, Exception code that caused the bugcheck
Arg2: fffff8000368b8b1, Address of the instruction which caused the bugcheck
Arg3: fffff8800c0d01e0, Address of the context record for the exception that caused the bugcheck
Arg4: 0000000000000000, zero.

Debugging Details:
------------------


EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

FAULTING_IP:
nt!KiSystemServiceGdiTebAccess+7a
fffff800`0368b8b1 480f433547872300 cmovae rsi,qword ptr [nt!MmUserProbeAddress (fffff800`038c4000)]

CONTEXT: fffff8800c0d01e0 -- (.cxr 0xfffff8800c0d01e0)
rax=0000000000000020 rbx=fffffa800a3cdb60 rcx=00000000000002b4
rdx=0000000000010000 rsi=00000000012df8c8 rdi=fffff8800c0d0bc8
rip=fffff8000368b8b1 rsp=fffff8800c0d0bb0 rbp=fffff8800c0d0ca0
r8=0000000001b41d90 r9=0000000000000000 r10=fffff8000398ca30
r11=00000000002ff730 r12=00000000004ed6a0 r13=000007fefeda0000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei ng nz na pe cy
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010283
nt!KiSystemServiceGdiTebAccess+0x7a:
fffff800`0368b8b1 480f433547872300 cmovae rsi,qword ptr [nt!MmUserProbeAddress (fffff800`038c4000)] ds:002b:fffff800`038c4000=????????????????
Resetting default scope

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT

BUGCHECK_STR: 0x3B

PROCESS_NAME: svchost.exe

CURRENT_IRQL: 0

LAST_CONTROL_TRANSFER: from 0000000076d7ff0a to fffff8000368b8b1

STACK_TEXT:
fffff880`0c0d0bb0 00000000`76d7ff0a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceGdiTebAccess+0x7a
00000000`012df8a8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x76d7ff0a


FOLLOWUP_IP:
nt!KiSystemServiceGdiTebAccess+7a
fffff800`0368b8b1 480f433547872300 cmovae rsi,qword ptr [nt!MmUserProbeAddress (fffff800`038c4000)]

SYMBOL_STACK_INDEX: 0

SYMBOL_NAME: nt!KiSystemServiceGdiTebAccess+7a

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nt

IMAGE_NAME: ntkrnlmp.exe

DEBUG_FLR_IMAGE_TIMESTAMP: 4d9fdd34

STACK_COMMAND: .cxr 0xfffff8800c0d01e0 ; kb

FAILURE_BUCKET_ID: X64_0x3B_nt!KiSystemServiceGdiTebAccess+7a

BUCKET_ID: X64_0x3B_nt!KiSystemServiceGdiTebAccess+7a

Followup: MachineOwner
---------

5: kd> lmvm nt
start end module name
fffff800`0361c000 fffff800`03bf8000 nt (pdb symbols) C:\Program Files\Debugging Tools for Windows (x64)\sym\ntkrnlmp.pdb\DE7B3DD8AC5343B3B4874BAB3F4599DD2\ntkrnlmp.pdb
Loaded symbol image file: ntkrnlmp.exe
Mapped memory image file: C:\Program Files\Debugging Tools for Windows (x64)\sym\ntoskrnl.exe\4D9FDD345dc000\ntoskrnl.exe
Image path: ntkrnlmp.exe
Image name: ntkrnlmp.exe
Timestamp: Sat Apr 09 06:14:44 2011 (4D9FDD34)
CheckSum: 00547734
ImageSize: 005DC000
File version: 6.1.7600.16792
Product version: 6.1.7600.16792
File flags: 0 (Mask 3F)
File OS: 40004 NT Win32
File type: 1.0 App
File date: 00000000.00000000
Translations: 0409.04b0
CompanyName: Microsoft Corporation
ProductName: Microsoft® Windows® Operating System
InternalName: ntkrnlmp.exe
OriginalFilename: ntkrnlmp.exe
ProductVersion: 6.1.7600.16792
FileVersion: 6.1.7600.16792 (win7_gdr.110408-1633)
FileDescription: NT Kernel & System
LegalCopyright: © Microsoft Corporation. All rights reserved.


Any ideas?
 

Blitz KriegeR

Senior member
Jan 30, 2005
261
0
0
To be honest if it was an overlock that caused the crash it doesn't matter at all what file/app/driver crashed. The only file thing I'd even look at is if it was nvdklm or any nvXxxX file then thats a graphics driver file and could be a CPU or GPU OC indused crash.

In terms of BSODs from overclocking, the error code is actually more useful. It should be located in the middle of the screen offset to the left. I used to know the top 3, but the biggest one that comes to mind is a 0x0000000001 is 95&#37; a CPU instability.
 

sticks435

Senior member
Jun 30, 2008
757
0
0
To be honest if it was an overlock that caused the crash it doesn't matter at all what file/app/driver crashed. The only file thing I'd even look at is if it was nvdklm or any nvXxxX file then thats a graphics driver file and could be a CPU or GPU OC indused crash.

In terms of BSODs from overclocking, the error code is actually more useful. It should be located in the middle of the screen offset to the left. I used to know the top 3, but the biggest one that comes to mind is a 0x0000000001 is 95% a CPU instability.
From his debugging above it's: 0xc0000005
 

happy medium

Lifer
Jun 8, 2003
14,387
480
126
I agree, most times the comp reboots i have found the issue to be the PSU. As happy said lower the OC to stock and run it for a week and see if you get any issues.

Sorry for the edit, but yes this is my second choice. :thumbsup:
 

DirkGently1

Senior member
Mar 31, 2011
904
0
0
If you really want something that will cause an unstable OC to fail try looping the Unigine Heaven Demo. If your card is stable doing that it is good to go. A lot of people found that overclocks that were 'stress test' stable with Furmark, et al, would fail in Crysis 2 for instance. Heaven is currently the best indicator we have.
 

Timmah!

Golden Member
Jul 24, 2010
1,565
914
136
Yeah, thank you all again, i just have now to decide, where to start, with GPU or CPU...i reckon i try those 25 passes of IBT at 3,9GHz and if it ends up stable, i turn my attention to GPU and proceed with the Unigine/Evga Precision App.
Anyway as the debugging tool did not blame directly any Nvidia file, i am more leaning now to the possibility, its probably the CPU stability issue... given the knowledge now, that 5 passes of IBT are not enough and fact, that the game fails, while only one GPU core is in use... The Octane Render will keep working for hours with both cores loaded (and CPU is basically idling in that period)....that makes it difficult to believe its the GPU failure. To be fair, it was pleasant surprise anyway, that CPU passed that IBT test at 3,9 and stock volts, before i expected it to be the limit, at which more juice will start to be required... so maybe that guess was right, but ofc i will be wiser, when i test it later today...
The only thing that irks me, is the possibility of PSU insufficiency, cause that could be solved only by stronger PSU replacement/running everything at stock speeds...and obviously i do not want to do either :p I saw in OCCT, there is some kind of PSU test, how does it work?
 
Last edited:

Rifter

Lifer
Oct 9, 1999
11,522
751
126
Yeah, thank you all again, i just have now to decide, where to start, with GPU or CPU...i reckon i try those 25 passes of IBT at 3,9GHz and if it ends up stable, i turn my attention to GPU and proceed with the Unigine/Evga Precision App.
Anyway as the debugging tool did not blame directly any Nvidia file, i am more leaning now to the possibility, its probably the CPU stability issue... given the knowledge now, that 5 passes of IBT are not enough and fact, that the game fails, while only one GPU core is in use... The Octane Render will keep working for hours with both cores loaded (and CPU is basically idling in that period)....that makes it difficult to believe its the GPU failure. To be fair, it was pleasant surprise anyway, that CPU passed that IBT test at 3,9 and stock volts, before i expected it to be the limit, at which more juice will start to be required... so maybe that guess was right, but ofc i will be wiser, when i test it later today...
The only thing that irks me, is the possibility of PSU insufficiency, cause that could be solved only by stronger PSU replacement/running everything at stock speeds...and obviously i do not want to do either :p I saw in OCCT, there is some kind of PSU test, how does it work?

The OCCT power supply test basically tries to make the system draw as much power as possible by running CPU and GPU test at the same time to try and maxamize useage of both. Its fairly effective.

I would run a battery of tests to make sure the CPU is stable if you suspect its not, dont just stick to IBT, run prime95 as well. I usually run prime95 for 24-48 hours and if it doesnt crash its usualy stable.

Dont rule out the PSU though.
 

Timmah!

Golden Member
Jul 24, 2010
1,565
914
136
Ok, i will have to make a list of all possible tests i have to do :-D

Anyway here is one more piece of info, i sent my minidump and system status to the win7 forums and the guy, who checked it, wrote me back:

Hi Timmaigh and welcome

This one was a memory exception Related to Livemouclass.sys NT Caps-lock Ctrl Swapper from Systems Internals.

I would remove it to test.



So i might start with this after all. Have you encountered this Livemouclass.sys error before?
 

Rifter

Lifer
Oct 9, 1999
11,522
751
126
So i might start with this after all. Have you encountered this Livemouclass.sys error before?

I have not but i spend 90% of my computer time in Linux and only boot windows to game. You would probably get more info for this specific error either in our operating system forum section or a windows forums.
 

Timmah!

Golden Member
Jul 24, 2010
1,565
914
136
Ok, so i followed the advice reagrding IBT and letting it run 25times with 2GB of RAM...and on the 5th run it failed! I downclocked the CPU back to 3,78 i ran without any BSODs until last week, ran those 25 runs again and it passed succcesfuly.
So i believe i found the culprit, i have to play the game for at least 2 hours to be sure, it still might be GPU or PSU related as said before. But hopefully it will be fine.

Just a question, for now i clocked the CPU back, cause i do not want to increase voltage, but if i decided i want to run those 3,9GHz :p. what should i do?
I mean i would want to OC it in a same manner via multiplier and keep TURBO and power saving feats...do not want to play around with RAM and Vtt and PLLs and stuff...
and there are 3 possibilities in BIOS ( i have Giga X58A UD7 rev 1.0):
AUTO... which is obviously irrelevant
NORMAL....which i run now, i suppose it wont let my Vcore go past the max default value, no matter the clocks, so thats the reason why my OC failed for example, right? But it will go down when idle...
Then there is the DVID thing, vcore offset
and finally numerical values...

i wonder now, which one to use? If i use numerical value, for example 1,28 (when Normal is set, it says 1,25625, so i suppose that is the default Vcore), will it decrease when idle like with Normal Settings? I intend to keep C1E and Speedstep on...
Or do i need to use the DVID Offset to Normal, which will just increase the voltage with the offset value (for example i set it to 0,025, what will mean 1,25625 + 0,025 = 1,28125) when needed (at load), but otherwise it will act as before, go down at idle etc.. Do i need to touch LLC, when using the Offset?

Am i missing something?
 

Rifter

Lifer
Oct 9, 1999
11,522
751
126
You will likely need to raise the Vcore and the VTT voltage to get near 4Ghz.

I have my Vcore and QPI(VTT) both at 1.35v to reach my 4.2Ghz overclock.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
If you can gpu render for 3+ hrs. Your GPU is fine. It doesn't surprise me that it turned out to be the CPU.

BTW, what are you rendering that takes 3 hrs on your GPU's? I'm guessing it's some sort of animation and not a single frame.
 

Timmah!

Golden Member
Jul 24, 2010
1,565
914
136
If you can gpu render for 3+ hrs. Your GPU is fine. It doesn't surprise me that it turned out to be the CPU.

BTW, what are you rendering that takes 3 hrs on your GPU's? I'm guessing it's some sort of animation and not a single frame.


Yeah, i suppose it was too good to be true anyway, that my CPU could be stable on the same voltage with almost 550 MHz overclock. I reckon 500 MHz overclock is about the limit, at which you need to bump vcore to be stable on most Intel chips since the 45nm Core2 CPUs. Maybe i am wrong though :D

Its a single frame actually and its family house interior rendering at 3840*2400 resolution (though it would take probably same amount of time on 1920*1200). It takes so long, cause the Octane render i use, is fully unbiased renderer, it uses the fabled raytracing/pathtracing to render the images, similarly to CPU based Maxwell, which is useless though, unless you own render farm :p
Before i switched to Octane, i used Vray on CPU, that was basically the reason why i bought 980x, and man it was really fast, it could render similar interior in HD under 20 minutes....but it was biased, used pre-computed lightning solutions ad although i believe, i finally learned to do some okayish looking exteriors, i could not find a way to do a proper interior render, the lightning never looked as good as with Octane...
 
Last edited:

Timmah!

Golden Member
Jul 24, 2010
1,565
914
136
You will likely need to raise the Vcore and the VTT voltage to get near 4Ghz.

I have my Vcore and QPI(VTT) both at 1.35v to reach my 4.2Ghz overclock.

Do you need to bump QPI voltage if you overclock only the core part via multiplier? I thought i need to touch other voltages than vcore, only if i did BCLK overclock, which would mean overclocking RAM and uncore as well...