Unstable system, tried a lot of things, kind of desperate now.

albi33

Junior Member
May 25, 2015
4
0
0
Hello! This is my first post here so first of all: nice to meet you!

-------------------

Here is a TL;DR, full post with details bellow:
System crashes (freeze & reboot) when I'm actively playing (not if the game is launched but idle). I tried a lot of solutions, including GPU RMA, re-formatting and changing my OS three times, buying a new PSU and CPU cooler, clearing the CMOS, flashing the BIOS of motherboard and gpu. Only thing that worked (but NOT satisfactory): underclothing -200Mhz memory & core clocks.

------------------

I'm having a lot of issues with my new computer.
I have a history of building my own PCs so I know a thing or two but it's the first time I'm having so much troubles with a problem that seems to have so many solutions at first.

Here it is: my system is really unstable. It crashes, a lot, but only during specific circumstances: when I'm actively using it, especially gaming.

Here are the components:
MB: ASRocks Z97 Pro3
Proc: i5 4690
GC: GIGABYTE GV-N970IXOC-4GD GeForce GTX 970 4GB 256-B
PSU: EVGA 850 G2
Ram: Crucial Ballistix Sport 16GB Kit (8GBx2) DDR3 1600 MT/s (PC3-12800)
I also have two SSDs (my main one, new, for my gaming Windows system, and an older one I'm still using for dual boot on a Linux for my work) and a 3TB hard drive for all the data.

My tower is also very well cooled with three noctua 120mm fans (added to the two coming with the case, a fractal design arctic white) and a NH-u12s cpu cooler.

I was using windows 7 when the first crashes occurred (it was this January, I bought the above configuration as a Christmas gift). Since then I switched to Windows 8.1 and now the technical preview of Windows 10, the crashes occurred on all the OS, even on clean installations without any additional software except steam + my current games (The Witcher 3 atm).

Here is a more detailed description of the symptoms: I start the game, load my save, everything is fine. In game, as long as I don't do anything, it can stay up without any crashes for several hours (I was doing that at first as ways to tests if everything was stable or not).
When I start playing, after 15 minutes to a couple of hours, I'll get a big freeze, an audio feedback loop and, if I wait enough time, a reboot.

At some point I copied the content of a minidump crash file, but right now on my windows 10 I didn't figure out how to get a memory dump when it crashes yet. It does not look like a BSOD too, I don't see anything on the screen, it just freezes and crashes.

*** WARNING: Unable to verify timestamp for nvlddmkm.sys
*** ERROR: Module load completed but symbols could not be loaded for nvlddmkm.sys
Probably caused by : nvlddmkm.sys ( nvlddmkm+7a2dc0 )

Followup: MachineOwner
---------

3: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.
Arguments:
Arg1: fffffa8018a6b190, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff88005203dc0, The pointer into responsible device driver module (e.g. owner tag).
Arg3: ffffffffc000009a, Optional error code (NTSTATUS) of the last failed operation.
Arg4: 0000000000000004, Optional internal context dependent data.

Debugging Details:
------------------


FAULTING_IP:
nvlddmkm+7a2dc0
fffff880`05203dc0 48ff251996edff jmp qword ptr [nvlddmkm+0x67c3e0 (fffff880`050dd3e0)]

DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_FAULT

CUSTOMER_CRASH_COUNT: 1

BUGCHECK_STR: 0x116

PROCESS_NAME: System

CURRENT_IRQL: 0

ANALYSIS_VERSION: 6.3.9600.17298 (debuggers(dbg).141024-1500) amd64fre

STACK_TEXT:
fffff880`05ec6a48 fffff880`054cc134 : 00000000`00000116 fffffa80`18a6b190 fffff880`05203dc0 ffffffff`c000009a : nt!KeBugCheckEx
fffff880`05ec6a50 fffff880`0549f867 : fffff880`05203dc0 fffffa80`106c9000 00000000`00000000 ffffffff`c000009a : dxgkrnl!TdrBugcheckOnTimeout+0xec
fffff880`05ec6a90 fffff880`054cbf43 : fffffa80`ffffd846 00000000`00000000 fffffa80`18a6b190 00000000`00000000 : dxgkrnl!DXGADAPTER::Reset+0x2a3
fffff880`05ec6b40 fffff880`0559c03d : fffffa80`1ad201e0 00000000`00000080 00000000`00000000 fffffa80`106be410 : dxgkrnl!TdrResetFromTimeout+0x23
fffff880`05ec6bc0 fffff800`0312f0ca : 00000000`0103906e fffffa80`103a7b50 fffffa80`0c6e19e0 fffffa80`103a7b50 : dxgmms1!VidSchiWorkerThread+0x101
fffff880`05ec6c00 fffff800`02e83be6 : fffff880`03165180 fffffa80`103a7b50 fffff880`0316ffc0 fffffa80`0f201da0 : nt!PspSystemThreadStartup+0x5a
fffff880`05ec6c40 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16


STACK_COMMAND: .bugcheck ; kb

FOLLOWUP_IP:
nvlddmkm+7a2dc0
fffff880`05203dc0 48ff251996edff jmp qword ptr [nvlddmkm+0x67c3e0 (fffff880`050dd3e0)]

SYMBOL_NAME: nvlddmkm+7a2dc0

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nvlddmkm

IMAGE_NAME: nvlddmkm.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 54b0548e

FAILURE_BUCKET_ID: X64_0x116_IMAGE_nvlddmkm.sys

BUCKET_ID: X64_0x116_IMAGE_nvlddmkm.sys

ANALYSIS_SOURCE: KM

FAILURE_ID_HASH_STRING: km:x64_0x116_image_nvlddmkm.sys

FAILURE_ID_HASH: {1f9e0448-3238-5868-3678-c8e526bb1edc}

Followup: MachineOwner

Here are the steps I did to try and fix that:

- Reformatted and tried with Windows 8.1.
- Sent my GPU to RMA, it was processed and I got it back and the problem was still there.
- Cleared the CMOS of my motherboard. Double checked all the parameters in my BIOS (like memory voltage etc.). Flashed the BIOS to the newest version.
- Flashed my GPU bios to the newest version.
- Bought the PSU + CPU cooler listed above and redid all the cable management of my tower. Now the GPU never goes above 66°C even on high load with benchmarks like Valley (which are not causing any crashes).
- Checked my memory with memcheck.

There is only one thing that improved the problem: undercloking my gpu (-200mhz both memory and core clocks). In that case, I don't get the crashes anymore. But it's just not acceptable in my mind, I didn't buy all this stuff to get 5 to 10% of decreased performances. Especially since I'd also like to be able to overclock at some point, the GTX 970 is supposed to be a good card for that.

So here it is. Would you have, by any chance, any other ideas about how to fix these problems? You think it could be the motherboard? I really don't want to send it to RMA, it's going to be very complicated for me to get by without desktop since I'm mostly working from home on it, so I'd like to keep that element at the bottom of the list of things to do. Are there any ways to test the stability/integrity of the motherboard?

Thanks!
 

GagHalfrunt

Lifer
Apr 19, 2001
25,284
1,998
126
I've had RAM that could pass memtest all day long and was still unstable in Windows. Try over-volting or underclocking the RAM to see if that helps and pick up a memory torture tester that runs in Windows rather than in DOS.
 

C1

Platinum Member
Feb 21, 2008
2,416
122
106
Go into your BIOS & inspect the available options for graphics. (This might be in "Chip Configuration" panel.)

There should be some different options worth changing/trying. Like in my case (ASUS MB)there is a choice for video memory cache mode. Default is a conservative value, but setting it to the more advanced option "greatly improves the display speed" provided that your display card supports the available option. There are other options available to try.

If it is any consolation, when I built my rig, the video graphics card brand/type was probably the most problematic component regarding interoperability of the system and required me to try several brand/models before I was able to achieve something that I could live with (as I was running four different types of operating systems).

I suppose too that, since you are in the BIOS, you could try some different RAM timings although, depending, that might have to be done thru jumpering (ie, available somewhere on the MB).
 
Last edited:

albi33

Junior Member
May 25, 2015
4
0
0
Thanks C1, I went to check in my BIOS (well, UEFI now) but didn't find anything really relevant to the issue. I already did check the memory voltage and parameters and everything looked all right.

GagHalfrunt, I'm running a Prime95 test atm, I'll let it run overnight, I'll update here with the results. Thanks!
 

Ketchup

Elite Member
Sep 1, 2002
14,559
248
106
Unfortunately, you are going to have a difficult time finding a solution running a Preview OS. Stick with 7 or 8.1 until you get the problem sorted out, then play with 10 to your heart's content.

One thing I would suggest is running the game in windowed mode and use a program such as CPUID Hardware Monitor to watch your video card temps.

If you continue to experience blue screens, and they are not caused by high temps, try a different video driver.
 

albi33

Junior Member
May 25, 2015
4
0
0
Unfortunately, you are going to have a difficult time finding a solution running a Preview OS. Stick with 7 or 8.1 until you get the problem sorted out, then play with 10 to your heart's content.

One thing I would suggest is running the game in windowed mode and use a program such as CPUID Hardware Monitor to watch your video card temps.

If you continue to experience blue screens, and they are not caused by high temps, try a different video driver.

Well, I tried pretty much everything I could think about before jumping on Windows 10.

The thing is, after a couple of days trying a bunch of fixes, modifying stuff in the Nvidia panel, in the registry, installing several benchmarking softwares, hardware monitors, antivirus, trying 5 different drivers, etc. it comes a moment when you are starting to think that you may have fixed the issue but your computer is so much bloated with all the steps you tried in the meantime that it could be worth it to just restart from scratch.

This is what happened for me after I bough the new PSU and CPU coolers last Saturday and I it was still crashing afterwards.

Now I know using a preview OS is not optimal for what I'm trying to do but in my eyes it was another thing I had to try before thinking of starting another RMA process.
Especially since I'm working full time on my desktop at the moment so I can't really afford going a month (duration of last RMA) without a graphic card or a motherboard.
 

silicon

Senior member
Nov 27, 2004
886
1
81
Thanks C1, I went to check in my BIOS (well, UEFI now) but didn't find anything really relevant to the issue. I already did check the memory voltage and parameters and everything looked all right.

GagHalfrunt, I'm running a Prime95 test atm, I'll let it run overnight, I'll update here with the results. Thanks!

If you think the ram is ok then possibly an intermittent on the board somewhere. Have you tried to run the setup out of the case? Could be aproblem of the board making a connection to the case.
 

albi33

Junior Member
May 25, 2015
4
0
0
So, I tried using my card with my friend's computer and the "good" news is, it crashed.
So yeah, now I'm sure it comes from the GPU, I bought a new one and everything seems to work perfectly fine.

One thing I noticed with the new card: the temps stays around 55°C when playing while it went up to 65°C easily with the faulty one.
The power usage % too seems weird, with the new one it's stable around 40% on full load (even with some overclocking) and the bad one was close to 90% without O/C.

So now I guess I just have to send it back to RMA again and ask for a replacement this time.

Thanks for all the suggestions guys!
 

Ketchup

Elite Member
Sep 1, 2002
14,559
248
106
Glad you figured it out! It's sad that you send a card back only to find that it hadn't been touched. I wonder if there is just a bad hsf mounting on the bad card, but that us an issue the manufacturer should be addressing.
 

inachu

Platinum Member
Aug 22, 2014
2,387
2
41
You know what? Maybe you did nothing wrong except at the start of the pc build.

For instance you know that apply motherboard drivers last will make a system less stable?

For the majority of a pc build the pc should be off the network. Once OS load is complete then apply motherboard drivers first. After motherboard then install the NIC driver then install video then sound.

I have seen it where sound should be before video and sometimes Video should be installed right after motherboard drivers but each and every board has its own vices.

Never rely on the drivers at pc build time that microsoft gives you unless the board is certified for that OS version.

Many smaller companies may not have certified drivers for a motherboard but they do work and their argument is that they refuse to pay the $140,000 it costs to get drivers certified for microsoft.
 

inachu

Platinum Member
Aug 22, 2014
2,387
2
41
So, I tried using my card with my friend's computer and the "good" news is, it crashed.
So yeah, now I'm sure it comes from the GPU, I bought a new one and everything seems to work perfectly fine.

One thing I noticed with the new card: the temps stays around 55°C when playing while it went up to 65°C easily with the faulty one.
The power usage % too seems weird, with the new one it's stable around 40% on full load (even with some overclocking) and the bad one was close to 90% without O/C.

So now I guess I just have to send it back to RMA again and ask for a replacement this time.

Thanks for all the suggestions guys!

Sounds like it suffered heat damage. Fans are cheap to replace.
Maybe try ball bearing based fans as they last longer. Everyone has a favorite for their pc case.