advice troubleshooting instability

benandjerry

Member
Aug 18, 2009
38
0
61
First of all, I just noticed it after I got carried away a little with my post, I want to apologize for its length but hope its not TLDR, I tried to describe it in as much detail as possible. Very much appreciate if you can bare with me.

Let me first list my specs:
asus m5a97 r2.0
fx6300 currently at 4.4ghz 22x multiplier
evga gtx 650ti boost factory oc
8gb corsair vengeance 1600
samsung evo 250gb
coolermaster g550m bronze psu
win 8.1

Cpu is the only part thats currently OC. I havent checked that its 100% rock stable. Here is whats happend. I initially ran it at 4.65ghz with 210mhz bus, I messed around with voltages and clocks quite a bit and although I could never be bothered running 20hrs+ stress testing to make it was rock solid I was satisfied enough for the time being with its stability. I could stress it for an hour or so without any issues, did not have a single crash in real world usage for weeks. So even if it wasnt 100% stable, there certainly werent severe stability issues.

I also tried OC the gpu and could push it a bit, in particular the memory, but only did some benching and after that ran it at stock. All was stable and well until a couple of days ago. As I was playing Titanfall it started crashing repeatedly, and I had played that for extended sessions, as well as other games, before. Since then, much simpler tasks, browsing, document work etc has also caused it to crash randomly.

In a couple of crashes I've had artifacts over screen leading me to believe gpu been the issue, another few times windows has told me nvidia drivers stopped responding and has been restored. Most of the time it kind of just goes though, black screen, white screen, no signal, sometimes gotta hard reset, sometimes just reboots. Another thing I've noticed, the monitor (an old CRT), started doing tick then shrink image real quickly and back to normal every once in a while, at first (before the instabilities started) I thought the monitor was on the verge of going but now thinking maybe its the gpu. I'm pretty sure it didnt do this from the start with this box, and I certainly got no memories of it doing this ever before this box was connected to it. Needless to say I'm using VGA connector with it, so converter inbetween.

I was running the latest stable nvidia drivers when this started occuring, 335.23 I think, have tried removing them and reinstalling, both via CP and driver sweepers successor whatever its name was. Have also done the same process with the new beta 337.50. Multiple times with both of them. Have also unplugged the GPU, reseated it. Cleared CMOS, and set it all up again.

Before you say but set cpu to stock it could be the cause. Have done so, the crashes continued. Yesterday I thought I may have fixed it, went into Titanfall, crash within a few secs. Set cpu clock to stock, exact same.

What really bugs me is that it all came out of nowhere while playing game, had not been messing with drivers, had not modified or added any hardware, which leads me to believe its more likely a hardware issue than a software issue.

Have run cpu stress tests for several hours, and while I had issues with 2 modules in prime95 yesterday after about 3h I say that isnt reason enough for it to go from being rock solid in real life to this. I have since decreased multiplier by 0.5x, as I said the issues doesnt go away even at stock. Cpu temps are solid, think it hoovers in mid 50s in prime95 at current clock and voltage (1.35V in bios, NB set to 1.15V).

Have run memtest, in windows though, as I dont have a usb stick or dvd drive at my hands atm, no issues after close to an hour full load. No ram issues reported in prime95 blend either.

My prime suspect at the moment is:
1. gpu (kind of strongly until today once after a crash pc wouldnt post at all until I cleared cmos, which lead me to believe mobo issues)
2. mobo
3. some kind of psu issues



My ideas of further troubleshooting with my tools at hand:
1. Create another partition and make a clean windows 8.1 install, with clean drivers and all to rule out software issues. Will this leave any problems if I want to just remove the new partition with windows 8.1 and revert back to the current install? I have quite a lot of software configured that I do not want to redo, unless software issues are the problem.

2. Try with an old corsair 400W psu I have laying around, this should be sufficient still albeit not giving a ton of headroom, right?


Unfortunately I do not have another box to try out the gpu in.

If, and I hope some did, you managed to bare with me for this long, what are your thoughts about this and if anyone could answer the 2 questions I have above regarding my next steps I'd very much appreciate it.
 

Ketchup

Elite Member
Sep 1, 2002
14,558
248
106
I have been having issues with Nvidia drivers on my 660 as of late. Specifically with PhysX. I don't know if there is any relation to your issue (I do not have any tearing issues and haven't exhausted possible causes), but thought it worth mentioning. The latest drivers that don't exhibit issues are 331.93. Actually I can run the latest beta drivers fine, so long as I don't use the PhysX driver that comes with it.

So you might want to try an earlier version, such as 331, and see what happens.

When you say crash, are you getting a blue screen? If so, try to get an app to read the minidump, such as
http://www.resplendence.com/whocrashed

I really don't like how this is pointing to the gpu (especially the memory). I would be curious to see what the minidump is pointing to, and if the older drivers help any.
 

benandjerry

Member
Aug 18, 2009
38
0
61
I have been having issues with Nvidia drivers on my 660 as of late. Specifically with PhysX. I don't know if there is any relation to your issue (I do not have any tearing issues and haven't exhausted possible causes), but thought it worth mentioning. The latest drivers that don't exhibit issues are 331.93. Actually I can run the latest beta drivers fine, so long as I don't use the PhysX driver that comes with it.

So you might want to try an earlier version, such as 331, and see what happens.

When you say crash, are you getting a blue screen? If so, try to get an app to read the minidump, such as
http://www.resplendence.com/whocrashed

I really don't like how this is pointing to the gpu (especially the memory). I would be curious to see what the minidump is pointing to, and if the older drivers help any.

Much appreciated. Not getting BSOD, it just kind of goes. Will have a look at that app. Event viewer reveals nothing.

Will try without PhysX, as well as the older version of drivers you mentioned, and report back later.

Isnt it a bit odd how it was stable for so long just to run into such severe instability if it was driver related when I didnt touch the drivers at all during all this time?
 

Ketchup

Elite Member
Sep 1, 2002
14,558
248
106
Much appreciated. Not getting BSOD, it just kind of goes. Will have a look at that app. Event viewer reveals nothing.

Will try without PhysX, as well as the older version of drivers you mentioned, and report back later.

Isnt it a bit odd how it was stable for so long just to run into such severe instability if it was driver related when I didnt touch the drivers at all during all this time?

Well, the drivers you mentioned have only been out for a month, which isn't a long period of time in my book.

I was running the latest stable nvidia drivers when this started occuring, 335.23
 

benandjerry

Member
Aug 18, 2009
38
0
61
Well, the drivers you mentioned have only been out for a month, which isn't a long period of time in my book.

Thats true, but as said it really was rock solid for weeks.

Have turned down cpu to stock again, have had a crash since. Installed whocrashed, a few reports, but far from all crashes been registered there.


Error: DPC_WATCHDOG_VIOLATION
Bug check description: The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL or above.
This appears to be a typical software driver bug and is not likely to be caused by a hardware problem. This problem might be caused by a thermal issue.


Error: VIDEO_TDR_ERROR
file path: C:\Windows\system32\drivers\nvlddmkm.sys
product: NVIDIA Windows Kernel Mode Driver, Version 337.50
company: NVIDIA Corporation
description: NVIDIA Windows Kernel Mode Driver, Version 337.50
Bug check description: This indicates that an attempt to reset the display driver and recover from a timeout failed.



Error: APC_INDEX_MISMATCH
file path: C:\Windows\system32\ntoskrnl.exe
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: NT Kernel & System
Bug check description: This indicates that there has been a mismatch in the APC state index.



Error: WHEA_UNCORRECTABLE_ERROR
file path: C:\Windows\system32\ntoskrnl.exe
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: NT Kernel & System
Bug check description: This bug check indicates that a fatal hardware error has occurred. This bug check uses the error data that is provided by the Windows Hardware Error Architecture (WHEA).
This is likely to be caused by a hardware problem problem. This problem might be caused by a thermal issue.




Are the ones registered. Will try to install without PhysX tomorrow, otherwise revert back to 331.xx. Just get the feeling this is hardware related. No thermal issues, and dont use wifi atm.
 

Ketchup

Elite Member
Sep 1, 2002
14,558
248
106
Yeah, I hope it's drivers, but it still looks like the video card is having issues.
 

benandjerry

Member
Aug 18, 2009
38
0
61
Yeah, I hope it's drivers, but it still looks like the video card is having issues.

You, Sir, are very quick to reply, appreciate the insight!

Is there any reason at all to believe its the mobo? Since I had one occasion of where it failed to post until I reset cmos. I was thinking (I personally dont think this is the issue) since its not really a high end board, and 4.65ghz is a decent OC for a fx6300 on air, but neither temps nor voltages have been out of line, ~1.35v, CPU 65C ish during max stress but hardly in real world usage.

Not sure how RMA works for you, but with my reseller I'd end up paying a £25 fee if the card is not faulty, which is kind of steep on a £100 card, not to mention I dont have a spare card available to use in the mean time, so just trying to be as certain as possible.

Just reinstalled 337.50 without PhysX so we'll see how that goes.
 

benandjerry

Member
Aug 18, 2009
38
0
61
Have had a few crashes since doing only the 337.50 drivers without PhysX. Firing up a video in my media player appears to be a common way to make it crash, another gpu related task...

Will give the 331.xx drivers a shot at some point too, but not holding my breath at this point.