Frequent Crashes But Nothing Fails Stability Testing...? Help?

chazdraves

Golden Member
May 10, 2002
1,122
0
0
For 4 months now, I've been dealing with a problem that drives me crazy. I built a new rig with 8GB DDR3 1600 Crucial Vengeance, an AMD Phenom II X4 965 CPU, Gigabyte M4A87TD EVO Mobo, Crucial 650W PSU, and a Geforce 560Ti. Shortly after I got it up and running (clean install of Win 7 HP 64-bit), I noticed random lockups mostly during gaming. A little research told me this was a TDR crash and ultimately the crashes almost always recovered but they happened too frequently to be tolerable. This occurred during both gaming (very frequent) and occasionally during more mundane tasks like Painter 12 or even Adobe Flash. Since the error was a video driver failure, I decided to send back the GPU. Having an AMD CPU and an AM3 board, I thought perhaps a Radeon card would pair better and purchased a 2GB 6950.

The new card came and seemed good, but within the first day, the problems were back. I came to believe that it was a fault of TDR/Win 7 and that my hardware was fine (it is all brand new, of course, and all running at stock speeds). I let it go as it was for a while until I couldn't stand it any longer and bought a PS3. I've just recently come back to the issue because I want it fixed. I'm going on the notion that TDR knows what its doing and there actually is an issue, but I can't find it. Here's what I've done so far:

I've maintained all of the current Catalyst drivers without any change from any of them.

I've disabled TDR entirely - this helped some of the time but now crashes result in hard lockups though they seem less frequent.

I've run Memtest 86+ for 2.5 hours last night at both my SPD speeds and rated speeds - both passed without any issue.

I've run Prime95 for over an hour this morning (again, nothing is OC'd, just stock speeds/coolers) and it ran like a champ at 100% the whole time without a single error or warning.

I ran FurMark 3 times and again no problems.

I then went back to video gaming this morning. Played the Homefront Demo for 20 minutes without issue. Started into the Space Marine demo and it made it 5 minutes before hard-locking my system...

Ultimately, I use my computer as a productivity machine, and it works pretty well for that without issue, but I'd like to be able to put all of that hardware to some games now-and-again. Any ideas? Motherboard? I don't think it's the PSU because a) it's well over the requirements and a reliable brand and b) it never fails during stress tests which tend to be more demanding than games.

Also, I do not have another system to try these parts out in, so I need to be able to diagnose within my current setup.

Thank you all kindly for reading all of this. I know that TDR issues are EXTREMELY common and very frustrating. I've spent hours reading possible solutions, but most of them relate to drivers, disabling TDR, faulty PSUs, or RAM settings - all of which check out on my system...

- Chaz
 

hondaf17

Senior member
Sep 25, 2005
763
16
81
Sorry you're having problems. I'd try replacing the RAM. From my experience I believe you have to run memtest longer than 2.5 hours to get a good indication. I had instability problems in a recent build and ran memtest overnight with 0 errors. Still had the problems so I started replacing hardware items. Replaced the RAM (and changed nothing else) and BOOM system was just fine. Start with the RAM and good luck.

Also, since you switched from Nvidia to AMD GPU, have you done a reinstall of W7? Might be worth this to ensure all Nvidia drivers/components are off the PC and then install latest of CCC.
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
I don't think it's the PSU because a) it's well over the requirements and a reliable brand and b) it never fails during stress tests which tend to be more demanding than games.

I understand that you don't suspect your PSU, but it was the first thing that I thought of before I even entered this thread. You really need to get another PSU just to swap out with your existing one and confirm the existing one is good.

Get something local with a decent return policy so you can take it home, try it, decide if it makes a difference and return it if it does not.

Don't get a PSU from a retailer that requires you to lie or do anything unethical to return it like claim it doesn't work or something. There are plenty of sellers out there that have a "100% satisfaction guaranteed" type return policy with small restocking fees like 10% or 15%.

Do the ethical thing and pay the restocking fee if you find out you don't need the PSU, but definitely go to the trouble of nailing this down and confirming for sure that it is not your PSU because it really is number one on the suspect list at this moment in your debugging process.

Good luck.
 

Arkadrel

Diamond Member
Oct 19, 2010
3,681
2
0
Maybe it only happends during max load (both cpu+gpu) and not when you only do 1 stability test.

ei. it could be your PSU isnt taxed enough with just 1 test, but doing something thats demanding on both cpu/gpu, pushes it over the top.


It could also be that those stability tests, arnt run long enough to detect anything. Maybe giveing 0.25v more to your northbridge/cpu/ram ect, will make it stable enough not to crash.

It could be something intirely unrelated, say "software" wise (not nessarly drivers), but some conflict or something. Are you running a fresh install of windows?
 

chazdraves

Golden Member
May 10, 2002
1,122
0
0
ah heck, you guys really figure it's PSU, eh? I'm afraid I'm in a small enough town that I don't have a local retailer, but I can always order one off of Amazon if needed. Problem is that it's not consistent. For example, I just now updated to the latest drivers again and found that Space Marine (which played fine yesterday without CCC installed - I had removed it as a test) now crashes as soon as the menu loads. After that, I fired up Bad Company 2 and played it on the highest settings for over an hour just now without a single hiccup.

Here's the other thing I've noticed. Sometimes, when I load into Bad Company 2, I'll see menu items flickering - like texture flicker or an artifact issue. If I load the game this way, it's guaranteed to hard-lock my system within the first 5 minutes. On the other hand, if I notice the issue, close the program, and restart it, it'll play just fine for as long as I feel like playing. I believe I was noticing the same thing with Space Marine except that Space Marine gets into the 3D right away at the menu which causes the hard lock. Now that I've had Bad Company 2 running, I'm going to go back to the SM Demo and see if that locks as it did earlier.

It's like it's got to get it "out of its system" before it's good to go.

Is there a stress test I can run or both GPU/CPU to eliminate the PSU? Can I run Prime 95 and Furmark?

Thanks for all of the ideas, I'm willing to listen to anything intelligent right now - this is driving me nuts!
- Chaz
 

chazdraves

Golden Member
May 10, 2002
1,122
0
0
Okay, another update. I ran Prime 95 Blend Test AND Furmark 1080P Burn-In test at the same time for 27 minutes. CPU usage was consistently 100% and GPU was consistently 100% the whole time. GPU temps peaked at 80C and maintained mostly 77C after starting at 53C. This has to be more power than a game would ever consume. Even still, it had no issues at all - not a single artifact and Prime95 reported no warnings and no errors. I then loaded up the Space Marine demo (since the latest drivers seem to have broken it). I got past the menu and into the game but saw some issues. I restarted and tried again. This time, the graphical oddities were gone but it only lasted 6 minutes before (with no warning) the whole thing hard-locked. Again, TDR is still disabled, otherwise I assume it would have recovered. This was after being on for 2 hours and having played 1 hour of BC2 and 27 minutes stress testing with P95 and FurMark...

That kind of power draw should rule out the PSU, should it not? I still say software - maybe some odd setting that should be disabled or enabled? Maybe drivers? Maybe Windows?

- Chaz
 

Arkadrel

Diamond Member
Oct 19, 2010
3,681
2
0
That kind of power draw should rule out the PSU, should it not? I still say software - maybe some odd setting that should be disabled or enabled? Maybe drivers? Maybe Windows?

yea that should rule out the PSU.

Time to try reinstalling windows?
 

chazdraves

Golden Member
May 10, 2002
1,122
0
0
Ooof... That'd be a ah heck of a thing. Bearing in mind this is usually a work computer - there'd be a good deal to set up again. Any ideas to fix it sans reinstall? Any further testing ideas?

I just tried disabling PCI-e Spread Spectrum in the BIOS for a kick... Running out of ideas.

- Chaz
 

thilanliyan

Lifer
Jun 21, 2005
12,060
2,273
126
I had a CPU overclock that could pass all stability testing but once in a while in Dawn of War 2 it would crash, and DoW2 doesn't fully load the CPU (Phenom 2 X6).

Try upping the CPU volts a bit to see if that nixes the problem. If that fixes the problem and your CPU is at stock then it's likely a CPU problem. Could be memory related too.

Also, are you running the latest Catalyst drivers, and any other drivers? Updated motherboard BIOS?
 

chazdraves

Golden Member
May 10, 2002
1,122
0
0
I haven't touched the BIOS (for better or worse). Latest Catalyst drivers and everything else up-to-date.

Actually, I just disabled PCI-e Spread Spectrum and haven't had a crash since - so far... Any thoughts on that? Seems unlikely. I also re-enabled TDR to make the crashes less severe.

Gonna continue trying to find a lock-up now.

- Chaz
 

chazdraves

Golden Member
May 10, 2002
1,122
0
0
Going on 2 hours now without a crash - and I've run it hard. Could it really be something that senseless?

- Chaz
 

tigersty1e

Golden Member
Dec 13, 2004
1,963
0
76
Prime for an hour would hardly be called stable.

Prime blend test runs through a set of instructions that needs to run 12 hours for 1 full loop.

The instruction set tells the cpu how long each set is. Sometimes your cpu runs full... other tests use the l2 l3 cache back and forth and sometimes runs with ram.
 

chazdraves

Golden Member
May 10, 2002
1,122
0
0
I see your point, of course. I just mean relative to what you'd likely experience in CPU usage with a game which is what's causing the issues. I don't care if it can run Prime95 for 12 hours, I care if it can run Bad Company 2 for 2 hours, but I do know what you're getting at and you're right.

Anyhow, it's mostly been good since then. I did have one glitch that TDR recovered, but that's the only one in hours. What I noticed was some artifacts at the Space Marine main menu. I exited out and restarted it as has worked before but the artifacts remained. Ultimately, it crashed for a couple of seconds and then ran smooth for... well... up until now and going forward.

Could the PCI-e Spread Spectrum really be related?

- Chaz

Oh... and I will look into the latest BIOS update. I should get around to that, but it is only 4 months old.
 

chazdraves

Golden Member
May 10, 2002
1,122
0
0
I just wanted to post one more update, in case it should help anybody else as well as myself. Further testing has revealed/reminded me of one more complexity. Specifically, the longer I have my computer on, the more likely failures are to occur. In this instance, I've had my computer on for roughly 6 hours and most of that there has been a game or stress test running without issue. I even left Space Marine running for roughly 1.5 hours with no issues. When I came back to finish the demo, things performed as expected. I then decided to move to Bad Company 2 and immediately saw artifacts at the menu. As I've grown accustomed, I exited and restarted. Unfortunately, the artifacts persisted this time and the game force-closed. I opened again, it force-closed. I restarted the computer without allowing any cool down time, started BC2, saw the usual "first-run" artifacts, exited, restarted, everything's beautiful as if there was never a problem at all.

I guess the simple cure is to restart before playing anything and expect an error on the first run. This still seems like a software issue to me, but at least the understanding of how to manipulate it makes it very tolerable. If anyone has any further thoughts on what to try, I certainly haven't given up hope to fix it completely, but at least I can live with it as it is now, now that I understand it better.

Regards,
- Chaz
 

chazdraves

Golden Member
May 10, 2002
1,122
0
0
I realize this post is likely forgotten for anyone else, but in case it should help anyone in the future, I just wanted to post another update. More extensive testing proved that even my previous solution was not entirely stable. Specifically, I started noticing a number of crashes under Dawn of War II. So, I finally got the sense enough to dig into the BIOS and discovered it was a revision from 2010. There have been 3 patches since this revision most of which focused on stability. Needless to say, I flashed the BIOS and have been giving it another run-through, I also went to the ASUS website and grabbed all of the official drivers for my motherboard. So far, things seem mostly good. I did have one TDR error the first time I loaded DoW II, and later I had a full-on BSOD to a forced restart, but the BSOD is occasionally reported in DoW II and not a normal error for my machine - so I'm discounting it, I mean to say. I've put in another few hours since and have yet to see even so much as a TDR error.

Time will tell. Thanks to everyone for the advice. Should this cure my problem, I'll post one last time in hopes it might help someone else searching across Google.
- Chaz
 
Last edited:

darckhart

Senior member
Jul 6, 2004
517
2
81
hey thanks for posting this. i hope you have figured out what the culprit is. the annoying thing is the randomness of the crashes. same type of use, but this time no problems?! what?! i feel your pain since i'm having the same kind of problem (lockups or bsod) when gaming, but all other tasks would be fine (even cad, comsol, matlab). i've run linx at 25 passes to check cpu oc stability. check. i ran unigine heaven for 2 hrs at max settings. check. i ran them both together for 1.5 hr. check. i've updated bios and drivers, tweaked bios settings, swapped hardware, unplugged all unnecessary peripherals, etc. nope. disabled my cpu oc, and am waiting for the results as i use it now. *shrugs* anyway, good luck!
 

Plimogz

Senior member
Oct 3, 2009
678
0
71
edit: Hell, did you solve it by updating your MB BIOS? well, now I feel like an idiot. Ah well, happy to hear that you solved it. I leave my underwritten bad advice unchanged despite this edit, because I don't believe in hiding my idiocy. Which is pretty stupid in itself, I suppose :p

...

I say -- and yeah, this might sound random, but humour me -- I say RAM.

If you have more than one stick installed, pull all but one and try that. Switch in every stick one at a time, until you've either crashed your system with all of them, or isolated a faulty stick.

If you don't want to do that, run memtest+ all night long, taking the time to set it up to only loop test 5.

I hope this gives you results. 'Cuz otherwise you're fast running out of options which don't involve getting your hands on another PSU, reaching for your Windows install disc or, last but not least (or at least the least painful, so you probably should try it first :)) cleaning out those video drivers and reinstalling 'em while crossing your fingers and offering up a little prayer to a minor deity of your choice.

Oh, and assuming you have a good idea of what a minor voltage bump is, no harm in slightly increasing voltages across the board (CPU, NB, RAM) to see if perhaps that leads you down a promising diagnostic path.
 
Last edited:

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
I have not read through all the posts in this thread, but I was just experiencing artifacts and driver crashers over the weekend with my video card. I thought for sure one of the ram modules on the card went bad, as clocking down my GPU's ram was the only way to fix what was wrong. Lo' and behond nothing was wrong with my hardware, the video card just needed to be reseated in the PCI-E slot. After I took it out, examined it, and put it back into the computer, it works 100% again.
 

WMD

Senior member
Apr 13, 2011
476
0
0
The problem is most likely your video card. Furmark is now a terrible test of gpu stability. The driver's built in protection throttles the output so it does not load up the card anywhere near as well as in games.

Did you overclock? If you did, revert to stock clock. If you did not overclock then try underclock it 100mhz and test for stability in games. This will help confirm the problem with the card.
 
Last edited:

happy medium

Lifer
Jun 8, 2003
14,387
480
126
I have not read through all the posts in this thread, but I was just experiencing artifacts and driver crashers over the weekend with my video card. I thought for sure one of the ram modules on the card went bad, as clocking down my GPU's ram was the only way to fix what was wrong. Lo' and behond nothing was wrong with my hardware, the video card just needed to be reseated in the PCI-E slot. After I took it out, examined it, and put it back into the computer, it works 100% again.

Ha, this happend to me a few times. I figured out it was the dual 10inch subs next to the tower that was shaking the card loose. :):awe:
 

chazdraves

Golden Member
May 10, 2002
1,122
0
0
Well, I really have put a lot into figuring this out, but I'm afraid I still don't have it yet. Updating the Chipset drivers and all of the mobo drivers along with the new BIOS revision have helped reduce the frequency of issue noticeably. Specifically, today I think I had a good 4 hours into DoW II and 6.5 hours "on-time" before I experienced 5 TDR resets withing 5 minutes. A quick restart, one fast TDR at the main menu, and it was good again for hours.

This seems to be the pattern now (and previously). For some reason, the stability goes downhill at almost exactly 6 hours every time. I would think it a hardware problem except a reset always fixes the problem - that is to say that the hardware never gets a chance to cool down and yet still works fine.

For myself, I believe TDR is doing its job in my situation, but I also believe the problem is related to Windows. What I've read shows this to be very common in Win 7 and Vista also regardless of 64/32-bit. I've seen a lot of people fix it with new drivers, a lot of people fix it with a new PSU/GPU, and a good number of cases where RAM timing/bad RAM was the culprit. As is mentioned in almost every other forum discussing this exact problem, it seems we are all experiencing these TDR lockups as the result of entirely individual issues. Personally, I'm more convinced now that mine is a software issue. It's possible that a re-install might rectify the situation, but I don't fully believe that's the base of the problem (in my case). It's also possible it might be a RAM issue as the RAM I purchased (because I've never had to worry in the past so long as I bought a reputable brand) is not listed on the Guaranteed Compatibility lists at either Crucial or ASUS, but, again, it still doesn't quite feel right either.

Anyhow, I'm ---><--- this close to going back to a pre-fabbed laptop and forgetting about this rig, but I'll dig a little further. I think I've got it to a manageable point that I could live with, but it's not perfect, and I really don't think I'm going to find the issue. Moreover, there is a limit to how much time I can commit to such a silly problem, but I want to be certain the hardware isn't at fault should I choose to sell it off.

Well, that was a bit lengthy. I appreciate everyone's help and thoughts in this forum. For the sake of anyone enduring these TDR issues, please feel encouraged to continue using this thread to keep the discussion alive. I'll chime in again in another day or two with the latest on my end.

Regards,
- Chaz
 

chazdraves

Golden Member
May 10, 2002
1,122
0
0
Ha, this happend to me a few times. I figured out it was the dual 10inch subs next to the tower that was shaking the card loose. :):awe:

On a further note: that is awesome.

And I did try re-seating all of my RAM/GPU/CPU/PCI cards/HDD/etc. several times to no avail.

- Chaz

Further edit: 2 posts came in while I was typing my response. If you read the thread above, you'll see I've already switched from a GeForce 560Ti to a Sapphire Radeon 2GP 6950 and still have the exact problem. You're likely correct about Furmark, but I feel fairly confident ruling out the GPU in my case.
 
Last edited:

happy medium

Lifer
Jun 8, 2003
14,387
480
126
Have you used driver sweeper to clean out all Nvidia drivers and AMD drivers and start over? WHat version of direct X are you running, do you have the latest runtime installed?
If so reinstall it anyway.
Do you have the latest chipset drivers and Bios installed?

Have nothing in your computer overclocked, set bios on default.
Play some games and try that.

If that does not work ,I suggest a fresh windows install.

If that does not work, mabe it is your psu but I'm thinking its a faulty motherboard.
 

chazdraves

Golden Member
May 10, 2002
1,122
0
0
Well, I've decided to go straight over the deep end with this one. I've backed everything up and completely wiped my HDD to install Ubuntu 11.04. If the problem persists in Ubuntu, we can likely conclude it's hardware related; if the problem disappears, it's likely the issue was Windows/driver related. Of course, the problem is finding a decent game in Linux...

As for the previous fixes, the end result was a computer that almost always gave one quick TDR crash at game launch and then performed flawlessly for roughly 6 hours before it became unstable but could be completely rectified with a simple restart. Not bad by any stretch, but not fixed. This should really help me narrow it down.

Regards,
- Chaz