Looking for double check: strange crashes, thinking I have PSU or Mobo issue...

praktik

Member
Jan 23, 2009
40
0
0
Hey Anandtech! Looking for some expert fellow enthusiasts to give me a double check on my situation and confirm I am following a good methodology to solve a very strange crashing issue which to me, feels like a PSU issue.

Setup:
OS: Win 8.1 x64
CPU: i7 4790k running stock
Mobo: Gigabyte Z97X-UD5H-BLK rev 1.0
Ram: 16 gigs running stock @ 2.4ghx
SSD: Intel 520 256GB
GPUs: 2xEVGA Titan SC in SLI
PSU: ~4 year old Corsair AX1200 (bought early after line was introduced)

Situation:
1) Came back from honeymoon super excited to play Civ: Beyond Earth - updated with new game-ready drivers for Nvidia which gave me a BSOD and OS recovery tools could not resolve inability to boot Windows. Here I think I had coincidental driver crash corrupt my OS - fine I thought - let's reinstall!
2) Reinstall OS as UEFI and keep it bare bones, Windows update only, Nvidia driver, Civ: Beyond earth
3) Have an odd crash here and there - and only ~20 mins into first session of game
4) Wanted to reinstall again - ended up in loop unable to install windows due to UEFI issues - as I was installed on legacy for longest time with no issues reinstalled again completely legacy
5) Event viewer - can never see a faulting application cause the crash in either Application or System.

Testing results -

Given fresh reinstalls and crashes even with stable, older versions of Nvidia driver (or no driver), concluding i have a fault somewhere in the hardware chain.

1) DOS bootable tests:
- Memory - Can run windows memory tester and dos-booted memtest x86 for 12 hours+ with no errors reported and no crashes Conclusion: Memory ok!
- CPU - when booted in a DOS tester using linpack and a few other utilities which I expect also run linpack like algorythms - NO PROBLEMS - can run for hours Conclusion: CPU ok!
2) Windows tests:
- Prime95 - crash happens <5mins with no errors in GUI and on reboot, none found in "results" file
- OCCT - intense GPU testing maxing out cards can run for an hour with no issues. Power supply testing with CPU running and GPUs going with draw of ~800watts (just under capacity of my APC battery backup - never hit this even when gaming intensely on the TITANs) and this ran for an hour until i stopped it manually
- OCCT CPU Linpack in windows - can run for an hour no errors no issues

Conclusions + Next steps (where I need your help!)

1) Latest NVIDIA driver release is problematic - see issues in a lot of places and even affected 980s in SLI for a recent HardOCP article
2) Memory+CPU are likely ok - or would see errors in DOS booted tests
3) Prime95 - can see many with similar setups running this test with no issues - i think fact this crashes within a few mins my whole computer makes me think there is a real HW fault somewhere

However it is very strange that I get no errors in the Prime95 test (increasing confidence in my CPU?) and that OCCT tests never generate errors or crashes.

Problem source candidates: PSU or Mobo - could have flaky 12A rail or flaky delivery of that rail through the mobo
4) Don't have backup PSU of sufficient wattage so will buy a new PSU next week and keep it handy for t-shooting if it turns out I dont have a PSU issue or replace my corsair AX1200 if it turns out its an issue.
5)If new PSU also has same issues occurring will seek out another mobo and do a whole mobo replacement - REALLY don't want to do that given the task and how long it will take

Questions for group:

Am I on the right track here? Can anyone offer rationale for how Prime95 could flush out an issue no other test can? Is it that it taxes the CPU in a way thats more intense than Linpack based tests which could stress the 12A rail?

I guess I am confused cause the OCCT "power supply" test should be drawing a lot from the 12A rail and I dont get a crash within mins - that thing ran for an hour with no problem!

There is something that Prime95 is exposing that nothing else is - though I did get that crash in-game with Civ aftwer 20 mins and a few odd ones outta nowhere.
 
Last edited:

Ketchup

Elite Member
Sep 1, 2002
14,559
248
106
The Prime95 crash is a little concerning. I wonder if it is running out of power of overheating.

One test I can think of would be to run the test with one or no video cards, see if it crashes, see what your temps are like.

Also, have you updated the board to the F6 BIOS, which is catered toward the K chips?
 

praktik

Member
Jan 23, 2009
40
0
0
Yes it is F6!

I can try with one or the other GPU just unsure if I can do video from the CPU with the i4790k, i'm always discrete so never cared much if I could (like with my wife's i5 in our HTPC I know I can run video there)
 

Burpo

Diamond Member
Sep 10, 2013
4,223
473
126
Never a mention of temps.. Have you had a look at CPU fan?
 

praktik

Member
Jan 23, 2009
40
0
0
Cause temps are pretty good especially at stock. The OCCT tests push it to 70-80 range, Prime doesn't get a chance to heat it up past that point before crash starts.

Have a Noctua D-14 and its all in a well ventilated Silverstone TJ11

I am thinking a heat issue is unlikely unless i'm getting reported wrong temps
 

Ketchup

Elite Member
Sep 1, 2002
14,559
248
106
I don't have a Noctua D-14, but 80 seems pretty high on stock clocks.
 

praktik

Member
Jan 23, 2009
40
0
0
Looks well in range from what I see on other similar platforms. That said, stays hotter and for longer on OCCT without crashing so should I really be worrying that much about temp?
 

praktik

Member
Jan 23, 2009
40
0
0
Hey so booted up Aida 64 and did fresh Prime95 run and this was temp when computer shut off (no BSOD, just straight to OFF and come back, like hitting reset button and as per all other Prime95 crashes)

Temps:

CPU 72
°C CPU Package 80
°C CPU IA Cores 80
°C CPU GT Cores 50
°C CPU Core #1 80
°C CPU Core #2 77
°C CPU Core #3 80
°C CPU Core #4 74

As i understand it these chips are good up to 95 before they start throttling and in unsafe zone. Nowhere near that with these...
 

Ketchup

Elite Member
Sep 1, 2002
14,559
248
106
Looks well in range from what I see on other similar platforms. That said, stays hotter and for longer on OCCT without crashing so should I really be worrying that much about temp?

You are pretty close to throttling temps with an aftermarket cooler running the chip at stock speed. So yeah, it will take hotter, I am just concerned about the numbers you are seeing with the equipment you have.

Next thing that drew my attention was your memory. Try dropping it from 2400 MHz to 1866 or 1600 at stock timings and see if that makes any difference.
 

praktik

Member
Jan 23, 2009
40
0
0
Ya probably wont be able to push this one much - feel my application job during install shoulda fit the bill - got a good quality one in there.

One day may even want to pursue the idea of taking off the spreader as many do with these CPUs to get at the thermal interface underneath.

But at moment I'm thinking thermally I should be cool enough so that I can worry about doing that when I have the luxury of an otherwise stable system....

So will likely go ahead with the PSU route though will check all cabling and PSU extensions for any signs of issues in one last inspection just in case. I would imagine a loose cable though would cause issues more severe than mine.

Another hour of OCCT passed without incident, temps maxing out 2-3 degrees above the reported temps during Prime95
 

Ketchup

Elite Member
Sep 1, 2002
14,559
248
106
You are pretty close to throttling temps with an aftermarket cooler running the chip at stock speed. So yeah, it will take hotter, I am just concerned about the numbers you are seeing with the equipment you have.

And I am going to back that off somewhat. I am looking at my numbers on that test and they were only a couple degrees below yours (77-76-75-68), while I would hope yours would have done a little better, probably nothing to worry about. However I would like to hear if the lower memory speed helps at all.
 

praktik

Member
Jan 23, 2009
40
0
0
Can confirm the following:
- ram is 1.65v required and this is setting in BIOS
- changed to 1600mhz and crash in Prime95 still happened after only a few mins with exact same symptoms as before

Was hoping that would do it cause I'm looking at some amount of annoying work swapping PSU....
 

praktik

Member
Jan 23, 2009
40
0
0
ARGH!

So have an EVGA 1200 Supernova P2 - 10 year warranty and rock solid platform and cheap price were all drawing points. Getting into my mid-30s now so having an extra PSU around is not as much of a pain in my wallet as it may have been a decade ago.

So that's what I have now! My orig Corsair Ax1200 is now a troubleshooting backup PSU..;)

Replaced the PSU - have some slightly cleaner routing now and reduced cables - but Prime still crashed within the typical 3-4 mins.

I have to think I am experiencing some kind of mobo issue - given the CPU tests passing 6 ways from sunday in all other types except in windows w/ Prime 95 (dont have a boot Prime I can use to confirm crash there but other bootable tests run forever with no problem).

So looking for advice, just a brief recap of process so far:

1) Mem tests run in windows and in boot run forever with no issues
2) same with all CPU tests in boot (stress test options in a bootable CD using linpack (sp?) type stuff from the looks of it, OCCT in windows runs fine)
3) Running loads of 3-850 watts in a variety of these tests caused no crashes in OCCT - Prime draws far less (just over 300). I am looking at screen readout of my APC backup to get these figures.
4) CPU temps are not near throttling - mid70s-80 @ load and have reached higher temps in OCCT than in Prime95 before the crash.
5) Stressing GPUs at 100% each can go for an hour before I get bored and stop it manually...
6) Disabled one GPU and still same behaviour - Prime crashes comp in 2-4 mins
7) Same behaviour replicated through 3 windows 8.1 reinstalls - last two I did *nothing* except install testing and diagnostic software before going through the tests. No change.

My next steps - looking straight at the mobo. I believe now my problem must either be mobo (strong suspicion) or CPU with weird issue that is invoked only under the Prime condition (weak suspicion).

Both would be under warranty - but wondering now about sticking with Gigabyte or not...

So:

1) Can pursue warranty replacement and wait for shipping from gigabyte - see if replacement mobo does it (maybe they'll make me send this one first)

2) purchase alternative mobo (considering Asus Z97-WS as a bit of an upgrade - the PLX chip is more attractive as I realize the more limited PCI lanes and not seeing latency is real issue any more)

Either way - any other people can offer a culprit here or ever experienced similar bizarre issue? I am leaning towards 2), if it fixes issue seek redress with gigabyte, maybe I can return the board for another product, or get a replacement from them I could use (i guess) as another backup mobo to test with if I ever, god forbid, have another crappy problem like this while I enjoy the Asus!
 
Last edited:

praktik

Member
Jan 23, 2009
40
0
0
Might grab a mobo on the way home tonight... anyone else with a tip I should try short of mobo replacement?
 

inachu

Platinum Member
Aug 22, 2014
2,387
2
41
Wait you are overclocking it and calling it a bad motherboard?

Try running the pc at stock native speeds and see how it runs.
 

praktik

Member
Jan 23, 2009
40
0
0
Ya its stock... Been stock through all of this.... Had the tiniest mild over clock I rolled back immediately as first step in Tshooting
 

praktik

Member
Jan 23, 2009
40
0
0
Ok! Have Asus maximum formula now to swap in - thinking this should resolve things will report back tomorrow
 

praktik

Member
Jan 23, 2009
40
0
0
So far so good - much smoother experience than with the gigabyte, Asus definitely had a more refined product. All is well and prime is not failing at all with the new motherboard.

An expensive bit of trouble shooting this was one weird issue I guess I have to chalk it up to a flaky gigabyte motherboard (when I took the CPU out all the pins looked fine too.) was hoping for more with the "black edition" being more tested and guaranteed, but I suppose anything can happen!