• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Need help troubleshooting intermittent crashes/BSODs!

hybrid2d4x4

Junior Member
Hello all. A computer I built for a friend is acting up intermittently (but often enough to be bothersome). The obvious symptoms are background app/process crashes usually right after boot, and BSODs (esp while shutting down - the "Windows is Shutting Down" screen). Basically it's a lottery whenever you power it on. Sometimes everything works perfect, other times you get odd app behavior and crashes and you have to try again. Before I go on, here are the specs:

Gigabyte P55M-UD2 (F8 BIOS- latest as of a week ago)
i5 750 @ stock speeds, auto voltage; cooling via CoolerMaster 212+
2x2 G.Skill Ripjaws rated for 1333MHz 8-8-8-21@1.5V but running 9-9-9-24@1.5V
ATI 4770
WD 160GB SATA (old)
LG IDE CD burner (old)
Antec 300 case /w Antec EarthWatts 380W PSU
Temps are really good: idle- high 20s for CPU, ~45 GPU; load- high 40s CPU, low-mid 50s GPU /w vRAM+VRMs in the 60s. Ambient= 18-20degC


Currently running Win7 HP x64, but the issues occurred on the previously installed WinXP OS, so I'm leaning towards hardware issues.
I spent most of one entire day doing HDD diags including full surface tests, mostly using WD's Data Lifeguard utilities, and there was not a single error.
I ran the Windows Memory Diagnostic (extended test, 10 loops) and it was going fine for ~8hrs until I plugged my ipod into the USB to charge and it did a hard reboot. Reseated RAM,etc at this point.
I've been running Memtest86+ v 4.00 for the past few days, but it's been inconsistent. Initially running both sticks threw me errors within 3 hours. Running one stick at a time was error-free for 3.5-4hrs for both sticks. Running both threw errors, especially when trying to run 2 stick in single-channel mode (in fact it didn't even detect the 2nd stick when running them in the 2 adjacent middle slots slots, #'d 1&4). Tried reseating CPU and ran P95 torture test to make sure heatsink seating is ok (for temps). Temps were great, but one thread failed in ~1hr.
Went back to running Memtest, and it went for a lil over 8hrs on both sticks without errors before I called it a night. The next day I let it go at it again and it survived over 20hrs in Memtest without any errors.

This inconsistent behavior in benchmarks/diags is pissing me off. Other than testing with known-good RAM (which I'll hopefully be able to do if a friend "trades" me their RAM for a few days), I'm starting to run out of ideas. It looks to me like it could be the RAM, mobo or CPU, but can't really tell which. Anyone have any further ideas or any CPU-specific benchmarks/diags to run (preferably outside an OS environment)? Any help would be appreciated...
 
Antec EarthWatts 380W PSU

I'm thinking this may be a little on the light side for this computer. Booting up is usually a pretty high load situation as everything is spinning up all at once. If you have access to a higher rated PS try that out in it and see if it helps. I haven't used anything less than 450 in over 5 years. In Anands review of that board they used a 750 war supply in their testbed. Not saying it needs that much but 380 just seems a bit light to me for running a processor of that caliber. The behaviour you are describing would dovetail with power related instability so i would definately explore that as a possible cause.
 
Hey Hybrid, I think you are on the right track. I have had a bad processor before (P4 2.8c), and it took 6 months of insanity to find the problem. Unfortunately, it is practically impossible to isolate which piece of hardware is the problem by means of software. I do not understand why good diagnostic software has not been created and made available by more hardware engineering companies.

I was able to borrow memory and a video card from a friend in the very next college dorm room, but he wasn't really willing to let me switch CPUs (they were compatible) which would have done it. I spent a ton of time on the phone with Kingson (their customer service is insanely good - HyperX series customers get escalated straight to T3 tech support), RMAed my Asus MB, etc. I finally noticed that the crashes occured less often when I turned off HT, then I payed some local shop $50 to switch the CPU and mess around a bit and see if the problem remained.

Anyways, I just checked again, and there still doesn't seem to be any CPU testing software. You could run the OCCT PSU test for a while, which should reliable crash the computer if it is a power issue (not the most likely). Otherwise you're pretty much stuck with borrowing or buying some more memory to switch and test with.

I think Memtest is reliable, as in it will almost always produce errors with bad memory, and never false posisitives. Same with the WD diagnostics, if they all passed you can surely rule out the HDD. Sucks that it isn't even your system, but that comes with doing work for other people.

I don't know what you're using to test/monitor your GPU core temp with, but I doubt it stays in the low 50s under real full load. I have a Sapphire HD4770 with 'Arctic Cooling' and it gets up to 68 at stock. I'm actually having problems with my computer since I put it in, still trying to figure out whether it is video memory problem or drivers or what (I might have damaged the MB when I installed it!).
 
I'm thinking this may be a little on the light side for this computer. Booting up is usually a pretty high load situation as everything is spinning up all at once. If you have access to a higher rated PS try that out in it and see if it helps.

Thanks for the input, though it never has problems powering up/POSTing which is the high load situation you describe. I have a Watt-meter and while I haven't tested this system's draw, based on the draw of my 2 systems, I'd guess it idles at around 65-85W and peaks between 150 and 200W, so 380W should provide more than enough headroom. The thing that makes me slightly suspect the PSU (or case grounding?) is the situation where plugging in a USB device caused a hard reboot.


Thanks for the insights, Athadeus. I'll play around with BIOS settings s/a those with the Turbo boosts, VT and all the "extra" features and see if that makes any difference. It'd be nice if I could find a diagnostic tool that gave me consistent results though, so I'd know if the changes I make actually make a difference...
Athadeus said:
I don't know what you're using to test/monitor your GPU core temp with, but I doubt it stays in the low 50s under real full load. I have a Sapphire HD4770 with 'Arctic Cooling' and it gets up to 68 at stock. I'm actually having problems with my computer since I put it in, still trying to figure out whether it is video memory problem or drivers or what (I might have damaged the MB when I installed it!).
The GPU temp is what shows up in GPU-z, Speedfan, and (i think) RealTemp. The 4770 is an XFX model with the dual-slot cooler/blower fan (like on reference 4870s) that exhausts outside the case. It's not the quietest, but is effective at keeping temps down. I also flashed the VGA BIOS with a custom temp/fan speed lookup table, so it's specifically geared to ramp up more progressively than the stock setup where the fan stays at 30% until the temp hits a certain threshold (either 50 or 55 at the core) then jumps to 56%. Case has very good airflow as well. Though to be fair, I took the measurements while running the FarCry2 benchmarks in loops, not something unrealistically stressful like RTHDR(or whatever it was called) or Furmark.

By the way, these intermittent issues have been going on pretty much for 6 months as well. Also, probably unrelated, the ethernet on the mobo hasn't worked since day 1. If I can narrow down the instability to CPU or MOBO by means of testing with known-good RAM, do you guys think that we'd be able to get the MOBO replaced under warranty for the definite Ethernet issue + potential cause of system instability?
 
Seriously you built that system for your friend and gave it to him with an inoperable NIC. What were you thinking. Didn't that make you think there might be a problem with the motherboard. Why would you think that the fact the NIC has never worked since day one would be unrelated.
 
Last edited:
Back
Top