System unstable under load: MB to blame?

dbr1

Member
Jan 23, 2011
53
18
81
Asus Z87 Pro
Intel 4770
GSkillz DDR3 1600 32 GB
Corsair RM650
Asus GTX 660Ti (latest drivers)
Crucial MX100 SSD
Win 7 64

Has become extremely unstable.

BSOD/freezes/crash to startup intermittently but especially under load.

My troubleshooting:

1) Memtest64 with every stick of memory and all together, all checked out.
2) Switched out PSUs with one from a stable system, same problem. Also tested with a PSU test device: pass.
3) New hard drive, fresh install of Windows with new drivers installed, same problem
4) BIOS update to most recent version, same problem
5) Ran Aida64 Extreme stress test and the system crashes within about 3 seconds of hitting 100% CPU load. CPU temperature indicating about 53C when the BSOD hits.
(Idle CPU temps about 28C)
6) Ran StressLinux and seemed OK, max temps on CPU cores at about 61C


Crash dump analysis:
IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.


What is the problem? If a driver is the problem, which one? All drivers installed from Asus.

Is the motherboard itself a problem? I think it must be, because all the other components have been swapped into other systems and work OK.

If it is the MB, how can I prove it definitely?

Getting a little crazy here.Thinking of just buying a new MB and see if that fixes it...
 

Burpo

Diamond Member
Sep 10, 2013
4,223
473
126
Doubtful it's the motherboard.. Most likely it is a driver.. Just because you used Asus drivers doesn't mean they're correct or current. In order to find the crashing address you would need to be versed in debugging, and finding the violation ( or find & change the offending drivers). IRQL_NOT_LESS_OR_EQUAL is a common crash that is seen often in the field and can be a result of a violation of the IRQL rules used by the system. Buying a new motherboard shouldn't be the answer.
 
Last edited:

BonzaiDuck

Lifer
Jun 30, 2004
16,336
1,890
126
Asus Z87 Pro
Intel 4770
GSkillz DDR3 1600 32 GB
Corsair RM650
Asus GTX 660Ti (latest drivers)
Crucial MX100 SSD
Win 7 64

Has become extremely unstable.

BSOD/freezes/crash to startup intermittently but especially under load.

My troubleshooting:

1) Memtest64 with every stick of memory and all together, all checked out.
2) Switched out PSUs with one from a stable system, same problem. Also tested with a PSU test device: pass.
3) New hard drive, fresh install of Windows with new drivers installed, same problem
4) BIOS update to most recent version, same problem
5) Ran Aida64 Extreme stress test and the system crashes within about 3 seconds of hitting 100% CPU load. CPU temperature indicating about 53C when the BSOD hits.
(Idle CPU temps about 28C)
6) Ran StressLinux and seemed OK, max temps on CPU cores at about 61C


Crash dump analysis:
IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.


What is the problem? If a driver is the problem, which one? All drivers installed from Asus.

Is the motherboard itself a problem? I think it must be, because all the other components have been swapped into other systems and work OK.

If it is the MB, how can I prove it definitely?

Getting a little crazy here.Thinking of just buying a new MB and see if that fixes it...

Don't "jump the gun" on the motherboard just yet.

1) Are you overclocking the processor?
2) Does Aida-64 verify the stock RAM settings? Or did you overclock/tweak the RAM -- speed, timings, voltage, et al before you tested it the first time? Or after? Are you sure the RAM is running at the speed for the intended BIOS settings?

3) [IMPORTANT] What is the BSOD stop code? You didn't likely take the time to write it down, since you don't mention it. THEREFORE: Go to Ctrl Panel->Administrative Tools->Event Viewer. "Critical" Events should be summarized at the top of listings under the top summary of "Administrative Events." Trace back to Event ID 41. You should also have -- after reboot -- a bug-trace item associated with the critical stop or BSOD. You should be able to find the stop-code as an "0x . . 0000. . . nn" and the last three digits are generally all that is needed to identify the code.

4) Make an inventory of all hardware items and their drivers. Collect the LATEST driver updates from the hardware vendor web-site and put them on a thumb disk.

5) Begin uninstalling and reinstalling all the hardware drivers one at a time.

6) Re-evaluate. This might not be the end of your troubles.

You may want to check for OS corruption -- Maybe better before you undertake the steps I outlined. Run a full CHKDSK on your system/boot drive, and from Windows run SFC /SCANNOW from the CMD window under administrative privileges or "run as administrator."

You may want to begin turning off ("Disable") any hardware features on the motherboard that you don't use at the moment, or which you can reconfigure easily so that they can be turned off.

Also consider (as opposed to imperative of step 4) an earlier version of the video driver. Otherwise, the drivers are the first thing to checklist and eliminate as causes.
 

dbr1

Member
Jan 23, 2011
53
18
81
1) Are you overclocking the processor?

No.

2) Does Aida-64 verify the stock RAM settings? Or did you overclock/tweak the RAM -- speed, timings, voltage, et al before you tested it the first time? Or after? Are you sure the RAM is running at the speed for the intended BIOS settings?

Verified, right now RAM is at stock at 1333. It is rated to 1600 with an XMP profile but I am not using that now, figured to keep it simple.


3) [IMPORTANT] What is the BSOD stop code? You didn't likely take the time to write it down, since you don't mention it. THEREFORE: Go to Ctrl Panel->Administrative Tools->Event Viewer. "Critical" Events should be summarized at the top of listings under the top summary of "Administrative Events." Trace back to Event ID 41. You should also have -- after reboot -- a bug-trace item associated with the critical stop or BSOD. You should be able to find the stop-code as an "0x . . 0000. . . nn" and the last three digits are generally all that is needed to identify the code.

Slightly confused. In Event Viewer there are multiple critical events, with Event ID 41 Task Category (63). Some have this information:

+ System - Provider [ Name] Microsoft-Windows-Kernel-Power [ Guid] {331C3B3A-2005-44C2-AC5E-77220C37D6B4}
EventID 41 Version 2 Level 1 Task 63 Opcode 0 Keywords 0x8000000000000002 - TimeCreated [ SystemTime] 2014-10-19T17:41:59.382808500Z
EventRecordID 2453 Correlation - Execution [ ProcessID] 4 [ ThreadID] 8
Channel System Computer 4770-PC - Security [ UserID] S-1-5-18

- EventData

BugcheckCode 26

BugcheckParameter1 0x41201

BugcheckParameter2 0xfffff680003781c0

BugcheckParameter3 0x7c3001017a74d025

BugcheckParameter4 0xfffffa8008480d00


and

+ System - Provider [ Name] Microsoft-Windows-Kernel-Power [ Guid] {331C3B3A-2005-44C2-AC5E-77220C37D6B4}
EventID 41 Version 2 Level 1 Task 63 Opcode 0 Keywords 0x8000000000000002 - TimeCreated [ SystemTime] 2014-10-20T01:55:36.664824900Z
EventRecordID 5429 Correlation - Execution [ ProcessID] 4 [ ThreadID] 8
Channel System Computer 4770-PC - Security [ UserID] S-1-5-18

- EventData

BugcheckCode 270

BugcheckParameter1 0x1f

BugcheckParameter2 0xfffff8a00cad9980

BugcheckParameter3 0x0

BugcheckParameter4 0x4db0a

SleepInProgress false

PowerButtonTimestamp 0

etc. Not sure if this is the useful info you were talking about.


4) Make an inventory of all hardware items and their drivers. Collect the LATEST driver updates from the hardware vendor web-site and put them on a thumb disk.

In progress...with the rest of your suggestions.

Thanks, will update.

I will not click 'Buy' on any new MBs yet!
 

BonzaiDuck

Lifer
Jun 30, 2004
16,336
1,890
126
See if you can find a (red) error Event ID 1001, likely time-stamped in the same window of time that includes an Event ID 41 (critical) error.

The description in the "General" tab should say "The computer has rebooted from a bugcheck. The bugcheck was 0xNNNNNNNN where N is a hex digit [0 . . . 9, A, B, C, D, E, F]. That should be the stop code for the BSOD.
 

Burpo

Diamond Member
Sep 10, 2013
4,223
473
126
Check Device manager for more than one audio device. If so disable audio drivers (one at a time) and check for crash..
 

redzo

Senior member
Nov 21, 2007
547
5
81
1) Memtest64 with every stick of memory and all together, all checked out.

Sometimes you have to run "memtest86+" for days in order to identify memory(RAM) corruption/malfunction. Memtest86(the open source one or the commercial one) is the only reliable way to test RAM.
Just use the pc with only one RAM module installed! If you don't experience BSODs anymore, you'll know where to look next.
 

dbr1

Member
Jan 23, 2011
53
18
81
See if you can find a (red) error Event ID 1001, likely time-stamped in the same window of time that includes an Event ID 41 (critical) error.

The description in the "General" tab should say "The computer has rebooted from a bugcheck. The bugcheck was 0xNNNNNNNN where N is a hex digit [0 . . . 9, A, B, C, D, E, F]. That should be the stop code for the BSOD.

I found the following Error events with Event ID 1001:

The computer has rebooted from a bugcheck. The bugcheck was: 0x0000000a (0x00005a800b499c08, 0x0000000000000002, 0x0000000000000001, 0xfffff800030f9061). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: .

The computer has rebooted from a bugcheck. The bugcheck was: 0x0000003b (0x00000000c0000005, 0xfffff80002eefb8a, 0xfffff8800b4497a0, 0x0000000000000000). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: .

The computer has rebooted from a bugcheck. The bugcheck was: 0x0000001a (0x0000000000005003, 0xfffff70001080000, 0x00000000000095f3, 0x000000002fd08009). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: .

The computer has rebooted from a bugcheck. The bugcheck was: 0x000000d1 (0xfffff88016a0a208, 0x0000000000000007, 0x0000000000000000, 0xfffff8800f1e1804). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 101914-19422-01.

The computer has rebooted from a bugcheck. The bugcheck was: 0x0000010e (0x000000000000001f, 0xfffff8a00cad9980, 0x0000000000000000, 0x000000000004db0a). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: .

The computer has rebooted from a bugcheck. The bugcheck was: 0x0000001a (0x0000000000041201, 0xfffff680003781c0, 0x7c3001017a74d025, 0xfffffa8008480d00). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: .

Your computer was not assigned an address from the network (by the DHCP Server) for the Network Card with network address 0x240A641D496A. The following error occurred: 0x79. Your computer will continue to try and obtain an address on its own from the network address (DHCP) server.

So it looks like the Stop codes are:

0000000a
0000003b
0000001a
000000d1
0000010e
0000001a

6 different ones, does that mean that these 6 crashes occurring over the last couple of days are all from different causes?

Where do you go from here with this data?
 

Burpo

Diamond Member
Sep 10, 2013
4,223
473
126
If you don't want to try disabling drivers one at a time, then a fresh install of Windows is in order. Something isn't right with your Windows configuration. Have you tried booting to "Last known good configuration"?
 

Ketchup

Elite Member
Sep 1, 2002
14,559
248
106
Do you have any Firwire drivers/hardware on this computer?

Are you using the SATA connections connected to the Asmedia controller for anything?

Are you using the wifi card that comes with the motherboard?
 

dbr1

Member
Jan 23, 2011
53
18
81
Do you have any Firwire drivers/hardware on this computer?

No

Are you using the SATA connections connected to the Asmedia controller for anything?

No

Are you using the wifi card that comes with the motherboard?

Is installed/available, but not connected.
 

dbr1

Member
Jan 23, 2011
53
18
81
If you don't want to try disabling drivers one at a time, then a fresh install of Windows is in order. Something isn't right with your Windows configuration. Have you tried booting to "Last known good configuration"?

Have disabled all audio devices except one.

Will now go ahead and disable all devices that are not essential and see if that solves it.

Update: disabled all unnecessary devices, still crashes.
 
Last edited:

dbr1

Member
Jan 23, 2011
53
18
81
If you don't want to try disabling drivers one at a time, then a fresh install of Windows is in order. Something isn't right with your Windows configuration. Have you tried booting to "Last known good configuration"?


This is a fresh install after the last one had the same problems.
 

Burpo

Diamond Member
Sep 10, 2013
4,223
473
126
And if you start in safe mode does it crash? If not, it's a driver.. Start over & don't load any drivers. Add them one at a time & find which one is causing the problem.
 

dbr1

Member
Jan 23, 2011
53
18
81
And if you start in safe mode does it crash? If not, it's a driver.. Start over & don't load any drivers. Add them one at a time & find which one is causing the problem.

Just tried a test run of Aida64 'System Stability Test' in safe mode.

Crashed after about 3 minutes.

What does this tell us?
 

Burpo

Diamond Member
Sep 10, 2013
4,223
473
126
Somethings not right?.. lol.. It may be the motherboard.. Did you try the CPU in a different board?
 
Last edited:

dbr1

Member
Jan 23, 2011
53
18
81
Sometimes you have to run "memtest86+" for days in order to identify memory(RAM) corruption/malfunction. Memtest86(the open source one or the commercial one) is the only reliable way to test RAM.
Just use the pc with only one RAM module installed! If you don't experience BSODs anymore, you'll know where to look next.

I did run with only one RAM module at a time, made no difference.
 

dbr1

Member
Jan 23, 2011
53
18
81
Somethings not right?.. lol.. It may be the motherboard.. Did you try the CPU in a different board?

No, didn't try CPU yet in a new board, don't have a spare one around, maybe just buy a new one anyway.

Question is, if it is the motherboard, is there a way to prove it?

Other than to buy a new version of exact same motherboard and drop it in with exact same configuration and see if it works? (Maybe this isn't a bad plan- could just return it if it fails too...)
 

Ketchup

Elite Member
Sep 1, 2002
14,559
248
106
Have you tested it with the CPU graphics only (video card and its drivers removed)?
 

dbr1

Member
Jan 23, 2011
53
18
81
Problem apparently diagnosed!!! (but not solved exactly)

I took the whole system and plugged it into an outlet on a different circuit right next to the house main service panel. Has been running Aida64 extreme stress test for about 3 hours now at 100%CPU utilization. Previous record was only a few minutes before crash.

So the problem apparently has nothing to do with the MB, Windows install or anything else, I guess, but an issue of the power supply TO the power supply.

Any ideas how to fix this? Would a UPS help? Is this a problem with voltage fluctuations or what?


UPDATE: Crashed after about 3 hours of Aida64 stability test. Still got to figure that the power quality from the outlet is the problem, right? Went from crashing in a few minutes with that stess test to running it for hours.
 
Last edited:

redzo

Senior member
Nov 21, 2007
547
5
81
If you suspect the psu, it's important to test with another one. It's the only way that you can tell if that's the cause. I would not advice you to buy components at this point in time because you do not know the cause of the issue and it will be a financial loss.
If you still cannot find a spare psu, I'm thinking that you can at least try to minimize the psu load by removing the dedicated gpu(660ti) and using the onboard one.
 

dbr1

Member
Jan 23, 2011
53
18
81
If you suspect the psu, it's important to test with another one. It's the only way that you can tell if that's the cause. I would not advice you to buy components at this point in time because you do not know the cause of the issue and it will be a financial loss.
If you still cannot find a spare psu, I'm thinking that you can at least try to minimize the psu load by removing the dedicated gpu(660ti) and using the onboard one.

No, I think that it is not the PSU I think that the problem has to do with the power outlet that I'm plugging in to- see my last post.

I have in fact switched PSUs and it made no difference.