Help with sporatic BSODs

kentatihc

Junior Member
Aug 12, 2004
15
0
0
Hello,

This is in reference to my posting on the anandtech.com troubleshooting message board ( http://forums.anandtech.com/messageview.cfm?catid=32&threadid=1373231&enterthread=y )

To sum up, over the summer I my computer has been spontaneously rebooting at startup very infrequently (1-2 times a week, sometimes it would take 2 weeks). I have completely restored my computer, and I haven't installed much of anything to see what it did. I haven't received the "double fault" exception again, which was the one I was worried about, but I still get the occasional BSOD and reboot on startup.

Here is my story since the above posting:

The first time I restored the system and was installing my printer, I got a BSOD which automatically rebooted the machine, at that point Windows would BSOD while trying to start up every time with a variety of BSODs, including STOP 0x000000CD, STOP 0x00000050, and STOP 0x0000008E. I even got a STOP 0x0000007F Double Fault exception. I chose to "restore the last known good configuration of Windows" and was able to get back into Windows with no problems. I restored again, however, just to be sure.

I quit playing with the Windows Verifier because it seemed to make the system more unstable overall. I also removed the driver aspi32 because it is an unsigned driver that is not used in XP, and it is known to cause problems with XP. I installed all of the updates using Windows Update All of the updates were non-critical, such as the Windows Journal Viewer. I experienced a couple of lockups and BSODs last Saturday (8/14), I was using the Windows Verifier tool at the time, which seems to make the system more unstable. (I have quit using Verifier to check drivers).

Saturday (8/21) I received a BSOD for the first time since last Saturday(8/14). It was a STOP 0x00000050 Page_Fault_In_Non_Paged_Area.

I have been using my computer a lot over the last couple of weeks to try to produce these errors, it has been running at least 18 hours each day. The errors seem to only occur when the computer is first starting up, varying from immediately on startup, to 30 seconds after booting to the Windows desktop.

I have tried each of the Corsair 3200 LLPRO 512 memory modules individually in the computer to see if either of them is the problem, and I have now received a BSOD with each of them individually. I have been running the Memtest86 utility overnight for the last week as well with no errors ever.

I have no way of checking the MB or CPU, however. To sum up the errors / activity over the last couple of weeks:

Aug 11, 2004: BSOD, before the Windows Logo even gets on the screen at startup. It was a STOP 0x000000CD PAGE_FAULT_BEYOND_END_OF_ALLOCATION (8715b000, 00000000, 805204bb, 00000000)

Aug 11, 2004: I removed the aspi32 driver from the system, and set verifier to monitor the only other unverified driver on the system, symlcbrd.sys (This is a Norton Systemworks driver).

Aug 14, 2004 Computer booted up to windows, but whenever the mouse hovered over the task bar, there was an hourglass and I couldn't do anything. I could click on icons on the desktop, but then the computer hung. When I rebooted into Windows, I got a BSOD STOP 0x0000008E (0xC00000005, 0x80608EC5, 0xA9905980, 0x00000000)
I also had two lockups (but no errors) when booting to the desktop that day.
Finally, I removed on of the memory modules, then later that night I got a BSOD STOP 0x0000000A (0x80FFFFE8, 0x00000002m 0x00000001, 0x8051617B) when running a Norton Antivirus scan. This is the only time I have got a BSOD (Other than the two "Double Fault" exceptions that I received about month or two ago) that I got after booting into Windows. Because I had the Windows Verifier checking the Norton driver, I assumed that could have caused it, so I disabled Verifier and haven't used it since.

Aug 15, 2004 Ran Windows update and installed everything that they had there (All non-critical updates). Downloaded and updated my modem and sound drivers (All other drivers up-to-date). An interesting note here: The newest drivers for my sound card (Sound Blaster Audigy 2 ZS Platinum Pro) has added several Unsigned drivers to the system. (ctac32k.sys, ctaud2k.sys, ctoss2k.sys, ctprxy2k.sys, ctsfm2k.sys, emupia2k.sys, ha10kx2k.sys, and hap16v2k.sys)
I also swapped my memory modules

Aug 16, 2004 No errors, but when I shut the computer down once there was a program that was not responding, don't remember what the program was.

Aug 17, 2004 No errors, but when I shut down the computer the program ccApp.exe was not responding (This is a Norton file).

Aug 21, 2004 Received a BSOD once the computer loaded into Windows. STOP 0x00000050 (0xF7FA9AF7, 0x00000001, 0x8059E7D6, 0x00000000) Page_Fault_In_Non_Paged_Area.

One other interesting note is that when I first set up Norton after restoring each time, I get a message from Norton for 3-4 reboots after installing that something is attempting to change the Norton settings. The system scans free of viruses. I have downloaded and ran Stinger and I have ran the Norton online virus scan with no report of problems. I have also scanned the hard drive on a friends computer, and I have scanned all of the backup CDs that I have made over the last 3 months on my computer, my friends computer, and my computer at work and no virus has been detected.

Also, Windows XP SP2 is not yet availible to me, I just checked Windows Update.

The fact that the problem almost exclusively happens on startup and happens very sparsely suggests a driver problem to me, but the variation on the type of STOP error, and the "double fault" exceptions suggests a hardware problem. It seems to happen with either memory module, so I don't think it is RAM. Any ideas? I can provide some of the minidump files if needed.

Thanks in advance for any information that you can provide!

System Information:

Asus P4C800-E Deluxe Motherboard
Pentium 4 3.2 GHz Processor
PowerColor Radeon 9800XT Video Card
2 Corsair 3200 LLPRO 512 MB RAM
Sound Blaster Audigy 2 ZS Platinum Pro Sound Card
Antec TrueBlue 480 Watt Power Supply
2 Western Digital 200 GB 7200 HDD in RAID 0 configuration
Plextor PX-708a DVD-RW Drive
Broadxent V.92 PCI Faxmodem
Lite On LTR-52327S CD-RW Drive
Windows XP Professional SP1

Thank you again,

Kent
 

dclive

Elite Member
Oct 23, 2003
5,626
2
81
The other errors could be lots of things; check out my sig, read thru and do all those steps (particularly driver updates and anything Windows Update presents to you --- but it sounds like you've done this already ---), and if the problems persist *after* you've done the updates, run MPS Reports (also found in that URL) and mail me the minidumps and .CAB file.

(Mostly cut & pasted from your other post...) :)
 

kentatihc

Junior Member
Aug 12, 2004
15
0
0
Thank you for your reply dclive!

I followed your advice from your previous post (thanks again!), and I did email you the files that your troubleshooting post suggested on Saturday. Let me know if you did not get them and I will try sending them again.

Thanks for all of your help!

- Kent
 

dclive

Elite Member
Oct 23, 2003
5,626
2
81
Guilty as charged! My e-mail had big problems this weekend. I'm sorry for the inconvenience. If you sent anything, please re-send. The latest MPS Reports and the latest dumps (again, made *after* all the driver updates) are always appreciated.
 

kentatihc

Junior Member
Aug 12, 2004
15
0
0
I don't have the files with me at work, so I will send them first thing when I get home tonight. Thanks ;-)

- Kent
 

kentatihc

Junior Member
Aug 12, 2004
15
0
0
I will update the BIOS after posting this information. Unfortunately, I don't have the minidump from 8/14, I was experimenting with the settings and they weren't being created that day. I do have several other minidump files from these crashes, however. I will dig them up and get them to you!

Thanks for the info!

- Kent
 

dclive

Elite Member
Oct 23, 2003
5,626
2
81
Let's hold off - I'd really just like to see anything *after* all the other BIOS/driver/etc. changes were done....
 

kentatihc

Junior Member
Aug 12, 2004
15
0
0
I will wait until the next BSOD then. Just to add some background to the problem, I checked out all of the minidump files before I restored the computer and here are the dates / errors:

3/28/2004 1000008E (c00000005, bf9ddd18, a4872bec, 0) KERNEL_MODE_EXCEPTION_NOT_HANDLED

5/7/2004 1000008E (c00000005, 805386ce, 9efc3be0, 0) KERNEL_MODE_EXCEPTION_NOT_HANDLED

6/19/2004 1000008E (c00000005, 9ff742af, 8514f4a8, 0) KERNEL_MODE_EXCEPTION_NOT_HANDLED

6/20/2004 0000000A (00000003, 0000001c, 00000000, 804f5715) IRQL_NOT_LESS_OR_EQUAL

7/4/2004 0000000A (00000003, 0000001c, 00000000, 804f5715) IRQL_NOT_LESS_OR_EQUAL

7/9/2004 0000000A (00000003, 0000001c, 00000000, 804f5715) IRQL_NOT_LESS_OR_EQUAL

7/20/2004 0000007F (00000008, 80042000, 00000000, 00000000) UNEXPECTED_KERNEL_MODE_TRAP (DOUBLE FAULT)

7/23/2004 0000007F (00000008, f793bd70, 00000000, 00000000) UNEXPECTED_KERNEL_MODE_TRAP (DOUBLE FAULT)

7/25/2004 0000000A (00000003, 0000001c, 00000000, 804f5715) IRQL_NOT_LESS_OR_EQUAL

So it started off happening very rarely (more than a month apart), then more often. I find it interesting that the 6/20, 7/4, 7/9, and 7/25 0000000A errors have the exact same four parameters every time, but I don't know if that is relevant in any way. Also, the double fault exceptions happened within 3 days of each other (they were the only ones that happened after the computer had already started up and had been running for a while) and they have not yet reccured...

I will follow up with any other BSODs that I get and I will upgrade to SP2 when it is available to me. If the above provides any further clues, please let me know!

- Kent
 

dclive

Elite Member
Oct 23, 2003
5,626
2
81
Stop 7F can be a BIOS issue with an out-of-date BIOS, particularly with Pentium 4 machines with hyperthreading. http://support.microsoft.com/?id=842465 matches up fairly well.

It sounds like you're having a variety of issues.

I think the Adaptec ASPI32 problem is resolved. The 7F will probably be resolved with the BIOS update (I'm hoping). Otherwise, the CPU is typically damaged, overheating, etc. Let's put some more emphasis there - *typically*. I'm not saying there's any problem yet; right now it's just an educated guess.

Let's see what future dumps show.
 

kentatihc

Junior Member
Aug 12, 2004
15
0
0
I have downloaded the updated BIOS and will install it when I get home from work. I did check the temp and voltage of the MB and CPU for several days after I got the double-fault while running some looping FPS testing demos and the temperature peaked at approx 57 degrees celcius after about 14 hours of constant timedemos. Never did find a problem with temp, voltage, or fan activity.

Other than these occasional BSODs the system has ran smoothly otherwise. I will continue to note any abnormal activity and will post anything that I find.

Thank you for all of your time on this. Your help is greatly appreciated!
 

kentatihc

Junior Member
Aug 12, 2004
15
0
0
I installed the latest BIOS for my system (1017) two nights ago. Last night I ran Windows
Update because XP SP2 was available.

When I installed SP2 I got the error messages:

Update.exe Application Error
The instruction at 0x0112af90 referenced memory at 0xffffffff. The memory
could not be "read". Click OK to terminate the program.

Followed by:

Update.exe Application Error
The instuction at 0x77f581bd referenced memory at 0x08055800 the memory
could not be "written". Click OK to terminate the program.

Windows update shows the install status of SP2, it has "failed". When
looking at the details, it shows the error code of D0000005. When I reboot,
the computer comes up normally, and the System Properties shows that Windows
XP Professional Service Pack 2. I went to Add & Remove programs and removed
SP2, rebooted a couple of times, and everything was back to normal. Went ahead
and installed SP2 again and at the time it was backing up the registry, I got a BSOD
STOP 0x0000004E (0000008F, 0000F2EF, 0000B93F, 00000000) PFN_LIST_CORRUPT

This is the first time that I have seen this "flavor" of BSOD. I looked it up and it seems
to be caused by faulty RAM. I seem to get these BSODs with either memory module
installed. (I have a different memory module in than the one that was in on the BSOD
on 8/21).

Please let me know if you have any ideas... I will probably swap out my RAM again
and try to install XP SP2 again...

Thanks,

Kent
 

dclive

Elite Member
Oct 23, 2003
5,626
2
81
Sounds good. At this point, given the dumps and the wide spread in issues, it's starting to look like hardware. I don't see any software cause.
 

kylef

Golden Member
Jan 25, 2000
1,430
0
0
It could still theoretically be caused by a poorly written driver that is overwriting a kernel-mode buffer or writing over kernel mode code. Those manifest as a variety of BugCheck codes and typically point to everything BUT the actual culprit.

Have you tried disabling as many drivers as you can afford to disable and running with minimal hardware for a while?

The other possibility is random registry and/or pagefile corruption due to a slowly failing disk drive or faulty disk controller.

Honestly, the only way to work through truly random errors (those that don't point to any one specific device driver) is to remove or replace one component at a time until you've eliminated everything except the motherboard / disk controller / memory controller / etc.
 

kentatihc

Junior Member
Aug 12, 2004
15
0
0
Well, third time was the charm. I got XP SP2 to install sucessfully. I am thinking the problem is hardware related as well. If I haven't gotten BSODs with each memory module individually, I would suspect the RAM at this point. It seems to me that MB or CPU would be the place to start. Which piece of hardware would you start with?

Thanks,

Kent
 

kylef

Golden Member
Jan 25, 2000
1,430
0
0
Well, unfortunately, either way you're looking at a complete re-install.

I would not suspect the processor, to be honest. They rarely fail if they boot successfully, and generally they just freeze if something catastrophic fails. There are rumors going around about a specific microcode bug in certain P4 cores, but they manifest rarely.

I would certainly be suspicious of a memory problem, but you mentioned above that you ran MemTest86 for multiple hours without seeing a failure. MemTest86 has missed problems before, mind you, but a negative result from MemTest86 tends to suggest that we look elsewhere.

I would consider trying a different hard disk and replace the controller cable (especially if it's an old IDE ribbon cable). Disk corruption is terribly difficult to detect. Have you tried running chkdsk /f and restarting? Also, you can download an HD diagnostic utility from WD/Seagate/Maxtor and see if it can find any problems.

Otherwise, if I were you I would certainly try to disable as many drivers in device manager as possible and see if that helps. If so, then re-enable the drivers one by one until you find the one that seems to be causing trouble.

If you can't EVER make the system stable, then it may indeed be a motherboard problem.
 

dclive

Elite Member
Oct 23, 2003
5,626
2
81
Originally posted by: kylef
Well, unfortunately, either way you're looking at a complete re-install.

I would not suspect the processor, to be honest. They rarely fail if they boot successfully, and generally they just freeze if something catastrophic fails. There are rumors going around about a specific microcode bug in certain P4 cores, but they manifest rarely.

There's no rumor - it's fact. Search on STOP 7F and you'll see specific recommendations about microcode updates on Microsoft's site. It's more common on HT multiproc Xeon boxes, but it can happen (and cause issues) with other (usually HT) boxes. http://support.microsoft.com/default.aspx?scid=kb;en-us;842465&Product=winxp details this. It also shows the error he's gotten in the article. His BIOS also was out of date at the time of that error. No rumor...

I would certainly be suspicious of a memory problem, but you mentioned above that you ran MemTest86 for multiple hours without seeing a failure. MemTest86 has missed problems before, mind you, but a negative result from MemTest86 tends to suggest that we look elsewhere.

I would consider trying a different hard disk and replace the controller cable (especially if it's an old IDE ribbon cable). Disk corruption is terribly difficult to detect. Have you tried running chkdsk /f and restarting? Also, you can download an HD diagnostic utility from WD/Seagate/Maxtor and see if it can find any problems.

Otherwise, if I were you I would certainly try to disable as many drivers in device manager as possible and see if that helps. If so, then re-enable the drivers one by one until you find the one that seems to be causing trouble.

If you can't EVER make the system stable, then it may indeed be a motherboard problem.

All good suggestions, but I'd just uninstall the device in question for testing... (by that I mean I'd remove it, physically, or disable it in the BIOS of the machine if it's integrated.)
 

kentatihc

Junior Member
Aug 12, 2004
15
0
0
Thanks for the reply, kylef. When I restored earlier this month I didn't (and still haven't) installed several items that seem to have caused problems such as the printer or any burning software.

I agree about the possiblity of it being caused by a driver on the system (which is what the Microsoft Online Crash Analysis keeps saying when submitting the dump file), but I haven't been able to pinpoint it, and since it can wait up to a week or two between happening, it has been very difficult to troubleshoot.

But as far as disabling drivers and such, I haven't experimented around much with doing that. (I am assuming that you are talking about disabling them via msconfig? I am fairly new to all of this and am learning more than I ever wanted to know troubleshooting this issue!)

Would you suggest spending some time experimenting with msconfig before looking at replacing system components?

Thanks,

Kent
 

kentatihc

Junior Member
Aug 12, 2004
15
0
0
Thanks for all of the suggestions. I ran chkdsk /f and it didn't find any problems, no bytes in bad sectors. I will look for the hdd diagnostic software on Western Digitals' site and give that a try.

- Kent
 

dclive

Elite Member
Oct 23, 2003
5,626
2
81
Originally posted by: kylef
It could still theoretically be caused by a poorly written driver that is overwriting a kernel-mode buffer or writing over kernel mode code. Those manifest as a variety of BugCheck codes and typically point to everything BUT the actual culprit.

Have you tried disabling as many drivers as you can afford to disable and running with minimal hardware for a while?

The other possibility is random registry and/or pagefile corruption due to a slowly failing disk drive or faulty disk controller.

Honestly, the only way to work through truly random errors (those that don't point to any one specific device driver) is to remove or replace one component at a time until you've eliminated everything except the motherboard / disk controller / memory controller / etc.

Kyle,

Can you read these dumps and tell us what you see? The last three or four might point to something - let us know what you think....
 

kylef

Golden Member
Jan 25, 2000
1,430
0
0
Can you post the mini-dumps (should be about 64kb each) from those last few crashes somewhere? Or email them to me, kylefarlow@yahoo.com, and I'll take a look in the kernel debugger.
 

kentatihc

Junior Member
Aug 12, 2004
15
0
0
kylef, I have emailed you the minidump files that I have from 8/11, 8/21, and 8/26. If you would like any of the other minidump files that I posted about earlier in this thread, let me know.

I downloaded the hard drive diagnostics tool from Western Digital and ran both the standard and extended tests on both drives, neither reported any errors. I reseated the SATA cables on the hard drives, cd-rom drive, and dvd-rom drive. I will go and get new cables for them when I get a chance.

I am thinking of removing the sound card and dvd drive from my system and restoring again. If anybody has any other ideas, let me know!

- Kent
 

kylef

Golden Member
Jan 25, 2000
1,430
0
0
Well, I looked at each one of those dumps in the kernel debugger, and to the best of my determination you're looking at one of the following explanations (which you probably already knew):

1. Memory corruption due to some kind of hardware malfunction (RAM incompatibility with the motherboard, bad system RAM timings, etc). I would try to set everything to default... especially the RAM timings. Set them as slow as possible for a while.

2. Memory corruption due to a bad driver overwriting a kernel pool buffer. I would try to remove as many devices as possible from your system to see if the instability goes away. If it does, then add devices back one by one until you've found the culprit. Removing the sound card is a must, as are any other cards that are unnecessary to boot your computer.

Without a more extensive kernel memory dump, it is impossible to see how bad the memory corruption is and where it's located. But I don't have much experience looking at the full kernel memory dumps anyway, so I probably wouldn't be of much assistance there. Hopefully one of the two suggestions above might help...
 

kentatihc

Junior Member
Aug 12, 2004
15
0
0
First of all, thank you very much kylef and dclive for taking the time to look into this problem for me!
Your help has been greatly appreciated!

I am currently waiting to get a BSOD again now that XP SP2 is installed on the system. It has "only" been six days since the error occured, so I figure I will see it again in the next few days. Assuming that I do, I will remove the DVD drive, printer, sound card, and modem from the system, repartition the hard drives, and restore Windows to see if I can find a culprit. (Although it is going to be painful to not have a computer for the 2-4 weeks that I believe that it will take to do this =( )

I have never overclocked anything on the system, but I will look into slowing down the RAM memory timings in the BIOS. I don't know much about memory timings, so I will read up on them and give it a try.
I am sure that the memory that I have is compatible with the motherboard.

I am currently thinking that the problem is some kind of driver problem because I have got the BSODs on system startup 95% of the time, or when installing a driver (or updating to XP SP2). I have never had a problem while running a game (which seems to me should be much more intensive on the CPU, MB, RAM, video card, sounc card, etc.) Does that sound like a reasonable conclusion, or am I just rationalizing at this point?