BSOD Galore cant get a stable system

Deathcharge

Member
May 6, 2005
76
0
0
My 5 months old rig that was rock solid is giving me endless grief. It was overclocked for a while to start of with until I started getting the dreaded BSOD (it was running at 2.5 1:1 stock voltage).

I asked for help on this forum and after playing around a bit I decided to drop everything back to stock and not OC for the time being.

I installed a fresh copy of XP and left it running for over a day unattended with very little CPU activity. I came back and it had a BSOD. I rebooted and windows would not boot because of a ?missing? file (C:\WINXP\system32\config\system).

Tried repairing windows but that didn?t fix it (same error message with the missing system file), tried reinstalling it but that created an endless loop where the computer would post but not boot from the HD, restart post not boot, restart etc suspect it was a HD boot sector problem.

I ended up quick formatting and reinstalling which BSODed so I did a format (not quick format) and reinstalled windows.

Finally managed to get it going running with the bare minimum drivers (Marvel lan driver, latest karjan sound driver from DFI and latest gforce video drivers 84.21). Also installed firefox and prime 95.

Did memtest for 1 hour at 200MHZ which was fine (memtested 2 weeks ago for 10 hours after the initial BSOD at 250MHZ with no errors), ran prime 95 (first test which is mainly CPU) which stopped after 2 hours 40 mins reporting a hardware problem.

Left the machine running at stock without any CPU load and it BSOD after less than 24 hours, tried priming again and it does roughly 2 hours 40 mins on test 1 and only 40 mins on test 2, a message appears saying hardware failure rounding = 0.5 expected less than 0.4.

I just want it to run at stock everything no OC but I cant get it stable enough. I am not sure what it is, but I am convinced that there is a problem somewhere. It is looking less likely that it is a software bug and I am suspecting hardware malfunction but I am not sure what to blame. It could be anything so any advice would be great.

My specs are

DFI LANPARTY UT nF4 SLI-DR eXpert [BIOS 12/07/2005]
Opteron 146 CACJE
Thermalright XP-90 with 92mm Panaflo fan
Crucial Ballistix PC4000 2GB @3-4-4-8 -- BL2KIT12864Z503
256MB Leadtek VIVO 7800GT
Antec SLK3000B
Seagate Barracuda 7200.8 250GB
BenQ's DW1650
ENERMAX 535watt ATX 12V V2.01/EPS 12V
 

Tarrant64

Diamond Member
Sep 20, 2004
3,203
0
76
This does indeed sound like hardware failure. I would argue that it may just be the memory. You've memtested it but just to be sure you could try running 1 stick at a time, and see if there are any changes. This could also be a faulty power supply I think. It's hard to really test this unless you have an extra one powerful enough to test it. I have a DFI board(but for skt. 754) and it sometimes had issues with memory when i was OC'ing. I was thinking maybe the hard drive kept getting OC'ed too much, but the DFI boards have agp/pci locks, so that's not the case either.

Hmm...Remove everything but the essentials, RAM, VIDEO, CPU, MOBO and get it started and see if it will run stable with that. Add on new parts one by one and if it fails, then it could be PSU problem. Hope this helps somewhat. Maybe some other people have more recommendations.

good luck!
 

LW07

Golden Member
Feb 16, 2006
1,537
2
81
Maybe downclock the ram. I got Bsods, and I downclocked my ram from PC3200 to PC2700 and now it's rock solid stable. No BSODs in 2 months. So, I recommend that you try that.
 

LOUISSSSS

Diamond Member
Dec 5, 2005
8,770
54
91
i feel you for your problems. i used to own a DFI lanparty board too, they are a poor second tier mb company, thats why everyone has compatibility issues and instability problems. i'm 98% sure you should either return the mb if u can, sell it, or RMA is the least u can do, or just buy another one if you can. i'd recommend ASUS, the only mb with bare minimal hard ware compatibility problems or instability problems, which DFI is full of. dfi-street.com for SOME information and to console with those other 1023910239 users with DFI mb problems, but going there didn't solve my problems at all.

if u dont wanna change hardware, take out all your parts and redo it to ensure you did everything correctly. reformat hard drive(s). reset the cmos for 24 hours. then reinstall windows and do testing using only lan drivers, mb drivers, and gpu drivers that came in the cd. test each individually: prime95 for 8+ hours. either memtest86+ boot disk for 10+ hours or a windows based memtest for 8+ hours (google). and superpi for 5+hours.

GL.
 

loafbred

Senior member
May 7, 2000
836
58
91
Here's my advice for isolating the cause:

Make a CD or DVD with "UBCD". link to UBCD image. NOT the Windows version. If you don't have software capable of burning an ISO image to CD, have a friend or helpful enemy do it for you.

Completely disconnect power and data cables from all drives, except for the optical drive that you intend to boot from.

Use only one dimm of memory. Check the motherboard manual to determine which slot is the best to use for one dimm.

In BIOS, disable onboard LAN, sound, gameport. Set boot sequence to boot from the optical drive, and disable all other drive entries in the boot sequence.

Leave the side off the case, and point a big fan into it, blowing plenty of cool air.

After booting from UBCD disk, run Prime95 (I think it's called Mersenne Prime on the disk menu). While it's under load, use a multitester to check 12v and 5v readings from a molex connector (if possible). If it's stable for eight hours or more, test the other memory dimm alone.

If it's stable without the HDD's attached, add one drive at a time, and turn on devices in BIOS until the problem reappears. Then try changing power supply connectors around, keeping fans on separate leads from the HDD's. You may just be putting too much load on one or two leads, or getting signal noise from one component to another (I used to hear about this occasionally, but not recently).
 

Deathcharge

Member
May 6, 2005
76
0
0
Originally posted by: loafbred

After booting from UBCD disk, run Prime95 (I think it's called Mersenne Prime on the disk menu). While it's under load, use a multitester to check 12v and 5v readings from a molex connector (if possible). If it's stable for eight hours or more, test the other memory dimm alone.

thanks for the detailed advice, i am getting a multimeter on monday to measure the 12v and 5 v but how do i identify them? and do i do that while the pc is under load or idling? does it matter?

i reinstalled windows and left out the nivida drivers for onboard nic (using the marvel instead) and it has been running for about 32 hours now with no crash (no load though just idling)

i still cant prime as it fails after 45 mins :(
 

Ctrackstar126

Senior member
Jul 14, 2005
988
0
76
I know you said minium drivers but i know some NF4 boards have problem with nvidias firewall.. I had an evga nf-41 and it was giving me problems because of the nvidia firewall. I disabled the firewall and its been as good as a sleeping baby. This may or may not pertain to u but u had the problem i had so i figured some input was needed.
 

Slammy1

Platinum Member
Apr 8, 2003
2,112
0
76
You definitely might be seeing a CPU failure. Have you look at the SMART status of your HDD? Without knowing much on your o/c I couldn't guess at what might have been done. If you're Memtest stable and Prime not stable you've identified the butler in your PC mystery. I don't see how driver issues could cause Prime errors unless it's BIOS related.
 

DrMrLordX

Lifer
Apr 27, 2000
22,126
11,814
136
Originally posted by: Deathcharge


i still cant prime as it fails after 45 mins :(

Which Prime95 test are you running? Small FFTs, Large FFTS, or mixed?
 

Deathcharge

Member
May 6, 2005
76
0
0
Originally posted by: Slammy1
You definitely might be seeing a CPU failure. Have you look at the SMART status of your HDD? Without knowing much on your o/c I couldn't guess at what might have been done. If you're Memtest stable and Prime not stable you've identified the butler in your PC mystery. I don't see how driver issues could cause Prime errors unless it's BIOS related.

yes I checked the Seagate website and used their web based utility to run the 3 diagnostic tests they offer all came back OK.

the OC was staright forward increase the FSB to 250 MHZ (from 200) my memory is rated for 250 MHZ operation so no need for dividers and everything was solid at stock voltages.

i dont think it is driver related either, and i have updated the bios to the latest nonbeta bios available
 

Deathcharge

Member
May 6, 2005
76
0
0
Originally posted by: DrMrLordX
Originally posted by: Deathcharge


i still cant prime as it fails after 45 mins :(

Which Prime95 test are you running? Small FFTs, Large FFTS, or mixed?

i am running all of the them, the middle one (max heat whatever its called) only runs for 45 mins the rest run for 2 hours and 40 mins
 

Deathcharge

Member
May 6, 2005
76
0
0
Originally posted by: Ctrackstar126
I know you said minium drivers but i know some NF4 boards have problem with nvidias firewall.. I had an evga nf-41 and it was giving me problems because of the nvidia firewall. I disabled the firewall and its been as good as a sleeping baby. This may or may not pertain to u but u had the problem i had so i figured some input was needed.

i read this quite a bit on google so did a clean install and didnt install any of the nforce drivers including the onboard nic and the firewall software it seems a bit more stable now but still cant prime
 

DrMrLordX

Lifer
Apr 27, 2000
22,126
11,814
136
Originally posted by: Deathcharge

i am running all of the them, the middle one (max heat whatever its called) only runs for 45 mins the rest run for 2 hours and 40 mins

Okay, Large FFTs is the "max heat" test. Since you're failing that one the most quickly, I'm going to make a leap of faith here and assume that you've got some kind of RAM problem. If it was Small FFTs you failed most quickly, I'd finger the CPU.

It still might be a CPU failure, but I'll be damned if I've ever seen an unstable CPU fail more quickly on Large FFTs than Small FFTs.

Anyway, you tried swapping out RAM yet? put in a single stick of cheap DDR400 or something and see if you can Prime with that.
 

Slammy1

Platinum Member
Apr 8, 2003
2,112
0
76
I'd agree w/ DML, but going from rock solid to BSODs indicates a failure in component and memory is generally pretty sturdy to stress. Memtest stability is an indication, not proof of correct RAM operation. You're running the advanced Memtest? How are your temperatures in Prime? You might be seeing OS problems in the Prime test.
 

Deathcharge

Member
May 6, 2005
76
0
0
Originally posted by: DrMrLordX


Anyway, you tried swapping out RAM yet? put in a single stick of cheap DDR400 or something and see if you can Prime with that.

well i tried the UBCD as pointed out in an earlier post and selected the prime 95 option it starts running and fails after less than 2 mins (twice)

then tried booting into windows and i get the windows logo (starting windows with the progress bar) and i get BSOD (IRQL_NOT_EQUAL_OR_LESS)

restart boot into windows everything is fine, take out a stick (ie running with one stick only) and prime for 4 hours on large FFTs that was stable no probs there.

just took that stick out and swapped it with the other one and started priming an hour ago no probs so far usually this test would have failed after 45 mins

i am thinking that either there is a mobo problem (ie the memory slot on the mother board that is used when running dual channel) or perhaps a memory problem when running dual channel (not sure if the latter is a possibility but you never know)
 

Deathcharge

Member
May 6, 2005
76
0
0
Originally posted by: Slammy1
You're running the advanced Memtest? How are your temperatures in Prime? You might be seeing OS problems in the Prime test.

um not sure if it is the advanced memory test, with the DFI eXpert board memory test is in the bios so you just enable it there and restart it will load memtest and start testing

i havent touched the advanced options and wouldnt know how to set it up so if you can help that would be great

Temps is another funny one the DFI eXpert board suffers from a temperature bug where it under reports temps by up to 15C my idle temps are around 22-24 and load temps at 30-31 as reported by MBM 5 with the settings from DFI streets

as for the OS problems in prime, i get the same error message "hardware failure" when booting from the UBCD and priming from there without booting into windows
 

seanp789

Senior member
Oct 17, 2001
374
0
0
I'm friend had the same problem after OC'ing an Opteron 165. he couldnt get stable at stock... turned out to be the CPU itself. Try testing the CPU in another rig.
 

Deathcharge

Member
May 6, 2005
76
0
0
Originally posted by: seanp789
I'm friend had the same problem after OC'ing an Opteron 165. he couldnt get stable at stock... turned out to be the CPU itself. Try testing the CPU in another rig.

A bit hard to do unfortunately since non of my friends own a 939 Mobo :(

I am not sure if hardware hsops offer a testing service or should i just RMA it and hope for the best which is replacement otherwise they will send it back after testing it and they will say it is working correctly
 

DrMrLordX

Lifer
Apr 27, 2000
22,126
11,814
136
Originally posted by: Deathcharge

just took that stick out and swapped it with the other one and started priming an hour ago no probs so far usually this test would have failed after 45 mins

i am thinking that either there is a mobo problem (ie the memory slot on the mother board that is used when running dual channel) or perhaps a memory problem when running dual channel (not sure if the latter is a possibility but you never know)

Based on those results, it appears that both your DIMMs are still good. It would kind of help if you had a third DIMM for testing, but if not, I understand.

It seems to me that your CPU's on-die memory controller may have become damaged or simply over-sensitive. That's just a guess. There's so little logic on s939 motherboards pertaining to memory access that I highly doubt the board could become damaged enough to make dual-channel operation unstable during overclocking.
 

Bobthelost

Diamond Member
Dec 1, 2005
4,360
0
0
Originally posted by: Deathcharge
Originally posted by: seanp789
I'm friend had the same problem after OC'ing an Opteron 165. he couldnt get stable at stock... turned out to be the CPU itself. Try testing the CPU in another rig.

A bit hard to do unfortunately since non of my friends own a 939 Mobo :(

I am not sure if hardware hsops offer a testing service or should i just RMA it and hope for the best which is replacement otherwise they will send it back after testing it and they will say it is working correctly

Wander down to your local shop and ask nicely. They are hardly going to beat you to death for it. (This advice does not count if your local computer store is run by an odd looking bloke with a tic and a bloodstained baseball bat)

Since you OC'd the chip you voided the warranty, as such trying to RMA it would be fraud. Annoying but true.
 

Deathcharge

Member
May 6, 2005
76
0
0
OK i am making progress here thanks for all the responses i memtested it over night on stock everything and found 9 errors all in test no. 8

http://img69.imageshack.us/my.php?image=img21168wk.jpg

Does that mean it is my RAM? I tested in dual channel configuration, so i guess the next step is to test the DIMMs separately or should I just RMA them? they memory was never OCed as they are rated for DDR500 operation and they are failing at DDR400 or could the memtest error indicate other failure?
 

loafbred

Senior member
May 7, 2000
836
58
91
Originally posted by: Deathcharge

thanks for the detailed advice, i am getting a multimeter on monday to measure the 12v and 5 v but how do i identify them? and do i do that while the pc is under load or idling? does it matter?

The yellow will be +12v and the red is +5v. You can use either black wire as ground. Test it at idle as well as under load. Readings should only move a little under load.


 

DrMrLordX

Lifer
Apr 27, 2000
22,126
11,814
136
Originally posted by: Deathcharge
OK i am making progress here thanks for all the responses i memtested it over night on stock everything and found 9 errors all in test no. 8

http://img69.imageshack.us/my.php?image=img21168wk.jpg

Does that mean it is my RAM? I tested in dual channel configuration, so i guess the next step is to test the DIMMs separately or should I just RMA them? they memory was never OCed as they are rated for DDR500 operation and they are failing at DDR400 or could the memtest error indicate other failure?

Memtesting them individually would be a good idea. If they fail in dual-channel config but do not fail in single-channel config, it's safe to assume it's not the DIMMs but the on-die memory controller.

If the DIMMS do not pass in single-channel mode, then whether or not you RMA the DIMMS is between you and the memory manufacturer.

However, you should not RMA the CPU under any circumstance.

 

Slammy1

Platinum Member
Apr 8, 2003
2,112
0
76
Originally posted by: DrMrLordX

However, you should not RMA the CPU under any circumstance.

Sage advice, it's like people who think stealing from insurance companies is a victimless crime.

You might try bumping the VDIMM and rerunning Memtest if you pass on both sticks individually. I know on my old 3500 HyperX to run their stated stock I have to up the voltages to get stable in DC. Still, stable to unstable implies component failure; so that's probably not going to solve the issue.

Please don't take it bad if I say it brusquely, I've done worse things to my PCs, but you were o/c'ing without proper monitoring of temperatures running Prime95 loops; you put the PC at risk specifically the CPU. I've fried cards running OOS PCI frequencies, so don't feel bad.