Hard lockups on a new AMD setup

lament

Senior member
Feb 17, 2004
345
0
76
Hey guys,

I come to the experts to help diagnose a problem.

Sorry if this is long: I want to be thorough.

I'm an Intel guy that just switched to AMD. I bought a 3200+ Venice, an Asus A8N-SLI Premium, and a XFX 7800 GT. I also had to buy a new PSU: an Enermax EG565P-VE.

In the system from my previous Intel setup is a Plextor DVDR, 2x512MB Kingston HyperX, 2x256MB Corsair XMS, my Creative Soundblaster Audigy 2 ZS Gamer, my SiG ATA/133 add-in controller, a new 300GB Seagate HD (system drive), a 160GB WD HD, and a refurbished 160GB Seagate HD. I also have an APC Back-UPS 1000 XS UPS.

The problems started from the beginning: I have a gold version of XP that I slipstreamed XP2 into. I got to the end of the install and it needed the XPSP2 CD, which I don't have and it shouldn't have asked for. After skipping the files, it booted and the Quick Launch wouldn't show up, and there were other issues. argh..

so I figure I'll just install XP as normal again. I pop in the XP CD, and it won't start the setup. It gives me an error saying a file is missing. After googling the error on my fiancé's laptop, MS suggests it's a memory issue. So i remove the 2x256MB Corsair, and suddenly I can boot and start the setup.

Setup goes great, I do all the updates, and everything is fine. That was on September 1.

In the last couple of days, a couple things have been happening.

First, my BIOS is reporting that my CHA1_FAN (which is my front intake fan) has a low RPM or isn't functioning. Looking at it, it's running. So it must be slow. I've been getting a warning popup from the APC software about not being able to communicate with the UPS. It will pop up in the systray for about 5 seconds, then go away. It would happen 2-3 times an hour.

I go into the APC software, run the tests, everything checks out OK. I move the USB cable from one spot in the back to the other, yet I'm still getting that popup.

Now, 2 mornings ago, I wake up and go to the computer. I leave it on all the time, but have the monitor set to shut off after 20 mins (everything else stays on). I move the mouse, and nothing happens. The computer is on - I can hear the fans. It's not in sleep or hibernate mode, and the monitor is not detecting a signal. Power light is on but no HD LED activity. So I restart.

After about 20 minutes, I get a hard lockup. HD LED light is off, can't CTRL-ALT-DEL, nothing. I reboot. I fire up Asus Probe and nvidia's nTune to check temps - everything is normal. Twenty minutes later, I'm on the 'net, I'm playing Winamp and bam - again, a hard lockup.

I get pissed and I need to go to lunch, so I decide to let Winamp run on repeat, go take a shower and head out to lunch.

I come back and it's still up and running (i was thinking maybe I hit a bad sector on the HD and was getting those lockups, since I've read reports of the new .8 Seagates having failure problems).

All throughout the day, it's fine, with the occasional UPS warning. I disabled Q-Fan control to avoid that CHA1_FAN warning in the BIOS.

Then I'm playing Battlefield 2 tonight, and I get the hard lockup. I check the temps, and they're fine.

So.. wtf is wrong with my system? The obvious answer is the UPS, but I did not have these APC error message popups with my previous system. I am about to ditch the UPS for a while and see if I get these lockups.

In the meantime, are there known compatibility issues with my setup that I should be aware of?

Here's CPU-Z's report on my stuff, and I've included a couple screenshots of the current temps.

http://s95408209.onlinehome.us/cpuz.htm

I was considering a BIOS flash to the latest, but it doesn't look like much was fixed from the previous BIOS - just added some more processors and updated the nvidia LAN that I'm not using (I'm using the Marvell one instead).

Any help would be appreciated and sorry for the long-winded post.
 

stevty2889

Diamond Member
Dec 13, 2003
7,036
8
81
1. CHA1_FAN is probably a low RPM fan. There is probably a setting in your bios to disable the fan monitoring, or lower the warning threshold.

2. When using 4 sticks of ram, you'll need to run at 2T, it's better to just use 2 sticks of ram with an A64. If you need more than 1 gig, it's best to get 2x1gb sticks. Even though you were able to install windows using the 2x512mb sticks, you should probably still run memtest overnight, as sometimes mothboards can be picky about ram. It could be perfectly fine in one computer, but have compatibity issues in others.

3. Try using the onboard audio for a while, Creative has been known to have some of the worst drivers that can cause all kinds of strange problems.

4. Make sure you have the most up to date chipset drivers, and did you ever put service pack 2 on? If not that could cause some of your USB communication issues, but not entierly sure on that one.

5. If you are using the Nvidia based LAN, I had a lot of problems with mine, I would get lock ups everytime I accessed the internet. I just disabled the nvidia lan, since my motherboard also has a marvel gigabit lan as well.

You might want to check to see if there is a newer BIOS available for your motherboard, it looks like you are using 1.06, and up to 1.09 is available.
 

lament

Senior member
Feb 17, 2004
345
0
76
Originally posted by: stevty2889
1. CHA1_FAN is probably a low RPM fan. There is probably a setting in your bios to disable the fan monitoring, or lower the warning threshold.

as i posted, i disabled Q-Fan control and that stopped the warning.

2. When using 4 sticks of ram, you'll need to run at 2T, it's better to just use 2 sticks of ram with an A64. If you need more than 1 gig, it's best to get 2x1gb sticks. Even though you were able to install windows using the 2x512mb sticks, you should probably still run memtest overnight, as sometimes mothboards can be picky about ram. It could be perfectly fine in one computer, but have compatibity issues in others.

ack don't tell me that. I have another 2x512MB of Kingston on the way. :(

3. Try using the onboard audio for a while, Creative has been known to have some of the worst drivers that can cause all kinds of strange problems.

I will try that if removing the UPS from my system entirely as I have just done doesn't solve the problem.

4. Make sure you have the most up to date chipset drivers, and did you ever put service pack 2 on? If not that could cause some of your USB communication issues, but not entierly sure on that one.

actually, that was another problem I forgot to mention: after installing the 6.66 nforce drivers, USB 2.0 wasn't detected. I had to manually update and point to the 6.66 folder. i just checked the 2 back USB 2.0 ports that the UPS was plugged into, and they're fine (plugged a flash drive in there just now).

5. If you are using the Nvidia based LAN, I had a lot of problems with mine, I would get lock ups everytime I accessed the internet. I just disabled the nvidia lan, since my motherboard also has a marvel gigabit lan as well.

no I'm using the Marvell LAN and I've disabled the nvarmor thing.

You might want to check to see if there is a newer BIOS available for your motherboard, it looks like you are using 1.06, and up to 1.09 is available.

yeah I'm using 1006, but here's what's new in 1007 (the latest one):

1. Support new CPUs. Please refer to our website at: http://support.asus.com/cpusupport/cpusupport.aspx

2. Update nVidia onboard Lan PXE ROM to V215.0503

but yeah maybe there are some minor fixes that they haven't documented.

Thanks for the quick response (and up so late like me.. unless you're not in the US). :)
 

BeakerChem

Senior member
May 11, 2005
219
0
0
ACK! You have the same problem I do, and with the same motherboard, sounds card and similar GPU!

My story and what I have tried and not succeeded with:

Fresh install of the following system: AMD X2 4800 retail, 2 GB on 2x1GB OCZ pc3200 ks (CAS 2-3-2-5-t1, running at 2.5-3-3-6-t2 at the moment although it runs fine at the lower specs, just troubleshooting and it defaults to the second timings), SoundBlaster 2ZS gamer ed, BFG 7800GTX GPU, Maxtor Maxline III 300GB HD, ASUS A8N-SLI Premium, Enermax 600WNoisetaker PSU.

Symptoms:

After a fressh install of windows XP OEM (which went perfect for me at least) I updated BOIS to 1006, and started installing drivers to update the sound card, monitor (Viewsonic VP191b, 8ms), vid card (7.7.4.0 I believe at the time) and installed Norton Firewalls and Antivirus.

Right away I started getting irregular errors of not finding files every so often, and then of files being corrupted. Traced this down to the use of the NVIDIA SW IDE drivers. When they were in, errors, when not - no file issues. <shrug> My HD is on a SATA plug anyway, so I am not sure why this seemed to mater.

Second issue began when I started running Battlefield 2. I would get video lockups during play, which would usually go to a black screen. I would ocassionally hear the game still running, but no video and could not ctl-alt-del, couldn't alt-tab, and had to hard reset to fix. Temps didn't seem high (47 CPU, 38 MB, 75 GPU).

So I updated the heck out of it:

MB drivers to 6.66
GPU to 7.7.7.7
SB ZS 2 to May 2005 drivers with Nov 2004 EAX 4.0 (The latest non-betas I could find)
Laest monitor drivers
Flashed the BIOS to 1007

Still locked in BF2 in about 10-30 minutes.

So I pulled down some testing software.

Sissoft Sandra found no errors in benching the CPU, cache, memory, file system
memchecker ran to 1000% several times (at both CAS settings) with no errors
Prime95 ran 3 hours+ (and off and on for 1Hr here and there) with no errors

And then! 3dmark05 reset the sysem on the third pass.

Ran again while monitoring temps (Through the display properties tab and a thermocouple) and got another reset while the temps were not high enough to justify it (75 GPU, 46 CPU, 38 MB - which is my typical load temps it seems). Ran again and got a video lock - JUST LIKE BF2!

So now I am thinking it might be the video card. I try Doom3 Demo. With High settings and everything turned on feature wise (and native 1280x1024 res) I was again getting periodic locks. But the interesting thing here is that my temps were down. (GPU 68, 42 CPU, 35MB) so I don't think its directly a heat problem. It doesn't seem to be a heat issue on the CPU/MB/memory because I got to those temps (- the GPU side) running Prime95 with no locks.

I do not have a UPS, but I run an APC pure voltage regulator to even out my voltages. Asus probe has not seen any voltage deviations during the lockups.


So, thinking Video card and having the latest drivers for everything I could think of, I called BFG with their 24/7 - 265 customer service. They had me try using two molex connectors instead of one of the two SLI plugs from the power supply. I tried disabling Nvidia Display Services through the Services tab, I tried setting the soundcard drivers in dircetx (9.0c) to basic. (Both have been used to help BF2 freeze-up issues. (The SLI plug is rated at 18A available, so I didn't think it was a power issue) It didn't help anything. So now I am RMAing the vid card back to them (as of I am sending it today) so we'll see if that fixes the issue.

One difference I am seeing compared to the OP is that I haven't had problems outside of graphically intensive programs (3dmark05 and games), but I also havrn't been online through a lan connection. I am stuck on dial-up for the next couple of months unfortunately.

If anyone has any ideas for me as well, I'll be glad to try them. It sounds like the OP and I have the same problem though, so perhaps comparing hardware setups will help.
 

aGreenAgent

Senior member
Apr 25, 2005
274
0
0
Take out the UPS and soundcard.

Make sure you have the latest BIOS and drivers.

Make sure only the 1GB kingston is in the computer, memory-wise.

Try it.

If that doesn't work, try it with only 1 stick of RAM.
 

BeakerChem

Senior member
May 11, 2005
219
0
0
Would it be the same thing to disable the sound card in Hardware settings? Or do you have to physically remove it?
 

lament

Senior member
Feb 17, 2004
345
0
76
good morning...

well the system didn't lockup as I was sleeping, so that's good. I think I'll download Prime95 and run that. I've never used it before, but I guess it couldn't hurt to stress the system a little.

the prime95 site is down at the moment..
 

BeakerChem

Senior member
May 11, 2005
219
0
0
You may have to run 2 instances if you aren't getting high CPU usage. And make sure to put the program on high (10) priority so that it actually gives the system a workout. It bogs everything else down, but it does the trick of testing.
 

lament

Senior member
Feb 17, 2004
345
0
76
well Prime95's servers are down and it won't let me run it.

I'm running SiSoftware Sandra right now, 10 runs on low priority while I'm at work. we'll see how it does.
 

BeakerChem

Senior member
May 11, 2005
219
0
0
Hope for the best!

I have 8-10 days until the RMA comes home. Hopefully that fixes things for me, but I have a bad feeling about it.
 

lament

Senior member
Feb 17, 2004
345
0
76
Originally posted by: BeakerChem
Hope for the best!

I have 8-10 days until the RMA comes home. Hopefully that fixes things for me, but I have a bad feeling about it.

thanks!

and BFG has an advanced replacement program - you didn't take them up on it? or do they not do that with GTXs?

My former BFG 6800GT had some problems when I first got it last year and I had to RMA it. At first he offered the advanced replacement, but then he checked and he said since the GTs were in such short supply, he couldn't offer that.
 

BeakerChem

Senior member
May 11, 2005
219
0
0
Originally posted by: lament
Originally posted by: BeakerChem
Hope for the best!

I have 8-10 days until the RMA comes home. Hopefully that fixes things for me, but I have a bad feeling about it.

thanks!

and BFG has an advanced replacement program - you didn't take them up on it? or do they not do that with GTXs?

My former BFG 6800GT had some problems when I first got it last year and I had to RMA it. At first he offered the advanced replacement, but then he checked and he said since the GTs were in such short supply, he couldn't offer that.


They tried the same thing on 7800GTX, no replacement units ready to cross ship. <shrug> I can live without the computer for a couple of weeks. I have my PIII 700MHz with 8Mb shared onboard video up and running internet connections in the meantime. Quite the upgrade, especially seeing how it is to go back now. Lol.

My fear is that the problem has something to do with the sound card/sound conflicts. I seemed to _maybe_ notice a correspondance to freezes and certain sounds (artillary strikes heading out if I am near the cannon, or striking me if I am away from it seemd to be a high incidence of lock-ups). My hope though is that it is a video card not properly handling temperature. Since BFG stock overclocks their cards, and some cards always don't like overclocking from the reference design as well as others, perhaps this is an issue with that side of the equation.
 

stevty2889

Diamond Member
Dec 13, 2003
7,036
8
81
Originally posted by: lament
well Prime95's servers are down and it won't let me run it.

I'm running SiSoftware Sandra right now, 10 runs on low priority while I'm at work. we'll see how it does.

Umm, you should be running the torture test, the servers don't matter for that..click on options, and torture test, and run the small FTT test.
 

lament

Senior member
Feb 17, 2004
345
0
76
tested for a couple of hours without problems, and played Battlefield 2 for a couple of hours without problems. thought everything was fine.. and then as I start up a torrent, it locks - but Azureus had already displayed the error. it was a socket layer error. after some more research, I found this:

http://forums.anandtech.com/messageview...atid=29&threadid=1665983&enterthread=y

which references this thread:

http://eqforums.station.sony.com/eq/boa...d=tech&message.id=101021&no_redir=true

I've rolled back the driver to 7.21.1.3 (8/19/04) so we'll see what happens.
 

BeakerChem

Senior member
May 11, 2005
219
0
0
Some interesting information from BFG technical support.

The BFG 7800GTX OC requires a minimum of 24A on the 12V rail. I have a 600W PSU with split 12V rails giving 18A each. I am not sure that this will work. I have asked that of tech support and will see what comes back this time, but the PSU may be the issue. Odd, since the 600W Noisetaker Enermax is listed on the nVidia site as recommended for SLI rigs along with the recommendation for the BFG 7800GTX OC... Hmmm. Perhaps I am misunderstanding the power usage of the thing.

I wrote to ENermax to ask about this as well.

The max temp on the BFG 7800GTX is 100C, and more likely up to 110C. (Also from tech service.) They are so completely freaking helpful at BFG, I love it.
 

lament

Senior member
Feb 17, 2004
345
0
76
also the new 78.01 drivers fix a lot of GTX issues. you should use Driver Cleaner to clean out the old drivers and see how the new ones do.
 

BeakerChem

Senior member
May 11, 2005
219
0
0
Lol. You tell me this after I already have the card in RMA... <Shrug> Oh well. 9 days until it comes home.
 

lament

Senior member
Feb 17, 2004
345
0
76
oh wait.. I lied.. it was the 77.77 bump that had a bunch of fixes for the GTX. so there is still hope! :)
 

BeakerChem

Senior member
May 11, 2005
219
0
0
Lol. I wondered about that. The 78.01/03 didn't seem to indicate anything like that in the release notes. Glad to see we are starting to get some recognition from the driver community.

What drivers did you update, btw?

GPU can go to 77.77
MB can go to 6.66
SB ZS 2 can got to May 2005/Nov2004-EAX

Did you update the Sil chip? It seems to have its own set of drivers.
 

lament

Senior member
Feb 17, 2004
345
0
76
GPU = 77.77
MB = 6.66

i don't remember what the SB's are.. i'm at work now and can't look. and my SiL PCI addi-in card is up to date.
 

interchange

Diamond Member
Oct 10, 1999
8,026
2,879
136
I would try running memtest overnight to make sure there are no memory issues. It can run from a bootable CD, so it will bypass your OS altogether.
 

lament

Senior member
Feb 17, 2004
345
0
76
added another 2x512mb Kingston HyperX last night and Battlefield 2 is screaming, load times are cut in half, and so far no lockups. i think it was the LAN driver.

think I can get tighter than 2.5-3-3-8-2T timings? i'd like to run at the rated 2-3-2-6-1T..
 

BeakerChem

Senior member
May 11, 2005
219
0
0
You can't drop to T1 with 4 sticks with an AMD chip at the moment. I don't know enough about mem timings to comment on the other CAS values. I do know that memory tends to default at looser timings initially to make sure you can boot. My OCZ defaults to 2-3-3-8-T2, but runs great at the advertised 2-3-2-5-T1 after I adjust it.

Glad to hear you aren't locking up anymore. How long have you run with no lock-ups? Any other differences besides the extra memory (I alrady have 2 GB on 2x1GB OCZ sticks) to get you running?

I have been playing with the idea of putting my video card (the 6-pin PCI-express plug) on another PSU that I have laying around. Leave some more Amperage for the rest fo the system and see if that helps. That is if RMAing the card doesn't fix the problem. They got it yesterday, so I should get the new one back by next week.
 

lament

Senior member
Feb 17, 2004
345
0
76
Originally posted by: BeakerChem
You can't drop to T1 with 4 sticks with an AMD chip at the moment. I don't know enough about mem timings to comment on the other CAS values. I do know that memory tends to default at looser timings initially to make sure you can boot. My OCZ defaults to 2-3-3-8-T2, but runs great at the advertised 2-3-2-5-T1 after I adjust it.

Glad to hear you aren't locking up anymore. How long have you run with no lock-ups? Any other differences besides the extra memory (I alrady have 2 GB on 2x1GB OCZ sticks) to get you running?

thanks!

so far a couple of days without lockups (been running smooth since i last posted). didn't do anything else except remove the UPS from before, but now I'm going to hook it back up.

what's the technical reason that the AMD can't do 1T with 4 sticks? that's silly.

and can you point me to an AMD/RAM overclocking guide? i know AT has one somewhere in the forums.. i'll have to do a search..