Help with random reboots/non-boots

Gimbal

Junior Member
Jan 20, 2005
4
0
0
It has been over two weeks now and I have googled high and low, and searched many a forum, but I'm no closer to finding any resolution. I'm sorry for being so wordy, but maybe a question well asked is half-answered as they say.

I'm currently not having random reboot problems, but instead I have troubles getting the machine to boot up properly. Here is why specs, the symtoms, and what I've done so far to troubleshoot the issue:

AMD Athlon64 3200+ with retail heatsink fan (I could have sworn it was one with a larger L2 cache)
Chaintech ZNF3-150 motherboard
1gig RAm - twin Corsair DDR PC3200
ATI Radeon 9800Pro 128M
80GB WesternDigital HD
Enermax PSU EG475P (470W)
WinXP SP2

I bought all these components a year ago (2/17/05 to be exact. Glad I got an extended warranty). It has been working beautifully up to this point.

One day while it was running it decided to reboot for no reason. Since then I closely monitored the temp but it was always 48C or less under any circumstances. The machine would run fine for an hour or so then reboot again. The reboots kept occuring more rapidly up to the point where it would reboot before it finished posting.

I could find no indication it had anything to do with Windows. In fact it would sometimes reboot while I was in BIOS. I tried changing to automatic shutoff temperatures to 65C or 70C but that didn't seem to change anything. At the time I couldn't check the memory because it would reboot parway though the test at different levels of progress.

I took apart the machine, unplugged all the connectors and re-inserted all the cards. Random reboots still occured. I unhooked everything but the floppy and put in an ancient geforce256 video card and only 1 stick of ram. Random reboots. I tried using the other stick of ram by itself. Random reboots.

One night I decided I'd be brave and take off the heatsink to the CPU and look for something there, maybe the temps weren't being reported accurately. I had some extra thermal grease that came with my mobo so after I inspected the CPU and put it back I cleaned off the gray goo that came with the retail unit and applied the new thermal grease. That worked...for about a week. During which time a ran the mem test fully through 5 cycles and no errors. Temps where always below 45C.

Sometimes the random reboots would cause the machine to freeze during bootup and it wouldn't respond to the power or reset switch. Again, I removed the heatsink and reapplied thermal grease. Again everything worked fine...for a little while anyway.

The random reboots have not come back thus far, but now I will go to turn my machine on and I will hear the CD-ROM drive whir and click as normal, as does the hard drive like it always does just as it boots, but then they just keep doing that. The monitor's light keeps flashing since the machine never gets to a point where it sends a signal.

Last night when I begrudgingly took the heatsink fan off again I noticed a sheen on the CPU board edge. I also noticed it around the edge of the socket in the same place. Almost like a liquid had gotten in there. May have been thermal grease that spooged over the edge (though I was careful to use a tiny drop in the center of the heat spreader). Anyway, I carefully mopped up whatever it was with a qtip, cleaned and re-applied thermal grease and was back in business.

Tonight I go to turn it on and again the CD-ROM and HD did their endlass dance and the machine never really booted. I cycled the PSU's power switch and tried again and everything started up, although the CD-ROM did cycle more times than I think was normal before going silent.

I just ordered some Artic Silver 5 just so that they next time I have to touch the stupid thing I'll at least know I'm putting down some quality stuff instead of whatever generic thermal grease came with the mobo.

But if this was really a CPU overheating problem, why would it not take any time at all to overheat from a cold start and not even get to the normal boot sequence? The random reboots would have made sense for an overheating problem. Perhaps CPU damage from the previous overheating problem? Then why is it working now?

I don't have a spare PSU to try and eliminate that from the equation, but the voltages shown in bios seem to look okay. I also tried plugging the machine into a different outlet (one that was on a different breaker in the house) to no avail. The voltages do fluctuate a little, but I think this is normal, no?

I also don't have a spare motherboard or CPU to switch out to see if those might be the culprit. Maybe the movement from taking the heatsink fan on and off is just enough to temporarily fix a fickle connection between the CPU and socket. No idea at this point.

Any ideas would be greatly appreciated. Again, sorry for being so wordy. I should never post a message after 3 cups of coffee.



 

mechBgon

Super Moderator<br>Elite Member
Oct 31, 1999
30,699
1
0
I should never post a message after 3 cups of coffee.
Wow, if I applied that rule, I'd probably never be seen at the Forums again! :Q

What model of Corsair PC3200 do you have, precisely, and what voltage are you using on it? Also, the Western Digital drive... is that alone on its own data cable, and if so, is it jumpered for Single Drive? Not that likely, if it's begun acting up only now, but slow/no-boot is sometimes caused by a loner WD that's jumpered for Master or Slave when it ought to be Single Drive.
 

Gimbal

Junior Member
Jan 20, 2005
4
0
0
Oops. Forgot to mention I do have another HD. The WD is jumpered as Master w/slave present and the slave is an older IBM 18gig (orsomething like that). It was working fine for a year so I don't imagine that is the problem. Also the problem still occured when I only had the floppy drive connected.

As far as the mem goes (copied and pasted from the website I bought it from):

Corsair XMS Extreme Memory Speed Series, (Twin Pack) 184 Pin 1GB(512MBx2) DDR PC-3200 - Retail
Model# TWINX1024-3200C2PT
Item # N82E16820145450
Specifications:
Manufacturer: Corsair
Speed: DDR400(PC3200)
Type: 184 Pin DDR SDRAM
Error Checking: Non-ECC
Registered/Unbuffered: Unbuffered
Cas Latency: 2-3-3-6 T1
Bandwidth: 3.2GB/s
Organization: two 64M x 64 -Bit
Special Features: With Platinum Heat Spreader
Warranty: Lifetime

As far as what voltage I'm using on it, whatever the defaults were. I never overclocked anything. Let me go into bios and see what the settings are. If I don't return to post it is because my computer didn't boot again and my head exploded, taking everything out in a 5 mile radius.

 

Gimbal

Junior Member
Jan 20, 2005
4
0
0
Everyone in a 5 mile radius can relax now...

Voltage tweaking is set to disabled in bios so everything is at its default setting. I looked into the PC health status and wrote down what the reading are there

CPU core 1.52V
Memory 2.59V
+3.3V 3.29 to 3.44V
+5.0V 5.10 to 5.16V
+12V 12.16 to 12.22V
-12V -12.28V
Chip 1.64V
AGP 1.50V
Battery 2.76 to 2.78V

My current hypothesis: I originally had an overheating problem (random reboots) due to the thermal crap that came with the retail unit loosing efficiency after a year. The crappy new stuff I applied only worked for a very short time because, perhaps, it is crap. In the process, some leakage got into the CPU socket area causing shorts and preventing the machine from booting properly at times.

If everything works for months after I use the Artic Silver stuff then I can assume that I was correct. If not...everyone in a 5 mile radius is doomed!

 

mechBgon

Super Moderator<br>Elite Member
Oct 31, 1999
30,699
1
0
That sounds like good news :) but don't be shy about kicking the memory voltage up to 2.7 volts if the issue comes back and you have reason to suspect the memory is unstable. I have the same modules myself.

The thermal grease that comes with the retail AMD heatsink is actually some of that very expensive ShinEtsu 751 stuff, if I have my info correct. It should last for practically forever. But whatever works, eh? :D Welcome to the Forums, btw :)
 

cryptonomicon

Senior member
Oct 20, 2004
467
0
0
the on-chip mem controller puts extra stress on the A64, which is rumored to increase the voltage needs depending on how many channels of ram you are running and at what speeds. and there is no harm in putting 2.7v into that ram, it might stabilize it. but if you are still having problems, you might even bump the vcore.
 

Gimbal

Junior Member
Jan 20, 2005
4
0
0
I would just like to update this thread with new info should someone with similar issues find while searching the forums.

As I said previously, I now no longer have random reboots. I still haven't received the new thermal grease but the stuff I have seems to be doing ok.

My problem now is from a cold boot, when the machine has been off for hours. It will sit there cycling the power to the CD-ROM and HD and never quite starting the whole post routine. If I leave it going for a minute it sometimes will boot. Other times I have to switch off the PSU in order to turn it off and try it again.

One thing I failed to noticed before is an error message whenever this happened. "Warning! Now system is in safe mode. Please resetting CPU frequency in the CMOS setup." Then I can press F1 to continue or Del to enter setup. I noticed that the CPU freq is being reported at 1000Mhz when this happens. If I press F1 to continue it boots normally but even Windows reports 1Ghz CPU speed when it should be 2Ghz.

If restart, the machine reboots just fine and it is running a 2Ghz again. The problem only seems to happen if the machine was off for a while.

I've located several other threads pertaining to this issue. For the most part there does'nt seem to be a clear culprit or solution. I've read about bad capicitor problems on mobos recently and I looked over my mobo to see if there were any visably leaking or bulging but didn't see any. The pictures I saw of the bad capicitors do match the ones I have though. It makes me suspicious about them, but not enough to be sure that is the problem. Although that might explain why it usually only happens during a cold start. And if I do have one going bad it might also explain the random reboots. Well a bad capicitor might explain any number of odd things come to think of it.

I plan on switching some of the power cables around as one person said this fixed his problem. I do have one of the auxilary power plugs going to my Radeon 9800Pro. Maybe finding an isolated plug just for it might help. If this doesn't change anything I'm going to suspect the motherboard more heavily and RMA the offending being.