• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Problem with the home fleet

Robor

Elite Member
My numbers will be slightly down for a little bit. The box that I use as my fileserver, CD/DVD ripper, and SetiQ box started experiencing BSOD's and reboots. After some troubleshooting hardware swaps and a format/reload I determined that the CPU died in it. This was my 2nd best box so I moved the CPU 3rd rig (1.6a) into the box that failed (2.4B). I should be nearly back to normal now. It will only cost me about 9 WU's per day plus what I lost in production yesterday because the Q was down.

In all the years of building computers this is the first time I've ever had a CPU fail. Even my trusty old Celeron 300A is still running in a computer I sold a coworker years ago. Guess it had to happen sometime though... 🙁
 
Worst of the worst ... on your main machine. 🙁

You know, I'm just asking for it. I don't have a backup of the machine running my Q. If it goes down, I'll have to start from scratch. I'd back it up but it would take quite a while and I don't want to shut it down ... to many people using it.

Over 140 Clients and now over 800 WUs @ day. :Q

And it does its share of Seti too:

Smoke #08
 
Now that you said it you've gone and jinxed it. 😉 You should get yourself a spare HD and a copy of Ghost or something similar and make yourself an image of that machine. Going HD to HD shouldn't take that long to dump an image and those clients will keep looking until you get it back up. You're going to be kicking yourself if it fails and you don't have a backup image.
 
... or you could be serious and do some kind of data protection with RAID...

Nowadays you can get a Promise IDE RAID card & a couple large drives for pretty cheap.

Mirror the drives & have a "backup" all the time.
 
sad to see one die, I work on a lot of systems, and have replaced several dead procs. intel just don't make 'em like they used to. (never had an amd proc die yet....knock on wood)
 
I just checked and my Q is almost up to 5 GB. :Q

I'm sure the HISTORY and WUs are taking up the most room. What folders and files would I need to copy to set up the SetiQueue with just the same users/queues/clients on another machine? It wouldn't take but a few days to accumulate new WUs if an emergency ever hit.

BTW, that computer has its own UPS and is on one of my critical A/C circuits ... backed up with a 30KW natural gas generator.

I just noticed that ole Smoke #08 has completed 4,254 WUs on its own. I'm pretty sure that is not the original installation of Seti on that machine but it has obviously been running for some time now ... approximately 554 days.

Wiz, I like your idea the best but I keep thinking we are almost to the end of S@H-1 and the Queue will not be needed much longer. I think I may be trying to rationalize not doing anything. 😛 😀

 
Originally posted by: Robor
My numbers will be slightly down for a little bit. The box that I use as my fileserver, CD/DVD ripper, and SetiQ box started experiencing BSOD's and reboots. After some troubleshooting hardware swaps and a format/reload I determined that the CPU died in it. This was my 2nd best box so I moved the CPU 3rd rig (1.6a) into the box that failed (2.4B). I should be nearly back to normal now. It will only cost me about 9 WU's per day plus what I lost in production yesterday because the Q was down.

In all the years of building computers this is the first time I've ever had a CPU fail. Even my trusty old Celeron 300A is still running in a computer I sold a coworker years ago. Guess it had to happen sometime though... 🙁

That is amazing, I have yet to see a CPU fail yet either. I thought I had a cooked Intel CPU last year when the CPU fan died but it turned out to be the actual Motherboard material and traces around the CPU fried and got so brittle that the traces got broken up.

Were you Overclocking it? Maybe it might not run at higher timings anymore (natural silicon decay) and could try backing down a tad?

 
Originally posted by: dmcowen674
That is amazing, I have yet to see a CPU fail yet either. I thought I had a cooked Intel CPU last year when the CPU fan died but it turned out to be the actual Motherboard material and traces around the CPU fried and got so brittle that the traces got broken up.

Were you Overclocking it? Maybe it might not run at higher timings anymore (natural silicon decay) and could try backing down a tad?
This box only does file storage, DVD/CD ripping, and SetiQ so I must confess I wasn't paying much attention to it. I only noticed that it had a problem because I tried to check my client status with SetiQ and it didn't respond. I flipped the KVM over to it and found it in the process of a reboot. Then I noticed that it had been rebooting on it's own for a few days.

All at default - CPU, FSB, RAM, PCI, AGP, voltage, etc, etc. Power supply and CPU fans both working fine and intake/exhause 120mm fans operational so I don't think it was heat. It's got 5 HD's, a DVD, and a CD drive along with a Promise RAID card so I thought maybe the PS (True 380?) was overloaded. I disconnected the RAID card and 4 drives and still got BSOD's. Tried using a spare 300W PS and same thing. Even reset the BIOS to default and tried a fresh load of Win2K Server and Win2K3 Server. Both resulted in BSOD or other errors. I hooked the RAID card and drives back up and put the 1.6A out of my 3rd system into this one and it's been solid ever since.

I looked closely at the CPU and there's no burned or dark spots anywhere. All pins look fine. I called Intel and after about 15 minutes of talking to them about the situation they agreed to setup an RMA. I'm still waiting on the number though - something about confirming the CPU was a boxed unit originally. I told him I know it's boxed because I'm holding it in my hand but he had to verify it there. Funny but I keep all of my 'puter hardware boxes - in fact I recently cleaned out the top shelf of my closet because it was full left to right and stacked to the top. I threw away things like a Diamond Monster, Diamond Monster II, Soundblaster 16, and P1 motherboard boxes.

<== Packrat! 😛
 
Pretty weird that a newer cpu would die, especially a P4. I fried a Duron 1.0 GHz when the heatsink tabs broke off of my cheap ECS motherboard. The cpu fried, and took the motherboard with it.
 
Well, in hindsight this CPU was in the system that was giving me problems before. I only had it together for a short while before doing a video card upgrade. I had some issues moving from a GF4 4200 to a Radeon 9800. I reloaded and I thought it was stable after that but I didn't have it running together that long before moving the 9800 to a new motherboard, CPU, RAM, and HD's (yet another system upgrade). It may have been bad all along.
 
Originally posted by: Robor
Well, in hindsight this CPU was in the system that was giving me problems before. I only had it together for a short while before doing a video card upgrade. I had some issues moving from a GF4 4200 to a Radeon 9800. I reloaded and I thought it was stable after that but I didn't have it running together that long before moving the 9800 to a new motherboard, CPU, RAM, and HD's (yet another system upgrade). It may have been bad all along.

It could be a marginal one that got by QC check. I would buy that it was a defective chip before failure in the field. Thank goodness this is a very rare exception rather than the norm. Nothing worse than intermittent electronics.


 
Back
Top