BOINC: Status of the Hordes (mondobyte)

mondobyte

Senior member
Jun 28, 2004
918
0
71
Hi TeAMmates,

I expect to gain some "traction" in SIMAP and Rosetta in February. Both projects have resisted the attentions of my hordes to this point.

The local horde is reeling from a spate of recent failures! November, December, and January have seen a record number of apparent failures in the local horde.

[*]Guardian (Tyan S2460 Dual Processor Motherboard) - This computer ran pretty much flawlessly for about 42 months. It began with a random crash. On reboot, a processor failed to be recognized. Testing has shown that the Antec TruPower 480 PSU has a failed power ready output which shouldn't be that big of a deal since all other outputs are within normal tolerances. Further testing showed one of the MP2000 processors is dead. The 512MB PC2100 ECC REG RAM refuses to be recognized by any motherboard that I have so I presume it is dead too! I replaced the suspect PSU and I pulled the Shuttle AN35N 400 Ultra Motherboard and Athlon Mobile XP3000+ from my library computer and now Guardian lives again. I upgraded the RAM to 768MB from 512MB. The Household File server lives again!

[*]Tyan S2462 Dual Processor Motherboard - a replacement for the S2460 in Guardian that was DOA.

[*]Bikini (Shuttle AN35N 400 Ultra) just crashed and failed to post ever again. Bad Caps in a cheap PSU! A lesson learned. Bikini may yet live again but that is a down the road thing as this computer is a pure cruncher.

[*]TheBrain (Shuttle MN31N) I built three identical micro ATX based computers in 2002. Trinity, Pinky, and TheBrain, to replace 3 aging dual pentium 200 computers. All had Shuttle MN31N motherboards, Athlon XP2400+, and 2 x 256MB Golden Dragon Value PC3200. TheBrain has experienced random Lock-Ups (usually about every other month) for about the last year. I finally decided it was time to upgrade it to Windows 2003 Server. I saved everything off ... installed the new OS and TheBrain turned up lame. Memory tests confirm that something is amiss with the motherboard or processor or onboard video memory. There are very apparent anomalies in the video during the memory testing. I'll pick the motherboard as the failed component. The PSU tests OK but I have never been pleased with the voltages from it so it will be history too. At this point, a replacement PSU, motherboard, and processor are on the way. This will be the first of the 3 to be upgraded to an Athlon 64 3200+ Venus. I'll test the XP2400+ and, if OK, will replace a XP2000+ in another computer ... a fairly nice upgrade for both systems. Alas, an Opteron or Dual Core was not in the budget this time :(

[*]Smoke (Shuttle MN31N) Yet another micro ATX based computer built in 2002. This has a transparent smoke gray plexiglass case. Apparently, the video RAM has failed in the onboard video. I added an AGP card and all seems well for the moment.

[*]Ghost (ancient ASUS Duron system) Another motherboard has apparently failed. Again, this computer may live again but not in the short term as it was only a cruncher.

[*]At the strong urging of the wife, I have divested several other sub 1GHz computers. The local horde is now less than a dozen processors/cores.

I am beginning to formulate a few theories from these failures:
[*]Processors and/or memory may tolerate long term overclocking but, perhaps, motherboard chipsets may not be as durable or tolerant. I don't believe DC projects bear on this one way or another.
[*]Cheap and/or generic power supplies are a waste. I will bite the bullet and purchase only Active PFC power supplies in the future. If for no other reason, the high efficiency of these PSU's should have the effect of reducing electricity consumption. Because of all the advanced features of these PSU's and the high cost, these power supplies seem to be built with better components to go the distance. MTBF/MTF for these PSU's is almost double that of the low cost generics. A bonus is that most of these advanced PSU's are genuinely QUIET.
[*]I keep my systems running cool. I routinely clean out the dust that seems to accumulate. I absolutely believe that if I did not keep my systems cool and clean, the failure rate would be unacceptable.
[*]Bad Caps in PSU's remain a problem even for some presumed high end PSU manufacturers like Antec. I don't find any visible evidence of Bad Caps on these failed mothboards but I have repeatedly seen cheap and generic PSU's with Bad Caps.

So much for the ramblings of a deranged cruncher.

mondo
 

petrusbroder

Elite Member
Nov 28, 2004
13,347
1,153
126
Sorry :( for all those failures. But the theories are sound - especially the "clean out dust" - "run then cool" advice. The PSU problems surprise me though. I have had a dozen comps running for 24/7 since 2001 and have had two PSU-failures. Considering the computers I have had over time (I built my first one in 1976 [that is 5 years BPC]) I have had IIRC three PSU-failures in all. Could this be a difference between running them @ 50 Hz (in Europe) or 60 Hz (in the USA)? I doubt that it is the climate ...
OK back to topic! Thanks for the info. The life of a "Hordes"-ruler is certainly most interesting ... :D



(Oh: BPC = Before PC)
 

winr

Diamond Member
Feb 17, 2001
6,081
56
91
..........

A moment of silence for the fallen warriors..........:(








:)
 

Wolfsraider

Diamond Member
Jan 27, 2002
8,305
0
76
Mondo, I have had similar failures, mine are caused from power spikes and bad wiring here though.

Good luck Steven, these issues really hurt the pocketbook, but at least its for a goiod cause :)
 

Coquito

Diamond Member
Nov 30, 2003
8,559
1
0
Alot of trouble lately for everyone. I remind everyone to run a solid UPS for their pcs. Direct from the wall socket; no extention cords!

Does the ram have a lifetime warranty?
 

mondobyte

Senior member
Jun 28, 2004
918
0
71
Originally posted by: Wolfsraider
Mondo, I have had similar failures, mine are caused from power spikes and bad wiring here though.

Good luck Steven, these issues really hurt the pocketbook, but at least its for a goiod cause :)

All these computers are on APC SmartUPS so there should be no power anomalies contributing to the failures.

I only have the support of my spouse to "repair" systems that have one or more critical functions -- BOINC is not one of them

mondo

 

mondobyte

Senior member
Jun 28, 2004
918
0
71
Originally posted by: Coquito
Alot of trouble lately for everyone. I remind everyone to run a solid UPS for their pcs. Direct from the wall socket; no extention cords!

Does the ram have a lifetime warranty?
Yes

I have APC SmartUPS or BackUPS Pro on all these fallen computers and I have two dedicated circuits with lots of outlets so I don't have an extension cord jungle. I also have a whole house surge protector.

mondo



 

Spacehead

Lifer
Jun 2, 2002
13,067
9,858
136
Originally posted by: mondobyte
I only have the support of my spouse to "repair" systems that have one or more critical functions -- BOINC is not one of them

mondo

Sounds like some one needs an "attitude adjustment" :Q




:p


 

BadThad

Lifer
Feb 22, 2000
12,099
47
91
Bad caps are the bane of computing! In the past 2 years, I've "repaired" a lot of fallen mobos that had bad caps. I was cleaning out my son's PC last week and during the visual I see he has 3 bad caps, but he has reported no problems with the system YET. I told him the time is coming, he will need a new mobo as soon as stability becomes an issue. The old P4S533 mobo has actually been an amazing workhorse for quite a few years.

Not sure if any of you have filled out the Abit Survey, but my cheif comment was about capacitors. I made it a point to complain about the industries use of inferior components on mobos. I would GLADLY pay an extra $10-20 for super high quality, aerospace grade capacitors. I would also like to see to top-tier manufacturers produce a bare-bones (no integrated anything) mainboard that uses ultra-high quality components. I would be happy to pay "extra" for quality over "quantity" of features.
 

Freewolf

Diamond Member
Feb 15, 2001
9,673
1
81
I had an Asus NF2 Mb with bad caps a couple of weeks ago that had finally started freezing up at post. I've thrown out a lot of MBs over the years with bad caps. I did actually have a successful RMA of an ECS P4 MB last December that had caps start leaking after 10 1/2 months.
 

mondobyte

Senior member
Jun 28, 2004
918
0
71
The situation goes from bad to worse ... or maybe from the sublime to the ridiculous :disgust:

This morning my mail server was offline. I checked on it. Fans would spin on the mobo but no post beeps. I changed out RAM ... no help. I unplugged the PSU from the extension from the mobo required by this case to power the front LCD displays. PSU checked OK. I plugged it back in. YEEEHAWWW ... it posted and booted.

Standing at the rear of the computer I smelled the unmistakable odor of something very hot and burning. I powered it down.

On further examination, I examined the case extension that plugs into the motherboard. Hmmm ... brown areas ... I unplugged it. When I unplugged it ... all the 5V connections are charred. The motherboard plastic connector is more or less intact except for the charred pins sticking up from the mobo. The extension that plugs into the motherboard connector is charred and missing at all 5V pins. One of the 5V wires is charred and blackened and the insulation is all melty.

This is a Tyan S2460 with dual MP2000+ processors ... these motherboards are famous for melting 12V pins and this mobo actually has the additional 12V feed from a 4 Pin HD connector to augment the main motherboard connector. I have not seen or heard of the 5V pins melting or charring although I've seen many pictures of 12V pins in that condition. All pins except the 5V rails are, apparently, OK.

Obviously, a motherboard with a charred main power connector can no longer be depended on. I suspect that one 5V pin failed which put more load on the remaining ones and so on. I have not yet removed the mobo from the case but I suspect that I will find that several of the 5V pins that are charred cannot be repaired as the motherboard may be damaged around them. Best case scenario says that I can unsolder a motherboard power connector from a dead mobo and repair the damaged connections and it "MAY" live for an uncertain period of time as a dedicated cruncher. I also see the 5V line is at a lower than expected voltage ... Could it be that the PSU failed in a way that reduced the voltage on the 5V rail thereby causing higher current draw and excessive heating through the connector? Alternatively, could it hang on a reboot in such a way that it drew an amazing amount of power through the 5V rail and that caused the meltdown. I go with the theory that the PSU is the culprit. I see a few bad caps in the PSU through the fan opening. I'l post an in-depth post mortem late Friday.

Again, very obviously, I need to get the mail server operational again ASAP ... no point in merely replacing the motherboard and PSU with already obsolete and/or repaired or pulled components. So ... a new Socket 939 motherboard, 600W EPS PSU, and Opteron 165 (Dual Core) will arrive Monday? (I was hoping for Friday :( ). This forced upgrade should reduce power consumption and radically improve reliability. Too bad I can't make use of Server 2003 64bit version but I have no 64 bit drivers for Adaptec RAID card. I have no doubts that the 32bit drivers would fail. Besides, AFAIK, there is no supported ASPI64.

The silver lining in this is that the memory, SCSI RAID, and dual NIC are all OK, and apparently, so are the 2 MP2000+ processors.

Anyone know of a place that repairs power connectors on mobos???

mondo

 

mondobyte

Senior member
Jun 28, 2004
918
0
71
Originally posted by: ken008
I think Monarch still sells new Tyan dual boards.

Yes they do ... but not AMD socket 462 duals (Athlon MP) Tiger MP or Tiger MPX motherboards. They haven't been manufactured in several years!

There are server "pulls"/used available on eBay but I would still spend about 75% of what I will spend on an Opteron solution and I would have no warranty and no confidence that I would not face a similar problem in the near future.

The Opteron solution is most definitely the least expensive even though it may not be the cheapest. I get a 3 year warranty on the mobo and a 3 year warranty on the processor (yes, I bought the retail box for the warranty and it was less too but it may not be the best stepping for overclocking but on my mail server ... who cares)

My existing memory has a lifetime warranty and tests good. The new PSU comes with a 1 year warranty. So ... at least for 1-3 years, there seems to be a cap on potential expenditures! The new PSU also posts a > 85% efficiency rating which should permit me to recover the cost of the PSU in electricity savings over a 1-2 year span! All these are good things -- particularly with my spouse!

Thanks for the suggestion.


 

Wolfsraider

Diamond Member
Jan 27, 2002
8,305
0
76
I just lost an epox 4g4a+ mobo due to bulging caps. I feel your loss mondo.

I wish I could replace it with another opteron 165 lol

Drool.
 

mondobyte

Senior member
Jun 28, 2004
918
0
71
... and yet anoither motherboard in distress

I opened up a cruncher to blow it out ... bad caps! Still running ... just finished taking good (different brand) caps from several motherboards that seemed not to fail due to bad caps and recapping the cruncher ... an older Gigabyte board.

The surgery was a success and it seems the computer will continue to live!

maybe things are turning around for me ...

mondo


 

mondobyte

Senior member
Jun 28, 2004
918
0
71
Nothing is as simple as it should be ...

The new motherboards (socket 939) seem not to be able to communicate with SCSI RAID BIOS. Bummer.

I managed to do some swapping around and have my email front-end and back-end servers operational.

Later today, I hope to get my web server back online ... the last of the socket 939 replacements.

If I can replace 4 pins in the motherboard power connector, there is a chance that my dual MP2000+ motherboard will live again. A born again cruncher. My current thinking is to retire my dual Pentium III 875 and replace it with the repaired dual MP2000+.

That would make my slowest computer(s) the ones with Athlon MP/XP2000+ processors. All the rest of the computers are running at least 2GHz ....

mondo

Wow ... what an adventure ... I keep learning more than I ever wanted to know.

mondo
 

The Reaper

Senior member
Oct 25, 1999
426
0
0
I've had my fun with bad caps too.

mondobyte I have a S2460 mobo with duel MP 1900+. Is thier any way to overclock these tyan mobo's? 1.6Ghz is to slow for me.
 

mondobyte

Senior member
Jun 28, 2004
918
0
71
A very significant factor in my decision to go with the Foxconn 6100K8MA and Foxconn 6150K8MA motherboards for my recent socket 939 upgrades is that Foxconn is one of the world's largest manufacturer of HIGH QUALITY capacitors!

The failure rate of capacitors on Foxconn branded motherboards is very low.

mondo