• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

F@H keeps killing my video cards!

VirtualLarry

No Lifer
I think I'm going to be dropping out of F@H on a more permanent basis. I just can't afford to keep feeding it video cards.

Background: I built a F@H box, with four 9600GSO video cards and an MSI K9A2 Plat. mobo. Had it running for a couple of weeks, and the top video card died. Presumably due to temps, as it was running at 96C.

Well, a couple of days ago, I recieved my second PNY 768MB 9600GSO single-slot card, and I re-did my F@H box, so instead of an empty top slot, an Asus "top" 9600GSO, and two EVGA single-slot 9600GSOs, now I have two PNY 768MB single-slot 9600GSOs, and two EVGA single-slot 9600GSOs.

Temps with the side of my case off, were 82C for the top GPU, and 80C for the next-highest GPU. I thought that 82C was reasonable for this card.

Well, I left the computer running F@H for a day, and I went to revive the machine, and it wouldn't produce any display. I had to force-power-off, and then upon attempting to boot, I also got no display. Same symptoms as last time.

WTH is killing my cards? It's getting pretty hard to find 96SP 9600GSOs these days.
 
What the PSU wattage running the 4 cards? Could be not enough amperage to run the cards. Asides from the thermal issue, the only other cause(s) I can think of would be poor air circulation within the case or the cards are defective.
 
Heat. All my 22 cards are dual-slot solutions, and I keep the fans at 50%-70%. The hottest any card gets is 75c, and even that worries me.

Changeover to 2 260-216 cards. They are under $200 now.
 
I've got an EarthWatts 650W, which has 300W available on the rail that the PCI-E plugs are on. So I should be good with two splitters. Single 6-pin PCI-E is rated at 75W, so 4x75=300W.
 
Update. This time, it wasn't the cards. It was the RAM.

Interestingly enough, I was having a chat with my computer-guru friend of mine, and he recollected some experiences with Corsair XMS2 DHX RAM. He said that he could put the RAM into a system, it would POST, and then the system would never boot again with that RAM. Well, that's approximately what happened to me.

I put in 2x2GB of GSkill DDR2-800, and things are working much better. With the four video cards, I only get 2.25GB of RAM that's usable. Oh well.

With the side panel on (w/fan), on my Antec 300, the temps have dropped to approx 60C for the top card. (I have the AC on and set to 72F as well.)

Fold on!

Edit: Oh, I tested the old card that "died". It's still dead. It won't prevent POSTing, but it just displays no output.
 
with 4 cards running at full load I would buy a 1000-1500W PSU (tagan is my personal favorite, in any case I would buy a brand). I have a 530W PSU powering a phenom9500 with a 8800ultra at full load and I worried about it... (I even notice that it's pretty stressed because its fan which adjusts itself according to load runs at full speed)
 
So...F@H is killing your card? Or is it just your RAM? Either way if you got the time, take your build apart then test everything one by one. Consider raising your fan speeds to drop your GPU temperatures. Also look into getting a new PSU. It's just not right that F@H is killing your GPUs, unless you're being a very unlucky guy.

P.S. 2GB RAM should be enough since I have my RAM limited to around 300MB and it still folds at top speed on my 9600GT.
 
Power supply: I have 2 boxes with 3 9800 GT's@660 and a PHII 940@3.6 on a 700 watt Fortron, no problems for months. The same PSU drives 2 260 GTX-216 cards and a E6400@3.2, no problems, and a Q6600@3.3 and 2 9800 GT GT's superclocked@756 no problem. 4 9600 GSO's should be fine on a good 650.
 
Originally posted by: Markfw900
Power supply: I have 2 boxes with 3 9800 GT's@660 and a PHII 940@3.6 on a 700 watt Fortron, no problems for months. The same PSU drives 2 260 GTX-216 cards and a E6400@3.2, no problems, and a Q6600@3.3 and 2 9800 GT GT's superclocked@756 no problem. 4 9600 GSO's should be fine on a good 650.

I think, had you not turned off several of your boxes, you would be seeing some issues with 'unstable machine' EUEs on some of the latest WUs. At least I was on my 8800GTS cards.

-Sid
 
Heat kills, I won't fold on a single slot GPU. My 3870 has been folding at close to 90C for almost a year with no hitch. The fan on it is usually at about 90%. I run the adaptive fan control.

I lost an entire HTPC that I had folding due to heat. Ok that is an exagguration there were some useable parts after I tore it down, but the MB and PS were toast.

There should definatly be a control in the FAH app that slows down work progress in the event that some specified heat threshold was reached. Possibly even stopping it. The other thing that would help is if there was a GPU % slider in the options like they have for the CPU.
 
So it was dead RAM in the end then? bad luck!🙁

I think you ought to edit your title now though 😉.

Sid
What temps were your cards hitting when F@H crapped out?
 
Originally posted by: Insidious
Originally posted by: Markfw900
Power supply: I have 2 boxes with 3 9800 GT's@660 and a PHII 940@3.6 on a 700 watt Fortron, no problems for months. The same PSU drives 2 260 GTX-216 cards and a E6400@3.2, no problems, and a Q6600@3.3 and 2 9800 GT GT's superclocked@756 no problem. 4 9600 GSO's should be fine on a good 650.

I think, had you not turned off several of your boxes, you would be seeing some issues with 'unstable machine' EUEs on some of the latest WUs. At least I was on my 8800GTS cards.

-Sid

I still have 10 cards folding ! And no EUE problems.
 
I shut mine down due to the many "unstable machine" errors. I was tired of trying to repair the problems every single day. I have them all back in order but I will be watching from the sidelines for a while. Besides the errors, I am worried about the heat that they exhaust in my computer room during the warm season...
 
Sounds like I may be the "lucky dog" here. My computers are in the basement which is large in area and relatively stable temp-wise. Although you can tell the difference as the weather warms up. If they were in my living area I probably would have to tone everything down also. I hate it for you.

I have been getting some Unstable_Machine errors on my 9800GT cards. The main culprit seems to be 5767 projects.

My second biggest headache is power outages, especially short ones during the night. Having to go around and log in and/or restart all the folders is a pain.

other than that, I'm trying to get a life ................. 😉
 
Originally posted by: Insidious
Originally posted by: Assimilator1
So it was dead RAM in the end then? bad luck!🙁

I think you ought to edit your title now though 😉.

Sid
What temps were your cards hitting when F@H crapped out?

Originally posted by: Insidious
You might be interested in this thread at the f@h forum <--- LOOK HERE

🙁

-Sid
I have I read the whole lot, so I either forget what temps you said or I missed it, I don't have time to read the whole thread again 😛😉. You could just give me a ballpark figure.
 
As far as I can tell, when I get the 8800GTS up to 80C or just a little higher, it starts throwing sporatic "unstable machine" errors.
 
Originally posted by: Assimilator1
Wierd, not that hot then 😕
Did you try John Weatherman's suggestion? (2nd post in your thread).

No I didn't jump through the "give us more info" dodge. I told them I quit because it is too hot.... that really was the only information they need from me.

I have modified my cases with additonal fans, upped my PSU wattage in each PC, learned how to take direct control of the Video card fans......

That is enough effort from me..... if they want me to let them use my GPU, then THEY are going to have to make some effort to protect my stuff. I truly believe I have met them MUCH more than half-way.

Core14 work units kept my GPU temps below 65C at all times. When Core14 was released it was announced it produced "more science".

BUT they have been too lazy or inconsiderate to port that method to all WUs we crunch because they don't deem it justified just to cool off my stuff. They'd rather tell me to put a box fan beside my case with the door removed... they'd rather tell me to put in water cooling.

Well Stanford.... I'd rather just take you off my GPUS! :|

-Sid

edit: I still crunch on the CPU with SMP clients. My 'I Quit' is only for the video card client
 
Oh ok, btw just because F@H GPU crashes doesn't mean it's going to cause any h/w damage, sounds like you've got good cooling & decent temps so don't worry about it 🙂 (of course it sucks that F@H is crashing but otherwise no harm's done).

I don't think asking for logs or more info is a dodge myself, sounded to me more like he was trying to help you get to the bottom of it. However I totally understand that you can get to a point where enough is enough on trying to get some s/w going, I quit F@H GPU myself for a couple of weeks about a month back, until I had time to try a more stable core, driver dlls & variables.

Thought about running SETI GPU on your Nvidia cards as an alternative?
 
I think I found my compromise....

Pulled the 8800GTS card from each PC and now just running a single card (9800GTX+)

Looks like this will work for me during the hot months. We'll see.

Plus, I see some new core14 WUs are being released. Thank God!

-Sid
 
Back
Top