GPU keeps stopping

ksherman

Senior member
Jul 9, 2000
619
0
0
www.kshermphoto.com
So I got home from work and the GPU client had stopped crunching. Managed to get it restarted, woke up in the morning and it stopped again. Error I'm seeing in the log file is "EUE limit exceeded. Pausing 24 hours."

Any idea what this means?
 

Golgatha

Lifer
Jul 18, 2003
12,396
1,068
126
So I got home from work and the GPU client had stopped crunching. Managed to get it restarted, woke up in the morning and it stopped again. Error I'm seeing in the log file is "EUE limit exceeded. Pausing 24 hours."

Any idea what this means?

It's pausing because too many errors are being returned (or at least erroneous data). Could be bad drivers, too much overclock, overheating (100% load is actually a lot higher load than games and needs to be taken into consideration), or just a bad WU.

The overheating can be tested by letting it run for 15min or so and checking temps. If you're seeing 100°C+ temps then it very well could be overheating and errata. At 110°C+, I would either upgrade the stock cooler, add fans, or not run F@H due to potential hardware damage. My GTX 275s run 95-100°C in a cool house and that's borderline comfortable for me. I turned them off in the late fall when we weren't running the air and it was slightly humid. For me, I'm not as concerned about the cards (eVGA lifetime warranty), as I am the CPU since it is also crunching WUs.

Overclocking can be pushed back or eliminated when running Folding@Home. I'd use WHQL drivers also and try reinstalling them if you've eliminated heat or overclocking issues as a potential cause. Sometimes you just get a bad WU also, so don't be afraid to expunge it and get a new one once you've eliminated all other possibilities.
 

ksherman

Senior member
Jul 9, 2000
619
0
0
www.kshermphoto.com
Yeah, I thought some of those things could be possible.. Temps stay under 70C at load and my fan is set to run at 45-50% when I'm gone (SUPER loud but the card stays pretty cool. 100% competes with most vacuum cleaners in the noise department). I'm not over clocking it at all and its just a standard, reference board. So I guess that leaves either the drivers (which I didn't use Beta drives, just the most recent drivers from AMD, or the fact that it is an AMD card and F@H hates them... I might have a go again at the drivers I suppose..

As far as a bad WU, after the first time it happened last night, I deleted the Work folder to clear out any wandering WUs... It recreated the folder and got back to work, but then same problem this morning. wooo!

Fortunately, the CPU client is chugging away! Operating somewhere just under 4k PPD. The Q9550 is quite incredible. Just put it in, OCed straight up to 3.8GHz and has been running at about 100% load on all 4 cores for 3 days now without flinching (this morning I brought it down to 3.63GHz for my own peace of mind). Runs a little hot, mid 60C-70C, but that's not too bad.
 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
Check out the F@H forums and find out which driver version(s) is(are) working well on ATI GPUs.

Video card manufacturers tend to modify drivers to make their own brand boast some super-d-dooper feature no one else has. And breaks them!!

I'd try using drivers from ATI rather than from AMD

-Sid
 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
That's the question I'm suggesting you try experimenting to answer.

But he most important part of my entire post is the part that suggests looking to the F@H forums, in the section on ATI specific issues and see what they have to say about drivers.

-Sid
 

biodoc

Diamond Member
Dec 29, 2005
6,334
2,243
136
As Sid suggested, it looks as though there's a lot of useful info over there on ATI cards (no experinece myself).

Here's the link to the ATI specific issues.

Also, it looks like you can get a performance gain by tweaking the environmental variables. link
 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
It's doubtful that ATI video cards will work very well. :p

(only kidding... don't waste flames on me please)

-Sid
 

ksherman

Senior member
Jul 9, 2000
619
0
0
www.kshermphoto.com
No, you have a point ;-)

Well, I cleaned out any trace of nVidia drivers from my system. I also brought in the 64bit dlls (amdcalc something) instead of the 32bit ones it pulls during installation. Cleaned out all my config stuff and I'll see if it works.
 

ksherman

Senior member
Jul 9, 2000
619
0
0
www.kshermphoto.com
Gah! No luck.. still.

log says:
[04:07:09] Folding@home Core Shutdown: UNSTABLE_MACHINE
[04:07:13] CoreStatus = 7A (122)

And some searching reveals that no one knows the problem... I looked through the F@H forums briefly, but didn't see much. I'll have to look closer tomorrow I guess. Bah!
 

salvorhardin

Senior member
Jan 30, 2003
390
38
91
Are you running with the -forcegpu ati_r700 flag or with any environmental variables? You are trying to run it on the 5850 in your signature, right?
 

ksherman

Senior member
Jul 9, 2000
619
0
0
www.kshermphoto.com
Are you running with the -forcegpu ati_r700 flag or with any environmental variables? You are trying to run it on the 5850 in your signature, right?

First, Mayor Hardin, you are a political genius.

And you are correct in each point. After some searching, the EUE errors seem related to setting the FLUSH_INTERVAL variable too high. I had read about it a while back and many suggested some pretty high intervals, like 512 or so. On the F@H forums, many where then saying that values over 256 tended to cause EUE errors... So I put it down to 128 or so. So far so good...

Its funny, the reason for setting those variables was to bring down the CPU usage, but it still maxed out one core. Bringing the value back down brought CPU usage back down to nill, so I was able to get my VM CPU client back to 4 cores (I set it only to use 3 before).. So maybe my PPD should be able to go up. *crossin' ma fingers*
 

biodoc

Diamond Member
Dec 29, 2005
6,334
2,243
136
First, Mayor Hardin, you are a political genius.

And you are correct in each point. After some searching, the EUE errors seem related to setting the FLUSH_INTERVAL variable too high. I had read about it a while back and many suggested some pretty high intervals, like 512 or so. On the F@H forums, many where then saying that values over 256 tended to cause EUE errors... So I put it down to 128 or so. So far so good...

Its funny, the reason for setting those variables was to bring down the CPU usage, but it still maxed out one core. Bringing the value back down brought CPU usage back down to nill, so I was able to get my VM CPU client back to 4 cores (I set it only to use 3 before).. So maybe my PPD should be able to go up. *crossin' ma fingers*

:cool: Good advice salvorhardin!!! :)
 

salvorhardin

Senior member
Jan 30, 2003
390
38
91
First, Mayor Hardin, you are a political genius.

And you are correct in each point. After some searching, the EUE errors seem related to setting the FLUSH_INTERVAL variable too high. I had read about it a while back and many suggested some pretty high intervals, like 512 or so. On the F@H forums, many where then saying that values over 256 tended to cause EUE errors... So I put it down to 128 or so. So far so good...

Its funny, the reason for setting those variables was to bring down the CPU usage, but it still maxed out one core. Bringing the value back down brought CPU usage back down to nill, so I was able to get my VM CPU client back to 4 cores (I set it only to use 3 before).. So maybe my PPD should be able to go up. *crossin' ma fingers*

Thanks, glad I could help. You might have to keep an eye out on the client. I'm still trying to adjust my flush interval at an optimum amount. I'm using 126 right now and just went 9 days without an EUE error. I'm going to let it run some more at that setting to see if there is a pattern (either time or core).

To fine tune the flush interval check your activity percentage from the catalyst center, if it's consistently at 99% it will probably error out on you. You want it to mostly be at 99% and occasionally to drop down lower. You will need to check at various times because each core uses a different amount of procesing power and you want to set it so that it's never consistently at 99% for any WU.

About how much ppd are you getting by running vm cpu client? I still have about 170 leiden WUs to finish and i'm going to try running F@H through a vm client. Was wondering if there is a noticeable difference in ppd. I will be running on a Q9400 at 3.4GHz.
 

ksherman

Senior member
Jul 9, 2000
619
0
0
www.kshermphoto.com
My GPU % seems to go between 99% and 90%, so that's good, right?

My PPD for the CPU client is ~3600 (using he notfred one), about 4800 for the GPU. Not too shabby :) I have only bothered to run it in VM, pretty much everyone said it was more efficient so I just ran with it. So no comparison to a non-VM SMP client..
 

salvorhardin

Senior member
Jan 30, 2003
390
38
91
That seems to be good, I would check other WUs to see how they affect the GPU %. By your ppd I should be doing better on my CPU than my 4830 (2400 ppd).