4870 & 4870X2 using 80% of 4 cores?

Peter Trend · Jun 5, 2010

Hi,

I'm using Vista Ultimate x64 and running 3 GPU2 clients and one uniprocessor (classic) client. I have the following environment variables set up: BROOK_YIELD=2, CAL_NO_FLUSH=1, CAL_PRE_FLUSH=1, FLUSH_INTERVAL=768
Flags: -advmethods -verbosity 9 -gpu x on all clients, and -forcegpu ati_r700 on the second and third clients only.

CPU use varies between 17 and 23% and is the same for each gpu2 thread; the uniprocessor usually uses between 17 and 20%.

I want to get a SMP2 on here but I'm not sure it would be able to meet deadlines and it seemed to slow the gpus down. Does anybody have any tips to reduce CPU use by the ATI GPU2 clients?

theAnimal · Jun 5, 2010

Try different values of FLUSH_INTERVAL. IIRC the recommended value for HD4xx0 is 256.

Peter Trend · Jun 6, 2010

Thanks theAnimal. I have reduced FLUSH_INTERVAL to 256, which has reduced my CPU usage to 6~8% of all four cores per client. However GPU-Z is now showing a slightly more frequent reduction in load from the usual 99~100% down to 75% load, presumably this is when the GDDR5 synchronizes with the DDR2 as I am fairly sure that this is what FLUSH_INTERVAL dictates. It strikes me as odd that simply waiting longer before synchronization would increase CPU use as dramatically as it has; intuitively one would expect the opposite.

Possibly due to ported code from the Nvidia client, I suppose. Anyway, will see what sort of effect this has on PPD. Turning on the SMP2 now 🙂

Peter Trend · Jun 6, 2010

Not much of a drop in PPD without the SMP2 running. However when I start the SMP2, Even with its' CPU usage limited to 64% and the -SMP 3 flag used, GPU-Z reports the load on all 3 gpus drops to 60~91%. When I close SMP2 this shoots back up to ~99% gpu load. Any ideas? Think I should change core affinities, maybe dedicate one core to the gpus and the remaining 3 to the SMP?

[Edit: I am also not sure that I necessarily have CAL_PRE_FLUSH, BROOK_YIELD, and CAL_NO_FLUSH optimized. Nor do I really understand what these do. Hmm...just want to squeeze as much production out of this machine as possible. I hope someone can help me 🙂]

Duvie · Jun 6, 2010

I run an smp2 client on my qx6700 and a gpu3 client on my gtx 470....I was running 42sec per frame on the gpu prior to running the cpu client and now with the cpu client i still run 42sec frames....

i notice i run 3-6% utiliz on the gpu core running...so my cpu only uses 94-97% usage...

Peter Trend · Jun 6, 2010

Yeah you will certainly do much better in terms of CPU use with Nvidia as that client is optimized better by Stanford. F@H on ATI seems to always do worse than Nvidia at the moment. My experience is that increasing the FLUSH_INTERVAL increases the CPU load but keeps the GPU loads up so my PPD from those increases, but I end up with only 20% of my CPU left to run SMP. Conversely reducing the FLUSH_INTERVAL makes GPUs use less CPU time, but struggles to keep the GPU load high, meaning my PPD from GPUs decreases from around 2600 per client to around 2200 per client, but I'm then able to run the SMP about 3 times as fast.

I'm struggling to achieve the best of both worlds and maximize my PPD overall...

Mr. Pedantic · Jun 6, 2010

Not much of a drop in PPD without the SMP2 running. However when I start the SMP2, Even with its' CPU usage limited to 64% and the -SMP 3 flag used, GPU-Z reports the load on all 3 gpus drops to 60~91%. When I close SMP2 this shoots back up to ~99% gpu load. Any ideas? Think I should change core affinities, maybe dedicate one core to the gpus and the remaining 3 to the SMP?

Change core priorities as well. Download something like WinAFC for Bill's process manager and have the GPU clients set to high priority, and the SMP client to lowest. That way, you still get the GPUs running at full, but it also means that you get reasonable system responsiveness because anything else that you do has a higher priority than the SMP client, which is taking up the rest of your cycles.

theAnimal · Jun 7, 2010

Peter Trend said:
Not much of a drop in PPD without the SMP2 running. However when I start the SMP2, Even with its' CPU usage limited to 64% and the -SMP 3 flag used, GPU-Z reports the load on all 3 gpus drops to 60~91%. When I close SMP2 this shoots back up to ~99% gpu load. Any ideas? Think I should change core affinities, maybe dedicate one core to the gpus and the remaining 3 to the SMP?

All you need to do is change the priority of the GPU clients in the config from idle to low priority.

[Edit: I am also not sure that I necessarily have CAL_PRE_FLUSH, BROOK_YIELD, and CAL_NO_FLUSH optimized. Nor do I really understand what these do. Hmm...just want to squeeze as much production out of this machine as possible. I hope someone can help me 🙂]

AFAIK the only variable that you can really play around with is FLUSH_INTERVAL, the rest are basically on or off. There is a detailed explanation over at foldingforum.

Peter Trend · Jun 7, 2010

theAnimal said:
All you need to do is change the priority of the GPU clients in the config from idle to low priority.

AFAIK the only variable that you can really play around with is FLUSH_INTERVAL, the rest are basically on or off. There is a detailed explanation over at foldingforum.

I have always had the gpu clients set to low priority and the SMP to idle, although task manager shows them both to be low and there is no idle option there.

I think I found the thread you meant: http://foldingforum.org/viewtopic.php?f=51&t=9162&start=15

mhouston said:
Re: ATI v1.24 Core available

Postby mhouston » Tue Mar 24, 2009 6:06 am
Start increasing FLUSH_INTERVAL *slowly*, but you shouldn't need a value larger than 128. You can disable the BROOK_YIELD by setting to 0. The BROOK_YIELD setting causes the core to yield CPU time instead of spinning. On small proteins, the GPU finishes fast enough that this can slow things down. So, try BROOK_YIELD=0 for a little while and your PPD should go back up. It will still be lower than previous cores because this core has the addition of another heavy math section per iteration. CAL_NO_FLUSH changes the batching behavior, i.e. how the CPU talks to the GPU, to build larger "packets" of work up to the FLUSH_INTERVAL setting. Be careful with large FLUSH_INTERVAL settings combined with CAL_NO_FLUSH=1 since you can almost guarantee a VPU recover. The setting will need to be smaller with larger proteins. We are working on automatically adjusting FLUSH_INTERVAL for a later core.

I also found this: http://en.fah-addict.net/articles/articles-1-3+gpu-environment-variables.php

For ATI

These variables require Core 11 v1.24 or later and Catalyst 9.3 or later to work.

FLUSH_INTERVAL is what will affect graphics performance (the 2D lag phenomenon for example). This is the number of functions sent to the GPU in one go. The GPU will not do anything else, including refreshing the screen until processing of the batch of commands ends. A low value reduces the time F@H monopolises the GPU, and the response time of the interface increases. However, the lower the value becomes, the higher the CPU load related to the OS and the driver becomes, so there is a trade-off between the performance of F@H and the fluidity of the interface. If the batch is too large, it can cause a VPU Recover, the driver thinking that the GPU has hung (when it is just taking too long to respond).

CAL_NO_FLUSH and CAL_PRE_FLUSH change the method of submitting batches of functions to the hardware. CAL_NO_FLUSH changes how the batches of functions are built. CAL_PRE_FLUSH allows caching the batch, in order to prepare the next batch in advance while the GPU handles the current batch.

BROOK_YIELD has several modes: 0/1/2. 0 will monopolise the CPU to have the lowest latency to requests from the GPU. 1 will release the CPU while waiting for response from the GPU to process all commandsof the same or lower priority than the GPU core. 2 will release the CPU for every process, regardless of its priority. Now, for very small values of FLUSH_INTERVAL and small proteins, it is likely that the GPU is almost finished when the CPU is released. The GPU must wait to regain access to the CPU, which may take up to a millisecond. A high-end GPU will complete most such batches in less than 100 microseconds, therefore the period of waiting for access to the CPU can have a big impact on performance. With a high value of FLUSH_INTERVAL, it is easy to build several milliseconds of work, making the wait period less of an issue.

Here is a sample configuration:

FLUSH_INTERVAL = 128-256 for a 48x0, 64-96 for a 38x0 (the optimum setting so that the GPU remains at 100% regardless of the WU, without causing too much lag should be within these ranges).
BROOK_YIELD = 2 (to stop utilising 100% of the CPU and therefore allow an additional CPU client to be started)
CAL_PRE_FLUSH = 1
CAL_NO_FLUSH = 1 (but should be reverted to 0 if it causes too many VPU Recovers).

I am going to experiment with BROOK_YIELD=0/1. I did download Affinity Changer, but I got halfway through creating a profile at 3AM and decided to go to bed. It's a bit confusing for a newbie, it could do with a code glossary so I know what I'm setting instead of just those sample profiles.

4870 & 4870X2 using 80% of 4 cores?

Peter Trend

Senior member

theAnimal

Diamond Member

Peter Trend

Senior member

Peter Trend

Senior member

Duvie

Elite Member

Peter Trend

Senior member

Mr. Pedantic

Diamond Member

theAnimal

Diamond Member

Peter Trend

Senior member

TRENDING THREADS