ok, so if you'll recall from one of my previous posts, i was running 5 simultaneous POEM tasks on the HD 5870, but i wasn't leaving a CPU core free for each POEM task running. instead, i was running them along side 4 CPU tasks (2 Einstein@Home CPU tasks and a a double-threaded Test4Theory@Home CPU task) and 2 simultaneous Milkyway@Home tasks on the other GPU in the machine, an HD 6950. with those 4 CPU cores preoccupied, i was essentially leaving only 2 cores free for 5 POEM GPU tasks (i have a hex-core 1090T CPU). to be honest, CPU task run times didn't suffer much at all in that situation, and also POEM GPU task run times didn't seem unreasonably slow for my level of hardware. but i wanted to see how much i could improve efficiency, so i finished off my cache of Einstein@Home CPU tasks and configured the machine for testing. the test configuration included the Test4Theory@Home double-threaded CPU task and the 2 simultaneous Milkyway@Home tasks on the HD 6950 GPU as constants b/c they always have and always will be running 24/7 (so fair warning - the test is really based around my project and application selections, and does
not isolate the POEM GPU tasks for pure baseline purposes). the variable in the test is of course the number of simultaneous POEM GPU tasks. so without further ado, here is the processed data:
there are a few things of interest to note:
1) with the Test4Theory@Home double-threaded CPU task consuming just under 2 full CPU cores, and the 2 Milkyway@Home GPU tasks consuming a negligible amount of CPU resources (approx. 0.75% of a CPU core each), there were essentially just over 4 full CPU cores available to POEM GPU tasks throughout the test. nevertheless, i did push beyond 4 simultaneous POEM tasks to see how much the CPU core deficiency would affect GPU task efficiency and run times.
2) setting the <avg_ncpus> and <max_ncpus> parameter values in the app_info.xml both to 1 hardly did anything for efficiency, CPU/GPU utilization, or run times, so i left it the way it was when i first copied it to make my own, with <avg_ncpus> tset o 0.25 and <max_ncpus> set to 1.
3) the law of diminishing returns is apparent from the very beginning. as you can see, the more POEM tasks i ran in parallel, the more efficiently they ran...well at least that much is true for up to 7 simultaneous POEM tasks. i'm sure at some point, say n simultaneous POEM tasks, efficiency would max out, and going to n+1 simultaneous tasks would actually cause efficiency to decrease. i didn't bother going that far with the test b/c by the time i got to 7 simultaneous POEM tasks, efficiency & run times were starting to plateau enough to warrant concluding the test.
4) despite not having a free CPU core per POEM task once i got beyond testing 4 simultaneous POEM tasks, the drop off in performance was hardly significant, and seemed to fall right in line w/ expectations as far as the law of diminishing returns is concerned.
5) i will say that i was able to obtain better results while running 7 simultaneous POEM GPU tasks than the chart shows, and i did that by suspending Test4Theory@Home and making those CPU resources available to POEM@Home...though i never actually ran a full length test in that configuration. i cut it short for 2 reasons...1) i was simply getting lazy, and 2) it doesn't matter to me if efficiency can be improved while running 7 POEM tasks - not running Test4Theory@Home is not an option.