New Einstein@Home Application for ATI/AMD GPUs!

petrusbroder · May 18, 2012

Very interesting info, Sunny129! Thanks!

Sunny129 · May 18, 2012

yet another update...

so i've been testing the "3 GPU tasks + 3 CPU tasks" configuration all day, and found that compute efficiency has improved yet again, albeit very slightly this time...and GPU tasks were taking almost exactly 2 hours to complete. under this specific configuration, 3 simultaneous GPU tasks only take ~62.5% of the amount of time it took for three GPU tasks running in series to finish. this marginal improvement in compute efficiency makes sense - recall that by going from "3 GPU tasks + 4 CPU tasks" to "3 GPU tasks + 3 CPU tasks," my GPU utilization only improved by a few percentage points (from ~92% to ~95%). this is in stark contrast to the significantly more substantial improvement in compute efficiency i experienced when switching from "2 GPU tasks + 5 CPU tasks" to "2 GPU tasks + 4 CPU tasks," and GPU utilization jumped from ~77% to ~90%. i should note that CPU task run times are still right around 270 minutes (4.5 hours).

i'm currently testing the "4 GPU tasks + 4 CPU tasks" configuration (which is now consuming 1409MB of my 2048MB of VRAM), and will report back w/ results when i have them. also, i'm eventually going to try to plug this data into a spreadsheet to make it all easily accessible and so you don't have to dig through my paragraphs...i'll get to that when i can.

Sunny129 · May 19, 2012

ok, i've tested the "4 GPU tasks + 4 CPU tasks" configuration, and it appears that my compute efficiency has finally plateaued. it turns out that under this configuration, GPU tasks are taking 160 minutes on average. so running them in parallel again takes only 62.5% of the 256 minutes it would have taken to run them in series, which really isn't any better or worse with respect to compute efficiency than the "3 GPU tasks + 3 CPU tasks" configuration. CPU task run times have yet again not suffered, as they're still taking ~270 minutes (4.5 hours) to complete.

i went ahead and switched to a "4 GPU tasks + 3 CPU tasks" configuration, but noticed that my GPU utilization only went from 95% to 96%, so any decrease in run times, as well as any improvement in compute efficiency, is going to be marginal at this point. i understand that from this point forward, the law of diminishing returns is going to have a stranglehold on my results. that being said, i'd like to find the sweet spot (the perfect combination of CPU and GPU tasks) and see just how much performance i can squeeze out of my GPU...even if that means running 5-6 GPU tasks simultaneously along side 0, 1, or 2 CPU tasks. i'll keep testing and posting results, and hopefully i'll get a chance to start on that excel spreadsheet i mentioned earlier...

Sunny129 · May 20, 2012

*update*

well my machine spent most of yesterday and part of today testing 5 simultaneous GPU tasks in multiple configurations. i started w/ 5 GPU tasks + 4 CPU tasks, and took note that my GPU was seeing 95% utilization, and GPU task run times were ~201 minutes. i then tried 5 GPU tasks + 3 CPU tasks, and took note that my GPU was now seeing 96% utilization, and GPU task run times were now only ~200 minutes. naturally i tried 5 GPU tasks + 2 CPU tasks next, and took note that my GPU was seeing 97% utilization, and GPU task run times were now down to ~198 minutes.finally, i tried the "5 GPU tasks + 1 CPU task" & "5 GPU tasks + 0 CPU tasks" configurations , but my GPU utilization never went beyond 97%, and my GPU task run times never dipped below 198 minutes.

at this point, i took all my data and plugged it into the following excel spreadsheet:

i then highlighted the most GPU-compute-efficient configurations and began to contemplate them. these included 3, 4, and 5 simultaneous GPU task configurations, the improvements in compute efficiency of which all centered around the 37.5% mark, give or take no more than 0.6%. knowing that the ever-so-slight differences in compute efficiency and PPD between some of these configurations would be negligible in the short run, and would hardly amount to anything significant in the long run, i decided to focus on which configuration would give me the best combination of CPU and GPU compute efficiency. the most CPU tasks running simultaneously in any of these configurations is 4, and the least 2. if you note the CPU task run time column, you'll note that CPU task run times remained the same across the testing of multiple configurations. given that CPU task run times are the same whether i run 2 simultaneously or 4 simultaneously, i obviously want to choose one of the configurations that involves running 4 simultaneous CPU tasks. that eliminated quite a few configurations, and left only two - the "4 GPU tasks + 4 CPU tasks" configuration and the "5 GPU tasks + 4 CPU tasks" configuration. if the same amount of CPU work gets done in the same amount of time in either of these configurations, then it comes back down to the GPU tasks as the deciding factor. if running 5 simultaneous GPU tasks doesn't do any more work in any less time than running 4 simultaneous GPU tasks, then it would seem to me that the "4 GPU tasks + 4 CPU tasks" configuration is the logical choice. even though GPU utilization is the same in both scenarios, the "5 GPU tasks + 4 CPU tasks" configuration uses more VRAM, and theoretically should cause the GPU to draw more power.

at any rate, i know that everyone's hardware/software setup is a little different, and i don't expect anyone to be able to duplicate these exact results...of course this info was just meant for reference anyways. hopefully it'll give interested folks an idea of what to expect w/ their ATI/AMD GPUs...and if not for the Einstein@Home project itself, i hope folks can at least use this model as a template for testing DC platforms w/ both CPU and GPU clients.

Eric

Alyx · May 21, 2012

Can you post your app_info.xml file? I'd like to mirror your setup on my machine too.

Sunny129 · May 21, 2012

Alyx said:
Can you post your app_info.xml file? I'd like to mirror your setup on my machine too.

an app_info.xml file is no longer necessary for the Einstein@Home BRP4 application. check your Einstein@Home web preferences, and you'll see that the developers have added a "GPU utilization factor for BRP apps" parameter. this does the same thing that the <count>n</count> parameter in the cmdline section of the app_info.xml file did back when we had to employ one. if you've ever used an app_info.xml file to change the number of tasks running simultaneously on your GPU before, then you'll understand right away how to use the parameter in the web preferences, as it works the same way. if you're not familiar w/ app_info.xml files, it works like this: a GPU utilization factor of n = 1 corresponds to 1 GPU task running by itself, a GPU utilization factor of n = 0.5 corresponds to 2 GPU tasks running in parallel, a GPU utilization factor of n = 0.33 corresponds to 3 GPU tasks running in parallel, and so on and so forth...just enter the reciprocal of the number of task you'd like to run simultaneously, and that's it. and don't worry if the change doesn't occur instantly - remember that its a web preference now...so at the very earliest, the change will take effect the next time your host contacts the Einstein@Home server. and sometimes your host must finish any BRP4 GPU tasks that might have been in your queue before you made the change. for instance, the other day i change my GPU utilization factor from 0.2 to 0.25, yet my host continued to run 5 simultaneous GPU tasks several hours and scheduler requests later. it was then that i realized that i already had a handful of BRP4 GPU tasks in the queue when i made the change to my web preferences. i then looked at the task that were running in BOINC at the time, and noticed that 4 of the 5 GPU tasks that were running said "0.5 CPUs + 0.2 GPUs," while the 5th one said "0.5 CPUs + 0.25 GPUs." surely enough, the "0.5 CPUs + 0.2 GPUs" tasks finished one by one until there were none left. at that point, there were only "0.5 CPUs + 0.2 GPUs" left running, and there were only 4 of them.so sometimes it takes a while for a change to this parameter to kick in now that it is controlled server side.

blckgrffn · May 21, 2012

Sunny129 said:
an app_info.xml file is no longer necessary for the Einstein@Home BRP4 application. *snip*

Thank goodness! That seems a little arcane in this day and age

(officially spoiled by GUIs)

Alyx · May 21, 2012

Super! This makes the world a bright and happy place.

Alyx · May 21, 2012

Sunny129 said:
and don't worry if the change doesn't occur instantly - remember that its a web preference now...so at the very earliest, the change will take effect the next time your host contacts the Einstein@Home server. and sometimes your host must finish any BRP4 GPU tasks that might have been in your queue before you made the change.

This totally answered my question before I could ask it. I was scratching my head at why this wasn't working after I did a driver and boinc update and was about to ask for your advice.

Edit:
Also I think resetting the project will cause it to resend you your current work with the next GPU settings, rather than canceling tasks.

petrusbroder · May 22, 2012

Thanks for the very well performed research task!

This would make a great academic paper if you would pour it into academic language ...

Anyhow: your data is very much appreciated and will help me to set this up!

Sunny129 · May 22, 2012

thanks Peter. that means alot coming from one of the TeAm's most successful crunchers!

btw, now that i've documented some BRP4 ATI task run times, i'd like to compare them to my BRP4 CUDA task run times. specifically, i'm comparing my HD 6950 to both my GTX 460 and my GTX 560 Ti. using the data from the optimal configuration in the spreadsheet a few posts up, my HD 6950 takes 160 minutes to complete 4 BRP4 ATI tasks (40 min/task). my GTX 460 takes 96 minutes to complete 3 BRP4 CUDA tasks (32 min/task), and my GTX 560 Ti takes 87 minutes to complete 3 BRP4 CUDA tasks (29 min/task). so CUDA tasks on the GTX 460 take approx. 20% less time to complete than ATI tasks take on the HD 6950, while CUDA tasks on the GTX 560 Ti take approx. 27.5% less time to complete than ATI tasks take on the HD 6950. i know comparing AMD and nVidia GPUs is like comparing apples to oranges, but now that there are BRP4 apps for both devices, Einstein@Home participants are eventually going to start comparing them. so i figured comparing the GTX 560 Ti to the HD 6950 is ideal since they perform quite similarly in other arenas (gaming, encoding, etc.). also, to level the playing field even further, you can see in my sig that GPUs aside, my nVidia rig is almost identical to my AMD/ATI rig.

salvorhardin · May 25, 2012

Thanks for the data Sunny, I hadn't even thought about being memory limited.

VirtualLarry · May 25, 2012

I can't seem to get any ATI OpenCL WUs with Einstein@home.

Sunny129 · May 25, 2012

VirtualLarry said:
I can't seem to get any ATI OpenCL WUs with Einstein@home.

Larry, check your decommissioning main rig thread - i posted more information regarding your problem there.

Sunny129 · May 29, 2012

guys, i'm currently accumulating more data...but this time i've added an HD 5870 2GB GPU to the mix and using it for testing while the HD 6950 2GB crunches Milkyway@HOme Separation 1.02 tasks in the background. i used to run this card by itself until i bought the HD 6950 2GB to go with it. the idea was to either make both discrete GPUs dedicated crunchers and run the display w/ the mobo's onboard graphics, or dedicate at least one of them entirely to DC while the other would split its resources between DC and running the display...but i soon found out that either option gave me troubles in WinXP x32. i've since upgraded to Win7 x64, and now i'm able to dedicate the discrete GPUs entirely to crunching, and run the display solely on the mobo's IGP.

so this test is going to be slightly different from the first test b/c i didn't strictly remove the HD 6950 2GB from the system and install the HD 5870 2GB in its place - rather i left the 6950 in place and installed the HD 5870 2GB to test Einstein@Home BRP4 ATI tasks this time around. the HD 6950 2GB will be crunching 2 simultaneous tasks in the background for this test. obviously this test was not designed to be an "apples-to-apples" comparison to the previous test, or i would have removed the HD 6950 2gb card from the machine...rather i'm testing based on the hardware mix i'd ultimately like to use with this machine.

so far i've measured the run time of a single E@H BRP4 ATI task on the HD 5870 2GB GPU, as well as the total run times of 2 and 3 simultaneous BRP4 ATI tasks. all measurements were taken w/ 2 MW@H tasks running on the other GPU (the HD 6950 2GB) and a varying number of CPU tasks depending on what my 6-core CPU would allow w/ all the GPU activity going on. i'm hoping to test 4 and 5 simultaneous BRP4 ATI tasks tomorrow and have a new spreadsheet of data in 2 or 3 days...

Sunny129 · Jun 4, 2012

as promised, here is the spreadsheet of data concerning Einstein@Home BRP4 ATI tasks on my HD 5780 2GB card:

Search

New Einstein@Home Application for ATI/AMD GPUs!

petrusbroder

Elite Member

Sunny129

Diamond Member

Sunny129

Diamond Member

Sunny129

Diamond Member

Alyx

Golden Member

Sunny129

Diamond Member

blckgrffn

Diamond Member

Alyx

Golden Member

Alyx

Golden Member

petrusbroder

Elite Member

Sunny129

Diamond Member

salvorhardin

Senior member

VirtualLarry

No Lifer

Sunny129

Diamond Member

Sunny129

Diamond Member

Sunny129

Diamond Member

TRENDING THREADS