New Einstein@Home Application for ATI/AMD GPUs!

Sunny129 · May 15, 2012

After having had CUDA (nVidia GPU) support for Einstein@Home's Binary Radio Pulsar Search (BRP4) for some time now, the developers have finally added support for AMD/ATI GPUs!

After more than a year of work by Oliver Bock, Bernd Machenschalk, Heinz-Bernd Eggenstein and other developers, we are pleased to announce the release of the first Einstein@Home application for ATI/AMD Graphics Cards.

This OpenCL application, which searches Arecibo data for new radio pulsars, is about a factor of ten faster than the same search running on a typical CPU. The application is currently available for Windows and Linux computers with Radeon HD 5000 or better graphics cards. We hope to have a version for Macintosh (Apple OS X 10.8, Mountain Lion) sometime this summer, but there are still some problems that need to be fixed or worked around.

Volunteers who wish to run this application will need to install version 7.0.27 or later of the BOINC client. Please see this thread for more information, or if you want to ask questions.

Many thanks to the AMD/ATI team for their support in the OpenCL software development effort.

Bruce Allen
Director, Einstein@Home

it appears the important points are as follows:

performance increase of a factor of 10 over a BRP4 task running on a CPU
requires Windows or Linux (for now)
requires Catalyst 12.1 or later (Linux users - don't install the APP SDK!)
requires BOINC v7.0.27 or later
requires an OpenCL 1.1 compliant (equivalent to Radeon HD 5xxx) AMD/ATI GPU or APU
requires at least 512MB of video memory

for a more in depth discussion that is sure to develop in the near future, follow the link in the above quote...

*IMPORTANT NOTE* - the reason Linux users should omit the APP SDK during the driver installation process is b/c both the main driver installer and the APP SDK installer have the potential to install different versions of the same 32-bit library, causing problems. this doesn't appear to be a problem w/ Windows-based platforms, so Windows users should first try installing the Catalyst 12.x driver package in its entirety, including the APP SDK.

blckgrffn · May 15, 2012

Very nice! I am wondering if my 4770 wont' work, or just isn't supported due to it's "beta" only support of OpenCL.

I'll give it a shot after I hit my POEM goals 🙂

Thank you for sharing...

BTW, OT - I finally got around to unpacking my Corsair 400r... Awesome. In reviews and on the box, I thought it was a bit ugly. Now, having worked with it and basked in the light of its front fans (which has an on/off switch!), I appreciate its brand of handsomeness. Not silent, but quiet enough...

Also, love the internal layout. Easily better than the big Lian-Li's I've been using lately, as it should be, since those are $50 shipped when I buy them...

Sunny129 · May 15, 2012

nice...i migrated one of my machines into a Corsair Carbide 500R a few weeks ago and i absolutely love it. with the case's fan controller set to low, and combined all of my other near-silent hardware, the machine is almost inaudible (quiet enough for me to not notice it when i go to bed right next to it...and this is with both the CPU and GPU crunching away under 100% load). i just took delivery of a Corsair Graphite 600T today, and i'm hoping to migrate my other machine into it tonight...

Alyx · May 15, 2012

Any idea what the points per day will be like?

petrusbroder · May 15, 2012

Thanks for the heads-up! 🙂

Sunny129 · May 15, 2012

Alyx said:
Any idea what the points per day will be like?

i don't know quite yet...it was mentioned on the E@H forums that this new app might not be on par w/ the CUDA app - yet. but it was also mentioned that we should expect an average 10-fold increase in performance over our CPU counterparts. its tough to say where that might put specific AMD/ATI GPUs w/ respect to PPD...especially when possible PPD values will be spread across a broad range (a 6970 will far outperform a 5830 for instance).

that being said, i typically get ~50K PPD crunching Einstein BRP4 tasks on a GTX 460 and a GTX 560 Ti (both in the same box). whether we can extrapolate from that info or not at this point remains to be seen...

Alyx · May 15, 2012

Sunny129 said:
i don't know quite yet...it was mentioned on the E@H forums that this new app might not be on par w/ the CUDA app - yet. but it was also mentioned that we should expect an average 10-fold increase in performance over our CPU counterparts. its tough to say where that might put specific AMD/ATI GPUs w/ respect to PPD...especially when possible PPD values will be spread across a broad range (a 6970 will far outperform a 5830 for instance).

that being said, i typically get ~50K PPD crunching Einstein BRP4 tasks on a GTX 460 and a GTX 560 Ti (both in the same box). whether we can extrapolate from that info or not at this point remains to be seen...

I think ~50k for a GTX460 is a pretty fair number for GPUs. So if it is near that then it should be pretty fair.

Sunny129 · May 15, 2012

well that's for a GTX 460 and a GTX 560 Ti combined...and i'm just going off of my RAC on BOINCstats, where it fluctuates between 50K and 60K PPD. so i would imagine that the GTX 460 is netting ~20-25K PPD, while the GTX 560 Ti is probably netting ~25-30K PPD.

blckgrffn · May 15, 2012

In continuation of the OT - interested to hear what you think of that beastly (I am not one to normally spend more that $60 on a case) 600T 🙂 After using the 400R (which I got @ MC for $70) I might be tempted to go all out if it is really that much nicer!

Alyx · May 15, 2012

blckgrffn said:
In continuation of the OT - interested to hear what you think of that beastly (I am not one to normally spend more that $60 on a case) 600T 🙂 After using the 400R (which I got @ MC for $70) I might be tempted to go all out if it is really that much nicer!

That 600T looks so nice. My case is like 6 years old so I really should get one too.

Sunny129 · May 15, 2012

*UPDATE*

ok guys - i'm running a BRP4 ATI task on my shader-unlocked HD 6950 right now. i don't want to make any runtime or PPD estimates based on my first WU b/c it didn't start off very efficiently...that is to say, t took me a few minutes to notice that each of these tasks consumes 0.5 CPUs, and i already had Einstein@Home CPU tasks running on all 6 of my CPU cores. consequently, the GPU task wasn't crunching at its full potential, and was actually showing damn near 0% GPU utilization in MSI Afterburner, with a few spikes here and there. i changed the BOINC manager settings to only use 5 of the 6 CPU cores for CPU tasks, and sure enough my GPU utilization instantly jumped up to and oscillated around the 65% mark. i'm going to run another task or two individually before i try two at a time. that may take some time, as my first BRP4 ATI task is just passing the 30 minute mark and is only 45% done.

OT - i'll have to get back to you on the 600T - i forgot my PC building tools at my sister's place, and i may or may not get them back tonight. at any rate, as i mentioned before, i already have a Carbide 500R and love it. it shares quite a few features w/ the 600T, so i expect it to be very similar, but more spacious. btw, for $70 the 400R is A LOT of case! you did well to snag it for that price. keep in mind that when its not on sale, its really competing in the $100 range, so its not a cheap case by any means, both in terms of price and build quality. i've had great luck w/ CoolerMaster and Corsair cases.

salvorhardin · May 15, 2012

Already switched my 5850 to einstein. Ran it under albert and had no problems, we'll see how it goes on regular einstein. I don't know if you have seen this but under einstein@home preferences you can now select how many wus to do at once on your gpu. The setting under GPU utilization factor of BRP apps is the same as the count setting in an appinfo file. I'm currently using 0.5, my utilization went around 70 to 94.

Sunny129 · May 15, 2012

salvorhardin said:
Already switched my 5850 to einstein. Ran it under albert and had no problems, we'll see how it goes on regular einstein. I don't know if you have seen this but under einstein@home preferences you can now select how many wus to do at once on your gpu. The setting under GPU utilization factor of BRP apps is the same as the count setting in an appinfo file. I'm currently using 0.5, my utilization went around 70 to 94.

yeah its nice not to have to use an app_info for once...that being said, i just finished testing a few WU's one at a time, and after changing my GPU utilization factor to 0.5, not even a BOINC restart got 2 WU's going at once. i imagine i'll have to wait for the current result to finish and upload before a 2nd concurrent task starts running. i'll report back if the parameter doesn't eventually start working...i've had no problems w/ it so far on my CUDA BRP4 crunching box.

salvorhardin · May 16, 2012

I had the same problem, it took a while to kick in. By the look of my tasks, once the task is finished the next ones are started with the updated settings.

Sunny129 · May 16, 2012

ok, so for a reference point, my shader-unlocked HD 6950 is completing BRP4 ATI tasks in ~64 min. apiece when run individually. i'm not running 2 at a time and will report back w/ some run times and quantify the improvement in efficiency if there is one.

given that each BRP4 ATI task consumes ~0.5 CPUs, it is interesting to note the following behaviors:

it is common sense that i can only run CPU tasks on 5 cores in order to leave the 6th and final core available a single or 2 concurrent BRP4 ATI tasks. while running a single BRP4 ATI task and 5 CPU tasks concurrently, my GPU utilization is ~67%...yet interestingly enough, if i decrease CPU work to 4 cores or less, GPU utilization slowly creeps up a few percentage points to ~70%.
when i run 2 BRP4 ATI tasks and 5 CPU tasks concurrently, my GPU utilization is ~77%. and yet, if i decrease CPU work to only 4 cores, my GPU utilization rises substantially to ~90%. i don't really see why this is happening, especially considering that leaving a single CPU core free should be more than enough to service 2 concurrent BRP4 ATI tasks b/c 1) they only consume 0.5 CPUs each, and b/c 2) they consume a bit less than that in reality, according to the Windows Task Manager. it'll take some more time to see how altering these variables affect both CPU task and GPU task run times...

back OT, i got everything put into the 600T and its back up and running now. however i won't get to test the built-in fan controller until tomorrow. when i opened it, there was one little plastic tab that was broken off - its part of a touch latch that opens and closes the front mesh fan cover. i believe super glue will do the trick. otherwise the case is a gem so far...

Fardringle · May 16, 2012

I'd like to try this out but I'm running BOINC version 7.0.26 (one version too old) and the Berkeley site has been inaccessible for the last 24 hours. 🙁

Anybody have an alternate download site for the BOINC client? If not, I guess I'll just have to wait...

salvorhardin · May 16, 2012

Just uploaded it onto my dropbox

blckgrffn · May 16, 2012

For this stuff, Dropbox is boss. I've found the new easy linkage feature awesome for posting photos here.

Fardringle · May 16, 2012

Thanks, salvorhardin! 🙂

petrusbroder · May 16, 2012

Thanks from me too, salvorhardin! 🙂

Sunny129 · May 16, 2012

ok, so i've been running for the last 24 hours two E@H BRP4 ATI GPU tasks and five E@H LineVeto tasks on my 1090T. a pair of GPU tasks are taking approx 105 minutes to complete, while the CPU tasks are taking approx. 270 minutes to complete. as soon as the next two GPU tasks complete, i'll take away a CPU core from the CPU app, allowing GPU utilization to go from ~77% to ~90%. then after some time passes, i'll measure the CPU and GPU task run times again.

for the time being however, under the current configuration of running 2 concurrent GPU tasks (along with 5 CPU tasks of course), there appears to be a substantial increase in compute efficiency in comparison to running only 1 GPU task at a time. to be precise, two GPU tasks running in parallel only take ~82% of the amount of time it took for two GPU tasks run in series to finish. i'm hoping for more improvements in efficiency when i start running only 4 CPU tasks and my GPU utilization goes up. i'll post up when i know...

salvorhardin · May 17, 2012

While running 2 tasks at a time my runtimes have averaged 103m, when I tried 3 at a time my runtimes went up to 366m. My gpu utilization is now sitting around 90%. All My workunits so far have been done with all my cpu cores being free to feed the gpu. You can also get 7.0.28 directly from einstein until seti is back up, but I don't know what was fixed for the newest version.

Sunny129 · May 17, 2012

ok, so under the "2 GPU tasks + 4 CPU tasks" configuration, there appears to be an even greater increase in compute efficiency in comparison to running only 1 GPU task at a time than there was w/ the "2 GPU tasks + 5 CPU tasks" configuration. BRP4 ATI task run time went from ~105 minutes down to ~90 minutes. two GPU tasks running in parallel now only take ~70% of the amount of time it took for two GPU tasks run in series to finish.

i'm hoping to test 3 simultaneous BRP4 ATI tasks shortly...

Sunny129 · May 17, 2012

salvorhardin said:
While running 2 tasks at a time my runtimes have averaged 103m, when I tried 3 at a time my runtimes went up to 366m. My gpu utilization is now sitting around 90%. All My workunits so far have been done with all my cpu cores being free to feed the gpu.

you're obviously better off running 2 tasks (@ 51.5 minutes each) simultaneously than 3 tasks (@ 122 minutes each). out of curiosity though, what was your GPU utilization while running 2 GPU tasks and 0 CPU tasks? i'm assuming you're doing this on the HD 5850 in your sig? is this a 1GB card? if so, that may explain why 2 simultaneous tasks crunch more efficiently for you than 3 tasks - you're probably running into a VRAM bottleneck. each BRP4 ATI task consumes approx. 355MB of VRAM...and i say approximately b/c it doesn't scale perfectly. for instance, on my system, MSI Afterburner shows 707MB of VRAM consumed for 2 simultaneous tasks, and 1058 MB for 3 simultaneous tasks (which is well over the 1024MB of VRAM found on a 1GB card).

fortunately my HD 6950 is a 2GB card, so it could probably handle 5 simultaneous tasks provided i 1) don't max out GPU utilization first, and 2) have the CPU resources to run all 5 GPU tasks. realistically, i think GPU utilization will max out before i enable a 5th simultaneous task (i think it'll max out w/ only 4 simultaneous tasks). i took the liberty to play w/ CPU utilization once my client started running 3 simultaneous GPU tasks. under a "3 GPU tasks + 5 CPU tasks" configuration, GPU utilization was well down near 65%. after limiting CPU tasks to run on only 4 cores, GPU utilization jumped up to ~92%. after further limiting CPU tasks to run on only 3 cores, GPU utilization jumped up to a solid 95%. for now, i'm testing the "3 GPU tasks + 4 CPU tasks" configuration, where GPU utilization is at ~92%. after i pin down the run times and compute efficiency of this configuration, i'll test out the "3 GPU tasks + 3 CPU tasks" configuration...

Sunny129 · May 18, 2012

another update...

well i've been testing the "3 GPU tasks + 4 CPU tasks" configuration since dinnertime now (approx. 7 hours), and i'm about to go to bed. so far, it appears the GPU tasks are now taking 125 minutes to complete, at no detriment to the 4 CPU tasks (they still appear to be consistently finishing in approx. 4.5 hours). i plan to continue this test throughout the night to make sure the run times are consistent. but if the range of run times i've seen so far doesn't broaden significantly over the next 6 hours or so, then it'll be safe to say that 3 GPU tasks running in parallel only take ~65% of the amount of time it took for two GPU tasks run in series to finish...that's quite the improvement in efficiency! *EDIT* - numbers confirmed (as of 12:00 UTC)

in the morning, i'll decrease the number of simultaneous CPU tasks to only 3 and see if the GPU crunching efficiency improves even more...after that, it'll be on to testing 4 simultaneous GPU tasks. i should note that the trade-off between CPU tasks and GPU tasks is worth it. i don't think i mentioned it before, but i'm running mostly Einstein@Home S6LV1 (Gravitational Wave LineVeto search) tasks and occasionally FGRP1 (Gamma Ray Pulsar search) on the CPU, both of which run far longer and earn significantly less credit than the BRP4 tasks running on my GPU. not that i'm in it for the points (i could care less about that), but more credit generally means more work done/more data crunched, provided you aren't cross-referencing projects, and are only comparing the tasks of the same or different applications under the same project.

New Einstein@Home Application for ATI/AMD GPUs!

Diamond Member

Diamond Member

Diamond Member

Golden Member

Elite Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Elite Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member