GPU "Crunching" Performance

Phynaz · Oct 16, 2012

GPU performance of World Community Grid work units:

http://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,34055

Before the flame wars start, this applies only to WCG, and is not necessarily indicative of overall compute performance of a particular brand, okay?

ShintaiDK · Oct 16, 2012

AMD cards support BIT_ALIGN_INT, but requires three separate hardware instructions to be emulated on nVidia GPUs (2 shifts + 1 add). Thats why they do so well in Bitcoin.

I wonder if WCG uses the same instruction. Or the worktask as such just simply fits AMD better.

tweakboy · Oct 16, 2012

Nice information ShintaiDK my friend. I will pass it on to my bitcoin addict I know. gl

RussianSensation · Oct 16, 2012

My guess on this one is OpenCL optimizations.

Here it says the project takes advantage of OpenCL with HD5000 or higher.

If we look at HD7770 outperforming HD6900 series, BIT_ALIGN_INT, theoretical ALU performance, single precision or double precision performance cannot explain this because 6900 creams HD7770 in all of those. Based on comletion times in this thread, even HD7770 has no trouble keeping up with an HD5870. Therefore, something else much be the answer, likely specific OpenCL optimizations on HD7000 architecture.

There is also a mention of OpenCL conversion for multiple platforms here.

If this program takes advantage of OpenCL, it would explain why HD7700 series beats HD6900 and why GTX580 has no trouble beating GTX680 since OpenCL performance is weaker on Kepler consumer parts.

ShintaiDK · Oct 16, 2012

RussianSensation said:
My guess on this one is OpenCL optimizations.

Here it says the project takes advantage of OpenCL with HD5000 or higher.

If we look at HD7770 outperforming HD6900 series, BIT_ALIGN_INT, theoretical ALU performance, single precision or double precision performance cannot explain this because 6900 creams HD7770 in all of those. Based on comletion times in this thread, even HD7770 has no trouble keeping up with an HD5870. Therefore, something else much be the answer, likely specific OpenCL optimizations on HD7000 architecture.

There is also a mention of OpenCL conversion for multiple platforms here.

If this program takes advantage of OpenCL, it would explain why HD7700 series beats HD6900 and why GTX580 has no trouble beating GTX680 since OpenCL performance is weaker on Kepler consumer parts.

HD5xxx/HD6xxx also supports BIT_ALIGN_INT. I doubt OpenCL is the cause at all.

The reason is most likely that the old cards couldnt run a solid load. Its no secret that they power throttled.
http://www.anandtech.com/show/4061/amds-radeon-hd-6970-radeon-hd-6950/8

RussianSensation · Oct 17, 2012

Thermal throttling on HD58xx/69xx as an explanation? Now you are just grasping for straws.

The GPU performance delta between HD5870 and HD6970 for arithmetic / ALU computational work under traditional workloads compared to the HD7770 is vast.

HD6970 is 2.11x faster than HD7770 at single-precision calculations or ALU performance, meaning it could have ran at 440mhz and still tied an HD7770. Double precision performance on HD7770 is 8.44x slower than an HD6970. No amount of real world thermal throttling can explain how HD7770 would outperform an HD6970 in any compute app. Something else is the answer. Not only that but the thermal throttling theory itself is bogus as cards like HD5870/6970 don't thermal throttle if you adjust the PowerTune in CCC.

I had an HD6950 @ 6970 880mhz and ran MilkyWay@Home for more than half a year 24/7 loaded and it didn't throttle once, despite this app loading the GPU to 99%. Stop spreading false information. Even your link shows throttling in Metro 2033 from 880mhz to 850mhz; and that type of drop cannot possibly explain why HD7770 would outperform an HD6970 in this DC app. The furmark throttling was implemented as early as HD5800 and Fermi series by both NV and AMD on purpose to stop the power virus from overloading the VRMs and killing GPUs. This was a well-known and investigated scenario. It's no wonder that Furmark throttled to 600mhz in that link.

If you increase PowerTune to +20%, there is no throttling whatsoever on any AMD cards and never has been on HD6970 even at 99% load in games or distributed computing projects. Of course you seem to be keen on always trying to find negatives in AMD cards rather than objectively trying to arrive at a root cause. As someone who ran countless DC projects on my unlocked 6950, I can say 6970 won't throttle at all.

Also, your logic for BIT_ALIGN_INT holds little water since HD5800/6900 destroy HD7770 in bitcoin mining, the very app that benefits from BIT_ALIGN_INT calculation. So that also can't be the answer. Not to mention BIT_ALIGN_INT is a purely ALU driven calculation and HD6970 has 2.11x the ALU performance over HD7770. Anything that relates to ALUs, shader speed, integer calculations, clock speeds cannot be the answer since HD7770 is inferior to HD5870/6970 in all of those areas. What's left? Architecture specific optimizations that this DC app uses.

The only way HD7770 can beat HD5870/6970 in a non-traditional compute task (i.e., not related to pure ALU, single-precision or double precision performance) is if it uses specific instructions that benefit an architecture that performs faster in OpenCL or DirectCompute, which GCN happens to run faster than Cypress/Cayman in a lot of these style apps. If we look at Compute performance under OpenCL, there are situations where HD7770 outperforms HD6970.

Plimogz · Oct 17, 2012

Take that Cayman time with a grain of salt, given that my 6950 (unlocked and @880MHz) is completing WUs fully a minute faster than that list would lead you to believe. (i.e. 150s instead of 210s)

Search

GPU "Crunching" Performance

Phynaz

Lifer

ShintaiDK

Lifer

tweakboy

Diamond Member

RussianSensation

Elite Member

ShintaiDK

Lifer

RussianSensation

Elite Member

Plimogz

Senior member

TRENDING THREADS