GPU "Crunching" Performance

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
AMD cards support BIT_ALIGN_INT, but requires three separate hardware instructions to be emulated on nVidia GPUs (2 shifts + 1 add). Thats why they do so well in Bitcoin.

I wonder if WCG uses the same instruction. Or the worktask as such just simply fits AMD better.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
My guess on this one is OpenCL optimizations.

Here it says the project takes advantage of OpenCL with HD5000 or higher.

If we look at HD7770 outperforming HD6900 series, BIT_ALIGN_INT, theoretical ALU performance, single precision or double precision performance cannot explain this because 6900 creams HD7770 in all of those. Based on comletion times in this thread, even HD7770 has no trouble keeping up with an HD5870. Therefore, something else much be the answer, likely specific OpenCL optimizations on HD7000 architecture.

There is also a mention of OpenCL conversion for multiple platforms here.

If this program takes advantage of OpenCL, it would explain why HD7700 series beats HD6900 and why GTX580 has no trouble beating GTX680 since OpenCL performance is weaker on Kepler consumer parts.
 
Last edited:

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
My guess on this one is OpenCL optimizations.

Here it says the project takes advantage of OpenCL with HD5000 or higher.

If we look at HD7770 outperforming HD6900 series, BIT_ALIGN_INT, theoretical ALU performance, single precision or double precision performance cannot explain this because 6900 creams HD7770 in all of those. Based on comletion times in this thread, even HD7770 has no trouble keeping up with an HD5870. Therefore, something else much be the answer, likely specific OpenCL optimizations on HD7000 architecture.

There is also a mention of OpenCL conversion for multiple platforms here.

If this program takes advantage of OpenCL, it would explain why HD7700 series beats HD6900 and why GTX580 has no trouble beating GTX680 since OpenCL performance is weaker on Kepler consumer parts.

HD5xxx/HD6xxx also supports BIT_ALIGN_INT. I doubt OpenCL is the cause at all.

The reason is most likely that the old cards couldnt run a solid load. Its no secret that they power throttled.
http://www.anandtech.com/show/4061/amds-radeon-hd-6970-radeon-hd-6950/8
 
Last edited:

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
Thermal throttling on HD58xx/69xx as an explanation? Now you are just grasping for straws.

The GPU performance delta between HD5870 and HD6970 for arithmetic / ALU computational work under traditional workloads compared to the HD7770 is vast.

HD6970 is 2.11x faster than HD7770 at single-precision calculations or ALU performance, meaning it could have ran at 440mhz and still tied an HD7770. Double precision performance on HD7770 is 8.44x slower than an HD6970. No amount of real world thermal throttling can explain how HD7770 would outperform an HD6970 in any compute app. Something else is the answer. Not only that but the thermal throttling theory itself is bogus as cards like HD5870/6970 don't thermal throttle if you adjust the PowerTune in CCC.

I had an HD6950 @ 6970 880mhz and ran MilkyWay@Home for more than half a year 24/7 loaded and it didn't throttle once, despite this app loading the GPU to 99%. Stop spreading false information. Even your link shows throttling in Metro 2033 from 880mhz to 850mhz; and that type of drop cannot possibly explain why HD7770 would outperform an HD6970 in this DC app. The furmark throttling was implemented as early as HD5800 and Fermi series by both NV and AMD on purpose to stop the power virus from overloading the VRMs and killing GPUs. This was a well-known and investigated scenario. It's no wonder that Furmark throttled to 600mhz in that link.

If you increase PowerTune to +20%, there is no throttling whatsoever on any AMD cards and never has been on HD6970 even at 99% load in games or distributed computing projects. Of course you seem to be keen on always trying to find negatives in AMD cards rather than objectively trying to arrive at a root cause. As someone who ran countless DC projects on my unlocked 6950, I can say 6970 won't throttle at all.

Also, your logic for BIT_ALIGN_INT holds little water since HD5800/6900 destroy HD7770 in bitcoin mining, the very app that benefits from BIT_ALIGN_INT calculation. So that also can't be the answer. Not to mention BIT_ALIGN_INT is a purely ALU driven calculation and HD6970 has 2.11x the ALU performance over HD7770. Anything that relates to ALUs, shader speed, integer calculations, clock speeds cannot be the answer since HD7770 is inferior to HD5870/6970 in all of those areas. What's left? Architecture specific optimizations that this DC app uses.

The only way HD7770 can beat HD5870/6970 in a non-traditional compute task (i.e., not related to pure ALU, single-precision or double precision performance) is if it uses specific instructions that benefit an architecture that performs faster in OpenCL or DirectCompute, which GCN happens to run faster than Cypress/Cayman in a lot of these style apps. If we look at Compute performance under OpenCL, there are situations where HD7770 outperforms HD6970.
 
Last edited:

Plimogz

Senior member
Oct 3, 2009
678
0
71
Take that Cayman time with a grain of salt, given that my 6950 (unlocked and @880MHz) is completing WUs fully a minute faster than that list would lead you to believe. (i.e. 150s instead of 210s)