AMD A10-7850K GPGPU : Memory and GPU Scaling in OpenCL

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
A few days ago we had a nice debate here in the AT forums about APU memory bandwidth and how memory starved the A10-7850K is. With that in mind and a new AMD A10-7850K available, I started running a few benchmarks to understand how the Memory and GPU affects Open CL performance.

I’ve used two free benchmarks available that everyone can download and do your own research if you like. The first one is Luxmark 2.0, this benchmark is one of my favorite for this kind of testing because it allows you to separately bench the CPU, the GPU or to combine both in CPU + GPU mode. I have used the GPU Only and CPU + GPU mode.

The second Benchmark used is the CL Benchmark 1.1.3. This one has a lot of sub tests but I have only used the Fluid Simulation. This benchmark only used the iGPU of the APU.

For memory I have used 2 pairs of 4GB each (total 8GB) of Kingston HyperX Genesis DDR-3 2133MHz. For the memory scaling the RAM was tested in 1333MHz at 9-9-9 timings, 1600MHz at 9-9-9 timings, 1866MHz at 10-11-10 timings, 2133MHz at 11-12-11 timings and 2400MHz at 11-13-12 timings.

Memory bandwidth at those settings is,

1333MHz = 21,3GB/s Memory Bandwidth
1600MHz = 25,6 GB/s Memory Bandwidth
1866MHz = 29,9 GB/s Memory Bandwidth
2133MHz = 34,1 GB/s Memory Bandwidth
2400MHz = 38,4 GB/s Memory Bandwidth

In order to have the same CPU performance, CPU Turbo was deactivated and CPU frequency was raised to 4.0GHz. I can also confirm that in Luxmark 2.0 at CPU + GPU mode, the CPU frequency was only measured to be at 3.0GHz when the benchmark was running. That confirms the Kaveri CPU Throttling when the iGPU is working at 100% load. So for that Benchmark mode you will have to remember that the CPU was only running at 3.0GHz.

32zhve0.jpg


For the GPU scaling I used the Integrated GPU of the A10-7850K at 720MHz, 800MHz and 960MHz. It would be nice to also use the A10-7700K and compare its iGPU against the 512 Shaders inside of the A10-7850K but since I don’t have the APU now I have only the Luxmark results from my clock to clock review.

The hardware used for the review is,

APU : AMD A10-7850K @ 4.0GHz, NB at 2000MHz
Motherboard : ASUS A88XM-Plus (BIOS 11.2)
RAM : 2x 4GB Kingston HyperX Genesis DDR-3 2133MHz
HD D Seagate 1TB SATA-6
PSU: ThermalTake TR2 380W
Catalyst 14.4
Win 8.1 64bit

6fmczk.jpg
 
  • Like
Reactions: Drazick

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Luxmark 2.0
We start with Luxmark 2.0 in GPU only mode. In that mode only the Integrated GPU is used by the Benchmark to render the picture. So let’s see what happens.


Memory Scaling.

As we can clearly see from the graphs, the performance scaling is decreasing when we increase the Memory Bandwidth from 1600MHz Memory frequency and up to 2400MHz. Also it seems that at the default 720MHz on the iGPU, performance is reaching its highest point when we use 2133MHz memory. That makes it clear that the A10-7850K with 2133MHz memory is not memory starved but compute limited. We can also confirm that by using the 960MHz data. The iGPU at 960MHz using even 1333MHz memory is faster than 720MHz paired with 2133MHz memory.

But, we can also confirm that performance scaling is increasing when we have more compute power by raising the memory bandwidth. At 960MHz for the iGPU, we reach total performance scaling from 1333MHz to 2400MHz of 16,22%. That clearly shows that memory bandwidth does play an important role the higher the Compute performance we have.



svknrp.jpg


6oocr7.jpg

* Total mem scaling = From 1333MHz to 2400MHz


GPU scaling.

Well, raising the Compute capability of the iGPU by raising the Frequency does scales higher even using low bandwidth memory like 1333MHz. Clearly we get higher scaling from increasing the GPU performance than using higher memory Bandwidth. And again using both higher GPU and Memory Bandwidth we get even higher scaling making higher Memory bandwidth needed the higher the GPU compute.

10nw4n6.jpg

* Total GPU scaling = From 720MHz to 960MHz



 
Last edited:
  • Like
Reactions: Drazick

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
CPU + GPU Mode.

This is the most important mode because that is what those APUs are supposed to be made for. To use both the CPU and iGPU cores for GPGPU tasks. In this mode both the CPU and the iGPU are working simultaneously raising the Memory bandwidth requirements. So lets see what happens.

Memory Scaling.

By using both the CPU and the iGPU the system needs more bandwidth to feed both the 4 CPU cores and its 512 GPU cores. So it shouldn’t be a big surprise that in that mode raising the memory bandwidth has a large impact in performance.

Especially when the compute capability is increased substantially at 960MHz on the iGPU the memory requirements are even higher. That’s why we get almost a 24% increase in performance going from 1333MHz to 2400MHz.

2wh11tx.jpg


200spl5.jpg



GPU scaling

Well increasing the GPU frequency doesn’t have the same scaling we show at the GPU only mode but it does scale none the less. And again we get the higher scaling by using the higher GPU and Memory frequency. So again, we can confirm that memory bandwidth requirements are increasing with higher compute performance.

25kkcog.jpg


A10-7850K vs A10-7700K (Both at 4.0GHz and 2133MHz Memory)

2upvasx.jpg



And again performance does scales higher by having more Compute capability, making 2133MHz not the limiting factor.
 
Last edited:
  • Like
Reactions: Drazick

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
CL Benchmark 1.1.3 : SPH Fluid Simulation.

In the Fluid Simulation the iGPU behaved completely different than in Luxmark 2.0.

As you can see, although we get a small scaling by increasing the Memory Bandwidth it is the increase of the iGPU frequency that provides the highest performance scaling in this benchmark. Even using 1333MHz memory by only increasing the iGPU from 720MHz to 960MHz we easily get a 27,54% increase in performance. So for this type of benchmarks the APU is not memory limited but Compute starved.


2rc8y6s.jpg


Memory Scaling

2hxvjb5.jpg


GPU Scaling

2dr8m5f.jpg



Hope you enjoyed the Review ;)


 
Last edited:
  • Like
Reactions: Drazick

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
You can disable the throttling:
1. You need to have AMD OverDrive software:
http://www.amd.com/en-us/markets/game/downloads
2. You need to enable "turbo" in BIOS.
3. After system boots, run AMD OverDrive.
4. Go to Clock/Voltage control
5. Click T"urbo Core Control" button
6. Uncheck "enable turbo core" checkbox.
7. Enjoy unlimited CPU speed under 100% iGPU load. (until next system startup)
8....???
9. Profit
 
Last edited:

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
You can disable the throttling:
1. You need to have AMD OverDrive software:
http://www.amd.com/en-us/markets/game/downloads
2. You need to enable "turbo" in BIOS.
3. After system boots, run AMD OverDrive.
4. Go to Clock/Voltage control
5. Click T"urbo Core Control" button
6. Uncheck "enable turbo core" checkbox.
7. Enjoy unlimited CPU speed under 100% iGPU load. (until next system startup)
8....???
9. Profit

Yea thx I ll try it and report back with graphs ;)
 
  • Like
Reactions: Drazick

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Houston we have a problem,

When I start AOD, the system reboots :p

Any recommendations ??
 
  • Like
Reactions: Drazick

DrMrLordX

Lifer
Apr 27, 2000
22,518
12,387
136
Thanks for the benchmarks! It is interesting that there is divergence in behavior between Luxmark 2.0 and SPH Fluid Simulation. However, I think the CPU + iGPU Luxmark 2.0 scores are the most relevant to "real world" OpenCL and HSA applications.

If you were to disable core throttling (sorry, can't help you with AOD causing system reboots, though there is another method to disable throttling involving the use of msr tweaker to adjust p-states p5 or p6, I forget which), the memory bandwidth requirements would probably be higher.

So, even for compute, Kaveri can benefit from additional memory bandwidth. Perhaps the effects of memory bandwidth are not as dramatic as they are in games, but the effects are still there. It is clear that AMD needs to move to DDR3-2133 as their minimum memory standard for APUs.
 

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
Thanks for the benchmarks! It is interesting that there is divergence in behavior between Luxmark 2.0 and SPH Fluid Simulation. However, I think the CPU + iGPU Luxmark 2.0 scores are the most relevant to "real world" OpenCL and HSA applications.

If you were to disable core throttling (sorry, can't help you with AOD causing system reboots, though there is another method to disable throttling involving the use of msr tweaker to adjust p-states p5 or p6, I forget which), the memory bandwidth requirements would probably be higher.

So, even for compute, Kaveri can benefit from additional memory bandwidth. Perhaps the effects of memory bandwidth are not as dramatic as they are in games, but the effects are still there. It is clear that AMD needs to move to DDR3-2133 as their minimum memory standard for APUs.

or build better memory controllers.