• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

WCCFAMD Carrizo APU on the 28nm Node Will Have Stacked DRAM On Package

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
While those results are interesting, they're hardly conclusive (and really, if we're going to be sticklers here, we should see OpenCL AND HSA benches since they are not exactly the same). There are uncontrolled elements here, such as the host CPU in the second benchmark being an i7-3960x. All you've shown is that Kaveri alone matches the Luxmark performance of a 3960x paired with a 7750 (which, admittedly, is pretty sweet). We would get much better data when isolating a single test platform (namely a 7850k) and running HSA and OpenCL benches at different memory speeds and/or iGPU speeds.

If OpenCL/HSA performance scaled upwards linearly with increased iGPU clockspeed at the same RAM clockspeed (say, DDR3-1600), we can reasonably conclude that you are correct.

If OpenCL/HSA performance did not change even with increased memory clockspeed AT THE SAME TIMINGS (say, CAS 12 just to make sure it would actually work) and the same iGPU speed, then we can also conclude that you are correct.

But, comparing a 3960x + 7750 to a 7850k doesn't necessarily give us that conclusion. The problem is that the x86 cores are contributing to the score somewhat, so it may be that the 7850k is turning in a stronger compute performance to offset what is probably a better x86 performance from the 3960x.
 
While those results are interesting, they're hardly conclusive (and really, if we're going to be sticklers here, we should see OpenCL AND HSA benches since they are not exactly the same). There are uncontrolled elements here, such as the host CPU in the second benchmark being an i7-3960x. All you've shown is that Kaveri alone matches the Luxmark performance of a 3960x paired with a 7750 (which, admittedly, is pretty sweet). We would get much better data when isolating a single test platform (namely a 7850k) and running HSA and OpenCL benches at different memory speeds and/or iGPU speeds.

If OpenCL/HSA performance scaled upwards linearly with increased iGPU clockspeed at the same RAM clockspeed (say, DDR3-1600), we can reasonably conclude that you are correct.

If OpenCL/HSA performance did not change even with increased memory clockspeed AT THE SAME TIMINGS (say, CAS 12 just to make sure it would actually work) and the same iGPU speed, then we can also conclude that you are correct.

But, comparing a 3960x + 7750 to a 7850k doesn't necessarily give us that conclusion. The problem is that the x86 cores are contributing to the score somewhat, so it may be that the 7850k is turning in a stronger compute performance to offset what is probably a better x86 performance from the 3960x.

Have a look at HD7770 and see how much faster it is compared to HD7750, both of them have the same Memory at the same frequency.

luxmark.png


Also, running Luxmark 2.0 in GPU mode doesn't use the CPU at all. So, it doesn't matter what CPU you use in the system because the Benchmark will only use the GPU. Only if you run in CPU + GPU mode the final score will depend on the CPU as well.
 
LuxMark is a single GPGPU workload. Some workloads will love compute performance, others will love memory bandwidth. It depends on what your problem is.
 
The APU need it only for Gaming, for everything else including GPGPU DDR-3 is just fine.

But isn't gaming also the primary reason for increasing the iGPU performance further on APUs? I.e. if they want to do so, they also need to increase the memory bandwidth for it to be of any use.
 
LuxMark is a single GPGPU workload. Some workloads will love compute performance, others will love memory bandwidth. It depends on what your problem is.

I dont disagree, but that is the one benchmark i have found to compare the A10-7850K to HD7750. Im sure there are applications that need more memory bandwidth but i strongly believe at the Kaveri hardware level it is more important to have more compute units than memory bandwidth in the majority of GPGPU applications.
 
But isn't gaming also the primary reason for increasing the iGPU performance further on APUs? I.e. if they want to do so, they also need to increase the memory bandwidth for it to be of any use.

Fusion primary focus is on Compute, not gaming. Gaming is an added bonus, AMD seeks the highest compute per cost possible. Adding stacked memory at 28nm will elevate the platform cost and that is not what AMD or the OEMs would want.
 
Fusion primary focus is on Compute, not gaming. Gaming is an added bonus, AMD seeks the highest compute per cost possible. Adding stacked memory at 28nm will elevate the platform cost and that is not what AMD or the OEMs would want.

But it will always only be a subset of work task types that can benefit from Compute (via OpenCL and similar), the rest must use the general purpose CPU instead. In addition it requires SW support on the App layer to make use of it.

Anyway, since AMD is intending to add HBM / Stacked DRAM on the APUs at some point, I guess they must have gaming in focus too. Otherwise there would not be much use in adding that if I understand you correctly (since Computer does not need HBM / Stacked DRAM)?
 
Have a look at HD7770 and see how much faster it is compared to HD7750, both of them have the same Memory at the same frequency.

luxmark.png


Also, running Luxmark 2.0 in GPU mode doesn't use the CPU at all. So, it doesn't matter what CPU you use in the system because the Benchmark will only use the GPU. Only if you run in CPU + GPU mode the final score will depend on the CPU as well.

Using the 384 core 7730 there is a small difference in Luxmark between GDDR5 and DDR3. 512 cores would be held back even more.

Luxmark.png
 
Fusion primary focus is on Compute, not gaming. Gaming is an added bonus, AMD seeks the highest compute per cost possible. Adding stacked memory at 28nm will elevate the platform cost and that is not what AMD or the OEMs would want.

You keep saying this, and yet when I go to AMDs website all they talk about is gaming on the APU product pages.
 
You keep saying this, and yet when I go to AMDs website all they talk about is gaming on the APU product pages.

Do you want me to post all the links for Fusion, HSA, HuMa, OpenCL etc etc ??

Also, Kaveri slides where heavy with OpenCL and HSA material as well as Gaming.

I havent said that Gaming is not important to AMD APUs but it sure its not the first focus.

edit: Do you believe Intel is investing money, resources and die space for its iGPUs for gaming or Compute ??
 
If AMD is intending to add stacked DRAM to APUs, gaming must be in focus, right? Since according to what you mentioned earlier, it's not needed for compute. Or did I miss something?
 
Do you want me to post all the links for Fusion, HSA, HuMa, OpenCL etc etc ??

Also, Kaveri slides where heavy with OpenCL and HSA material as well as Gaming.

I havent said that Gaming is not important to AMD APUs but it sure its not the first focus.

edit: Do you believe Intel is investing money, resources and die space for its iGPUs for gaming or Compute ??

No need. As I said the AMD product page disagrees with you. Go argue with them, but stop claiming you know more about AMDs product positioning than they do.

Should I link to your posts where you explicitly said you were right and AMD was wrong?
 
If HBM is used it will be the L3 cache. While, the stacked memory will have latency similar to system memory. It will not have the issues of previous L3 caches from AMD.

Orochi(AMD) for example had pathetic bandwidth and a very small size which made its latency awful. HBM being the L3 will provide up to 128 GB/s and 1 gigabyte of memory.

AIDA64Memory.png

Haswell in comparison: http://i1365.photobucket.com/albums...C10_AIDA64_25133Copy_zps9802a583.png~original

SRAM @ 8 MB - 128 GB/s Read - 40 GB/s Write - 50 GB/s Copy / 50% of the die (Interface + SRAM)
vs
Stacked DRAM @ 1 GB - 128 GB/s peak throughput / ~5?% of the die (Interface)
 
Last edited:
I havent said that Gaming is not important to AMD APUs but it sure its not the first focus.
But the reality is for any advanced GPU use, that gaming or more precisely real-time raster graphics is always the first, the second, the third to infinity except the last use of an APU. Yes, it was said to be compute-friendly, but the sad truth is that the compute features are mostly irrelevant so far. HSA and OCL2 may make it relevant, but still it is uncertain.

Don't even mention the beloved Kaveri in fact supported only partially the full profile of HSA, which may cause problems if AMD cannot come up with a solution to those reliability issues.

P.S. Gaming is essentially compute. Hmm.
 
Last edited:
If HBM is used it will be the L3 cache.
Then I will drop an option here for others to pick: HBM needn't be used as an LLC from day one. In fact, stacked DRAM as cache is just something on an academic paper so far, and the research is centric around CPU workloads as far as I've read. Do not be confused with eDRAM though.
 
Last edited:
AMD's plans for Stacked DRAM is for L3, no ifs or buts. I just wanted it to sound as if there was a choice in the matter.

People tend to reply when I type if. That is one of the reasons I distribute so much ifs around.

If you want to try to disprove my claim that AMD will not use HBM as L3 try it. I am pretty sure I am as solid as an unobtainium wall. When, stating Advanced Micro Devices will only use HBM as a L3 cache.
 
Last edited:
It is not true. But I guess you will insist as always.
It is 100% true that AMD will only use HBM for L3.

The hypertransport cache-coherent interconnect has two memory interfaces; GDDR5 for GPUs or DDR3/DDR4 for APUs/SOCs, then HBM for L3.

There is nothing from AMD implying or stating that Stacked DRAM will replace System RAM. Thus, HBM is stuck as an L3 cache, no ifs or buts.
 
Last edited:
How does the GT3e's eDRAM work with system memory to allocate resources for graphics considering it's an "L4 cache"? Do drivers only use the cache as a framebuffer?
 
How does the GT3e's eDRAM work with system memory to allocate resources for graphics considering it's an "L4 cache"? Do drivers only use the cache as a framebuffer?
It's a hardware cache. Memory pages (any memory pages, CPU and GPU) are stored in there as deemed necessary, just like the other cache layers.
 
You keep saying this, and yet when I go to AMDs website all they talk about is gaming on the APU product pages.

No need. As I said the AMD product page disagrees with you. Go argue with them, but stop claiming you know more about AMDs product positioning than they do.

I dont know what product page you are looking at, but the APU page have the following.

http://www.amd.com/en-us/products/processors/desktop/a-series-apu#

Meet the new APU

AMD’s most advanced APU ever. Welcome to the revolution.

Introducing the AMD A10-7850K , AMD’s most advanced APU technology. So revolutionary, it challenges the very definition of a processor. With 12 Compute Cores (4 CPU + 8 GPU)* with AMD Radeon™ R7 graphics and exclusive features like AMD TrueAudio technology for immersive audio, it can take on Battlefield 4™ or just about anything else you throw at it.2,3
Features

HSA

Unlock your system’s full potential with revolutionary HSA architecture – the new standard in processor design – enabling the CPU and GPU to work in perfect harmony and blaze through compute tasks in Ultra HD resolution.
Graphics Core Next (GCN) Architecture

Get extreme performance with Graphics Core Next (GCN) architecture, featuring supercharged AMD Radeon™ R7 graphics.4
Mantle Technology

AMD Mantle technology raises your game to unprecedented levels with hyper-efficient performance.3
AMD TrueAudio Technology

Designed for breathtaking immersive audio, AMD TrueAudio technology sets a new level of immersion. Hear your enemy’s every move and anticipate their next.1
AMD Eyefinity Technology

Connect up to four displays to see every angle of the battlefield with AMD Eyefinity technology in its full glory.5
AMD Radeon™ Memory

AMD Radeon™ Memory enables top performance and maximum value for a boost in entertainment and gaming experiences.6
AMD A-Series APU

AMD A10 APU features:


From Computex June 2014,

2czpq4h.jpg


And from Mobile Kaveri release, first pages are all about HSA and compute performance, Gaming comes later in every AMD PDF.


2h3d2e1.jpg


10qh3sp.jpg


mmd4rb.jpg


257its2.jpg


Nobody said gaming is not important, but it is not the first focus.

Also nobody said that memory bandwidth is not important, you dont increase the compute units without increasing the memory bandwidth to some degree, or decrease memory/cache latency etc.
 
Last edited:
If AMD is intending to add stacked DRAM to APUs, gaming must be in focus, right? Since according to what you mentioned earlier, it's not needed for compute. Or did I miss something?

I didnt say memory bandwidth is not needed for Compute, i said the Kaveri is not memory bandwidth constrained in compute but only in Gaming and that gaming is not the main focus of the APUs.

As the APUs will gain more compute units, memory bandwidth needs for compute and obviously for gaming will also grow. So at some point they will need to add stacked memory in to the APUs. Not only for memory bandwidth but for economical and technical reasons like smaller die sizes, shorter interconnects etc.
 
Back
Top