WCCFAMD Carrizo APU on the 28nm Node Will Have Stacked DRAM On Package

DrMrLordX · Jul 30, 2014

While those results are interesting, they're hardly conclusive (and really, if we're going to be sticklers here, we should see OpenCL AND HSA benches since they are not exactly the same). There are uncontrolled elements here, such as the host CPU in the second benchmark being an i7-3960x. All you've shown is that Kaveri alone matches the Luxmark performance of a 3960x paired with a 7750 (which, admittedly, is pretty sweet). We would get much better data when isolating a single test platform (namely a 7850k) and running HSA and OpenCL benches at different memory speeds and/or iGPU speeds.

If OpenCL/HSA performance scaled upwards linearly with increased iGPU clockspeed at the same RAM clockspeed (say, DDR3-1600), we can reasonably conclude that you are correct.

If OpenCL/HSA performance did not change even with increased memory clockspeed AT THE SAME TIMINGS (say, CAS 12 just to make sure it would actually work) and the same iGPU speed, then we can also conclude that you are correct.

But, comparing a 3960x + 7750 to a 7850k doesn't necessarily give us that conclusion. The problem is that the x86 cores are contributing to the score somewhat, so it may be that the 7850k is turning in a stronger compute performance to offset what is probably a better x86 performance from the 3960x.

AtenRa · Jul 30, 2014

DrMrLordX said:
While those results are interesting, they're hardly conclusive (and really, if we're going to be sticklers here, we should see OpenCL AND HSA benches since they are not exactly the same). There are uncontrolled elements here, such as the host CPU in the second benchmark being an i7-3960x. All you've shown is that Kaveri alone matches the Luxmark performance of a 3960x paired with a 7750 (which, admittedly, is pretty sweet). We would get much better data when isolating a single test platform (namely a 7850k) and running HSA and OpenCL benches at different memory speeds and/or iGPU speeds.

If OpenCL/HSA performance scaled upwards linearly with increased iGPU clockspeed at the same RAM clockspeed (say, DDR3-1600), we can reasonably conclude that you are correct.

If OpenCL/HSA performance did not change even with increased memory clockspeed AT THE SAME TIMINGS (say, CAS 12 just to make sure it would actually work) and the same iGPU speed, then we can also conclude that you are correct.

But, comparing a 3960x + 7750 to a 7850k doesn't necessarily give us that conclusion. The problem is that the x86 cores are contributing to the score somewhat, so it may be that the 7850k is turning in a stronger compute performance to offset what is probably a better x86 performance from the 3960x.

Have a look at HD7770 and see how much faster it is compared to HD7750, both of them have the same Memory at the same frequency.

Also, running Luxmark 2.0 in GPU mode doesn't use the CPU at all. So, it doesn't matter what CPU you use in the system because the Benchmark will only use the GPU. Only if you run in CPU + GPU mode the final score will depend on the CPU as well.

NTMBK · Jul 30, 2014

LuxMark is a single GPGPU workload. Some workloads will love compute performance, others will love memory bandwidth. It depends on what your problem is.

Fjodor2001 · Jul 30, 2014

AtenRa said:
The APU need it only for Gaming, for everything else including GPGPU DDR-3 is just fine.

But isn't gaming also the primary reason for increasing the iGPU performance further on APUs? I.e. if they want to do so, they also need to increase the memory bandwidth for it to be of any use.

AtenRa · Jul 30, 2014

NTMBK said:
LuxMark is a single GPGPU workload. Some workloads will love compute performance, others will love memory bandwidth. It depends on what your problem is.

I dont disagree, but that is the one benchmark i have found to compare the A10-7850K to HD7750. Im sure there are applications that need more memory bandwidth but i strongly believe at the Kaveri hardware level it is more important to have more compute units than memory bandwidth in the majority of GPGPU applications.

AtenRa · Jul 30, 2014

Fjodor2001 said:
But isn't gaming also the primary reason for increasing the iGPU performance further on APUs? I.e. if they want to do so, they also need to increase the memory bandwidth for it to be of any use.

Fusion primary focus is on Compute, not gaming. Gaming is an added bonus, AMD seeks the highest compute per cost possible. Adding stacked memory at 28nm will elevate the platform cost and that is not what AMD or the OEMs would want.

Fjodor2001 · Jul 30, 2014

AtenRa said:
Fusion primary focus is on Compute, not gaming. Gaming is an added bonus, AMD seeks the highest compute per cost possible. Adding stacked memory at 28nm will elevate the platform cost and that is not what AMD or the OEMs would want.

But it will always only be a subset of work task types that can benefit from Compute (via OpenCL and similar), the rest must use the general purpose CPU instead. In addition it requires SW support on the App layer to make use of it.

Anyway, since AMD is intending to add HBM / Stacked DRAM on the APUs at some point, I guess they must have gaming in focus too. Otherwise there would not be much use in adding that if I understand you correctly (since Computer does not need HBM / Stacked DRAM)?

Enigmoid · Jul 30, 2014

AtenRa said:
Have a look at HD7770 and see how much faster it is compared to HD7750, both of them have the same Memory at the same frequency.

Also, running Luxmark 2.0 in GPU mode doesn't use the CPU at all. So, it doesn't matter what CPU you use in the system because the Benchmark will only use the GPU. Only if you run in CPU + GPU mode the final score will depend on the CPU as well.

Using the 384 core 7730 there is a small difference in Luxmark between GDDR5 and DDR3. 512 cores would be held back even more.

AtenRa · Jul 30, 2014

Enigmoid said:
Using the 384 core 7730 there is a small difference in Luxmark between GDDR5 and DDR3. 512 cores would be held back even more.

There is a small performance gain with higher memory bandwidth but the performance gains you get with more compute units is way higher. The 512 Cores of the A10-7850K is almost 100 points faster than HD7730 GDDR-5 and more than 100 points over DDR-3.

http://www.extremetech.com/computin...-wait-for-the-first-true-heterogeneous-chip/5

Phynaz · Jul 30, 2014

AtenRa said:
Fusion primary focus is on Compute, not gaming. Gaming is an added bonus, AMD seeks the highest compute per cost possible. Adding stacked memory at 28nm will elevate the platform cost and that is not what AMD or the OEMs would want.

You keep saying this, and yet when I go to AMDs website all they talk about is gaming on the APU product pages.

AtenRa · Jul 30, 2014

Phynaz said:
You keep saying this, and yet when I go to AMDs website all they talk about is gaming on the APU product pages.

Do you want me to post all the links for Fusion, HSA, HuMa, OpenCL etc etc ??

Also, Kaveri slides where heavy with OpenCL and HSA material as well as Gaming.

I havent said that Gaming is not important to AMD APUs but it sure its not the first focus.

edit: Do you believe Intel is investing money, resources and die space for its iGPUs for gaming or Compute ??

Fjodor2001 · Jul 30, 2014

If AMD is intending to add stacked DRAM to APUs, gaming must be in focus, right? Since according to what you mentioned earlier, it's not needed for compute. Or did I miss something?

Phynaz · Jul 30, 2014

AtenRa said:
Do you want me to post all the links for Fusion, HSA, HuMa, OpenCL etc etc ??

Also, Kaveri slides where heavy with OpenCL and HSA material as well as Gaming.

I havent said that Gaming is not important to AMD APUs but it sure its not the first focus.

edit: Do you believe Intel is investing money, resources and die space for its iGPUs for gaming or Compute ??

No need. As I said the AMD product page disagrees with you. Go argue with them, but stop claiming you know more about AMDs product positioning than they do.

Should I link to your posts where you explicitly said you were right and AMD was wrong?

Phynaz · Jul 30, 2014

Fjodor2001 said:
If AMD is intending to add stacked DRAM to APUs, gaming must be in focus, right? Since according to what you mentioned earlier, it's not needed for compute. Or did I miss something?

Lol. You didn't miss anything.

NostaSeronx · Jul 30, 2014

If HBM is used it will be the L3 cache. While, the stacked memory will have latency similar to system memory. It will not have the issues of previous L3 caches from AMD.

Orochi(AMD) for example had pathetic bandwidth and a very small size which made its latency awful. HBM being the L3 will provide up to 128 GB/s and 1 gigabyte of memory.

Haswell in comparison: http://i1365.photobucket.com/albums...C10_AIDA64_25133Copy_zps9802a583.png~original

SRAM @ 8 MB - 128 GB/s Read - 40 GB/s Write - 50 GB/s Copy / 50% of the die (Interface + SRAM)
vs
Stacked DRAM @ 1 GB - 128 GB/s peak throughput / ~5?% of the die (Interface)

pTmdfx · Jul 30, 2014

AtenRa said:
I havent said that Gaming is not important to AMD APUs but it sure its not the first focus.

But the reality is for any advanced GPU use, that gaming or more precisely real-time raster graphics is always the first, the second, the third to infinity except the last use of an APU. Yes, it was said to be compute-friendly, but the sad truth is that the compute features are mostly irrelevant so far. HSA and OCL2 may make it relevant, but still it is uncertain.

Don't even mention the beloved Kaveri in fact supported only partially the full profile of HSA, which may cause problems if AMD cannot come up with a solution to those reliability issues.

P.S. Gaming is essentially compute. Hmm.

pTmdfx · Jul 30, 2014

NostaSeronx said:
If HBM is used it will be the L3 cache.

Then I will drop an option here for others to pick: HBM needn't be used as an LLC from day one. In fact, stacked DRAM as cache is just something on an academic paper so far, and the research is centric around CPU workloads as far as I've read. Do not be confused with eDRAM though.

NostaSeronx · Jul 30, 2014

AMD's plans for Stacked DRAM is for L3, no ifs or buts. I just wanted it to sound as if there was a choice in the matter.

People tend to reply when I type if. That is one of the reasons I distribute so much ifs around.

If you want to try to disprove my claim that AMD will not use HBM as L3 try it. I am pretty sure I am as solid as an unobtainium wall. When, stating Advanced Micro Devices will only use HBM as a L3 cache.

pTmdfx · Jul 30, 2014

NostaSeronx said:
AMD's plans for Stacked DRAM is for L3, no ifs or buts. I just wanted it to sound as if there was a choice in the matter.

It is not true. But I guess you will insist as always.

Homeles · Jul 30, 2014

pTmdfx said:
P.S. Gaming is essentially compute. Hmm.

Compute + "special function units," like TMUs and ROPs.

NostaSeronx · Jul 30, 2014

pTmdfx said:
It is not true. But I guess you will insist as always.

It is 100% true that AMD will only use HBM for L3.

The hypertransport cache-coherent interconnect has two memory interfaces; GDDR5 for GPUs or DDR3/DDR4 for APUs/SOCs, then HBM for L3.

There is nothing from AMD implying or stating that Stacked DRAM will replace System RAM. Thus, HBM is stuck as an L3 cache, no ifs or buts.

Blitzvogel · Jul 30, 2014

How does the GT3e's eDRAM work with system memory to allocate resources for graphics considering it's an "L4 cache"? Do drivers only use the cache as a framebuffer?

ViRGE · Jul 31, 2014

NUSNA_Moebius said:
How does the GT3e's eDRAM work with system memory to allocate resources for graphics considering it's an "L4 cache"? Do drivers only use the cache as a framebuffer?

It's a hardware cache. Memory pages (any memory pages, CPU and GPU) are stored in there as deemed necessary, just like the other cache layers.

AtenRa · Jul 31, 2014

Phynaz said:
You keep saying this, and yet when I go to AMDs website all they talk about is gaming on the APU product pages.

Phynaz said:
No need. As I said the AMD product page disagrees with you. Go argue with them, but stop claiming you know more about AMDs product positioning than they do.

I dont know what product page you are looking at, but the APU page have the following.

http://www.amd.com/en-us/products/processors/desktop/a-series-apu#

Meet the new APU

AMD’s most advanced APU ever. Welcome to the revolution.

Introducing the AMD A10-7850K , AMD’s most advanced APU technology. So revolutionary, it challenges the very definition of a processor. With 12 Compute Cores (4 CPU + 8 GPU)* with AMD Radeon™ R7 graphics and exclusive features like AMD TrueAudio technology for immersive audio, it can take on Battlefield 4™ or just about anything else you throw at it.2,3
Features

HSA

Unlock your system’s full potential with revolutionary HSA architecture – the new standard in processor design – enabling the CPU and GPU to work in perfect harmony and blaze through compute tasks in Ultra HD resolution.
Graphics Core Next (GCN) Architecture

Get extreme performance with Graphics Core Next (GCN) architecture, featuring supercharged AMD Radeon™ R7 graphics.4
Mantle Technology

AMD Mantle technology raises your game to unprecedented levels with hyper-efficient performance.3
AMD TrueAudio Technology

Designed for breathtaking immersive audio, AMD TrueAudio technology sets a new level of immersion. Hear your enemy’s every move and anticipate their next.1
AMD Eyefinity Technology

Connect up to four displays to see every angle of the battlefield with AMD Eyefinity technology in its full glory.5
AMD Radeon™ Memory

AMD Radeon™ Memory enables top performance and maximum value for a boost in entertainment and gaming experiences.6

AMD A-Series APU

AMD A10 APU features:

Unlock your system’s full potential with the A10-7850K APU’s revolutionary Heterogeneous System Architecture (HSA)

Get extreme performance with the A10-7850K APU’s Graphics Core Next (GCN) architecture, featuring supercharged AMD Radeon™ R7 graphics4

Experience breathtaking immersive audio with the A10-7850K APU’s AMD TrueAudio technology1

Tear through everyday applications with AMD Turbo Core 3.0 technology2

From Computex June 2014,

And from Mobile Kaveri release, first pages are all about HSA and compute performance, Gaming comes later in every AMD PDF.

Nobody said gaming is not important, but it is not the first focus.

Also nobody said that memory bandwidth is not important, you dont increase the compute units without increasing the memory bandwidth to some degree, or decrease memory/cache latency etc.

AtenRa · Jul 31, 2014

Fjodor2001 said:
If AMD is intending to add stacked DRAM to APUs, gaming must be in focus, right? Since according to what you mentioned earlier, it's not needed for compute. Or did I miss something?

I didnt say memory bandwidth is not needed for Compute, i said the Kaveri is not memory bandwidth constrained in compute but only in Gaming and that gaming is not the main focus of the APUs.

As the APUs will gain more compute units, memory bandwidth needs for compute and obviously for gaming will also grow. So at some point they will need to add stacked memory in to the APUs. Not only for memory bandwidth but for economical and technical reasons like smaller die sizes, shorter interconnects etc.

WCCFAMD Carrizo APU on the 28nm Node Will Have Stacked DRAM On Package

Lifer

Lifer

Lifer

Diamond Member

Lifer

Lifer

Diamond Member

Platinum Member

Lifer

Lifer

Lifer

Diamond Member

Lifer

Lifer

Diamond Member

Member

Member

Diamond Member

Member

Platinum Member

Diamond Member

Platinum Member

Elite Member, Moderator Emeritus

Lifer

Lifer