WCCFAMD Carrizo APU on the 28nm Node Will Have Stacked DRAM On Package

csbin · Jul 13, 2014

[RUMOR] Notice the shiny tag at the start. Continue. We have received some information regarding AMDs upcoming APU Carrizo, this information is unverified so I will be treating it as a rumor. The report comes from the Italian site bitsandchips.it and states that AMDs upcoming flagship APUs will have Stacked DRAM while maintaining the 28nm Node.

Now it goes without saying, that you need to keep that pinch of salt handy throughout this post. However this news, if true, is very interesting. We know for a fact that APUS benefit a lot from good memory and if these APUs will truly support HBM then we can expect to see some very substantial performance per clock gains while jumping from Kaveri to Carrizo even while staying on the same node. Another important point to note is that with the Carrizo APU the implementation of HSA will be perfected resulting in probably significant in compute as well as gaming.
Now we already know that AMD is working with Hynix to create Stacked DRAM. We also know that this memory will come in two types, namely 3DS and HBM (Dont be fooled by the lack of 3D in this name, both are stacked). The memory that is in question here is the HBM variant type which will feature the highest bandwidth and I know for a fact that there are two types already in production. Namely the 2-Hi and 4-Hi variants. You can find the detailed analysis of the same in my Pascal Architecture Analysis. Now the max bandwidth of a single HBM is 128-256 GBps (compare this to the 28GBps of GDDR5), so we are looking at an insane growth in bandwidth, albeit at reduced clocks (most probably around 1000Mhz).

Now we previously received a much more authentic report that the APU would actually feature DDR4 support but this is obviously better. Carrizo APUs die size will be smaller than Kaveri APU according to the same source, though I am not sure how they aim to accomplish this if the HBM is truly on-package. The Stacked DRAM will be manufactured on the 20nm node but the APU will stay at 28nm. Previous leaks had suggested that the upcoming APU will be compatible with the FM2+ socket and have TDP no greater than 65W. However the thing is, the last authentic leak was quite a while back and AMDs plans could have changed in the meantime. We will be waiting for more information on this front and in the meantime this should serve as good food for thought, if nothing else.

Read more: http://wccftech.com/amd-carrizo-apu-28nm-stacked-dram-alleges-italian-leak/#ixzz37LwoX9Pc

Fjodor2001 · Jul 13, 2014

Just wondering what Carrizo would do with all that memory bandwidth? Will the iGPU really be able to make use of it all?

NTMBK · Jul 13, 2014

Fjodor2001 said:
Just wondering what Carrizo would do with all that memory bandwidth? Will the iGPU really be able to make use of it all?

Definitely. Look at some Kaveri overclocking results- even with DDR3 2400, a GPU OC from 720Mhz to 1Ghz+ has almost no impact. http://www.eteknix.com/amd-kaveri-a10-7850k-overclocking-unleashing-gcns-potential/6/ Seems like a pretty massive memory bottleneck. A big fat 1GB L3 HBM cache should help a lot.

TrulyUncouth · Jul 13, 2014

HBM can't come soon enough. My only question to the more knowledgable amongst us would be how close to crystalwell bandwidth this gets us.

I can't wait for the next-gen GPU's with this tech and cheap laptops with even better entry level gaming performance. Not to mention being able to pick up the cheapest memory for an APU build this could bring even more price-conscious people into the PC gamer fold.

Homeles · Jul 13, 2014

TrulyUncouth said:
HBM can't come soon enough. My only question to the more knowledgable amongst us would be how close to crystalwell bandwidth this gets us.

I can't wait for the next-gen GPU's with this tech and cheap laptops with even better entry level gaming performance. Not to mention being able to pick up the cheapest memory for an APU build this could bring even more price-conscious people into the PC gamer fold.

Gen 1 Crystalwell is 1.6GHz and gives 50Gbps. Gen 2 Crystalwell (Broadwell, possibly Skylake) is 2 GHz, which would give up to 62.5 Gbps (timings are relaxed though, so the final number will be interesting to see).

Compare to the rumored 64 Gbps and 128 Gbps of Carrizo.

If this rumor is true, Carrizo probably looks a lot like this:

Erenhardt · Jul 13, 2014

I would triple the bandwidth of current APUs.
To give it a scale... it is like going from HD7770 to HD7950 - massive +200% bandwidth increase!

I don't think it will happen with carrizo, but it will happen sooner rather than later.

TrulyUncouth · Jul 13, 2014

Does latency come into play though? I assume intel's method ends up with considerably lower latency?

Unrelated question- Did the APU in the PS4 end up using an interposer like this? I thought it just ended up with GDDR like a graphics card attached to the APU in a typical fashion.

Homeles said:
Gen 1 Crystalwell is 1.6GHz and gives 50Gbps. Gen 2 Crystalwell (Broadwell, possibly Skylake) is 2 GHz, which would give up to 62.5 Gbps (timings are relaxed though, so the final number will be interesting to see).

Compare to the rumored 64 Gbps and 128 Gbps of Carrizo.

If this rumor is true, Carrizo probably looks a lot like this:

Homeles · Jul 13, 2014

TrulyUncouth said:
Does latency come into play though? I assume intel's method ends up with considerably lower latency?

It doesn't matter for the GPU, but for the CPU it does... a little.

I have no idea what the latency difference would be, but it's probably not that huge.

Unrelated question- Did the APU in the PS4 end up using an interposer like this? I thought it just ended up with GDDR like a graphics card attached to the APU in a typical fashion.

No, it didn't.

Qwertilot · Jul 13, 2014

Well they have to do something like this fairly soon because these things don't really make sense right now 🙂

Fascinating to see how well it does if it does arrive though.

DrMrLordX · Jul 13, 2014

Interesting. It is sort of disappointing that the general memory interface isn't what's improving here (DDR4, or something significantly better than DDR4), but at the same time, that kind of bandwidth increase is far beyond anything you can expect from the usual improvements in JEDEC memory specs.

Unfortunately, this almost guarantees that max clock speeds for Carrizo will probably be lower than those of even Kaveri. Or does it?

ShintaiDK · Jul 13, 2014

The title should reflect its nothing but a baseless rumour. And its against AMDs own roadmaps.

Homeles · Jul 13, 2014

ShintaiDK said:
The title should reflect its nothing but a baseless rumour.

The WCCF tag already covers that.

mikk · Jul 13, 2014

I doubt it, too expensive for AMD.

Fjodor2001 · Jul 13, 2014

DrMrLordX said:
Unfortunately, this almost guarantees that max clock speeds for Carrizo will probably be lower than those of even Kaveri. Or does it?

Why would it make the max clock frequency of Carrizo lower?

Fjodor2001 · Jul 13, 2014

NTMBK said:
Definitely. Look at some Kaveri overclocking results- even with DDR3 2400, a GPU OC from 720Mhz to 1Ghz+ has almost no impact. http://www.eteknix.com/amd-kaveri-a10-7850k-overclocking-unleashing-gcns-potential/6/ Seems like a pretty massive memory bottleneck.

Yes, I agree that a fast cache would be beneficial. But the question is how fast does it have to be? At some point there will be diminishing returns.

And note that this is Carrizo we're talking about, not some top end AMD discrete GFX card. So the number of GPU cores and hence required memory bandwidth will be much lower in Carrizo.

TrulyUncouth · Jul 13, 2014

Fjodor2001 said:
Why would it make the max clock frequency of Carrizo lower?

I assume he means because of the heat produced.

NostaSeronx · Jul 13, 2014

2.5D High Bandwidth Memory tools at GlobalFoundries and TSMC are only 20-nm planar and FinFETs.
====
Here is a comparison to DDR3 and GDDR5;

Same latency as GDDR5 but 4.6x more bandwidth.

Fjodor2001 · Jul 13, 2014

NTMBK said:
A big fat 1GB L3 HBM cache should help a lot.

How big does it have to be? The Anandtech article on Crystalwell Intel said that:

"There’s only a single size of eDRAM offered this generation: 128MB. Since it’s a cache and not a buffer (and a giant one at that), Intel found that hit rate rarely dropped below 95%. It turns out that for current workloads, Intel didn’t see much benefit beyond a 32MB eDRAM however it wanted the design to be future proof. Intel doubled the size to deal with any increases in game complexity, and doubled it again just to be sure."

So basically Intel is saying that there is little point in having more than 32 MB.

NostaSeronx · Jul 13, 2014

The implementation of HBM in APUs and GPUs is a double L3.

Installation Cache which is SRAM and low latency. (The traditional L3 cache)
HBM Stack which is DRAM and high latency. (The novel L3 cache)

As a non-limiting example, for a die-stacked DRAM over a multicore processor, the installation cache may be placed on the same chip as the multicore processor's memory controller. In some embodiments, the main cache may contain a logic layer upon which the installation cache may be placed. Regardless of how the installation cache is implemented in any particular system, the installation cache provides the advantage of low latency cache returns allowing the use of higher latency die-stacked DRAM L3 cache memory with reduced risk of increasing cache misses.

Xpage · Jul 13, 2014

Bigger L3 means lower hit rate, even 2% makes a big difference if you need to go over to main memory. I would assume this cache would be shared by CPU and GPU, so an even split becomes 512mb each (big assumption).

I remember S/A have a die shot of an interposer for SI cards that never came to fruition due to I believe costs. That was a few years ago so I would assume costs have come down somewhat and yields increased. I do think AMD needs this to compete with high end intel mobile graphics offerings.

I could see them doing this for the top bin, if anything fails then fuse off the HBM and use a normal interface. I do hope they offer this, also hope they focus on CPU improvements more this round as Kaveri seemed to focus mostly on the APU side

NostaSeronx · Jul 13, 2014

I need to post this as a separate post;
http://www.skhynix.com/inc/pdfDownload.jsp?path=/datasheet/Databook/Databook_3Q%272014_GraphicsMemory.pdf

Roland00Address · Jul 13, 2014

So if this is true, how soon will we get affordable laptops and tablets that are apu based with similar gaming potential as the playstation 4? Of course we will need a die shrink or two, but one die shrink, two, three, four?

45w tdp
35w tdp
15w tdp
5w tdp

(I am using teraflops to make it easier to calculate since they are the same architecture and thus will have similar teraflops. Comparing other architectures such as nvidia with terraflops does not work)
Playstation 4 is 1.84 terraflops
7970m is 2.176 terraflops (100w tdp) (8970m and R9 m290x is the same chip but with boost clocks that can go 50 mhz faster)
7950m is 1.792 terraflops (75w tdp)
R9 m270x is 1.382 terraflops (50w tdp)
R9 m265x is 1.155 terraflops (35w tdp)

Die shrinks usually get you 30% reduction in power consumption if you keep everything the same right? Thus we are talking a 7970m would be a 50w chip with two die shrinks, and a 7950m would be a 38 watt chip with two die shrinks.

----

Can someone check my math? Also how much power consumption is due to the memory? And how much power consumption would stack memory take?

NostaSeronx · Jul 13, 2014

One HBM's TDP is less than 4 watts.

If placed in a 82 watt TDP part (Ameythst XT) then it would out 90 watts from HBM and 100+ watts from HBM + GDDR5 + VRMs.

Erenhardt · Jul 13, 2014

NostaSeronx said:
I need to post this as a separate post;
http://www.skhynix.com/inc/pdfDownload.jsp?path=/datasheet/Databook/Databook_3Q%272014_GraphicsMemory.pdf

So it is already shipping? AMD didn't work on HBM with hynix if I recall correctly

Homeles · Jul 13, 2014

Erenhardt said:
So it is already shipping? AMD didn't work on HBM with hynix if I recall correctly

How in the world did you miss that?

http://electroiq.com/blog/2013/12/amd-and-hynix-announce-joint-development-of-hbm-memory-stacks/

WCCFAMD Carrizo APU on the 28nm Node Will Have Stacked DRAM On Package

Senior member

Diamond Member

Lifer

Senior member

Platinum Member

Diamond Member

Senior member

Platinum Member

Golden Member

Lifer

Lifer

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Platinum Member