[AMD] Die stacking for high end GPUs and mainstream computing

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
http://www.microarch.org/micro46/files/keynote1.pdf

Page 52 takeaways
" Die stacking is happening in the mainstream
It is happening now because we need it &
It is going to change who and how we build sockets in the future "

Is it a possibility that AMD is transitioning to HBM for all their products - GPUs and APUs. AMD's APUs are bandwidth starved and there is no better solution than stacked DRAM.

the generation after Kaveri could use a silicon on interposer solution with a single HBM stack for a massive 128 Gb/s. Imagine a 768 sp Radeon GPU with a bandwidth of 128 Gb/s. That would mean the Xbox One GPU performance in a mainstream laptop/desktop.

Again page 46 compares a 512 bit GDDR5 memory bus at 8 Ghz against a 4 HBM stack at 1 Ghz .Same bandwidth of 512 Gb/s . But HBM delivers it at 1/3 rd the power of GDDR5.

a 512 Gb/s HBM solution for AMD's 20nm flagship GPU is very likely to happen.
 
Last edited:

piesquared

Golden Member
Oct 16, 2006
1,651
473
136
VERY interesting slides. So AMD and Hynix were sampling HBM at the latest by APU. :) Can't wait to read about the interposer and TSVs and stacked memory.
 

VulgarDisplay

Diamond Member
Apr 3, 2009
6,188
2
76
It would make sense considering they didn't slap together some EDRAM or some other hack job for their latest APU's. Because they knew they were close to HBM. Would be awesome and make me happy that I waited to buy an APU for a HTPC until memory bandwidth is solved.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
It would have to be second generation 20nm GPUs if any. But again, 16FF is 20nm too ;)
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
It would make sense considering they didn't slap together some EDRAM or some other hack job for their latest APU's. Because they knew they were close to HBM. Would be awesome and make me happy that I waited to buy an APU for a HTPC until memory bandwidth is solved.

Stacked DRAM is for dGPU only for quite some years and most likely first starting on 2016. AMD simply couldnt afford an even bigger die with Kaveri. They even had to drasticly raise the price with their ever declining volume and marketshare. And unless you replace all the memory, eDRAM, eSRAM, Stacked DRAM etc, all the same. And that will first happen around 2019-2020. At that time we may buy (non ethusiast) CPUs with predefined memory amounts.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
It would make sense considering they didn't slap together some EDRAM or some other hack job for their latest APU's. Because they knew they were close to HBM. Would be awesome and make me happy that I waited to buy an APU for a HTPC until memory bandwidth is solved.

exactly. AMD is horribly starved for bandwith even at 512 sp in Kaveri. Here is a HD 7750 GDDR5 vs HD 7750 GDDR3 comparison. same 512 sp on both chips.

http://www.hardware.fr/focus/76/amd-radeon-hd-7750-ddr3-test-cape-verde-etouffe.html

the HD 7750 GDDR5 is 75 - 100% faster than HD 7750 GDDR3 in demanding games like BF3, Crysis 2. On average its 65% faster. AMD cannot add any more sp unless they solve the bandwidth problem. At 20nm AMD can easily go to 768 sp but its worth doing that only if they get bandwidth upto 100 Gb/s . Otherwise they would be wasting silicon area and power. Its like the sand clock example. The memory is hindering the performance as its the slowest part of the system and creating huge bottlenecks. HBM clocks are 800 - 1200 Mhz. So assuming AMD can get a 800 Mhz HBM stack for their next gen 768 sp APU thats 128 x 0.8 = 102.4 Gb/s . Armed with that bandwidth AMD can easily double the A10-7850k GPU performance or go even higher across the board.

Once APUs with HBM are available the entry level dGPU market below USD 100 will fade away in time.
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,809
1,289
136
HBM for APUs, will probably happen when AMD goes 2000+ pin LGAs/BGAs.

Oh nevermind, 2.5D.
 
Last edited:

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
HBM for APUs, will probably happen when AMD goes 2000+ pin LGAs/BGAs.

Oh nevermind, 2.5D.

yeah with interposer stacking (2.5D) AMD can address all their notebook/ desktop APUs. 3D stacking is for phones and other space/power constrained devices.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,809
1,289
136
yeah with interposer stacking (2.5D) AMD can address all their notebook/ desktop APUs. 3D stacking is for phones and other space/power constrained devices.
http://sites.amd.com/la/Documents/TFE2011_001AMC.pdf

2.5D Interposer starts at 23 mm by 23 mm then can go up to 45 mm by 45 mm. Seems to be very spacy.

////
Also, find it funny it shows Orochi instead of an APU in this http://www.microarch.org/micro46/files/keynote1.pdf
Took me awhile to to see Bobcat in page 38.

Be really interesting if they replaced L3 with HBM.
 
Last edited:

theeedude

Lifer
Feb 5, 2006
35,787
6,197
126
I think all the companies need to agree on a common interposer interface and communications protocol. Once that happens it will allow for a lot of specialization, which will drive innovation. You will have companies that build specific chips they are good at, and not have to bear the burden of building the whole SOC. Kind of like the old days when there were many companies building all sorts of chips that were integrated in a system. It could be the next golden age of chip design.
The big question is who is going to be the integrator. Is it going to be present day CPU/SOC vendors like Intel and AMD, who will be acting as a gatekeeper. Or is it going to be third parties who will pick the best CPU, best GPU, best RAM, best modem, etc for their needs in open competition. I am actually hopeful it will be the latter, because that's where the money is, the likes of Apple, Google, MS, Sony, etc. They have the leverage to demand the best components for their mass volume products, regardless of chip vendor. It may actually not be that good for AMD in the long run, because they put a lot of eggs in the basket of being the one SOC vendor that owns both x86 CPU and a GPU and can put them together on one SOC. If CPU and GPU are separate components sitting on an interposer controlled by a third party, they can pick best CPU and best GPU independently, and it's not a fact they'll pick AMD for either.
 
Last edited:

NTMBK

Lifer
Nov 14, 2011
10,411
5,677
136
Fascinating find, thanks! Very interesting that AMD are talking about breaking their APU up into multiple dies on an interposer so that they can use a more focused process node for each component. Maybe we'll go back to separate CPU and GPU dies, but tightly integrated on an interposer?