WCCFAMD Carrizo APU on the 28nm Node Will Have Stacked DRAM On Package

Soulkeeper · Jul 17, 2014

Exophase said:
Interestingly, Llano had 1MB L2 caches per core (and 16-way set associative), and maintained a 15 cycle latency like its predecessors. But there are still some key differences. Being shared by two cores/interfacing two L1 dcaches probably increases latency, at the very least due to physical requirements. BD/PD were likely penalized further by their higher clock speed requirements. Maybe with XV AMD will give up even more clock headroom so they can tighten down latencies.

Incidentally, Llano didn't have a huge performance improvement over Athlon II, and this is including several minor improvements beyond the doubling of L2 cache. So I'm skeptical that 2MB/module is really much of a performance requirement for BD (1MB/module should be more flexible than 512KB/core anyway)

The performance change over propus was somewhere around 8% on average I believe (I remember one reiviewer claiming 12%).
Compared to thuban however, the lower cache amount made it a wash in most cases.
All per clock comparisions of course. It's been awhile since i've looked over the benchmarks.
llano "stars" cores with large thuban caches would have been a potential 10%+ improvement for FX, but also max clock speeds seemed to have suffered which is likely one reason why they never released one. Guess we'll never know how much of that was due to the changes or the 45-32nm move.

Arachnotronic · Jul 17, 2014

So, Carrizo will integrate the FCH on die while BDW/SKL will still have external PCHs. Interesting leg up that AMD has here.

Abwx · Jul 17, 2014

Intel17 said:
So, Carrizo will integrate the FCH on die while BDW/SKL will still have external PCHs. Interesting leg up that AMD has here.

FCH is already integrated on Kaveri but it s enabled only in mobile variants.

Since it has less ports and capabilities than the discrete one used on FM2+ it is disabled for DT variants.

monstercameron · Jul 17, 2014

Abwx said:
FCH is already integrated on Kaveri but it s enabled only in mobile variants.

Since it has less ports and capabilities than the discrete one used on FM2+ it is disabled for DT variants.

do you have a source for that, on-die fch was scheduled for carrizo...

NostaSeronx · Jul 17, 2014

Tier1 expectation is;
-> 20-nm LPM with High Performance Libaries @ <120 mm²
-> Higher IPC (10%)
-> Higher Clocks (20%)

Tier2 expectation is;
-> 28-nm SHP with High Density Libraries @ <200 mm²
-> Higher IPC (30%)
-> Same Clocks

Tier3 expectation is;
-> 28-nm SHP with High Performance Libraries @ <300 mm²
-> Higher IPC (20%)
-> Higher Clocks (10%)

inf64 · Jul 17, 2014

edit
Opps wrong topic :/.

NTMBK · Jul 17, 2014

Intel17 said:
So, Carrizo will integrate the FCH on die while BDW/SKL will still have external PCHs. Interesting leg up that AMD has here.

Yeah, good to see AMD innovating with areas other than the core. Hopefully these kinds of improvements will make their way into K12/Zen. A little less board space, a little less energy consumed and a slightly lower BoM, sounds like a solid win for laptops.

NTMBK · Jul 17, 2014

A relevant slide from SKHynix:

A 1GB stack used as near memory. Sound familiar?

Shivansps · Jul 17, 2014

whats a near memory? some sort of memory cache?

NostaSeronx · Jul 17, 2014

Shivansps said:
whats a near memory? some sort of memory cache?

In package memory is near memory and out package memory is far memory.

SilverlightWPF · Jul 18, 2014

Sry if links been posted before, but seeing as theres all this HBM talk thought this recent AMD research paper might be interesting to some..

TOP-PIM: Throughput-Oriented Programmable Processing in Memory
-Dong Ping Zhang, Nuwan Jayasena, Alex Lyashevsky Joe Greathouse, Lifan Xu, Mike Ignatowski
-AMD Research 25/06/14
http://www.google.com.au/url?sa=t&r...ramuJBXz-NAwds_vg&sig2=71k8yJUUACaTlTvyiD8pxw

Fjodor2001 · Jul 18, 2014

Anybody got an idea what the price of stacked DRAM will be? Is it reasonable to pair it with e.g. a Carrizo APU? In what amounts?

NostaSeronx · Jul 18, 2014

Fjodor2001 said:
Anybody got an idea what the price of stacked DRAM will be? Is it reasonable to pair it with e.g. a Carrizo APU? In what amounts?

HBM from SK Hynix has a cost within the realm of this;
http://www.newegg.com/Product/Produc...82E16820148817

Crucial 1GB 204-Pin DDR3 SO-DIMM DDR3 1333 (PC3 10600) Laptop Memory Model CT12864BF1339
$25.99

The HBM is cheap, the Si Interposer is cheap, the OSAT structure isn't so cheap.

AMD if wanting to increase average selling prices or maintain average selling prices would need to use HBM.

Erenhardt · Jul 18, 2014

384 GCN APU + HBM would mop the floor with 512 GCN cores a10-7850k... There is a potential for that kind of APU.

Remember the mighty iris pro reviews where the conclusion was that nv gt640 is superior in comparison:

R7-250 has 384 GCN cores and plenty of memory bandwidth:

50% faster than gt640 - how is that for an APU?

But AMD will probably not sacrifice their own entry level dGPU sales.

NostaSeronx · Jul 18, 2014

Erenhardt said:
But AMD will probably not sacrifice their own entry level dGPU sales.

The XDMA unit supports GPUs with different sizes and different memory interfaces. If that GPU happens to have that XDMA unit.

Carrizo(35w(QC)) + Ameythst XT(<100W) = FX-4100(95W) + Hawaii Pro(220-275W)

or

Carrizo(35w(QC)) + Topaz XT (<40W) = FX-4100(95W) + Bonaire XTX(110W)

or

etc.

The only entry level GPU that is new is Iceland. The next GPU is Tonga.

Iceland/Topaz -> Tonga/Ameythst -> Maui

Ajay · Jul 18, 2014

Erenhardt said:
50% faster than gt640 - how is that for an APU?

But AMD will probably not sacrifice their own entry level dGPU sales.

50% faster than a GT 640 would be a nice start for actual playable games on a laptop. AMD doesn't need to sacrifice their dGPU sales if HBM is only available on, typically, more expensive laptop CPUs for the first year or two. But desktop will have to move there the cost for 4GB HBM decline. By the time an 8GB HBM is available - why even have system RAM? Big cost savings on DRAM and motherboard BOM costs.

iGPU w/HBM really is the near term future for non-enthusiast class GPUs.

Homeles · Jul 18, 2014

Ajay said:
50% faster than a GT 640 would be a nice start for actual playable games on a laptop. AMD doesn't need to sacrifice their dGPU sales if HBM is only available on, typically, more expensive laptop CPUs for the first year or two. But desktop will have to move there the cost for 4GB HBM decline. By the time an 8GB HBM is available - why even have system RAM? Big cost savings on DRAM and motherboard BOM costs.

iGPU w/HBM really is the near term future for non-enthusiast class GPUs.

Well, you'd still need RAM for both the CPU and the GPU. Doing memory accesses over the PCIe bus would kill performance. This is one place where APUs will have a big advantage, as you'd only have to pay for high performance memory once.

TrulyUncouth · Jul 18, 2014

I normally write off everything you say as utter insanity but this actually sounds pretty good. If they could pull something like that off it coud be a huge boost for them. The only big downside would still be the CPU performance.

While we are in fantasy land I would wish for 6-8 cores of their next small core design boosting up to 3ghz and x-firable gpu with power on Kaveri mobile level and dual channel. If they can pull something like that off, I can easily see myself picking one up.

Even farther into fantasy, Why AMD hasn't talked Valve into making a steam machine with an architecture like the PS4 I'll never know.

NostaSeronx said:
The XDMA unit supports GPUs with different sizes and different memory interfaces. If that GPU happens to have that XDMA unit.

Carrizo(35w(QC)) + Ameythst XT(<100W) = FX-4100(95W) + Hawaii Pro(220-275W)

or

Carrizo(35w(QC)) + Topaz XT (<40W) = FX-4100(95W) + Bonaire XTX(110W)

or

etc.

The only entry level GPU that is new is Iceland. The next GPU is Tonga.

Iceland/Topaz -> Tonga/Ameythst -> Maui

TrulyUncouth · Jul 18, 2014

Homeles said:
Well, you'd still need RAM for both the CPU and the GPU. Doing memory accesses over the PCIe bus would kill performance. This is one place where APUs will have a big advantage, as you'd only have to pay for high performance memory once.

Is there any reason the HBM can't just replace our standard RAM? Once they get up to 8gb is there any reason we need to have RAM slots?

Imagine the reduced cost on a board that doesn't need all the leads to memory- if I am not mistaken they could make the board with less layers making it considerably cheaper.

Homeles · Jul 18, 2014

TrulyUncouth said:
Is there any reason the HBM can't just replace our standard RAM? Once they get up to 8gb is there any reason we need to have RAM slots?

Imagine the reduced cost on a board that doesn't need all the leads to memory- if I am not mistaken they could make the board with less layers making it considerably cheaper.

Nope, it definitely could replace it.

NostaSeronx · Jul 18, 2014

HBM will not replace system RAM, or RAM in DIMMs. These will stay and go for increased capacities while slow progression of speed.

In a single DIMM;
128 GB DDR4 @ 2.4 GHz (Q2 2015) -> 128 GB DDR4 @ 2.933 GHz (Q4 2015) -> 256 GB DDRx -> 512 GB DDRx -> so on.

TrulyUncouth said:
The only big downside would still be the CPU performance.

I'm currently trying to find data that indicates the iGPU giving the dGPU draw calls through HSAIL signaling. While, it might be limited to just Mantle or Mantle2. It would be a significant boost for AMD.

pTmdfx · Jul 18, 2014

NostaSeronx said:
I'm currently trying to find data that indicates the iGPU giving the dGPU draw calls through HSAIL signaling. While, it might be limited to just Mantle or Mantle2. It would be a significant boost for AMD.

This is out of the scope of HSA, but of course you can always implement your software renderer on any kinds of platforms. HSA might do it better than software CPU renderer as it exposes some texturing features like OCL. In Mantle, this can be implemented by simply using Indirect Draw Call, as the API exposes multi-GPU explicitly together with the video memories of the GPUs.

pTmdfx · Jul 18, 2014

TrulyUncouth said:
Is there any reason the HBM can't just replace our standard RAM? Once they get up to 8gb is there any reason we need to have RAM slots?

Imagine the reduced cost on a board that doesn't need all the leads to memory- if I am not mistaken they could make the board with less layers making it considerably cheaper.

Hopefully yes, but it is in the similar case as GDDR5/GDDR5M over DDR3 in APUs, whereas both of the GDDR5 definitely have the capacity needed today. Let's say if a consumer APU can ship with just 1-2GB of HBM and has nearly no difference in its class of visual experience from using 8GB of HBM, why bother to use the probably-far-more-expensive 8GB HBM except in the higher margin SKUs? Don't even mention that the HBM with higher capacity from SK Hynix also scales up the bandwidth, and I think it is safe to assume that higher bandwidth means charging more. For a higher-than-high-end APU and generic GPUs, this would be nice as you need only one instead of two. But it is not so nice for the lower-end APUs that focus more on cost structure, as the excess bandwidth and excess cost for the just so-so capacity may not worth the price. Extensibility from the OEM/customer standpoint is interesting either.

Moreover, there are some performance benefits to have both in parallel I can think of. Like bringing back the lower latency of the system memory - as the system memory no longer needs to be flooded by the massive-data-parallel graphics use, and dedicating full high bandwidth access of the HBM to the GPU without caring the latency-critical requests from CPU, perhaps?

NostaSeronx · Jul 19, 2014

I've checked previous launches of AMD products. Since, "Carrizo" is a mobile leak it has a chance of launching early for desktops. Simply, because the date of which the image was leaked.

My catch of the Kaveri 2.0 slide in the russian PDF was in October.
http://www.xbitlabs.com/images/news/2013-11/amd_kaveri_unofficial_specs.jpg
(It doesn't have the date in which it presented because I edited it out. To find out who would simply copy and paste my search.)
www.ospcon.ru/files/media/Perminov.pdf

If we assume the same for this to Carrizo's desktop launch.
http://i.imgur.com/j4uv5oT.jpg

October(Mobile leak) -> November -> December -> January(Desktop launch) -> 6 months(mobile launch)
July(Mobile leak) -> August -> September -> Octobor(Desktop launch) -> 6 months(mobile launch)

===
The issue I am having currently is the conflict of Carrizo being GF28A and Excavator being GF20LPM.
http://archive.today/DvJlR

Lead designer for DFT logic on 28 nm (GF28A) Carrizo SoC. RTL owner for the DFT and common (non-lane i.e. not involving Rx-Tx pair) logic which included CSR interface, PLLs and PLL management and some other glue logic. Enabled a synthesis and preliminary timing analysis flow for RTL team's use.

http://www.linkedin.com/in/kvnagesh

Manage India Test Plan and Infrastructure team of Steamroller(28nm) and Excavator (20nm) x86 CPU core processor.

If we look at Mr. Nagesh Vishnumurthy teammates at the Bengaluru Area. They show on their linkedin resumes about GF28A to GF20LPM/AN or GlobalFoundries 28nm to GlobalFoundries 20nm.

Also, the earliest mention of 20-nm is in 2011.

Thinking to myself the progression appeared as;
Kaveri - 28-nm SHP
to
Carrizo - 28-nm SLP

AMD sometime in late 2011 or early 2012 cancelled GF28A Carrizo. In which, Kaveri was then delayed to a later point. Where the aggressive GF20AN/GF20LPM Carrizo launch would be more suitable.

This lead to the changed focus of;
Kaveri - 28-nm SHP
to
Carrizo - 20-nm LPM

Unfortunate events happened to Kaveri causing a minor delay to 2014. While, Carrizo was not delayed do to reasons unknown.
(If you look into early 2013. Kaveri was expected to launch in late Q3 2013 or early Q4 2013)

Parallel universe where this didn't happen.
Kaveri -> Early 2013. (Q1 2013)
Carrizo -> Very late 2013. (Q4 2013)

===
Checking even deeper aka searching Semiaccurate's Charlie's comments letter by letter.

The original Kaveri was going to have a new uncore;
http://semiaccurate.com/2012/11/06/amds-kaveri-apu-slips-again-2014-now/

The new uncore happens to a ring-based interconnect using the Hypertransport 4.0 specification. (Exclusive to AMD, but will be released to the Hypertransport Consortium/HT4.0 spec not the uncore)
http://forums.anandtech.com/showpost.php?p=36542593&postcount=52

2010 - 28SHP and GF28A optimizations started.
2011 - 20LPM optimizations started.
2013 - 14XM/16FF optimizations started but I think these will be killed off. For the more perf from LPP or FF+, AMD might even wait for SOI FinFETs.

For those who have searched probably have seen these;

Screen-Shot-2014-04-29-at-1.08.08-AM.jpg

My stretch;
Integrated Voltage Regulator = Carrizo / Per Part Adaptive Voltage and DDR4 (and Southbridge for FMx) = FM2C/FM2r3/FM3, FP4r2, SP2r2 / Inter-frame Power Gating = Freesync Drivers

- Bearsharktopus -
http://i.imgur.com/21eZTlT.jpg

NTMBK · Jul 19, 2014

On die fabric and off die Hypertransport are very different things... And HT is point to point.

WCCFAMD Carrizo APU on the 28nm Node Will Have Stacked DRAM On Package

Diamond Member

Lifer

Lifer

Diamond Member

Diamond Member

Diamond Member

Lifer

Lifer

Diamond Member

Diamond Member

Junior Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Platinum Member

Senior member

Senior member

Platinum Member

Diamond Member

Member

Member

Diamond Member

Lifer