• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

WCCFAMD Carrizo APU on the 28nm Node Will Have Stacked DRAM On Package

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Saving space for better/wider cores? Besides Haswell and co (since Nehalem actually) have just 256Kb but that's good enough... just reduce the latency and everythings ok.

Yeah, but Intel 'Core' tech has L3$ (and LLC on some SKUs). Make cores wider and cut cache - seems like an odd choice.
 
IF the cache reduction lets them cut latency by a significant amount (like say, 4-5 cycles?) it'll almost definitely be worth it. That's a big if, since the configuration of Bulldozer with half cache only reduced L2 latency by one cycle. But it's possible they could realize greater gains with a proper redesign.
 
L3 being the key.

Actually cutting it completely wasn't so bad for all the APUs, but makes one think when you see a Piledriver die shot and imagine it with 50-100% more cores... Was it really necessary for server applications? I guess today (back then was a dream) a nice solution will be HBM as L3 equivalent to reduce main memory accesses.
 
Actually cutting it completely wasn't so bad for all the APUs, but makes one think when you see a Piledriver die shot and imagine it with 50-100% more cores... Was it really necessary for server applications? I guess today (back then was a dream) a nice solution will be HBM as L3 equivalent to reduce main memory accesses.

It didnt help the APUs. And now the L2 is cut in half. While the memory speed is the same.

And there is no indication of HBM anywhere for APUs.
 
I just hadn't anything about hardware changes... all the coverage was on the software side, which honestly is by far the most important bit.
It is because MS hasn't disclosed anything about the new hardware features yet but a sneak peek with ordered UAV and conservative rasterization.
 
IF the cache reduction lets them cut latency by a significant amount (like say, 4-5 cycles?) it'll almost definitely be worth it. That's a big if, since the configuration of Bulldozer with half cache only reduced L2 latency by one cycle. But it's possible they could realize greater gains with a proper redesign.

Which CPU was that ??
 
Checking the Carrizo slide not related to the core but the GPU;
VCE1/2 = 3x 1080p30
VCE3 = 9x 1080p30

UVD4 = 4x-8x 1080p30
-skip to 6-
UVD6 = 9x-18x 1080p30

ACP1; HiFi EP Audio DSP
ACP2?; HiFi 3 Audio DSP

There is already some errors as well. The FP4 socket is mostly used for the 16h family now. Anything in that socket would be single channel.

Project Discovery (FT3b);
http://browser.primatelabs.com/geekbench3/574913
http://browser.primatelabs.com/geekbench3/574933

Project Gardenia (FP4);
http://browser.primatelabs.com/geekbench3/626430
http://browser.primatelabs.com/geekbench3/626420

There is also the issue of the Carrizo slide using the old style when they switched to a newer style.
 
Last edited:
Checking the Carrizo slide not related to the core but the GPU;
VCE1/2 = 3x 1080p30
VCE3 = 9x 1080p30

UVD4 = 4x-8x 1080p30
-skip to 6-
UVD6 = 9x-18x 1080p30

ACP1; HiFi EP Audio DSP
ACP2?; HiFi 3 Audio DSP

There is already some errors as well. The FP4 socket is mostly used for the 16h family now. Anything in that socket would be single channel.

Project Discovery (FT3b);
http://browser.primatelabs.com/geekbench3/574913
http://browser.primatelabs.com/geekbench3/574933

Project Gardenia (FP4);
http://browser.primatelabs.com/geekbench3/626430
http://browser.primatelabs.com/geekbench3/626420

There is also the issue of the Carrizo slide using the old style when they switched to a newer style.

yep seems fake or old...
 
Yea thx,

Well, that may could be for a fused off part, not a real half L2 Module.
Undoubtedly this is the case. Steamroller's L2 latency is down to 19 clocks now (Piledriver, 20; Bulldozer, 21). Phenom II's was as low as 14 on Thuban... so it should be somewhere in between for 1MB, and probably closer to the Phenom side of things.
 
j4uv5oT.jpg


Here is the image for those not wanting to click the VR-Zone link.
 
Undoubtedly this is the case. Steamroller's L2 latency is down to 19 clocks now (Piledriver, 20; Bulldozer, 21). Phenom II's was as low as 14 on Thuban... so it should be somewhere in between for 1MB, and probably closer to the Phenom side of things.

Interestingly, Llano had 1MB L2 caches per core (and 16-way set associative), and maintained a 15 cycle latency like its predecessors. But there are still some key differences. Being shared by two cores/interfacing two L1 dcaches probably increases latency, at the very least due to physical requirements. BD/PD were likely penalized further by their higher clock speed requirements. Maybe with XV AMD will give up even more clock headroom so they can tighten down latencies.

Incidentally, Llano didn't have a huge performance improvement over Athlon II, and this is including several minor improvements beyond the doubling of L2 cache. So I'm skeptical that 2MB/module is really much of a performance requirement for BD (1MB/module should be more flexible than 512KB/core anyway)
 
It just seems weird to me that Excavator is essentially Puma+. 4 cores, 2MB L2 total cache, 3rd Gen GCN, VCE3, UVD6, Integrated FCH, PSP. Just happens to be on the same platform as the revised Puma(Mullins) platform for Tablets/Convertibles.

There is no Kaveri APU at the 15 watts designation, either. So, it is clearly pointing at the Beema SKUs not the Kaveri SKUs which are only 19 watts and up.

30% more performance than Beema at the same TDP is not bad. It does seem to fall in line with the Trinity to Richland setup. Happening to be just a respin on the Kaveri design on 28-nm
 
Last edited:
It just seems weird to me that Excavator is essentially Puma+. 4 cores, 2MB L2 total cache, 3rd Gen GCN, VCE3, UVD6, Integrated FCH, PSP. Just happens to be on the same platform as the revised Puma(Mullins) platform for Tablets/Convertibles.

It's still a Bulldozer derived CMT architecture.
 
MLRNjn7.png


^ Essentially, with the changes what the die would look like after the respin ± iFCH. I was also considering cutting out half the GPU since it is 16 compute units. I'm pretty sure you guys can cut that out for your self.

It will be really weird if Kaveri gets a second stepping and becomes a FX APU.
 
Last edited:
Excavator core is not a SR respin and Carrizo will not be a Kaveri variant. Whether it will be noticeably faster (via clock,IPC or combination of the two) is unlikely, but calling it Kaveri respin is just wrong.
 
Excavator core is not a SR respin and Carrizo will not be a Kaveri variant. Whether it will be noticeably faster (via clock,IPC or combination of the two) is unlikely, but calling it Kaveri respin is just wrong.
I'm just thinking how aggressive AMD can be with this split.

40h-4Fh(FX CPU) : 16 Excavator Cores, No GPU, 8MB L2, 8 MB L3, 256-bit DDR3/DDR4. >65W
50h-5Fh(FX APU) : 6 Excavator Cores, 16 3rd gen GCN CUs, 3 MB L2, 6 MB L2+L3, 256-bit DDR3/DDR4. >45W
60h-6Fh(A(x) APU) : 4 Excavator Cores, 8 3rd gen GCN CUs, 2 MB L2, 128-bit DDR3/DDR4. <35W

This implies that these are all launching very late this year. With planned 20-nm node drop downs later next year.

http://www.linkedin.com/pub/ramya-gandamaneni/16/485/aa8

Fast macro porting designs like GCN and 16h can jump to the 20-nm node first. Then, the large macro count designs with slow porting speed like 15h, port later. Small L2 and L3 making better use of the space of 28-nm. Then, going back to the large L2 and large L3 with the shrink.

This would technically also allow AMD to do what Intel did with Sandy Bridge/-E and Ivy Bridge/-E. While doing it with PCIe 3.0(/Hypertransport 8 Gb/s) and PCIe 4.0(/Hypertransport 16 Gb/s).
 
Last edited:
Checking the Carrizo slide not related to the core but the GPU;
VCE1/2 = 3x 1080p30
VCE3 = 9x 1080p30

UVD4 = 4x-8x 1080p30
-skip to 6-
UVD6 = 9x-18x 1080p30

ACP1; HiFi EP Audio DSP
ACP2?; HiFi 3 Audio DSP

There is already some errors as well. The FP4 socket is mostly used for the 16h family now. Anything in that socket would be single channel.

Project Discovery (FT3b);
http://browser.primatelabs.com/geekbench3/574913
http://browser.primatelabs.com/geekbench3/574933

Project Gardenia (FP4);
http://browser.primatelabs.com/geekbench3/626430
http://browser.primatelabs.com/geekbench3/626420

There is also the issue of the Carrizo slide using the old style when they switched to a newer style.

or maybe is more like a Haswell-Y...
 
On slide second section says:

"Full HSA:Hi Perf Bus for GFX & DRAM, Fine-grain Preemption for Context Switches"

Does not mean DRAM can be added to package? No DRAM mobile, DRAM module added desktop?

Fine-grained Preemption for Context Switches means dGPU additive to iGPU depends on load?

If both so desktop Carizzo being bad assing chip no?
 
Can someone find if Puma has this?
http://i.imgur.com/OAbL0kn.png

I've looked around and this feature is only planned for 20-nm devices. This comes from the Excavator patch though.

Also, it should be noted these never actually told what node is on what;
http://hardware.tecnogaming.com/images/novedades/2011/trinity-filtrado.jpg
http://www.ixbt.com/short/images/2011/Aug/amd2012deccan1_dh_fx57.jpg
http://cdn.pcper.com/files/imagecache/article_max_width/review/2013-05-21/temash05_0.jpg
http://cdn2.wccftech.com/wp-content/uploads/2013/11/AMD-Kaveri-APU-Platform-Details.png

It could be that Carrizo is cancelled and somehow the old documents were leaked. In turn, we could be getting Basilisk two quarters earlier than expected. Like what AMD did with "Witchita"/"Krishna" and "Kabini/Temash".

===
Also, for the 28nm bulk standard process. That was done with Kaveri;
http://i.imgur.com/yYwMlw6.png

===
http://semiaccurate.com/2011/04/11/guess-what-taped-out-at-amd/
http://www.donanimhaber.com/islemci...-28nm-islemcileri-hakkinda-resmi-detaylar.htm
http://semiaccurate.com/2011/11/15/exclusive-amd-kills-wichita-and-krishna/

---
So, what is the possibility of doing a Kaveri Refresh calling it Carrizo. Then, node skipping from 28-nm SHP to 14-nm SOI.

===
http://www.linkedin.com/pub/chakradhar-tallury/23/103/aa3
http://www.linkedin.com/pub/lloyd-bachand/20/7b4/88
http://www.linkedin.com/pub/hoang-dao/10/43/22a
http://www.linkedin.com/in/lokealvin
http://www.linkedin.com/pub/rich-schultz/73/459/29b
http://www.linkedin.com/pub/wade-xiong/15/a96/b00
/notAMD http://www.linkedin.com/pub/byron-scott/5/45a/20b /notAMD
/notAMD http://www.linkedin.com/pub/amit-banik/6/a04/315 /notAMD
/notAMD http://www.linkedin.com/pub/qi-zhang/19/379/310 /notAMD
/notAMD http://de.linkedin.com/pub/ahmed-said-abdou/67/8b2/b2b /notAMD
 
Last edited:
Back
Top