Fudzilla: New AMD Zen APU boasts up to 16 cores (plus Greenland GPU with HBM)

Novacius · Apr 28, 2015

New information:

insider2015 said:
There is more

krumme · Apr 28, 2015

Tuna-Fish said:
IPC is hard because clock targets are high. The lower you set your clock targets, the more you can do per clock, without heroic effort. If you go and read that post again you can note the caveat of one-MHz clock target.

The point of my statement is that talking solely of IPC as if it's a primary measure of performance and capability of a CPU is exactly as wrong as doing the same for clock speed. The two can be traded against each other in design, almost without limit.

Generally ignore everything Fruehe said. He knew very little of anything he talked about. Looking at the cache, and decreased throughputs for muls, etc, the expectation could not have been greater ipc.

IPC is a design target you (with some caveats) get to set. If you want 30% greater IPC, you can just decide to do that. If you want 100% greater IPC, you can just decide to do that.

The hard part is how fast, in clock speed, that high-ipc design is going to be. IPC increases come after heroic effort and in small increments because they want to get the gains without backsliding on the clocks.

AMD is not trying to eke out a few percent out of their old design. They are doing a completely new core. Will that core be as fast as Intel's best? I don't think so. Will it have X ipc? Regardless of X, the answer is "if they want to", or "if they think the best tradeoff between clocks and IPC they can get is there".

I perfectly follow that.
But what process is amd then designing for?

For me it seems optimally designing for process that the ecosystem uses no matter else eg. Very low freq mobile designs. Imo i think if amd design like the old days when they could tailor process they are damn wrong. They dont shape any more. They have to follow and design for that.
Under those asumtions i would expect high ipc but very low freq hdl designs. Very much synthesized. Lower cost than Intel (and so perf) and fit for consoles embeded and mobile x86 solution.

AtenRa · Apr 28, 2015

krumme said:
I perfectly follow that.
But what process is amd then designing for?

For me it seems optimally designing for process that the ecosystem uses no matter else eg. Very low freq mobile designs. Imo i think if amd design like the old days when they could tailor process they are damn wrong. They dont shape any more. They have to follow and design for that.
Under those asumtions i would expect high ipc but very low freq hdl designs. Very much synthesized. Lower cost than Intel (and so perf) and fit for consoles embeded and mobile x86 solution.

They are not going to use 14nm FF LPE, they will use the 14nm FF LPP that is suited for high performance devices. And im betting they are not going to use HDL for Desktop/Server ZEN.

Arachnotronic · Apr 28, 2015

AtenRa said:
They are not going to use 14nm FF LPE, they will use the 14nm FF LPP that is suited for high performance devices. And im betting they are not going to use HDL for Desktop/Server ZEN.

The "LP" in "LPP" is still "low power."

14LPP is just an enhanced 14LPE, just like 16FF+ is an enhanced 16FF.

AtenRa · Apr 28, 2015

Arachnotronic said:
The "LP" in "LPP" is still "low power."

14LPP is just an enhanced 14LPE, just like 16FF+ is an enhanced 16FF.

Low Power Performance, it has way different characteristics than LPE.

It is like the old Low Power vs High Performance. They just call it Low Power because of the FF design.

inf64 · Apr 28, 2015

Hmm, if the core itself is around the size of Broadwell core (~7mm^2) then one unit with L2+L3 cache might be around 40mm^2? 4 units then becomes a very viable choice for high end part, depending on the possibility that Zen could have iGPU. If there is iGPU then it will need to have HBM so those two would blow up the size of the die to something much larger than Kaveri or Carrizo.

Zen looks rather nice on paper, lets hope it performs well and is competitive when it comes to market (intel is a constantly moving target).

MrTeal · Apr 28, 2015

sm625 said:
AMD cant win on CPU performance, and they shouldnt be trying. What they need to do is deliver a core that can process gaming-related instructions an order of magnitude faster than an intel design. That is where the biggest bottleneck is in terms of gaming performance. Its all about connecting the cpu and gpu in ways that require serious out of box thinking. Major graphics breakthroughs have occured regularly throughout history. The next one simply needs to come from the CPU. A texture compression module, a command processor, vertex processor, texture samplers, a transform core, rasterization core, tessellator, all of it needs to be connected to the CPU at a very low level. They all need to be components inside the GPU just as INT and FPU are today. Their future depends almost entirely on how far they have come in designing a true APU that can process all of those functions alongside x86 instructions, AND how far they have come in writing drivers that can run today's games on such a core to deliver an order of magnitude more FPS. They had almost 10 years to come up with something and they so far have failed to do so. What they need is basically something nearly akin to a miracle, but that doesnt mean it is impossible. It's only impossible if they havent at least been trying for the last almost 10 years. If indeed all they have done in the last 10 years is move the GPU closer to the CPU without actually integrating any of the functionality, then they should just declare bk today because they are done.

What kind of market share do you think AMD could wrest away from Intel with an absolutely awesome gaming chip, that they don't already control (IE, low cost gaming PCs and the consoles)?

Arachnotronic · Apr 28, 2015

AtenRa said:
Low Power Performance, it has way different characteristics than LPE.

It is like the old Low Power vs High Performance. They just call it Low Power because of the FF design.

There was 20nm LPE and 20nm LPP...neither of those was FinFET.

Novacius · Apr 28, 2015

inf64 said:
Hmm, if the core itself is around the size of Broadwell core (~7mm^2) then one unit with L2+L3 cache might be around 40mm^2?

512KB Cache are only ~1mm² in Samsungs 14nm. A core with caches could fit in 12mm².

ShintaiDK · Apr 28, 2015

I wonder if Zen will be called Phenom III

Snafuh · Apr 28, 2015

Novacius said:
512KB Cache are only ~1mm² in Samsungs 14nm. A core with caches could fit in 12mm².

One core is apparently <10mm²
Edit: Missed "with chaches"

insider2015
Excavator has 14,48 sqmm w/o cache in 28nm, Jaguar's L2 is ~2 sqmm.

Just because I said <10 sqmm for Zen doesn't mean it's exactly 10. It's smaller.

http://www.planet3dnow.de/vbulletin...95W-TDP-DDR4?p=5004103&viewfull=1#post5004103

Novacius · Apr 28, 2015

I know. And inf64 said that one core could only be 7mm². In that case core+L2+L3(for that core) could be 12mm².

Is nobody interested in that new slide?

sefsefsefsef · Apr 28, 2015

Arachnotronic said:
If AMD is truly "starting from scratch," then that only makes their job more difficult; it's far easier to take what is already great and to make it better than to start with nothing and produce greatness, IMO.

I would argue that playing catch up is far easier than blazing the initial trail. How do you think Apple was able to catch up so quickly?

Exophase · Apr 28, 2015

Nothingness said:
You're both correct and wrong at the same time. Yes achievable IPC is impacted by the clock. But the theoretical IPC isn't infinite and the amount of required hardware grows extremely fast. As an example, register file area grows quadratically with number of write ports (and you basically need one write port per instruction executed).

So getting high IPC is certainly difficult and increasing it by 100% isn't just a matter of deciding it.

If you intrinsically link IPC to ILP this is the case, but if you allow for some internal clock multiplier or asynchronous execution with low external clocks this changes things. Although at some point the memory interface would have to be a limiter, even if you allow a huge number of memory transfers in-flight over a very slow clock.

Arachnotronic · Apr 28, 2015

sefsefsefsef said:
I would argue that playing catch up is far easier than blazing the initial trail. How do you think Apple was able to catch up so quickly?

They bought PA Semi, threw lots and lots of money at it to hire top notch talent from the industry, and basically develop a single chip for a product that Apple itself builds.

Oh, and they bought PA Semi in 2008; Apple didn't debut a custom CPU core until 2012.

NTMBK · Apr 28, 2015

Novacius said:
New information:

Looks about right. Smaller, lower latency private L2s, and connected through a shared L3. Back to the Phenom architecture:

Which Intel also used for Nehalem:

sefsefsefsef · Apr 28, 2015

Arachnotronic said:
They bought PA Semi, threw lots and lots of money at it to hire top notch talent from the industry, and basically develop a single chip for a product that Apple itself builds.

Oh, and they bought PA Semi in 2008; Apple didn't debut a custom CPU core until 2012.

Do any of those points argue against catching up being easier than blazing the trail? I would reckon that AMD's design capability is still far greater than Apple's, and look what Apple was able to accomplish in a few short years, starting from relatively little.

AtenRa · Apr 28, 2015

Phenom/BD Caches are Exclusive, ZEN Caches according to the new slide above are Inclusive, like Intel from Nehalem forward if im not mistaken.

ZEN cache will be way faster than Phenom/BD designs.

Arachnotronic · Apr 28, 2015

sefsefsefsef said:
Do any of those points argue against catching up being easier than blazing the trail? I would reckon that AMD's design capability is still far greater than Apple's, and look what Apple was able to accomplish in a few short years, starting from relatively little.

Apple has been at the CPU design thing for like 7 years now -- hardly a "few short years." Also, have you not noticed that AMD has lost a lot of talent to the likes of Apple, Samsung, and Qualcomm?

Don't underestimate the importance of money to getting competitive projects done in a reasonable time frame.

AtenRa · Apr 28, 2015

Arachnotronic said:
Apple has been at the CPU design thing for like 7 years now -- hardly a "few short years." Also, have you not noticed that AMD has lost a lot of talent to the likes of Apple, Samsung, and Qualcomm?

Don't underestimate the importance of money to getting competitive projects done in a reasonable time frame.

Apple would only dream to have the CPU IP AMD has. Also, as was mentioned before, Keller explained they used the best things from Bulldozer and Jaguar mArchitectures to create ZEN. Jaguar is a high IPC low power design when Bulldozer is a high clock high throughput design.
Add a 14nm FF and you can do miracles in 4 years. As keller said, they already know how to make a high clock design, now they will have the process node to fully exploit what IP they have.

scannall · Apr 28, 2015

sefsefsefsef said:
Do any of those points argue against catching up being easier than blazing the trail? I would reckon that AMD's design capability is still far greater than Apple's, and look what Apple was able to accomplish in a few short years, starting from relatively little.

Rumor has it that Apple's CPU teams are roughly 3 times that of AMD.

Arachnotronic · Apr 28, 2015

AtenRa said:
Apple would only dream to have the CPU IP AMD has.

I don't think so:

Apple's CPU IP is helping the company achieve ungodly amounts of profit. Apple's operating profit last quarter was more than even Intel generated in operating profit last year! Apple's R&D budget this year will be higher than AMD's revenue in its best revenue year.

Apple spent >50% more than AMD's current market cap on a headphone company. Believe me, if Apple wants something, it doesn't sit there "dreaming" or "wishing" that it had it -- it would go and buy it.

Apple bought PA Semi and used that as a foundation to build one of the finest low power chip design houses on the planet. Apple also has deep software expertise (it controls iOS), which no doubt informs their CPU designs.

R0H1T · Apr 28, 2015

Arachnotronic said:
I don't think so:

Apple's CPU IP is helping the company achieve ungodly amounts of profit. Apple's operating profit last quarter was more than even Intel generated in operating profit last year! Apple's R&D budget this year will be higher than AMD's revenue in its best revenue year.

Apple spent >50% more than AMD's current market cap on a headphone company. Believe me, if Apple wants something, it doesn't sit there "dreaming" or "wishing" that it had it -- it would go and buy it.

Apple bought PA Semi and used that as a foundation to build one of the finest low power chip design houses on the planet. Apple also has deep software expertise (it controls iOS), which no doubt informs their CPU designs.

You do realize that Apple sells their brand & not the A8X, even if they replaced the Ax SoC's with something tier 2 people would still buy it in droves. They have a cult like following but in terms of tech leadership they've never been a pioneer, except arguably with the original iphone, & have always followed the leading trends in (market) areas where they operate.

As for your other points, you can't buy 1.x IPC gains or high clock even with the money Apple has. There is a physical limit to everything & Apple is slowly but surely reaching a saturation point where the (CPU) gains will come from more cores & high clock rates, they're not getting the same IPC gains they've enjoyed recently.

Also if they're so great why haven't they added a custom GPU core to their SoC or why does Intel still needs to add more EU's to prove they've gotten better with their IGP's ? Granted they've vastly improved but most of the gains have come from dedicating more transistors & die area to the GPU, as compared to previous gen.

raghu78 · Apr 28, 2015

AtenRa said:
Apple would only dream to have the CPU IP AMD has. Also, as was mentioned before, Keller explained they used the best things from Bulldozer and Jaguar mArchitectures to create ZEN. Jaguar is a high IPC low power design when Bulldozer is a high clock high throughput design.
Add a 14nm FF and you can do miracles in 4 years. As keller said, they already know how to make a high clock design, now they will have the process node to fully exploit what IP they have.

The products where I think Zen will have the most impact is servers and APUs. High Density Libraries bring a 30% power reduction in exchange for a max frequency tradeoff. In tablets, ultrathins and notebooks and high end servers this tradeoff would be reasonable. Servers are primarily designed for maximum throughput within a target TDP. In high core count server chips (16 cores / 32 threads) each CPU core is likely running at <= 2.5 Ghz. Zen with HDL could be very impressive. Intel's process lead might be tough to overcome when competing for the highest single thread performance (Single thread Perf = Frequency x IPC) . imo a Zen APU with HBM will be a disruptive product. The APU will fulfill its true potential with HBM.

AtenRa · Apr 28, 2015

Lets take AMD as a CPU designer from the day they started there own designs, that is from K5 introduced in 1996. That is more than 20 years.
Not to mention the ATI IP they got back in 2006.

Now, how many years does Apple designing their own CPUs ??? Unlike AMD, Apple hasnt produced a High frequency, high IPC, High Throughput design so far. It is another thing to go for a high IPC, low power, low frequency Mobile design and another thing to go for a High performance design for Servers, Desktop, Laptops.

Combine the AMD CPU and GPU IP and Apple is nowhere near.

Fudzilla: New AMD Zen APU boasts up to 16 cores (plus Greenland GPU with HBM)

Member

Diamond Member

Lifer

Lifer

Lifer

Diamond Member

Diamond Member

Lifer

Member

Lifer

Member

Member

Senior member

Diamond Member

Lifer

Lifer

Senior member

Lifer

Lifer

Lifer

Golden Member

Lifer

Platinum Member

Diamond Member

Lifer