Info TOP 10 CPU Core in PPA - Performance Per Area

Richie Rich · Jun 28, 2020

Server grade table: PPA score / GHz (iso-clock).
High core count server system operates around 3Ghz due to TDP limitation.
Everything on 7nm TSMC (A13 is N7P though, A78 and X1 are projections based on ARM official data).
SPEC data based on Anandtech's articles: https://www.anandtech.com/show/15813/arm-cortex-a78-cortex-x1-cpu-ip-diverging/4

Pos	Man	CPU	Core	Core Area mm2	Year	ISA	SPEC PPA/Ghz	Relative
1	ARM Cortex	A78	Hercules	1.33	2020	ARMv8	9.41	100.0%
2	ARM Cortex	A77	Deimos	1.40	2019	ARMv8	8.36	88.8%
3	ARM Cortex	A76	Enyo	1.20	2018	ARMv8	7.82	83.1%
4	ARM Cortex	X1	Hera	2.11	2020	ARMv8	7.24	76.9%
5	Apple	A12	Vortex	4.03	2018	ARMv8	4.44	47.2%
6	Apple	A13	Lightning	4.53	2019	ARMv8	4.40	46.7%
7	AMD	3950X	Zen 2	3.60	2019	x86-64	3.02	32.1%

Desktop/workstation grade table: PPA score at maximum clock.

Pos	Man	CPU	Core	Core Area mm2	Year	ISA	SPEC PPA	Relative
1	ARM Cortex	A78	Hercules	1.33	2020	ARMv8	28.24	100.0%
2	ARM Cortex	A77	Deimos	1.40	2019	ARMv8	23.73	84.0%
3	ARM Cortex	A76	Enyo	1.20	2018	ARMv8	22.21	78.7%
4	ARM Cortex	X1	Hera	2.11	2020	ARMv8	21.73	76.9%
5	AMD	3950X	Zen 2	3.60	2019	x86-64	13.89	49.2%
6	Apple	A13	Lightning	4.53	2019	ARMv8	11.66	41.3%
7	Apple	A12	Vortex	4.03	2018	ARMv8	11.25	39.8%

.

CHADBOGA · Jun 28, 2020

Finally, I now feel that this is the thread to bring new vibrancy to Anandtech forums.

Richie Rich · Jun 29, 2020

64-core EPYC Milan based on Zen3 is coming this year. Amazon has 64-core monolith Graviton2 based on pretty old A76/N1 and this G2 is beating Zen2 Rome in performance per thread. 80-core and 128-core Ampere Altra is coming soon also based on A76. That's not good for x86.

Imagine Graviton3 based on A78/N2 and manufactured on TSMC 5nm. This will have at least 128 cores. And if Ampere can manufacture 128-core on 7nm, 256-core monolith is also possible on 5nm. It's like A nightmare on a elm x86 street. And ARM is the Fredy Krueger here

Thala · Jun 29, 2020

So these are area efficiency values? Just asking because PPA typically means Power-Performance-Area, but this is just Performance-Per-Area?

Richie Rich · Jun 29, 2020

Thala said:
So these are area efficiency values? Just asking because PPA typically means Power-Performance-Area, but this is just Performance-Per-Area?

Mathematicaly it's simple as that: SPECint2006 score divided by Area.

When I checked ARM's PPA definition it looks like pretty complex tool for multi-criterion optimization used during core development.

My performance per area table is just a simple comparison of cores. Useful probably for server CPU prediction. I was impressed by new Cortex X1 and its 30% IPC uplift from A77. But the A78 look like a real server killer. When @soresu mentioned G3 will contain A78 I was thinking why not X1. But from many-core design and area savings it looks like the ultimate choice.

MrTeal · Jun 29, 2020

3950X isn't a server CPU if you're making a server grade table. If you want a Rome server CPU, AT has the 7742 benchmarked in SPECint2006.

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

Scores 39.2459 in SPECint2006, so 11.5429/GHz, or 3.21 for your ISO-clock metric.

Richie Rich · Jun 29, 2020

MrTeal said:
3950X isn't a server CPU if you're making a server grade table. If you want a Rome server CPU, AT has the 7742 benchmarked in SPECint2006.

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

Scores 39.2459 in SPECint2006, so 11.5429/GHz, or 3.21 for your ISO-clock metric.

Good job. It makes sense. I'm gonna update table with your numbers.

Those 3.21 pts/mm2/GHz doesn't help to change the tragic last place though.
Those 11.54 pts/mm2 gets even worse and EPYC fall down behind Apple A13.
In PPC/IPC table there would be small benefit after all.

Topweasel · Jun 29, 2020

I don't know the whole table seems screwed up and useless if you are tracking everything to clockspeed instead of TDP, but honestly at this point in the world its probably closer to perf mm2 and that and only that. Its a performance density that the server solutions are trying to hit. This doesn't use one and ignore the value of the other.

Richie Rich · Jul 3, 2020

Topweasel said:
I Its a performance density that the server solutions are trying to hit. This doesn't use one and ignore the value of the other.

I agree with you that performance density is the key factor for server solutions. But how do you calculate performance density?

I think that performance density is equal to performance per area however feel free to provide me your formula. I can recalculate and create chart. It would be interesting to compare.

NTMBK · Jul 3, 2020

Oh good, another unnecessary thread about ARM

zir_blazer · Jul 3, 2020

I'm almost positively sure than at some point of time, back when everyone was stuck at 90nm, AnandTech did an article comparing different architectures core die sizes, and it included both AMD K8, Intel P4 and Itanium. I think it was between the 1 MB Cache L2 90nm A64 San Diego (Or whatever the Opteron version counterpart was named), Prescott, and Montecito, as all those were 90nm. I can't find it, but the point is that after removing the big Cache L3, Itanium was on paper stupidly more performant based on core die size alone than whatever x86 could do, but on practice that wasn't the case. And the Cache was pretty much mandatory, so even if the core itself wasn't big, you still require a lot of die area for the actual core to perform properly, and at that point x86 was actually kicking Itanium butt.

Basically, these entirely synthetic analysis are as good as nothing because there is no correlation between them and real code found on the wild, plus all the complementary auxiliary die functions that takes a lot of die space but you need anyways. Thus, you end up comparing whole Processor vs whole Processor. These is a reason why you don't see ARM completely pwning x86 in Linux benchmarks like those done by Phoronix or ServeTheHome, and that's because in real world these paper advantages simply don't materialize.

Richie Rich · Jul 4, 2020

zir_blazer said:
I'm almost positively sure than at some point of time, back when everyone was stuck at 90nm, AnandTech did an article comparing different architectures core die sizes, and it included both AMD K8, Intel P4 and Itanium. I think it was between the 1 MB Cache L2 90nm A64 San Diego (Or whatever the Opteron version counterpart was named), Prescott, and Montecito, as all those were 90nm. I can't find it, but the point is that after removing the big Cache L3, Itanium was on paper stupidly more performant based on core die size alone than whatever x86 could do, but on practice that wasn't the case. And the Cache was pretty much mandatory, so even if the core itself wasn't big, you still require a lot of die area for the actual core to perform properly, and at that point x86 was actually kicking Itanium butt.

Basically, these entirely synthetic analysis are as good as nothing because there is no correlation between them and real code found on the wild, plus all the complementary auxiliary die functions that takes a lot of die space but you need anyways. Thus, you end up comparing whole Processor vs whole Processor. These is a reason why you don't see ARM completely pwning x86 in Linux benchmarks like those done by Phoronix or ServeTheHome, and that's because in real world these paper advantages simply don't materialize.

There are two types of people:

the ones who is looking for a ways to do things
the ones who is looking for an excuses to do nothing

BTW: That Graviton2 is the only monolithic 64-core server CPU thanks to superb performance per area of A76. That's the reason why AMD uses only 8-core chiplets. And yet ARM Graviton2 has higher performance per thread than Zen2 EPYC.

And do you know what is the best thing? Money. 64-core EPYC cost 7,500 USD while 64-core ARM Gravion2 cost about 500 USD. In x86 monopolistic world you are forced to buy an over-priced Intel or AMD. In ARM world you can manufacture your own CPU which cost a fraction. x86 is dead due to economic reasons as well as technical ones.

Markfw · Jul 4, 2020

Richie Rich said:
There are two types of people:

the ones who is looking for a ways to do things

the ones who is looking for an excuses to do nothing

BTW: That Graviton2 is the only monolithic 64-core server CPU thanks to superb performance per area of A76. That's the reason why AMD uses only 8-core chiplets. And yet ARM Graviton2 has higher performance per thread than Zen2 EPYC.

And do you know what is the best thing? Money. 64-core EPYC cost 7,500 USD while 64-core ARM Gravion2 cost about 500 USD. In x86 monopolistic world you are forced to buy an over-priced Intel or AMD. In ARM world you can manufacture your own CPU which cost a fraction. x86 is dead due to economic reasons as well as technical ones.

Where did you come up with 500 for the Graviton ? If thats manufacturing cost, then EPYC is probably less than that. I bought a 64 core EPYC for $1000 (but it was an ES) Retail versions are going for $3500-4500 on ebay. The 7500 is retail MSRP. Companies do not pay anything close to that.

Richie Rich · Jul 4, 2020

Markfw said:
Where did you come up with 500 for the Graviton ? If thats manufacturing cost, then EPYC is probably less than that. I bought a 64 core EPYC for $1000 (but it was an ES) Retail versions are going for $3500-4500 on ebay. The 7500 is retail MSRP. Companies do not pay anything close to that.

EPYC is 1005 mm2 total area. Graviton2 is estimated 350mm2. EPYC cannot be cheaper than Graviton2.

500 USD is my rough cost estimation including silicon and Anapurna labs development costs. If AWS makes high volume the price could be even significantly lower than that. Welcome in ARM world where everybody can buy ARM license

Markfw · Jul 4, 2020

Richie Rich said:
EPYC is 1005 mm2 total area. Graviton2 is estimated 350mm2. EPYC cannot be cheaper than Graviton2.

500 USD is my rough cost estimation including silicon and Anapurna labs development costs. If AWS makes high volume the price could be even significantly lower than that. Welcome in ARM world where everybody can buy ARM license

In other words, its a wild guess. Stop making statements that sound like facts that are actually wild guesses.

And a monolithic die costs a lot more than chiplets. THAT IS A FACT.

So it is very possible that EPYC is cheaper than Graviton, solely for MFG cost.

Vixis Rei · Jul 4, 2020

Richie Rich said:
EPYC is 1005 mm2 total area. Graviton2 is estimated 350mm2. EPYC cannot be cheaper than Graviton2.

500 USD is my rough cost estimation including silicon and Anapurna labs development costs. If AWS makes high volume the price could be even significantly lower than that. Welcome in ARM world where everybody can buy ARM license

Oh.
Have you ever used a die per wafer calculator?
You have to consider the size of the die, the wafer cost, the margin that the company wants, and the price the market can actually handle relative to the competetion before stating final prices.

No one is giving out CPU's for free if they can help it.

As Markfw has said, EPYC uses chiplets and cheap I/O dies, letting AMD get the vast majority of use from a single wafer allowing them to be much more nimble on pricing if need be. A defective chiplet can easilly be sold as a lesser product.

KompuKare · Jul 4, 2020

zir_blazer said:
Basically, these entirely synthetic analysis are as good as nothing because there is no correlation between them and real code found on the wild, plus all the complementary auxiliary die functions that takes a lot of die space but you need anyways. Thus, you end up comparing whole Processor vs whole Processor. These is a reason why you don't see ARM completely pwning x86 in Linux benchmarks like those done by Phoronix or ServeTheHome, and that's because in real world these paper advantages simply don't materialize.

Well, it takes a lot more to get a fast design than just having a good ISA.
ISA advantages are real but no longer as important as they were in the 8086 / 80286 or 80386 days (remember a lot of the issue with DOS and early Windows and all the nonsense was because IBM chose Intel instead of more rational designs like the Motorola 68000 - if they had PCs would have been in 32 bit from the very beginning).

However, not having enough registers, having to still potentially emulate a 8086 etc all costs transistors. But the actual most important advantage ARM has over x86 is not that though.

The fact that almost everyone (after that happened to HiSillicon it definitely is almost) can licence the design or ISA and adapt it to that they require.

For mobile that means what we currently see - SOCs which excel at power efficiency, are designed for power efficiency above all else, dedicate large areas of the die to GPU and NPU and camera sensors etc., have memory systems build for low power not max performance.

The few server ARM designs are starting to differentiate themselves from that, but desktop and server loads have similar requirements and designing good interconnects, memory controllers, etc. takes time.

Think of ARM as being like open-source and x86 as being closed-source.
While open source isn't the answer to everything, once it reaches a major momentum it is hard to stand against it. And the smartphone sales have been that major momentum not only enabling ARM to get revenue but TSMC etc. as well plus giving plenty of silicon designer teams a lof of experience.

I am certainly not in the ARM will take over the world camp, however the future of x86 everywhere is by no means guaranteed and certain parts of the server market seem ripe for adapting ARM.

UsandThem · Jul 4, 2020

Any additional discussion/metrics can go in the existing ARM thread. Locking this thread.
https://forums.anandtech.com/threads/top-20-of-the-worlds-most-powerful-cpu-cores-ipc-ppc-comparison.2580622

AT Mod Usandthem

Info TOP 10 CPU Core in PPA - Performance Per Area

Senior member

Platinum Member

Senior member

Golden Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Lifer

Golden Member

Senior member

Moderator Emeritus, Elite Member

Senior member

Moderator Emeritus, Elite Member

Junior Member

Golden Member

Elite Member