Info TOP 10 CPU Core in PPA - Performance Per Area

Status
Not open for further replies.

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Server grade table: PPA score / GHz (iso-clock).
High core count server system operates around 3Ghz due to TDP limitation.
Everything on 7nm TSMC (A13 is N7P though, A78 and X1 are projections based on ARM official data).
SPEC data based on Anandtech's articles: https://www.anandtech.com/show/15813/arm-cortex-a78-cortex-x1-cpu-ip-diverging/4




Pos
Man
CPU
Core
Core Area mm2
Year
ISA
SPEC PPA/Ghz
Relative
1​
ARM Cortex​
A78​
Hercules​
1.33​
2020​
ARMv8​
9.41​
100.0%​
2​
ARM Cortex​
A77​
Deimos​
1.40​
2019​
ARMv8​
8.36​
88.8%​
3​
ARM Cortex​
A76​
Enyo​
1.20​
2018​
ARMv8​
7.82​
83.1%​
4​
ARM Cortex​
X1​
Hera​
2.11​
2020​
ARMv8​
7.24​
76.9%​
5​
Apple​
A12​
Vortex​
4.03​
2018​
ARMv8​
4.44​
47.2%​
6​
Apple​
A13​
Lightning​
4.53​
2019​
ARMv8​
4.40​
46.7%​
7​
AMD​
3950X​
Zen 2​
3.60​
2019​
x86-64​
3.02​
32.1%​





Desktop/workstation grade table: PPA score at maximum clock.

Pos
Man
CPU
Core
Core Area mm2
Year
ISA
SPEC PPA
Relative
1​
ARM Cortex​
A78​
Hercules​
1.33​
2020​
ARMv8​
28.24​
100.0%​
2​
ARM Cortex​
A77​
Deimos​
1.40​
2019​
ARMv8​
23.73​
84.0%​
3​
ARM Cortex​
A76​
Enyo​
1.20​
2018​
ARMv8​
22.21​
78.7%​
4​
ARM Cortex​
X1​
Hera​
2.11​
2020​
ARMv8​
21.73​
76.9%​
5​
AMD​
3950X​
Zen 2​
3.60​
2019​
x86-64​
13.89​
49.2%​
6​
Apple​
A13​
Lightning​
4.53​
2019​
ARMv8​
11.66​
41.3%​
7​
Apple​
A12​
Vortex​
4.03​
2018​
ARMv8​
11.25​
39.8%​



.
 
  • Like
Reactions: Tlh97

Richie Rich

Senior member
Jul 28, 2019
470
229
76
64-core EPYC Milan based on Zen3 is coming this year. Amazon has 64-core monolith Graviton2 based on pretty old A76/N1 and this G2 is beating Zen2 Rome in performance per thread. 80-core and 128-core Ampere Altra is coming soon also based on A76. That's not good for x86.

Imagine Graviton3 based on A78/N2 and manufactured on TSMC 5nm. This will have at least 128 cores. And if Ampere can manufacture 128-core on 7nm, 256-core monolith is also possible on 5nm. It's like A nightmare on a elm x86 street. And ARM is the Fredy Krueger here :)
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
So these are area efficiency values? Just asking because PPA typically means Power-Performance-Area, but this is just Performance-Per-Area?
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
So these are area efficiency values? Just asking because PPA typically means Power-Performance-Area, but this is just Performance-Per-Area?
Mathematicaly it's simple as that: SPECint2006 score divided by Area.

When I checked ARM's PPA definition it looks like pretty complex tool for multi-criterion optimization used during core development.

My performance per area table is just a simple comparison of cores. Useful probably for server CPU prediction. I was impressed by new Cortex X1 and its 30% IPC uplift from A77. But the A78 look like a real server killer. When @soresu mentioned G3 will contain A78 I was thinking why not X1. But from many-core design and area savings it looks like the ultimate choice.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
3950X isn't a server CPU if you're making a server grade table. If you want a Rome server CPU, AT has the 7742 benchmarked in SPECint2006.
Scores 39.2459 in SPECint2006, so 11.5429/GHz, or 3.21 for your ISO-clock metric.
Good job. It makes sense. I'm gonna update table with your numbers.

Those 3.21 pts/mm2/GHz doesn't help to change the tragic last place though.
Those 11.54 pts/mm2 gets even worse and EPYC fall down behind Apple A13.
In PPC/IPC table there would be small benefit after all.
 
  • Haha
Reactions: Zucker2k

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
I don't know the whole table seems screwed up and useless if you are tracking everything to clockspeed instead of TDP, but honestly at this point in the world its probably closer to perf mm2 and that and only that. Its a performance density that the server solutions are trying to hit. This doesn't use one and ignore the value of the other.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
I Its a performance density that the server solutions are trying to hit. This doesn't use one and ignore the value of the other.
I agree with you that performance density is the key factor for server solutions. But how do you calculate performance density?

I think that performance density is equal to performance per area however feel free to provide me your formula. I can recalculate and create chart. It would be interesting to compare.
 

zir_blazer

Golden Member
Jun 6, 2013
1,164
406
136
I'm almost positively sure than at some point of time, back when everyone was stuck at 90nm, AnandTech did an article comparing different architectures core die sizes, and it included both AMD K8, Intel P4 and Itanium. I think it was between the 1 MB Cache L2 90nm A64 San Diego (Or whatever the Opteron version counterpart was named), Prescott, and Montecito, as all those were 90nm. I can't find it, but the point is that after removing the big Cache L3, Itanium was on paper stupidly more performant based on core die size alone than whatever x86 could do, but on practice that wasn't the case. And the Cache was pretty much mandatory, so even if the core itself wasn't big, you still require a lot of die area for the actual core to perform properly, and at that point x86 was actually kicking Itanium butt.

Basically, these entirely synthetic analysis are as good as nothing because there is no correlation between them and real code found on the wild, plus all the complementary auxiliary die functions that takes a lot of die space but you need anyways. Thus, you end up comparing whole Processor vs whole Processor. These is a reason why you don't see ARM completely pwning x86 in Linux benchmarks like those done by Phoronix or ServeTheHome, and that's because in real world these paper advantages simply don't materialize.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
I'm almost positively sure than at some point of time, back when everyone was stuck at 90nm, AnandTech did an article comparing different architectures core die sizes, and it included both AMD K8, Intel P4 and Itanium. I think it was between the 1 MB Cache L2 90nm A64 San Diego (Or whatever the Opteron version counterpart was named), Prescott, and Montecito, as all those were 90nm. I can't find it, but the point is that after removing the big Cache L3, Itanium was on paper stupidly more performant based on core die size alone than whatever x86 could do, but on practice that wasn't the case. And the Cache was pretty much mandatory, so even if the core itself wasn't big, you still require a lot of die area for the actual core to perform properly, and at that point x86 was actually kicking Itanium butt.

Basically, these entirely synthetic analysis are as good as nothing because there is no correlation between them and real code found on the wild, plus all the complementary auxiliary die functions that takes a lot of die space but you need anyways. Thus, you end up comparing whole Processor vs whole Processor. These is a reason why you don't see ARM completely pwning x86 in Linux benchmarks like those done by Phoronix or ServeTheHome, and that's because in real world these paper advantages simply don't materialize.
There are two types of people:
  1. the ones who is looking for a ways to do things
  2. the ones who is looking for an excuses to do nothing

BTW: That Graviton2 is the only monolithic 64-core server CPU thanks to superb performance per area of A76. That's the reason why AMD uses only 8-core chiplets. And yet ARM Graviton2 has higher performance per thread than Zen2 EPYC.

And do you know what is the best thing? Money. 64-core EPYC cost 7,500 USD while 64-core ARM Gravion2 cost about 500 USD. In x86 monopolistic world you are forced to buy an over-priced Intel or AMD. In ARM world you can manufacture your own CPU which cost a fraction. x86 is dead due to economic reasons as well as technical ones.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
There are two types of people:
  1. the ones who is looking for a ways to do things
  2. the ones who is looking for an excuses to do nothing

BTW: That Graviton2 is the only monolithic 64-core server CPU thanks to superb performance per area of A76. That's the reason why AMD uses only 8-core chiplets. And yet ARM Graviton2 has higher performance per thread than Zen2 EPYC.

And do you know what is the best thing? Money. 64-core EPYC cost 7,500 USD while 64-core ARM Gravion2 cost about 500 USD. In x86 monopolistic world you are forced to buy an over-priced Intel or AMD. In ARM world you can manufacture your own CPU which cost a fraction. x86 is dead due to economic reasons as well as technical ones.
Where did you come up with 500 for the Graviton ? If thats manufacturing cost, then EPYC is probably less than that. I bought a 64 core EPYC for $1000 (but it was an ES) Retail versions are going for $3500-4500 on ebay. The 7500 is retail MSRP. Companies do not pay anything close to that.
 
  • Like
Reactions: Tlh97

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Where did you come up with 500 for the Graviton ? If thats manufacturing cost, then EPYC is probably less than that. I bought a 64 core EPYC for $1000 (but it was an ES) Retail versions are going for $3500-4500 on ebay. The 7500 is retail MSRP. Companies do not pay anything close to that.
EPYC is 1005 mm2 total area. Graviton2 is estimated 350mm2. EPYC cannot be cheaper than Graviton2.

500 USD is my rough cost estimation including silicon and Anapurna labs development costs. If AWS makes high volume the price could be even significantly lower than that. Welcome in ARM world where everybody can buy ARM license :D
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
EPYC is 1005 mm2 total area. Graviton2 is estimated 350mm2. EPYC cannot be cheaper than Graviton2.

500 USD is my rough cost estimation including silicon and Anapurna labs development costs. If AWS makes high volume the price could be even significantly lower than that. Welcome in ARM world where everybody can buy ARM license :D
In other words, its a wild guess. Stop making statements that sound like facts that are actually wild guesses.

And a monolithic die costs a lot more than chiplets. THAT IS A FACT.

So it is very possible that EPYC is cheaper than Graviton, solely for MFG cost.
 

Vixis Rei

Junior Member
Jul 4, 2020
13
16
41
EPYC is 1005 mm2 total area. Graviton2 is estimated 350mm2. EPYC cannot be cheaper than Graviton2.

500 USD is my rough cost estimation including silicon and Anapurna labs development costs. If AWS makes high volume the price could be even significantly lower than that. Welcome in ARM world where everybody can buy ARM license :D

Oh.
Have you ever used a die per wafer calculator?
You have to consider the size of the die, the wafer cost, the margin that the company wants, and the price the market can actually handle relative to the competetion before stating final prices.

No one is giving out CPU's for free if they can help it.

As Markfw has said, EPYC uses chiplets and cheap I/O dies, letting AMD get the vast majority of use from a single wafer allowing them to be much more nimble on pricing if need be. A defective chiplet can easilly be sold as a lesser product.
 
Last edited:

KompuKare

Golden Member
Jul 28, 2009
1,013
924
136
Basically, these entirely synthetic analysis are as good as nothing because there is no correlation between them and real code found on the wild, plus all the complementary auxiliary die functions that takes a lot of die space but you need anyways. Thus, you end up comparing whole Processor vs whole Processor. These is a reason why you don't see ARM completely pwning x86 in Linux benchmarks like those done by Phoronix or ServeTheHome, and that's because in real world these paper advantages simply don't materialize.
Well, it takes a lot more to get a fast design than just having a good ISA.
ISA advantages are real but no longer as important as they were in the 8086 / 80286 or 80386 days (remember a lot of the issue with DOS and early Windows and all the nonsense was because IBM chose Intel instead of more rational designs like the Motorola 68000 - if they had PCs would have been in 32 bit from the very beginning).

However, not having enough registers, having to still potentially emulate a 8086 etc all costs transistors. But the actual most important advantage ARM has over x86 is not that though.

The fact that almost everyone (after that happened to HiSillicon it definitely is almost) can licence the design or ISA and adapt it to that they require.

For mobile that means what we currently see - SOCs which excel at power efficiency, are designed for power efficiency above all else, dedicate large areas of the die to GPU and NPU and camera sensors etc., have memory systems build for low power not max performance.

The few server ARM designs are starting to differentiate themselves from that, but desktop and server loads have similar requirements and designing good interconnects, memory controllers, etc. takes time.

Think of ARM as being like open-source and x86 as being closed-source.
While open source isn't the answer to everything, once it reaches a major momentum it is hard to stand against it. And the smartphone sales have been that major momentum not only enabling ARM to get revenue but TSMC etc. as well plus giving plenty of silicon designer teams a lof of experience.

I am certainly not in the ARM will take over the world camp, however the future of x86 everywhere is by no means guaranteed and certain parts of the server market seem ripe for adapting ARM.
 
Status
Not open for further replies.