Info TOP 20 of the World's Most Powerful CPU Cores - IPC/PPC comparison

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Added cores:
  • A53 - little core used in some low-end smartphones in 8-core config (Snapdragon 450)
  • A55 - used as little core in every modern Android SoC
  • A72 - "high" end Cortex core used in Snapdragon 625 or Raspberry Pi 4
  • A73 - "high" end Cortex core
  • A75 - "high" end Cortex core
  • Bulldozer - infamous AMD core
Geekbench 5.1 PPC chart 6/23/2020:

Pos
Man
CPU
Core
Year
ISA
GB5 Score
GHz
PPC (score/GHz)
Relative to 9900K
Relative to Zen3
1​
Nuvia​
(Est.)​
Phoenix (Est.)​
2021​
ARMv9.0​
2001​
3.00​
667.00​
241.0%​
194.1%​
2​
Apple​
A15 (est.)​
(Est.)​
2021​
ARMv9.0​
1925​
3.00​
641.70​
231.8%​
186.8%​
3​
Apple​
A14 (est.)​
Firestorm​
2020​
ARMv8.6​
1562​
2.80​
558.00​
201.6%​
162.4%​
4​
Apple​
A13​
Lightning​
2019​
ARMv8.4​
1332​
2.65​
502.64​
181.6%​
146.3%​
5​
Apple​
A12​
Vortex​
2018​
ARMv8.3​
1116​
2.53​
441.11​
159.4%​
128.4%​
6​
ARM Cortex​
V1 (est.)​
Zeus​
2020​
ARMv8.6​
1287​
3.00​
428.87​
154.9%​
124.8%​
7​
ARM Cortex​
N2 (est.)​
Perseus​
2021​
ARMv9.0​
1201​
3.00​
400.28​
144.6%​
116.5%​
8​
Apple​
A11​
Monsoon​
2017​
ARMv8.2​
933​
2.39​
390.38​
141.0%​
113.6%​
9​
Intel​
(Est.)​
Golden Cove (Est.)​
2021​
x86-64​
1780​
4.60​
386.98​
139.8%​
112.6%​
10​
ARM Cortex​
X1​
Hera​
2020​
ARMv8.2​
1115​
3.00​
371.69​
134.3%​
108.2%​
11
AMD
5900X (Est.)
Zen 3 (Est.)
2020
x86-64
1683
4.90
343.57
124.1%
100.0%
12​
Apple​
A10​
Hurricane​
2016​
ARMv8.1​
770​
2.34​
329.06​
118.9%​
95.8%​
13​
Intel​
1065G7​
Icelake​
2019​
x86-64​
1252​
3.90​
321.03​
116.0%​
93.4%​
14​
ARM Cortex​
A78​
Hercules​
2020​
ARMv8.2​
918​
3.00​
305.93​
110.5%​
89.0%​
15​
Apple​
A9​
Twister​
2015​
ARMv8.0​
564​
1.85​
304.86​
110.1%​
88.7%​
16
AMD
3950X
Zen 2
2019
x86-64
1317
4.60
286.30
103.4%
83.3%
17​
ARM Cortex​
A77​
Deimos​
2019​
ARMv8.2​
812​
2.84​
285.92​
103.3%​
83.2%​
18​
Intel​
9900K​
Coffee LakeR​
2018​
x86-64​
1384​
5.00​
276.80​
100.0%​
80.6%​
19​
Intel​
10900K​
Comet Lake​
2020​
x86-64​
1465​
5.30​
276.42​
99.9%​
80.5%​
20​
Intel​
6700K​
Skylake​
2015​
x86-64​
1032​
4.00​
258.00​
93.2%​
75.1%​
21​
ARM Cortex​
A76​
Enyo​
2018​
ARMv8.2​
720​
2.84​
253.52​
91.6%​
73.8%​
22​
Intel​
4770K​
Haswell​
2013​
x86-64​
966​
3.90​
247.69​
89.5%​
72.1%​
23​
AMD​
1800X​
Zen 1​
2017​
x86-64​
935​
3.90​
239.74​
86.6%​
69.8%​
24​
Apple​
A13​
Thunder​
2019​
ARMv8.4​
400​
1.73​
231.25​
83.5%​
67.3%​
25​
Apple​
A8​
Typhoon​
2014​
ARMv8.0​
323​
1.40​
230.71​
83.4%​
67.2%​
26​
Intel​
3770K​
Ivy Bridge​
2012​
x86-64​
764​
3.50​
218.29​
78.9%​
63.5%​
27​
Apple​
A7​
Cyclone​
2013​
ARMv8.0​
270​
1.30​
207.69​
75.0%​
60.5%​
28​
Intel​
2700K​
Sandy Bridge​
2011​
x86-64​
723​
3.50​
206.57​
74.6%​
60.1%​
29​
ARM Cortex​
A75​
Prometheus​
2017​
ARMv8.2​
505​
2.80​
180.36​
65.2%​
52.5%​
30​
ARM Cortex​
A73​
Artemis​
2016​
ARMv8.0​
380​
2.45​
155.10​
56.0%​
45.1%​
31​
ARM Cortex​
A72​
Maya​
2015​
ARMv8.0​
259​
1.80​
143.89​
52.0%​
41.9%​
32​
Intel​
E6600​
Core2​
2006​
x86-64​
338​
2.40​
140.83​
50.9%​
41.0%​
33​
AMD​
FX-8350​
BD​
2011​
x86-64​
566​
4.20​
134.76​
48.7%​
39.2%​
34​
AMD​
Phenom 965 BE​
K10.5​
2006​
x86-64​
496​
3.70​
134.05​
48.4%​
39.0%​
35​
ARM Cortex​
A57 (est.)​
Atlas​
0​
ARMv8.0​
222​
1.80​
123.33​
44.6%​
35.9%​
36​
ARM Cortex​
A15 (est.)​
Eagle​
0​
ARMv7 32-bit​
188​
1.80​
104.65​
37.8%​
30.5%​
37​
AMD​
Athlon 64 X2 3800+​
K8​
2005​
x86-64​
207​
2.00​
103.50​
37.4%​
30.1%​
38​
ARM Cortex​
A17 (est.)​
0​
ARMv7 32-bit​
182​
1.80​
100.91​
36.5%​
29.4%​
39​
ARM Cortex​
A55​
Ananke​
2017​
ARMv8.2​
155​
1.60​
96.88​
35.0%​
28.2%​
40​
ARM Cortex​
A53​
Apollo​
2012​
ARMv8.0​
148​
1.80​
82.22​
29.7%​
23.9%​
41​
Intel​
Pentium D​
P4​
2005​
x86-64​
228​
3.40​
67.06​
24.2%​
19.5%​
42​
ARM Cortex​
A7 (est.)​
Kingfisher​
0​
ARMv7 32-bit​
101​
1.80​
56.06​
20.3%​
16.3%​

GB5-PPC-evolution.png

GB5-STperf-evolution.png

TOP10PPC_CPU_frequency_evolution_graph.png



TOP 10 - Performance Per Area comparison at ISO-clock (PPA/GHz)

Copied from locked thread. They try to avoid people to see this comparison how x86 is so bad.[/B]

Pos
Man
CPU
Core
Core Area mm2
Year
ISA
SPEC PPA/Ghz
Relative
1​
ARM Cortex​
A78​
Hercules​
1.33​
2020​
ARMv8​
9.41​
100.0%​
2​
ARM Cortex​
A77​
Deimos​
1.40​
2019​
ARMv8​
8.36​
88.8%​
3​
ARM Cortex​
A76​
Enyo​
1.20​
2018​
ARMv8​
7.82​
83.1%​
4​
ARM Cortex​
X1​
Hera​
2.11​
2020​
ARMv8​
7.24​
76.9%​
5​
Apple​
A12​
Vortex​
4.03​
2018​
ARMv8​
4.44​
47.2%​
6​
Apple​
A13​
Lightning​
4.53​
2019​
ARMv8​
4.40​
46.7%​
7​
AMD​
3950X​
Zen 2​
3.60​
2019​
x86-64​
3.02​
32.1%​



It's impressive how fast are evolving the generic Cortex cores:
  • A72 (2015) which can be found in most SBC has 1/3 of IPC of new Cortex X1 - They trippled IPC in just 5 years.
  • A73 and A75 (2017) which is inside majority of Android smart phones today has 1/2 IPC of new Cortex X1 - They doubled IPC in 3 years.

Comparison how x86 vs. Cortex cores:
  • A75 (2017) compared to Zen1 (2017) is loosing massive -34% PPC to x86. As expected.
  • A77 (2019) compared to Zen2 (2018) closed the gap and is equal in PPC. Surprising. Cortex cores caught x86 cores.
  • X1 (2020) is another +30% IPC over A77. Zen3 need to bring 30% IPC jump to stay on par with X1.

Comparison to Apple cores:
  • AMD's Zen2 core is slower than Apple's A9 from 2015.... so AMD is 4 years behind Apple
  • Intel's Sunny Cove core in Ice Lake is slower than Apple's A10 from 2016... so Intel is 3 years behind Apple
  • Cortex A77 core is slower than Apple's A9 from 2015.... but
  • New Cortex X1 core is slower than Apple's A11 from 2017 so ARM LLC is 3 years behind Apple and getting closer



GeekBench5.1 comparison from 6/22/2020:
  • added Cortex X1 and A78 performance projections from Andrei here
  • 2020 awaiting new Apple A14 Firestorm core and Zen3 core
Updated:



EDIT:
Please note to stop endless discussion about PPC frequency scaling: To have fair and clean comparison I will use only the top (high clocked) version from each core as representation for top performance.
 
Last edited:
  • Like
Reactions: chechito

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Transistors are basically free. Intel SHOULD be using more of them. Apple certainly is for every core. And yes, it's a brute force approach, in the sense that doubling the number of transistors only gets you maybe 20% higher IPC. But WHO CARES? Transistors are free! Design for that fact.
I usually agree with most your posts but not with this one. Transistors are free in x86 inefficient desktop world only. Why new Cortex A78 is -5% smaller die and 7% higher IPC in compare to previous A77? Because in a world of efficient mobile devices it matters how many transistors you have to spend. The most important rule is HOW smart those transistors are spent. Apple A13 core is big, it has 2.5 mm2 area without huge L2 cache (more less acting like L3), however in traditional setup with 0.5 MB L2 it would have 2.9 mm2 and that's pretty linear to Cortex X1 IPC. And IPC shouldn't scale linearly. That means Apple has incredibly great stuff under the hood, much better than pretty good ARM's Cortex cores. X86 vendors are totally useless and both Intel and AMD has a huge problem with Gravtion2. A76/N1 in G2 is about 90% of Zen2 IPC while having 1/3 area (1.4 vs. 3.6). No x86 vendor can manufacture monolithic 64-core due to this. ARM can. And 80-core Altra and 128-core monolithic Altra Max is coming soon on 7nm. If G2 is problematic for x86 servers, than imagine A78 based G3 (+30% IPC, -5% area in compare to A76/N1). On 5nm they can built 128-core easy and possibly 160-core or 256-core monolith. That's performance disaster for x86 world which couldn't be possible without spending transistors very efficiently. Transistors are not free in servers.


This is, of course, the same argument we have about Apple's cores. Right now today, if I pumped more voltage into an A12Z, could it run at 4GHz (ie the transistors are capable of that switching speed; just run very very hot)? Or would it run very very hot but with negligible frequency improvement? No-one outside Apple and TSMC knows.
There was experimental A72 running at 4.2 Ghz when manufactured on 7nm HP process. But I do not think Apple will use this for laptops and iMacs. A14 will dominate in ST performance over x86 even at 2.9 GHz. And for MacPro/workstations? IMHO Apple will rather keep high density and put more cores than chasing high frequencies. They don't need to. Of course I might be wrong here. We'll see.

5.5 % is more than the difference between the 1800x and the 3950x in your table. That alone should bring into question the usefulness of your table and geekbench PPC as a metric.
Am I forcing you to post in this useless thread? Do not hesitate to create your own thread based on BETTER table with BETTER table. Go ahead, I will apreciate if somebody will come up with better comparison. We can compare these as soon as ARM MacBook is released.


Bening real here I'd consider those two things Richie:

1) x86 vendors in your charts go too far into the past, they both suffered fails such as being stuck on 14 nm for way too long on Intel side and being stuck on a bad arch with extremely low IPC on AMD side. Looking only at the past two years and the next two would change things considerably, particularly the steepness of the curves would go closer to ARM if you consider Zen1-Zen4 for AMD and Ice lake to Ocean cove for Intel.
Charts are based on 64-bit systems comparison. So of course x86 has much bigger history when AMD64 extension appeared. First ARM's 64-bit chip on ARMv8 was released 2013 and it was Apple's A7 Cyclone (and it was first 4xALU ARM core, released just two months after first 4xALU x86 core Intel Haswell). A7 had lower IPC than Haswell but look at how they pushed development hard and how much IPC they brought every single year. I really doubt that AMD or Intel will bring 15% IPC every year.


2)While I do believe in the advantage ARM cores have for the time being, especially Apple's big lead in IPC, there isn't really any reason to prevent other teams to catch up and do the same tricks, especially now that it's been proved possible and with the competitive push that will grow with desktops/laptops running ARM silicon in the coming years.
Yes, ARM's new microarchitecture Matterhorn based on ARMv9 and SVE2 2048-bit capable vectors should have IPC higher than Apple's A12. Probably also having 6xALUs like Apple. Cortex cores are getting closer to Apple. Pretty awesome stuff is coming next year.


With that said, 100% IPC above Skylake a thing with A14? Cool: then I can only wait for Intel answer after 5 years of slumber, actually close to 6, not to talk about whatever AMD might come up with if they keep the pace up with Zen and their engineering renaissance.

As for the argument of clocks I don't buy x86 being stuck: stock speeds have been slowly growing since the fall from P4 to core, with few exception being initial process issues, but never arch related.
It looks like there's a wall at 5GHz... not, currently passed even by the then "fat" Skylake core. Tiger lake samples on 10 nm have leaked with 4.7 GHz speeds (and +20-25% IPC possibly) not bad for a dead node with initial speeds of 3 GHz on Cannon lake...

What I'm saying is I can see future 3GHz GHz apple cores with 200% Skylake IPC (so old it becomes a metric now xD) but also 5GHz Golden coves with 150% IPC and Zen 3 about the same. You do the math an tell me who still leads on absolute performance then, 2022 time.
Yeah, A14 will have probably slightly more than double IPC over Skylake. That's very cool, I agree. However Golden Cove with 150% Skylake IPC in 2021/2022 will need to fight against Apple A15/A16 with:
  • A14 is 15% IPC up ..... 182% IPC Skylake * 1.15 = .............................209%
  • A15 is 15% IPC up ..... 182% IPC Skylake * 1.15 *1.15 = ...................240%
  • A16 is 15% IPC up ..... 182% IPC Skylake * 1.15 *1.15* 1.15 = .........277%
I'm affraid that AMD and Intel are screwed despite of Golden Cove and Zen3.... Maybe not in desktops where they can use high frequency but certainly in servers where 64-core AMD Epycs have a base clock 2.25 GHz and boost 3.2 GHz. There comes performance per area metric where A78 can extract 3x higher performance per mm2 than Zen2. x86 is dead IMHO.
 

name99

Senior member
Sep 11, 2010
404
303
136
Indeed, I seem to remember hearing about an observation in microarchitecture design.

On average if you increase the number of transistors a designer has available, the IPC gain that can be obtained from using these transistors is normally equal to the square root of the transistor increase.

So for example:

2x transistors (1 node shrink) = ~42% IPC increase
4x transistors (2 node shrinks) = ~100% IPC increase (2x)
16x transistors (4 node shrinks) = ~300% IPC Increase (4x)

There are clearly some caveats to this:

- It is not a hard and fast rule. Some designers are more competent than others, and may be able to extract more or less IPC than their competitors.
- Heat is still an issue, particularly on smaller and smaller nodes. If the transistor density doubles but heat is only reduced by 30%, then utilising all of the extra transistors isn’t really viable (Unless you are prepared to reduce clock speeds)
- This observation may not hold in the future, the process of going wider and deeper in core design may hit diminishing returns and become more inefficient moving forward.

Not sure if this is something others have heard of (Or even if this square root observation is true) but it seems to be correct looking at the history of microarchitectures, and it would certainly explain why a 38% increase in transistors for Sunny Cove resulted in only an 18% IPC uplift.

The square root observation is a rough claim about the behavior of caches.
It's rather more subtle than what you've said --- it's about miss rates, which don't *directly and linearly* translate into performance; and while that scaling seems valid in a rough sense, the constant in front of the scaling can be improved by throwing more transistors (of course!) at the problem to make your prefetch smarter and your cache placement/replacement algorithms smarter. (These carry on the tradition of how a 2-way associative cache will be better than a non-associative cache of the same size, and 4-way even better; or how pseudo-LRU will be better than random replacement. Both of these are, of course, 80s techniques and well-known, but there are much more modern, more performant, variants on these ideas.)

Here's one quick overview of the issue:
https://www.jilp.org/vol10/v10paper2.pdf
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Top 10 Performance Per Area thread was locked due to Gravion2 again (obviously it comes from EPYC owners). They think that by hiding of these information will stop evolution. Obviously this isn't going to work.

Markfw claims that estimated 350mm2 Graviton2 is more expensive than EPYC Rome (Zen2) due to yield penalty for monolithic chips. Let's calculate yields (Die yield calculator):
  • AMD Zen2 CPU chiplet (8.6 by 8.6 mm = 74mm2) .... yield 92.91%
  • AMD Rome IO die (20.4 by 20.4 = 416mm2)................yield 66.91%
  • Graviton2 monolith ( 18.8 by 18.8mm = 353 mm2) .... yield 70.96%
Graviton2's yield are not bad at all. It's not going to be more expensive than Rome especially when you include Rome's huge 416mm2 IO die. Ampere Altra offers their 80-core die with disabled cores down to 32/48/48/64/72 cores to further increase yields. IMHO Graviton2 and Altra manufacturing cost is similar to GPU so pretty low. It's amazing that these 64-core ARM monsters could enter desktop/workstation for the price of 16-core Ryzen 3950X. Great for customers. We can choose between 16-core speed demon or 64-core ARM monster. Last year there was announcement of 64-core ARM HiSilicon KungPeng 920 at ATX board but it never showed up sadly.

And don't forget that new ARM Cortex A78 core has +30% IPC while -5% area in compare to A76/N1 used in G2 and Altra. ARM CPUs can put 3x times more cores than x86 Zen2 at 7nm. So instead 8-core x86 chiplets it's possible to create 24-core ARM chiplets which would deliver much higher performance. A78 has also slightly higher IPC/PPC than Zen2 (8% in GB5, 15% in SPECint2006).

And if x86 vendors will continue wasting transistors with Zen3 and Golden Cove than the gap will further increase. AMD Zen3 needs to bring more than 30% IPC while be more than 5% smaller die to be able challenge A78 in servers. Rumors are 17% IPC increase only and probably massive transistor increase.


Table of Performance per are (iso-clock for servers):
Pos
Man
CPU
Core
Core Area mm2
Year
ISA
SPEC PPA/Ghz
Relative
1​
ARM Cortex​
A78​
Hercules​
1.33​
2020​
ARMv8​
9.41​
100.0%​
2​
ARM Cortex​
A77​
Deimos​
1.40​
2019​
ARMv8​
8.36​
88.8%​
3​
ARM Cortex​
A76​
Enyo​
1.20​
2018​
ARMv8​
7.82​
83.1%​
4​
ARM Cortex​
X1​
Hera​
2.11​
2020​
ARMv8​
7.24​
76.9%​
5​
Apple​
A12​
Vortex​
4.03​
2018​
ARMv8​
4.44​
47.2%​
6​
Apple​
A13​
Lightning​
4.53​
2019​
ARMv8​
4.40​
46.7%​
7​
AMD​
3950X​
Zen 2​
3.60​
2019​
x86-64​
3.02​
32.1%​







You continue to call out the moderation on this forums.
Your time here is running towards its end.


esquared
Anandtech Forum Director
 
Last edited by a moderator:

Hitman928

Diamond Member
Apr 15, 2012
5,321
8,005
136
Top 10 Performance Per Area thread was locked due to Gravion2 again (obviously it comes from EPYC owners). They think that by hiding of these information will stop evolution. Obviously this isn't going to work.

No it was locked to consolidate multiple threads on the same topic.

Markfw claims that estimated 350mm2 Graviton2 is more expensive than EPYC Rome (Zen2) due to yield penalty for monolithic chips. Let's calculate yields (Die yield calculator):
  • AMD Zen2 CPU chiplet (8.6 by 8.6 mm = 74mm2) .... yield 92.91%
  • AMD Rome IO die (20.4 by 20.4 = 416mm2)................yield 66.91%
  • Graviton2 monolith ( 18.8 by 18.8mm = 353 mm2) .... yield 70.96%
Graviton2's yield are not bad at all. It's not going to be more expensive than Rome especially when you include Rome's huge 416mm2 IO die. Ampere Altra offers their 80-core die with disabled cores down to 32/48/48/64/72 cores to further increase yields. IMHO Graviton2 and Altra manufacturing cost is similar to GPU so pretty low. It's amazing that these 64-core ARM monsters could enter desktop/workstation for the price of 16-core Ryzen 3950X. Great for customers. We can choose between 16-core speed demon or 64-core ARM monster. Last year there was announcement of 64-core ARM HiSilicon KungPeng 920 at ATX board but it never showed up sadly.

First, you're probably giving a pessimistic number for all three examples as I doubt the defect density is that bad for TSMC or GF at this point. Second, Mark never claimed Graviton2 was more expensive than Rome that I saw, he said that it was possible. Also, is Amazon offering lower than 64 core solutions for die salvaging? Not that I'm aware of. So you haven't considered that in comparing costs between Rome and Graviton2. You mention it for Ampere, but Ampere doesn't make Graviton2 so I don't see how that's relevant.

You also ignore that Rome's IO die is made on an older and much cheaper process from GF as well as the fact the the IOD will have a different defect rate than the compute cores because it is made up mostly of completely different types of circuits. Producing a chip on 7nm is going to be at least 50% more expensive than on 12/14nm due to the more advanced node costs and further maturity of the 12/14nm node. Additionally, the compute cores for Rome are used across server, desktop, and HEDT and so there is all kinds of die salvaging and product binning that can happen for maximum profit.

You also should be looking beyond yield due to defect density (your quoted number) and take into account yield due to partial dies. Basically, the smaller your chip, the better you can place them around the edge and you will get more usable dies as a percentage of the wafer. If you also consider this angle along with the ability to salvage dies on Rome across multiple SKUs and markets, it only looks worse for Graviton2. Yes it will look worse for the IO die as well, but again, you're taking that hit on a node that is much cheaper in the first place than the 50+% more expensive 7 nm so ultimately your costs will be much lower for the IO die than the Graviton2 die. Do I know that if you do the full calculation for Rome that it will be cheaper than Graviton2? No, I don't and I don't expect that when comparing a full Rome chip to Graviton2 that it will be cheaper 1:1. But bringing in the full scope of the situation with AMD's strategy of chiplets being common across the entire desktop/HEDT/server space and using a cheaper process for the large IOD puts it in the realm of possibility when you look at the total cost structure.

To be honest, this is whole thing is so silly. Graviton2 and Rome have very different designs and purposes. How many sockets does Graviton2 support, how many PCIe lanes? Maximum memory capacity? All of these things add significant power and area. Yes they obviously compete because Amazon uses them in their own cloud servers, but Amazon isn't letting anyone else use Graviton2 so their points of competition are very limited. The Ampere line is a much more direct competitor to Rome but those seem to be largely MIA from what I can tell. Have they announced any design wins? As I've said before, I'm all for better CPUs, I don't care which ISA they use. But the reality is right now x86 is dominant and it will take more than a competitive product to displace it. Maybe the next round of ARM CPUs will dominate x86 in every way and take over the world. Great, I'm all for it. But let's let it happen and discuss when we have real independent review numbers to use rather than endless speculation and fuzzy math to try and prove a point that everyone will ignore anyway until independantly verified hard numbers are produced.
 
Last edited:

Thunder 57

Platinum Member
Aug 19, 2007
2,675
3,801
136
Top 10 Performance Per Area thread was locked due to Gravion2 again (obviously it comes from EPYC owners). They think that by hiding of these information will stop evolution. Obviously this isn't going to work.

Markfw claims that estimated 350mm2 Graviton2 is more expensive than EPYC Rome (Zen2) due to yield penalty for monolithic chips. Let's calculate yields (Die yield calculator):
  • AMD Zen2 CPU chiplet (8.6 by 8.6 mm = 74mm2) .... yield 92.91%
  • AMD Rome IO die (20.4 by 20.4 = 416mm2)................yield 66.91%
  • Graviton2 monolith ( 18.8 by 18.8mm = 353 mm2) .... yield 70.96%
Graviton2's yield are not bad at all. It's not going to be more expensive than Rome especially when you include Rome's huge 416mm2 IO die. Ampere Altra offers their 80-core die with disabled cores down to 32/48/48/64/72 cores to further increase yields. IMHO Graviton2 and Altra manufacturing cost is similar to GPU so pretty low. It's amazing that these 64-core ARM monsters could enter desktop/workstation for the price of 16-core Ryzen 3950X. Great for customers. We can choose between 16-core speed demon or 64-core ARM monster. Last year there was announcement of 64-core ARM HiSilicon KungPeng 920 at ATX board but it never showed up sadly.

And don't forget that new ARM Cortex A78 core has +30% IPC while -5% area in compare to A76/N1 used in G2 and Altra. ARM CPUs can put 3x times more cores than x86 Zen2 at 7nm. So instead 8-core x86 chiplets it's possible to create 24-core ARM chiplets which would deliver much higher performance. A78 has also slightly higher IPC/PPC than Zen2 (8% in GB5, 15% in SPECint2006).

And if x86 vendors will continue wasting transistors with Zen3 and Golden Cove than the gap will further increase. AMD Zen3 needs to bring more than 30% IPC while be more than 5% smaller die to be able challenge A78 in servers. Rumors are 17% IPC increase only and probably massive transistor increase.


Table of Performance per are (iso-clock for servers):
Pos
Man
CPU
Core
Core Area mm2
Year
ISA
SPEC PPA/Ghz
Relative
1​
ARM Cortex​
A78​
Hercules​
1.33​
2020​
ARMv8​
9.41​
100.0%​
2​
ARM Cortex​
A77​
Deimos​
1.40​
2019​
ARMv8​
8.36​
88.8%​
3​
ARM Cortex​
A76​
Enyo​
1.20​
2018​
ARMv8​
7.82​
83.1%​
4​
ARM Cortex​
X1​
Hera​
2.11​
2020​
ARMv8​
7.24​
76.9%​
5​
Apple​
A12​
Vortex​
4.03​
2018​
ARMv8​
4.44​
47.2%​
6​
Apple​
A13​
Lightning​
4.53​
2019​
ARMv8​
4.40​
46.7%​
7​
AMD​
3950X​
Zen 2​
3.60​
2019​
x86-64​
3.02​
32.1%​

Mark didn't even lock the thread. And EPYC owners want to suppress information? That is foolish to suggest. And keep on coming up with fake news. If what you said was true Anandtech would run their server on an iphone.
 
  • Like
Reactions: Tlh97 and Markfw

Richie Rich

Senior member
Jul 28, 2019
470
229
76
You also should be looking beyond yield due to defect density (your quoted number) and take into account yield due to partial dies.
Fair point. When I include partial dies then we got:
  • Rome 8-core chiplet.... 672 good / from 788 total ........ total yield 85%
  • Graviton2 .................... 103 good / from 156 total ......... total yield 66% .... this is 129% more expensive.

So total cost:
  • Rome 8x chiplet ..... 8 * 74mm2 = 592 mm2
  • Graviton2 ................ 1* 350 mm2 * 1.29 yiel penalty = 451 mm2 equivalent .... this is 76% cost of Rome, so 24% cheaper (plus Rome has that huge IO die)

So speaking about price: Gravion2 350mm2 die costs like mainstream GPU. Silicon can cost 150 USD plus license payments etc. That's big difference in compare to AMD EPYC 7500 USD official price. No more x86 monopoly, no more forcing people to buy overpriced and technologicaly outdated x86 CPUs. No more forcing to choose between dumb and dumber vendor. That's ARM's economic victory.


Another ARM power is Performance Per Area. Graviton2 core A76/N1 has only 1.4mm2 so they can put almost triple core count in the same silicon area. Deal with it baby :D

And A78 with +30% higher IPC and -5% smaller area is the true x86 killer. Will Zen3 have more than 30% IPC increase over Zen2 while having smaller die? Without Keller highly unlikely. Intel is in much better situation surprisingly because Intel has Atom cores Tremont and upcoming Gracemont. And Intel is working on Snow Ridge server platform based on these small Gracemont cores. AMD has nothing except chiplet architecture (that's an old north bridge setup which everybody can and use it).

To be honest, this is whole thing is so silly. Graviton2 and Rome have very different designs and purposes. How many sockets does Graviton2 support, how many PCIe lanes? Maximum memory capacity? All of these things add significant power and area.
But let's let it happen and discuss when we have real independent review numbers to use rather than endless speculation and fuzzy math to try and prove a point that everyone will ignore anyway until independantly verified hard numbers are produced.
Graviton2 and Ampere Altra has:
  • 64x PCIe 4.0 lanes. (Rome has 128x lanes)
  • 8x DDR4 3200 memory channels
But let's let it happen and discuss when we have real independent review numbers to use rather than endless speculation and fuzzy math to try and prove a point that everyone will ignore anyway until independantly verified hard numbers are produced.
It's not fuzzy math. The Phoronix tests of Graviton2 shows that Zen1 EPYC is destroyed by G2 in every way. And Zen2 Rome is slower in performance per thread. And most probably also in power consumption and price. And based on A78 IPC improvements we can approximate that next ARM server CPUs will start to dominate in every way.

And next year is coming ARM Matterhorn new core lineup. New ARMv9 + SVE2 2048-bit capable vectors, 60-70% higher IPC/PPC than Zen2. Japanese Fugaku supercomputer, the fastest SC on the world, outperforming SC based on Nvidia huge GPUs, is based purely on CPUs on ARM + SVE vectors. You have no clue what a storm is coming into servers (including Nuvia server CPUs).
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Rich, I think I missed it- where are you getting this 350mm2 figure from?
  • 64-core A76/N1 including 1MB L2 = 64* 1.4 mm2 = 90 mm2 (Zen2 has 3.6mm2)
  • 32 MB L3 cache ... 35 mm2 .... so cores + L3 = 125 mm2
  • 8ch MEM + 128 PCIe .... Rome's IO Die is 416mm2 @ 14nm
  • 8ch MEM + 64 PCIe .... Gravion2 @7nm estimated 225 mm2 (might be even smaller than that)
------------------------------------------------------------------------
Total 350 mm2 seems reasonable.


It's just my rough estimation but I will not be far from reality (plus minus 15%).


You can also see that 80-core Ampere Altra will be just 23 mm2 larger than G2 (about 373 mm2).
You can also see that 128-core Ampere Altra Max will be 90 mm2 larger than G2 (about 440 mm2).
128-core Ampere Altra Max with A78 cores (+30% IPC, -5% area) .. 80mm2 larger (about 430 mm2).

That's a power of ARM's high performance per area ratio. X86 is doomed.
 
Last edited:
  • Haha
Reactions: Tlh97 and Antey

Hitman928

Diamond Member
Apr 15, 2012
5,321
8,005
136
Fair point. When I include partial dies then we got:
  • Rome 8-core chiplet.... 672 good / from 788 total ........ total yield 85%
  • Graviton2 .................... 103 good / from 156 total ......... total yield 66% .... this is 129% more expensive.

So total cost:
  • Rome 8x chiplet ..... 8 * 74mm2 = 592 mm2
  • Graviton2 ................ 1* 350 mm2 * 1.29 yiel penalty = 451 mm2 equivalent .... this is 76% cost of Rome, so 24% cheaper (plus Rome has that huge IO die)
Another ARM power is Performance Per Area. Graviton2 core A76/N1 has only 1.4mm2 so they can put almost triple core count in the same silicon area. Deal with it baby :D

Again, you're ignoring a very large aspect of AMD being able to die salvage defective dies where Amazon cannot. So all those defective dies for Graviton2 have to be thrown away whereas AMD gets to sell most of their defective dies for hundreds of dollars each. If you are looking at AMD's cost per 64 core Rome and including this factor, it's very realistic that it's cheaper than Graviton2 because they essentially get to subsidize wafer costs by selling defective dies for large amounts of money.


It's not fuzzy math. The Phoronix tests of Graviton2 shows that Zen1 EPYC is destroyed by G2 in every way. And Zen2 Rome is slower n performance per thread.

It is fuzzy math because you make so many assumptions when doing the math that it's not worth the napkin you're writing it on. As for the comparisons, first, you're comparing a 3 year old 32 core CPU in Zen1 against a brand new 64 core CPU in Graviton2. Second, performance per thread is your own metric that you are using to try and make Graviton2 seem competitive against Rome by conflating per core and per thread performance. No one does this. You compare core to core and SMT counts as bonus throughput when applicable. If SMT isn't beneficial, it gets turned off and Rome still outperforms Graviton2 significantly across a wide range of benchmarks.


  • 64-core A76/N1 including 1MB L2 = 64* 1.4 mm2 = 90 mm2 (Zen2 has 3.6mm2)
  • 32 MB L3 cache ... 35 mm2 .... so cores + L3 = 125 mm2
  • 8ch MEM + 128 PCIe .... Rome's IO Die is 416mm2 @ 14nm
  • 8ch MEM + 64 PCIe .... Gravion2 @7nm estimated 225 mm2 (might be even smaller than that)
------------------------------------------------------------------------
Total 350 mm2 seems reasonable.


It's just my rough estimation but I will not be far from reality (plus minus 15%).


You can also see that 80-core Ampere Altra will be just 23 mm2 larger than G2 (about 373 mm2).
You can also see that 128-core Ampere Altra Max will be 90 mm2 larger than G2 (about 440 mm2).
128-core Ampere Altra Max with A78 cores (+30% IPC, -5% area) .. 80mm2 larger (about 430 mm2).

That's a power of ARM's high performance per area ratio. X86 is doomed.

Now, I'll admit that I was taking your die sizes on good faith because the Rome numbers were correct, but obviously that was a mistake. Since we don't know the actual size of Graviton2 this whole discussion is pointless and you are grossly underestimating the size of the ARM CPUs (another example of your fuzzy math). Not sure by how much, but try and convince people that this chip is only 373 mm2, good luck.

1594038299099.png
1594038830837.png


As a comparison, here is Rome

1594038936856.png

However, the Ampere chip is monolithic versus Rome being chiplet based and having tons of empty package space between dies.

1594039173728.png

Servethehome even commented on how big the Ampere chip is:

One thing is for certain, this is a much larger package than 2nd Gen Intel Xeon Scalable Refresh chips and larger than the package (minus the carrier) of the AMD EPYC 7002 Series “Rome” CPUs. Our test lab is about 8 minutes away from where we have both x86 chips so I drove there just after my visit to validate that perception. I have handled a few hundred Xeon Scalable and EPYC parts since 2017 so it seemed bigger when I was at Ampere’s office and that was confirmed by checking the Xeon and EPYC parts just after.
One will also notice that this is a monolithic 7nm die.

In conclusion, there's a high probability that Rome is cheaper than all of its Arm competitors, even without counting things like die salvaging and common chiplet business model. It definitely appears to be cheaper than the 80 core Ampere chip and by a significant margin.
 
Last edited:

Gideon

Golden Member
Nov 27, 2007
1,646
3,712
136
You can also see that 80-core Ampere Altra will be just 23 mm2 larger than G2 (about 373 mm2).
You can also see that 128-core Ampere Altra Max will be 90 mm2 larger than G2 (about 440 mm2).
128-core Ampere Altra Max with A78 cores (+30% IPC, -5% area) .. 80mm2 larger (about 430 mm2).

LOL, here's a prime example why no-one should take your 'estimates' seriously:

1. Ian cutress said in a tweet that the 128 core ampere is single die but 'approaching reticle limits', info coming straight from Altera.
2. According to Wikichip TSMC 7nm is using i193 litography steppers which have a recticle limit of 858 mm².

So your estimate is almost 2x off.

While 80 core Altra will obviously be smaller and 64 Core Graviton even more so (particularily as it has half the L3) It will still be larger than half of the 128-core.
Why? Because I/O and memory controllers will be the same size for all of them and IO doesn't scale well. Just look at Zen1 annotated die-shot and how much of it is I/O.

Now let's extrapolate from your own gueswork :
The relative difference between 128-core ampere and 64 core Graviton by your calculation is (350 / 430) ~= 0.814
Let's be generous with Ampere die size setting it at 800 mm². that makes the die size of Graviton: 800 * (350 / 430) = 651 mm²

The reality will probably be in between, but way closer to my number than yours.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
As for the comparisons, first, you're comparing a 3 year old 32 core CPU in Zen1 against a brand new 64 core CPU in Graviton2.
  • Zen1 ......................... released 2017.
  • ARM Cortex A76 ..... released 2018 (used in Graviton2 and Altra).

A76 is an old, tiny and slow smart phone core! And yet it's destroying beefy Zen1 core designed for servers. And puts a lot of problem for Zen2 2019 Rome. No wonder that you and other x86 fans are pretty desperate.


Second, performance per thread is your own metric that you are using to try and make Graviton2 seem competitive against Rome by conflating per core and per thread performance. No one does this. You compare core to core and SMT counts as bonus throughput when applicable. If SMT isn't beneficial, it gets turned off and Rome still outperforms Graviton2 significantly across a wide range of benchmarks.
When SMT OFF Zen2 core looses 25% of performance (sometimes 40%, depends on code). This would significantly lowered Rome performance. Go ahead and use it this way. Even better for ARMs :D

Regarding the performance per thread with SMT OFF. You are comparing Zen2@3.2 GHz boost clock vs. fixed clocked small A76@2.5 GHz with IPC/PPC about 85% of Zen2. No wonder A76 is slower. And yet with SMT ON beats Zen2 in performance per thread.

And what about new Cortex A78? +30% IPC than A76 (+15% IPC over Zen2 in SPECint2006) while even -5% less area (1.3 mm2). This will destroy Zen2 Rome, equal to Zen3 while allows to put 4x more cores than Zen3. Eat this!

Or would you like Cortex X1 cores? +60% IPC over A76 (+40% IPC over Zen2) while area about 2.1mm2 (Zen2 is 3.6mm2). Still Rome or Milan looks so great? ARM cores has 3x higher performance from same area. Just wait when those chips will enter servers. It's going to be fun (for me). People in denial will have hard time.


Not sure by how much, but try and convince people that this chip is only 373 mm2, good luck.
The package is not equal to die size. Especially for server where you need a lot of pins for 64x PCIe lanes and 8-channel mem you need large package.


In conclusion, there's a high probability that Rome is cheaper than all of its Arm competitors, even without counting things like die salvaging and common chiplet business model. It definitely appears to be cheaper than the 80 core Ampere chip and by a significant margin.
You are lying to yourself if you think that 1005 mm2 large Rome is cheaper than 350 mm2 Graviton2. Why would Amazon set pricing to -40% less than x86 Xeon and Zen systems? The obvious answer is: BECAUSE IT'S WAY CHEAPER THAN X86.


@Gideon Nice estimation but based on blind shot. Nobody knows how much close 128-core Altra is to limits. Nobody knows also the density they choose (Altra should boost up to 3.3 GHz which suggest lower density than 2.5 GHz G2). 650 mm2 for G2 is too big. Rome IO die is 416 mm2 at 14 nm GF process. Even with some redundant cores, lets say there are 72-cores, this means 72 * 1.4 mm2 = 101 mm2 for cores. Plus 32MB L3 cache what is about 35 mm2, that's about 136 mm2 for core-side even with redundant cores.



136 mm2 + 416 mm2 ROME IO Die at 14nm with 128x PCIe lanes = 552 mm2. That's way lower than your 650 mm2 suggestion. And that's absurd usage of Rome's IO Die as Graviton2 has only 64x PCIe 4.0 lanes and is on 7nm. Ok, IO doesn't scale linearly but G2 has full encrypted mem channels so there are bunch of logic which will scale linearly.

Altra does uses partly damaged cores as they offer 32, 48, 64, 72, and 80-core variants.

I'm not underestimating size of ARM cores ;-)
Take a look here:
03_Infra%20Tech%20Day%202019_Filippo%20Neoverse%20N1%20FINAL%20WM15_575px.jpg


 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
5,321
8,005
136
  • Zen1 ......................... released 2017.
  • ARM Cortex A76 ..... released 2018 (used in Graviton2 and Altra).

A76 is an old, tiny and slow smart phone core! And yet it's destroying beefy Zen1 core designed for servers. And puts a lot of problem for Zen2 2019 Rome. No wonder that you and other x86 fans are pretty desperate.

I am not an x86 fan, stop with the pathetic attempts at ad hominem to try and discredit other posters. Graviton2 is based on the Neoverse N1 architecture and is a customized A76 core. The Neoverse platform was announced a while ago, but the first silicon product available was Graviton2, a 7 nm CPU, which was made available in May 2020, very near 3 years after Zen1 which was built on a 14 nm node. You are comparing a mid 2020 product with a mid 2017 product on a (now) outdated process and saying the 2020 product wins. Yet you avoid a direct comparison to the AMD 2019 product, I wonder why. Edit: To keep things a little more fair, you could say Graviton2 was official in Dec 2019 which is more similar to the release dates I gave for the AMD products, but the points still remain.

Outside of AWS using it in their own cloud instances with unknown usage rates, no Neoverse CPUs have had any major design wins or contracts that I am aware of, so how is it a problem for Rome exactly? Cloudfare did announce they would be switching to ARM based servers from Intel ones, but then they were sampled some Rome systems to test and subsequently changed their mind and said they would be going with Rome. Maybe you should contact them and tell them that they're doing it wrong.

We looked very seriously at ARM-based CPUs and continue to keep our software up to date for the ARM architecture so that we can use ARM-based CPUs when the requests per watt is interesting to us. In the meantime, we've deployed AMD's EPYC processors.


When SMT OFF Zen2 core looses 25% of performance (sometimes 40%, depends on code). This would significantly lowered Rome performance. Go ahead and use it this way. Even better for ARMs :D

Regarding the performance per thread with SMT OFF. You are comparing Zen2@3.2 GHz boost clock vs. fixed clocked small A76@2.5 GHz with IPC/PPC about 85% of Zen2. No wonder A76 is slower. And yet with SMT ON beats Zen2 in performance per thread.

No. Again, you're just making up numbers you think make your argument sound good.


1594049531998.png
1594049870065.png

If a server use case can benefit from SMT, you leave it on, if not, you turn it off. Either way you get better performance than Graviton2 except in a small percentage of cases where Graviton2 wins.


And what about new Cortex A78? +30% IPC than A76 (+15% IPC over Zen2 in SPECint2006) while even -5% less area (1.3 mm2). This will destroy Zen2 Rome, equal to Zen3 while allows to put 4x more cores than Zen3. Eat this!

Or would you like Cortex X1 cores? +60% IPC over A76 (+40% IPC over Zen2) while area about 2.1mm2 (Zen2 is 3.6mm2). Still Rome or Milan looks so great? ARM cores has 3x higher performance from same area. Just wait when those chips will enter servers. It's going to be fun (for me). People in denial will have hard time.

Let's see where we are out when these actually come out and not put the cart before the horse, shall we? As I've said before, If ARM takes over the compute world, I'm fine with that but I'm also not gonna sit here and ignore reality and pretend like x86 is already defeated and should just pack up shop. I've said before that ARM is making impressive gains in terms of competitiveness in the server world, but this gen isn't it yet. Next gen looks much stronger to me but we'll have to wait and see what the competitive landscape looks like at that time.

The package is not equal to die size. Especially for server where you need a lot of pins for 64x PCIe lanes and 8-channel mem you need large package. You are lying to yourself if you think that 1005 mm2 large Rome is cheaper than 350 mm2 Graviton2. Why would Amazon set pricing to -40% less than x86 Xeon and Zen systems? The obvious answer is: BECAUSE IT'S WAY CHEAPER THAN X86.

You continue to make up whatever numbers you want and ignore all evidence that doesn't fit your preconceived narrative. You've done this over and over and over. Even when its pointed out to you, you just ignore those posts and keep on posting the same wrong information. I largely ignore everything you post for this reason, but putting words in other people's mouths and using your made up numbers to try and prove other people wrong needs to stop.

BTW, Rome compute instance prices are only 13% higher priced than Graviton2, most likely due to the lower power consumption of Graviton2 and not at all because it is cheaper to make. The price of the CPU is a tiny part of the TCO of running a compute server. But remember, Amazon is running 48 core Epyc CPUs so they can offer a 16x and an 8x instance on the same CPU compared to a single 16x instance on Graviton2. If AWS is using dual socket Rome systems, then they can offer 2 of each, or 3 of the 16x Rome instances per node compared to Graviton's single 16x instance. Because of this, it's likely that Amazon may be subsidizing their Graviton2 instances to try and get people to switch over to ARM, it's something most large businesses do when trying to gain market presence. I doubt we every find out for sure because Amazon has been very tight lipped about Graviton2 specifics like power draw and platform costs.


c6g.16xlarge64N/A128 GiBEBS Only$2.176 per Hour

c5a.16xlarge64N/A128 GiBEBS Only$2.464 per Hour
 
Last edited:

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
You are lying to yourself if you think that 1005 mm2 large Rome is cheaper than 350 mm2 Graviton2. Why would Amazon set pricing to -40% less than x86 Xeon and Zen systems? The obvious answer is: BECAUSE IT'S WAY CHEAPER THAN X86.
Did you know that TSMC sell wafers priced according to the process used, and do not take into consideration whether the transistors operate using ARM or x86 ISA?
Your answer is not the obvious one, but rather the oblivious one.
 

Vixis Rei

Junior Member
Jul 4, 2020
13
16
41
Hitman928
It's interesting. i never bothered to look at the actual die shot of Gravaton2 or Ampere myself. I didn't expect Ampere to be that huge either. I myself just took his word for it. Ampere is around the size / slightly bigger then that ROME CPU. It's totally clear that at this point chiplets are the future going forward.


  • You are lying to yourself if you think that 1005 mm2 large Rome is cheaper than 350 mm2 Graviton2. Why would Amazon set pricing to -40% less than x86 Xeon and Zen systems? The obvious answer is: BECAUSE IT'S WAY CHEAPER THAN X86.

The reason why they would sell Gravaton2 cheaper is because it's amazon trying to bulldoze their way into the market. They're willing to take a loss on these CPU's just to get into the market or giving them away. It's how amazon works. But with how skittish and slow Server people adopt stuff. Amazon would have to be selling these for years at losses while keeping a pretty good track record of roadmaps/ect for them to be adopted en masse and still manage to keep up with AMD as they move onto 3D stacking and fun CPU x GPU stuff. And that's not stuff i expect to see from ARM for a few more years. at least not before intel/AMD.

Also there are 17 Zen 2 based desktop models. And there are 25 Zen 2 based Server models. All 42 of these models are highly binned so that AMD can maximize the profit of a single wafer. AMD looses very little money here. This is why chiplets are so beneficial. This is not an x86 vs ARM thing. it's a technology thing and currently AMD is the one to have such a advantage here over everyone else including Intel.


Edit: Corrections made as pointed out by hitman928.
 
Last edited:

name99

Senior member
Sep 11, 2010
404
303
136
Again, you're ignoring a very large aspect of AMD being able to die salvage defective dies where Amazon cannot. So all those defective dies for Graviton2 have to be thrown away whereas AMD gets to sell most of their defective dies for hundreds of dollars each.

Your claim makes no sense to me.
If AMD are able to provide enough enough redundancy and work-around logic to salvage value from dies on which some cores are faulty, why can't Amazon do the same?
They sell smaller virtual instances that could still fit on such dies, even apart from using them for Amazon internal services.
 

Doug S

Platinum Member
Feb 8, 2020
2,269
3,521
136
Your claim makes no sense to me.
If AMD are able to provide enough enough redundancy and work-around logic to salvage value from dies on which some cores are faulty, why can't Amazon do the same?
They sell smaller virtual instances that could still fit on such dies, even apart from using them for Amazon internal services.


I assumed he was talking about selling 32 core dies as 28 cores or something like that as a slightly lower end product since AMD & Intel both sell a lot of SKUs with varying performance levels, numbers of cores and other product segmentation strategies.

There's no reason Ampere can't do the same and offer a slightly cheaper version with fewer cores, or simply put 132 cores on the "128 core" product so that if a few are bad it can still be sold as 128 cores.
 

Hitman928

Diamond Member
Apr 15, 2012
5,321
8,005
136
I assumed he was talking about selling 32 core dies as 28 cores or something like that as a slightly lower end product since AMD & Intel both sell a lot of SKUs with varying performance levels, numbers of cores and other product segmentation strategies.

There's no reason Ampere can't do the same and offer a slightly cheaper version with fewer cores, or simply put 132 cores on the "128 core" product so that if a few are bad it can still be sold as 128 cores.

Yes, because of AMD's chiplet model they can use chiplets in various ways and in various markets with reduced core counts all the way down to 4 cores per module if needed. Ampere can and already said they are doing the same but will have less flexibility due to the monolithic nature of the CPU. For Graviton2, I was just pointing out that Amazon hasn't given any indication that they are doing this and most likely they are not. I find it highly suspect that they would waste the rack space by putting reduced core count CPUs in servers to fulfill reduced core count instances, this would cause your server footprint to increase and efficiency to decrease. It's really a terribly inefficient way of building out your cloud infrastructure and they don't do this with, for instance, their Rome offerings as far as I'm aware, they have a single, high core count SKU that they use across all of their Rome instances.
 
  • Like
Reactions: Tlh97

DrMrLordX

Lifer
Apr 27, 2000
21,640
10,858
136
@Hitman928

I just want to point out (as I have in other threads) that you - and others - should be careful about citing the geometric mean from Phoronix's test suite. In the case of server workloads, their selection of benchmarks is a little closer to the "norm" you would find on other sites, but it still bears repeating that one should carefully consider the implication of each score from the test suite.

Graviton2 was found to be horrendous in their kernel compile bench, but superior in other areas (GROMACs, MariaDB, maybe a few others). It's really lopsided. And anyone that wants to run PostgreSQL on Graviton2 must be completely out of their mind.
 

Hitman928

Diamond Member
Apr 15, 2012
5,321
8,005
136
Hitman928
It's interesting. i never bothered to look at the actual die shot of Gravaton2 myself. I didn't expect it to be that huge. I myself just took his word for it but It's certainly around the size / slightly bigger then that ROME CPU. It's totally clear that at this point chiplets are the future going forward.




If you look at the pictures Hitman928 posted. look at the CPU relative to the RAM sticks. While both appear to be the same in therms of vertical length, the horizontal length of the ampere chip is a tiny bit longer. Then you have to look at the ROME CPU and realize it's made of chiplets and there is a ton of empty space between each individual one. In fact. you have made me curious as to why there is so much empty space

There is no way that Gravaton2 chip is anywhere close to 350 mm2. If anything it's slightly larger then EPYC's total package. The reason why they would sell Gravaton2 cheaper is because it's amazon trying to bulldoze their way into the market. They're willing to take a loss on these CPU's just to get into the market. It's how amazon works. But with how skittish and slow Server people adopt stuff. Amazon would have to be selling these for years at losses while keeping a pretty good track record for them to be adopted en masse and still manage to keep up with AMD as they move onto 3D stacking and fun CPU x GPU stuff. And that's not stuff i expect to see from ARM for a few more years. at least not before intel/AMD

Also there are 17 Zen 2 based desktop models. And there are 25 Zen 2 based Server models. All 42 of these models are highly binned so that AMD can maximize the profit of a single wafer. AMD looses very little money here. This is why chiplets are so beneficial. This is not an x86 vs ARM thing. it's a technology thing and currently AMD is the one to have such a advantage here over everyone else including Intel



I would certainly trust ian cutress and wikichip numbers / estimations before these numbers. There is the possibility that they have seen it / know someone that has or has gotten decent information and i'd rather take that chance first on their number first.

Just to point out the pictures I provided are for the 80 core Ampere chip, not Graviton2. Graviton2 should be a decent bit smaller than that but still a good bit bigger than 350 mm2.
 

Hitman928

Diamond Member
Apr 15, 2012
5,321
8,005
136
@Hitman928

I just want to point out (as I have in other threads) that you - and others - should be careful about citing the geometric mean from Phoronix's test suite. In the case of server workloads, their selection of benchmarks is a little closer to the "norm" you would find on other sites, but it still bears repeating that one should carefully consider the implication of each score from the test suite.

Graviton2 was found to be horrendous in their kernel compile bench, but superior in other areas (GROMACs, MariaDB, maybe a few others). It's really lopsided. And anyone that wants to run PostgreSQL on Graviton2 must be completely out of their mind.

Yes, that's why I put the halo (or donut, or whatever you want to call it) graph as well so you can see the relative performance in each individual test.
 
  • Like
Reactions: Tlh97 and lobz

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
When SMT OFF Zen2 core looses 25% of performance (sometimes 40%, depends on code). This would significantly lowered Rome performance. Go ahead and use it this way. Even better for ARMs :D
That's either a big fat lie, or just a childish brag based on an uneducated wild guess.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Nice graph. But you have to take into account this:

  • TDP: ............. Rome 225W ............................... Graviton2 95-105W estimated ............ less than half
  • Clocks:...........Rome boost 3.2 GHz ................ Graviton2 2.5 GHz fixed clocks .......... G2 has 78% clock
  • IPC/PPC: ...... Rome 286 pts/Ghz in GB5........ Graviton2 253 pts/Ghz ........................ G2 is about 88% IPC of Zen2
  • Area .............. Rome 1005 mm2 ..................... Graviton2 est 350 mm2 ....................... G2 is 1/3 the size
  • Price ............. Rome 7500 USD ........................ Gravion2 estimated 500 USD .............. way cheaper


128-core Altra at 3.0 GHz will beat Rome easily and most probably Zen3 Milan too.
256-core A78@ 3.0 GHz on 5nm TSMC process.... that's gonna be a fun :D


1594067797615.png
 

DrMrLordX

Lifer
Apr 27, 2000
21,640
10,858
136
Yes, that's why I put the halo (or donut, or whatever you want to call it) graph as well so you can see the relative performance in each individual test.

You'll notice that one of the usual suspects has no interest in that particular graph.

@Richie Rich

You have no idea what Graviton2's actual power draw is. Some estimates put it as low as 80W, while others put it at 110W or more. Amazon isn't telling anyone, either.
 
  • Like
Reactions: Tlh97

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,570
14,520
136
Nice graph. But you have to take into account this:

  • TDP: ............. Rome 225W ............................... Graviton2 95-105W estimated ............ less than half
  • Clocks:...........Rome boost 3.2 GHz ................ Graviton2 2.5 GHz fixed clocks .......... G2 has 78% clock
  • IPC/PPC: ...... Rome 286 pts/Ghz in GB5........ Graviton2 253 pts/Ghz ........................ G2 is about 88% IPC of Zen2
  • Area .............. Rome 1005 mm2 ..................... Graviton2 est 350 mm2 ....................... G2 is 1/3 the size
  • Price ............. Rome 7500 USD ........................ Gravion2 estimated 500 USD .............. way cheaper


128-core Altra at 3.0 GHz will beat Rome easily and most probably Zen3 Milan too.
256-core A78@ 3.0 GHz on 5nm TSMC process.... that's gonna be a fun :D


View attachment 25065
You have no real idea of TDP of Gravitons2.
Under load the 7742 is stuck at 2.25, less than Graviton2.
You have already been proven wrong on the area/size.
You have already been proven wrong on price.

And while you admit the IPC is less, I even doubt that metric.
And the 128 and 256 core chips. Do you have any benchmarks at all to support this ? or that those chips even exist ?

Why do you keep posting this BS ? If you don't know what you are talking about, you should stop posting.
 
Last edited:
  • Like
Reactions: Tlh97 and Gideon