Info TOP 20 of the World's Most Powerful CPU Cores - IPC/PPC comparison

Page 10 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Added cores:
  • A53 - little core used in some low-end smartphones in 8-core config (Snapdragon 450)
  • A55 - used as little core in every modern Android SoC
  • A72 - "high" end Cortex core used in Snapdragon 625 or Raspberry Pi 4
  • A73 - "high" end Cortex core
  • A75 - "high" end Cortex core
  • Bulldozer - infamous AMD core
Geekbench 5.1 PPC chart 6/23/2020:

Pos
Man
CPU
Core
Year
ISA
GB5 Score
GHz
PPC (score/GHz)
Relative to 9900K
Relative to Zen3
1​
Nuvia​
(Est.)​
Phoenix (Est.)​
2021​
ARMv9.0​
2001​
3.00​
667.00​
241.0%​
194.1%​
2​
Apple​
A15 (est.)​
(Est.)​
2021​
ARMv9.0​
1925​
3.00​
641.70​
231.8%​
186.8%​
3​
Apple​
A14 (est.)​
Firestorm​
2020​
ARMv8.6​
1562​
2.80​
558.00​
201.6%​
162.4%​
4​
Apple​
A13​
Lightning​
2019​
ARMv8.4​
1332​
2.65​
502.64​
181.6%​
146.3%​
5​
Apple​
A12​
Vortex​
2018​
ARMv8.3​
1116​
2.53​
441.11​
159.4%​
128.4%​
6​
ARM Cortex​
V1 (est.)​
Zeus​
2020​
ARMv8.6​
1287​
3.00​
428.87​
154.9%​
124.8%​
7​
ARM Cortex​
N2 (est.)​
Perseus​
2021​
ARMv9.0​
1201​
3.00​
400.28​
144.6%​
116.5%​
8​
Apple​
A11​
Monsoon​
2017​
ARMv8.2​
933​
2.39​
390.38​
141.0%​
113.6%​
9​
Intel​
(Est.)​
Golden Cove (Est.)​
2021​
x86-64​
1780​
4.60​
386.98​
139.8%​
112.6%​
10​
ARM Cortex​
X1​
Hera​
2020​
ARMv8.2​
1115​
3.00​
371.69​
134.3%​
108.2%​
11
AMD
5900X (Est.)
Zen 3 (Est.)
2020
x86-64
1683
4.90
343.57
124.1%
100.0%
12​
Apple​
A10​
Hurricane​
2016​
ARMv8.1​
770​
2.34​
329.06​
118.9%​
95.8%​
13​
Intel​
1065G7​
Icelake​
2019​
x86-64​
1252​
3.90​
321.03​
116.0%​
93.4%​
14​
ARM Cortex​
A78​
Hercules​
2020​
ARMv8.2​
918​
3.00​
305.93​
110.5%​
89.0%​
15​
Apple​
A9​
Twister​
2015​
ARMv8.0​
564​
1.85​
304.86​
110.1%​
88.7%​
16
AMD
3950X
Zen 2
2019
x86-64
1317
4.60
286.30
103.4%
83.3%
17​
ARM Cortex​
A77​
Deimos​
2019​
ARMv8.2​
812​
2.84​
285.92​
103.3%​
83.2%​
18​
Intel​
9900K​
Coffee LakeR​
2018​
x86-64​
1384​
5.00​
276.80​
100.0%​
80.6%​
19​
Intel​
10900K​
Comet Lake​
2020​
x86-64​
1465​
5.30​
276.42​
99.9%​
80.5%​
20​
Intel​
6700K​
Skylake​
2015​
x86-64​
1032​
4.00​
258.00​
93.2%​
75.1%​
21​
ARM Cortex​
A76​
Enyo​
2018​
ARMv8.2​
720​
2.84​
253.52​
91.6%​
73.8%​
22​
Intel​
4770K​
Haswell​
2013​
x86-64​
966​
3.90​
247.69​
89.5%​
72.1%​
23​
AMD​
1800X​
Zen 1​
2017​
x86-64​
935​
3.90​
239.74​
86.6%​
69.8%​
24​
Apple​
A13​
Thunder​
2019​
ARMv8.4​
400​
1.73​
231.25​
83.5%​
67.3%​
25​
Apple​
A8​
Typhoon​
2014​
ARMv8.0​
323​
1.40​
230.71​
83.4%​
67.2%​
26​
Intel​
3770K​
Ivy Bridge​
2012​
x86-64​
764​
3.50​
218.29​
78.9%​
63.5%​
27​
Apple​
A7​
Cyclone​
2013​
ARMv8.0​
270​
1.30​
207.69​
75.0%​
60.5%​
28​
Intel​
2700K​
Sandy Bridge​
2011​
x86-64​
723​
3.50​
206.57​
74.6%​
60.1%​
29​
ARM Cortex​
A75​
Prometheus​
2017​
ARMv8.2​
505​
2.80​
180.36​
65.2%​
52.5%​
30​
ARM Cortex​
A73​
Artemis​
2016​
ARMv8.0​
380​
2.45​
155.10​
56.0%​
45.1%​
31​
ARM Cortex​
A72​
Maya​
2015​
ARMv8.0​
259​
1.80​
143.89​
52.0%​
41.9%​
32​
Intel​
E6600​
Core2​
2006​
x86-64​
338​
2.40​
140.83​
50.9%​
41.0%​
33​
AMD​
FX-8350​
BD​
2011​
x86-64​
566​
4.20​
134.76​
48.7%​
39.2%​
34​
AMD​
Phenom 965 BE​
K10.5​
2006​
x86-64​
496​
3.70​
134.05​
48.4%​
39.0%​
35​
ARM Cortex​
A57 (est.)​
Atlas​
0​
ARMv8.0​
222​
1.80​
123.33​
44.6%​
35.9%​
36​
ARM Cortex​
A15 (est.)​
Eagle​
0​
ARMv7 32-bit​
188​
1.80​
104.65​
37.8%​
30.5%​
37​
AMD​
Athlon 64 X2 3800+​
K8​
2005​
x86-64​
207​
2.00​
103.50​
37.4%​
30.1%​
38​
ARM Cortex​
A17 (est.)​
0​
ARMv7 32-bit​
182​
1.80​
100.91​
36.5%​
29.4%​
39​
ARM Cortex​
A55​
Ananke​
2017​
ARMv8.2​
155​
1.60​
96.88​
35.0%​
28.2%​
40​
ARM Cortex​
A53​
Apollo​
2012​
ARMv8.0​
148​
1.80​
82.22​
29.7%​
23.9%​
41​
Intel​
Pentium D​
P4​
2005​
x86-64​
228​
3.40​
67.06​
24.2%​
19.5%​
42​
ARM Cortex​
A7 (est.)​
Kingfisher​
0​
ARMv7 32-bit​
101​
1.80​
56.06​
20.3%​
16.3%​

GB5-PPC-evolution.png

GB5-STperf-evolution.png

TOP10PPC_CPU_frequency_evolution_graph.png



TOP 10 - Performance Per Area comparison at ISO-clock (PPA/GHz)

Copied from locked thread. They try to avoid people to see this comparison how x86 is so bad.[/B]

Pos
Man
CPU
Core
Core Area mm2
Year
ISA
SPEC PPA/Ghz
Relative
1​
ARM Cortex​
A78​
Hercules​
1.33​
2020​
ARMv8​
9.41​
100.0%​
2​
ARM Cortex​
A77​
Deimos​
1.40​
2019​
ARMv8​
8.36​
88.8%​
3​
ARM Cortex​
A76​
Enyo​
1.20​
2018​
ARMv8​
7.82​
83.1%​
4​
ARM Cortex​
X1​
Hera​
2.11​
2020​
ARMv8​
7.24​
76.9%​
5​
Apple​
A12​
Vortex​
4.03​
2018​
ARMv8​
4.44​
47.2%​
6​
Apple​
A13​
Lightning​
4.53​
2019​
ARMv8​
4.40​
46.7%​
7​
AMD​
3950X​
Zen 2​
3.60​
2019​
x86-64​
3.02​
32.1%​



It's impressive how fast are evolving the generic Cortex cores:
  • A72 (2015) which can be found in most SBC has 1/3 of IPC of new Cortex X1 - They trippled IPC in just 5 years.
  • A73 and A75 (2017) which is inside majority of Android smart phones today has 1/2 IPC of new Cortex X1 - They doubled IPC in 3 years.

Comparison how x86 vs. Cortex cores:
  • A75 (2017) compared to Zen1 (2017) is loosing massive -34% PPC to x86. As expected.
  • A77 (2019) compared to Zen2 (2018) closed the gap and is equal in PPC. Surprising. Cortex cores caught x86 cores.
  • X1 (2020) is another +30% IPC over A77. Zen3 need to bring 30% IPC jump to stay on par with X1.

Comparison to Apple cores:
  • AMD's Zen2 core is slower than Apple's A9 from 2015.... so AMD is 4 years behind Apple
  • Intel's Sunny Cove core in Ice Lake is slower than Apple's A10 from 2016... so Intel is 3 years behind Apple
  • Cortex A77 core is slower than Apple's A9 from 2015.... but
  • New Cortex X1 core is slower than Apple's A11 from 2017 so ARM LLC is 3 years behind Apple and getting closer



GeekBench5.1 comparison from 6/22/2020:
  • added Cortex X1 and A78 performance projections from Andrei here
  • 2020 awaiting new Apple A14 Firestorm core and Zen3 core
Updated:



EDIT:
Please note to stop endless discussion about PPC frequency scaling: To have fair and clean comparison I will use only the top (high clocked) version from each core as representation for top performance.
 
Last edited:
  • Like
Reactions: chechito

Vixis Rei

Junior Member
Jul 4, 2020
13
16
41
Given that Richie Rich started this thread, and started it with an explicit "Here's how ARM is doing well" agenda, it seems a bit strange to complain about people "hijacking it" and inserting ARM into the discussion...
If that's happening in an essentially x86 thread (I mostly don't read those) complain in that thread. But complaining "ARM thread is being used to talk about ARM. OMG!!!" makes very little sense.


The name of the topic is "TOP 20 of the World's Most Powerful CPU Cores - IPC/PPC comparison". So it's a top 20 list, listing the worlds most powerful CPU cores, but it's also a IPC/PPC comparison between these said CPU cores. When i look at the list on the first post i don't see any of these worlds most powerful CPU cores listed.... Maybe 1 of which people would consider pretty powerful but none that i would consider most powerful. It's also only a list using geekbench5.1. There is some CPU's listed on that list from 2012 and going as far back as 2005 so it's not a list of recent CPU's at all. So i am rather confused, there is nothing that makes sense about any of this. Not to mention that there is an Edit at the bottom of the post clarifying that because people found easy to see massive exploits/ flaws in the chart regarding frequency that none of that data will be used. I don't get any kind of "Here's how ARM is doing well" vibe from any part of this thread, and it's definitely not in the topic title where it should be. It looks alot like intel marketing.

I'm not sure if you read any of the other threads when he posted basically the same information on 5 different threads, 3 arm related and 2 x86 related threads, and when 1 arm thread died it leaked onto 3 x86 threads and 2 arm threads. I like reading about any kind of technology but if we're discussing a arm CPU and it's inner workings / specs i'm not interesting in hearing about how x86 is suddenly doomed and vice versa with countless meaningless charts of inaccurate/irrelevant information. I just want meaningful discussions. it's why i'm here.
 

name99

Senior member
Sep 11, 2010
404
303
136
Not as much when Summit is 2 years old now and built around 14/16 nm technology compared to a just deployed system built entirely on 7 nm. When the x86/GPGPU systems come out that are built on 7 nm starting later this year, the Fujitsu system is not going to look so hot, at least from a peak performance and perf/w perspective.


Yes this is true, but did you look at the #2 in the list? That's an almost 40% increase in peak perf/w over the full Fujitsu system when the x86/GPGPU solution moves to 7 nm. There's more of that coming later this year and next.

Well there will always be mismatch in the dates and processes being compared, especially for these very large systems.
You seem to think the argument is "look, GPUs are still best for supercomputers". That's not the argument; the argument is "look, SVE does extremely well for a general purpose solution" -- and soon it will available to a range of ARM cores.
The point is not "SVE does slightly better than GPU", it's that (for whatever reason) SVE seems to do rather better than AVX-512.
 

name99

Senior member
Sep 11, 2010
404
303
136
The name of the topic is "TOP 20 of the World's Most Powerful CPU Cores - IPC/PPC comparison". So it's a top 20 list, listing the worlds most powerful CPU cores, but it's also a IPC/PPC comparison between these said CPU cores. When i look at the list on the first post i don't see any of these worlds most powerful CPU cores listed.... Maybe 1 of which people would consider pretty powerful but none that i would consider most powerful. It's also only a list using geekbench5.1. There is some CPU's listed on that list from 2012 and going as far back as 2005 so it's not a list of recent CPU's at all. So i am rather confused, there is nothing that makes sense about any of this. Not to mention that there is an Edit at the bottom of the post clarifying that because people found easy to see massive exploits/ flaws in the chart regarding frequency that none of that data will be used. I don't get any kind of "Here's how ARM is doing well" vibe from any part of this thread, and it's definitely not in the topic title where it should be. It looks alot like intel marketing.

I'm not sure if you read any of the other threads when he posted basically the same information on 5 different threads, 3 arm related and 2 x86 related threads, and when 1 arm thread died it leaked onto 3 x86 threads and 2 arm threads. I like reading about any kind of technology but if we're discussing a arm CPU and it's inner workings / specs i'm not interesting in hearing about how x86 is suddenly doomed and vice versa with countless meaningless charts of inaccurate/irrelevant information. I just want meaningful discussions. it's why i'm here.

OK, so let's recap.
You complained that Richie Rich was dumping data in multiple threads.
So he collects all the data and dumps it in one thread.
And you're STILL complaining. Now about the name of the thread. Now that he's using the thread for precisely what he was told to use it for.

Seems an awful lot like the issue is not how much he is posting, but that you don't want to see what he is posting; and you don't want anyone else to see it either, you just don't want those facts in the public record.

This thread is what it is. You can perfectly well avoid it, like I avoid the ten thousand x86 threads that I'm not interested in. If he's spamming those, complain in those threads.
But don't complain that he (or anyone else) is doing exactly what you told him to do -- extolling various evidence about the performance of ARM in a thread created for precisely that purpose.
 

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
Well there will always be mismatch in the dates and processes being compared, especially for these very large systems.
You seem to think the argument is "look, GPUs are still best for supercomputers". That's not the argument; the argument is "look, SVE does extremely well for a general purpose solution" -- and soon it will available to a range of ARM cores.
The point is not "SVE does slightly better than GPU", it's that (for whatever reason) SVE seems to do rather better than AVX-512.

It seems pretty disingenuous to not take date and process into consideration. I mean, if we just throw that out the window can we look back in two years at the A64X system and say how crappy it was because when we compare it against the brand new 5 nm systems it has horrible performance and perf/w? What's the point of that?

AVX-512 has been a mess from the start. If your argument is simply SVE is better than AVX, then ok I guess. I just don't see that there's much to discuss here. A huge problem that AVX-512 and wide SVE solutions have/will have is that when you start to support these super wide instructions, you cross over into what GPUs do really well and so the benefit of supporting such on a CPU becomes questionable. Sure, custom designing a CPU like the A64FX for HPC and including these wide instructions will help, but even then does it get you close enough to what you can achieve with a GPU to make people change course? We're talking supercomputers here, not home users. That's the question of future compute trends and so far from what I can tell, the answer is by and large, GPU is still the way to go.

Equating A64X HPC performance to general purpose ARM cores because they will adopt SVE in the future is quite a stretch though. Just because they share an instruction set doesn't mean that they aren't very different designs with very different purposes in those designs. I do agree that SVE seems to be a better solution overall than AVX, but there's a whole lot of steps from solution to implementation to market dominance. Edit: I'll gladly revisit this SVE/AVX discussion in a year or two when we have real products to talk about in this regard.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
You don't think it's of some significance that a more traditional and easier programming model manages to match (and slightly exceed) the rather more difficult and specialized GPGPU model?

In the HPC world, CUDA is pretty well-entrenched. Now I will admit that using a CPU-only approach via SVE is very attractive in some respects. Don't laugh, but the OpenJDK folks have been working very hard at making the JVM autovectorize extremely well, and at least on x86 CPUs and on my Snapdragon 855+, it seems to work. It works a lot better than OpenMP (in my opinion - feel free to correct me if I'm wrong). If you want to use a JVM language like Java, Scala, Kotlin, or something else to bang out some HPC code and if you take advantage of heuristics (not hard to figure out), you can get some blazing-fast SIMD without having to know asm or intrinsics or . . . really anything. Which is both good and bad since there are some things that are lost when relying on the JVM to handle it for you.

The mainstream

I'm sorry, who?

- attack the individuals, regardless of what they say.
- complain that they are posting too much

It's really just one poster. And no, that isn't you either.

It's also only a list using geekbench5.1.

And SPEC2006!!!! Yay!
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
Perhaps (hard to understand, I know) he's more interested in reality than in tribalism?
He advocated AMD's technology as the likely shape of the future back, now he's advocating ARM's technology as the likely shape of the future. And he's correct in both cases!

AMD's HW side (chiplets, advanced packaging) probably does define the shape of that dimension of computing going forward, certainly for larger SoCs.

ARM's ISA, for whatever reason (IMHO it's not this tiresome idiotic CISC vs RISC nonsense, it's more about memory ordering, precise semantics of inter-core interaction, ease of decode, ease of device validation/debug, and a willingness every so often to restructure the ISA to drop what doesn't work well so that it stops weighing you down) likewise probably will define a different aspect of computing going forward; obviously mobile and IoT already, but soon a reasonable fraction of desktop and server computing. (Already large by mindshare, expected by some of us to also become large and larger by various finance metrics).
You clearly don't know him, nor do you have any idea what and most importantly how he was advocating. Which makes your comment pure nonsense.

I also ask you to please be so kind and stop blindly throwing around the suggestion that people here are being interested in tribalism instead of reality. To make a very simple and easy to understand example: I assure you that 99% of the people here who you'd say are tribal and unrealistic AMD promoters, would say the same thing about the Ryzen 3000 series XT refresh: it's beyond greedy and shamefully pathetic.
 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
OK, so let's recap.
You complained that Richie Rich was dumping data in multiple threads.
So he collects all the data and dumps it in one thread.
And you're STILL complaining. Now about the name of the thread. Now that he's using the thread for precisely what he was told to use it for.

Seems an awful lot like the issue is not how much he is posting, but that you don't want to see what he is posting; and you don't want anyone else to see it either, you just don't want those facts in the public record.

This thread is what it is. You can perfectly well avoid it, like I avoid the ten thousand x86 threads that I'm not interested in. If he's spamming those, complain in those threads.
But don't complain that he (or anyone else) is doing exactly what you told him to do -- extolling various evidence about the performance of ARM in a thread created for precisely that purpose.

Again, what you are missing is that he didn't do that, the mods forced it by merging the posts from multiple threads into a single thread which is why you are seeing people complaining about multiple threads, many of those posts were made before the mods merged the threads. He was already asked multiple times to stop posting about ARM in unrelated threads but he continued to do so which is why people are posting in this thread about it because they don't want to further derail the unrelated threads.

As far as "facts" go, he constantly ignores all facts or criticism that doesn't line up with the narrative he is trying to tell. I'm fine with a differing of opinion and back and forth discussions, but when the same poster literally ignores post after post of valid criticism and literally refuses to include valid data that doesn't fit into his preconceived notions in his tables and calculations, people are going to get sick of him continuing to post the same things over and over. Many posters have shown that he is working from false premises and bad data in many instances but he just pretends like they never posted these things at all and will continue to reiterate the same bad points over and over again. He is quickly wearing out his welcome here from the forum community because of it.
 

Vixis Rei

Junior Member
Jul 4, 2020
13
16
41
OK, so let's recap.
You complained that Richie Rich was dumping data in multiple threads.
So he collects all the data and dumps it in one thread.
And you're STILL complaining. Now about the name of the thread. Now that he's using the thread for precisely what he was told to use it for.

Seems an awful lot like the issue is not how much he is posting, but that you don't want to see what he is posting; and you don't want anyone else to see it either, you just don't want those facts in the public record.

This thread is what it is. You can perfectly well avoid it, like I avoid the ten thousand x86 threads that I'm not interested in. If he's spamming those, complain in those threads.
But don't complain that he (or anyone else) is doing exactly what you told him to do -- extolling various evidence about the performance of ARM in a thread created for precisely that purpose.

i don't understand why you say i'm complaining. i explained the events and how they happened. my strongest personal opinion in that paragraph was "i'm not interested"

But more like:
he dumped the same data in multiple threads. Since you say you don't read most x86 threads this should be relevant information to this point.

he was told to stop posting the same data in multiple threads and post it in only one. and when he made another thread just recently that other thread was deleted / merged because it was already too similar to this thread.

i'm not complaining about the title of this thread, i'm stating that it's inaccurate in relation to the full title as well as to any of the contents of the first post and the thread as a whole and all that data presented the way it is isn't helpful in any meaningful way. and the stated edit prevent any interesting discussion further examining anything from anyone else. This puts a negative light on arm from anyone's prospective.

he can post as much as he likes, i read every post that any poster posts no matter what.

If he posts the same info on multiple threads even when it's relevant on one thread and irrevelant on 4 others, i'm going to notice. how can i not? I read the vast majority of threads on this forum. Just because you seem to avoid a large portion of x86 threads does not mean i'd do the same. I'm a technology fan, i care not weather it's arm, x86, sparc, itanium, power, risc-v. I just don't like being misled with inaccurate information or agendas.

he's doing very little of what anyone has told him. And as you can see i'm relatively new here. Though i have been here since 2017 reading and reading and learning alot thanks to alot of the kind posters on this forum. if something doesn't make sense or is inaccurate i'd say something so that some other new person that comes here can get the information/discussions/accuracy they need too like i did. This is a Enthusiast technology forum where we share discuss and learn about the more complex sides of things. One would become more knowledgeable if one is able to do all 3 of those things. but if you don't read what others have to say and don't learn or take any valid critism then there is no point. Especially if you think you're always right.
 
Last edited:
Apr 30, 2015
131
10
81
On initial and ongoing costs, and cloud costs, see:

ARM also explicitly refer to up to 128 N1 cores in a SoC:
 

name99

Senior member
Sep 11, 2010
404
303
136
It seems pretty disingenuous to not take date and process into consideration. I mean, if we just throw that out the window can we look back in two years at the A64X system and say how crappy it was because when we compare it against the brand new 5 nm systems it has horrible performance and perf/w? What's the point of that?

AVX-512 has been a mess from the start. If your argument is simply SVE is better than AVX, then ok I guess. I just don't see that there's much to discuss here. A huge problem that AVX-512 and wide SVE solutions have/will have is that when you start to support these super wide instructions, you cross over into what GPUs do really well and so the benefit of supporting such on a CPU becomes questionable. Sure, custom designing a CPU like the A64FX for HPC and including these wide instructions will help, but even then does it get you close enough to what you can achieve with a GPU to make people change course? We're talking supercomputers here, not home users. That's the question of future compute trends and so far from what I can tell, the answer is by and large, GPU is still the way to go.

Equating A64X HPC performance to general purpose ARM cores because they will adopt SVE in the future is quite a stretch though. Just because they share an instruction set doesn't mean that they aren't very different designs with very different purposes in those designs. I do agree that SVE seems to be a better solution overall than AVX, but there's a whole lot of steps from solution to implementation to market dominance. Edit: I'll gladly revisit this SVE/AVX discussion in a year or two when we have real products to talk about in this regard.

"you cross over into what GPUs do really well and so the benefit of supporting such on a CPU becomes questionable"

Not exactly. The issue is what sort of regularity you find in your code.
A GPU is optimized for situations where there is instruction regularity (same code acting on many many items) but very little expected regularity in data access patterns, so everything is optimized for preserving huge amounts of temporary state while DRAM is being accessed and letting other computations happen during that wait.
A CPU (even one with wide vectors) is still much more a latency engine, able to do much better on smaller data volumes, with much less of a requirement that the long delays of a GPU be amortized over huge amounts of data.

"That's the question of future compute trends and so far from what I can tell, the answer is by and large, GPU is still the way to go"
Well that's precisely the issue! No, this isn't about super computers, it's about everyday computers. Some of us like Richie Rich or juanrga or myself think it is indeed a big deal that this combination of very performant ISA and wide vectors will soon be available to phones and desktops (perhaps Apple even this year? perhaps generic ARM next year?)

You are falling prey to what I call monomania, a never-ending temptation in computing, even though the precise form keeps changing -- you seem to imagine that the end state of the world must be a *single* solution. Once that was we'd all use Windows. Then it was we'd all use x86. Then it was we'd all use phones. Now your version is we'll all use GPUs for numeric computation. But monomania is an aesthetic claim, it's not a technical claim!

A GPU is great for certain tasks, if those tasks have a certain type of regular structure, and are large enough. But if they are not large enough, the latency to get from CPU to GPU and back kills you. On-CPU computation is great if you have less structure, or smaller units of work. Why not provide both?

Sure A64fx is not designed for phones. This claim, that something supposedly designed for one space will work poorly in another, has a long (and not very glorious) history when it comes to ARM. We first heard, of course, that ARM will never be able to scale up to what was necessary for the desktop and to match Intel performance. Then we heard that ARM could never seriously scale up to match data warehouse chips. Now you're trying the reverse argument. Rather than the argument from "clearly these are different", how about you argue based on technology? In what ways is A64fx different from what Apple or ARM will do, and how is that relevant?

OK, Fujitsu uses 2x512-bit pipes. This seems way larger than Apple and ARM will aim for. I'd expect Apple to go with 2x256-bit pipes, ARM maybe even just one 2x256-bit pipe? So sure, half the peak throughput. No surprise there. BUT STILL
- that gives Apple 512 vs their current 384 wide total FPU capability. So 33% boost there. Nice. Not SVE-dependent (they'll still decompose that down to 4x128-wide NEON pipes) but nevertheless a 33% boost over what they have today.

IN addition they get the various ISA+lower power boosts that others have seen from SVE -- less control overhead, no clean up loops, masking, scatter gather. This can be as high as 3x, though more realistically it's more like 20 to 30% on most code. (Though, on the other hand, it will also probably gain quite a few percentage points over the next few years as people and compilers figure out new ways to use what is a fairly new and not yet well-explored type of instruction set. Beyond that what Apple and ARM will implement is likely SVE2, filling in some holes in SVE and allowing it to be used in more places.

So basically we have
- ARM and Apple already have performant FPU/NEON.
- they both care about improving these (even if you don't understand why). Apple moved from 2 to three NEON units (at the A10 I think), and has reduced the latency of the FPU ops at least once, maybe twice. ARM has likewise gone through at least one round of reducing their FPU op latency, and have kept adding FPU/ASIMD functionality (FP16, dot product, better load-store performance)
- so why couldn't they just add the SVE/SVE2 ISA functionality to their cores and get an ISA boost that's usually around 20% or so, but sometimes quite a bit higher?

The exciting thing about SVE is not that at some point someone will be using 2048-bit vectors in their supercomputer; it's that even phone-level cores can get a substantial boost in performance for certain types of dense computational code, without requiring any more FMAC units, just by having a richer more flexible instruction set.

I'm not interested in "market dominance", and even if I were, I wouldn't be interested in such a conversation unless the term is defined. Because I know how this game is played. Party A will say "there are a billion phones with SVE, and 20 million computers using AVX-512". Party B will say "but there is zero use of SVE in servers". Party A will say "you never said servers". Party B will start talking about dollar volumes saying that matters more than chip counts. And so it goes...
 

name99

Senior member
Sep 11, 2010
404
303
136
Again, what you are missing is that he didn't do that, the mods forced it by merging the posts from multiple threads into a single thread which is why you are seeing people complaining about multiple threads, many of those posts were made before the mods merged the threads. He was already asked multiple times to stop posting about ARM in unrelated threads but he continued to do so which is why people are posting in this thread about it because they don't want to further derail the unrelated threads.

As far as "facts" go, he constantly ignores all facts or criticism that doesn't line up with the narrative he is trying to tell. I'm fine with a differing of opinion and back and forth discussions, but when the same poster literally ignores post after post of valid criticism and literally refuses to include valid data that doesn't fit into his preconceived notions in his tables and calculations, people are going to get sick of him continuing to post the same things over and over. Many posters have shown that he is working from false premises and bad data in many instances but he just pretends like they never posted these things at all and will continue to reiterate the same bad points over and over again. He is quickly wearing out his welcome here from the forum community because of it.

I've read a bunch of Richie Rich's stuff, and the criticisms, and I would not, on average, call them "valid criticisms". They tend to be of the form "we don't like the numbers you've produced, so go find other [impossible or irrelevant] numbers".

For example
- complaints that GeekBench doesn't represent something or other about CPUs. Show us SPEC so we can see memory performance.
- well SPEC doesn't represent large I-footprint and lots of dynamic linking. Show us browsers.
- well browsers don't represent large system performance. Show us data warehouse numbers.
- well you can't generalize from Graviton to Apple. And anyway I don't care about any of that, how fast does it play my favorite game.

He has been told repeatedly that IPC doesn't scale across a reasonably frequency range (say 2 to 5GHz) and responded by showing that, for a wide range of use cases it pretty much does.

I have not seen ANY technically informed and intelligent criticisms of the work he has done, just a whole lot of "I don't want to believe it, therefore [silly objection]".
You can disagree with him (and me) on certain "beliefs" that are difficult to prove, things like
it makes more sense to concentrate on brainiac design than on frequency. But that's not what I am seeing, what I am seeing is just an unwillingness to accept even the basic starting points of his arguments.
If he says "Apple has IPC twice Intel, and therefore can match Intel at half the frequency" that's essentially a statement of fact. When someone responds to that with random nonsense about how Apple is "cheating" by using large caches, or will never be able to get to 5GHz like Intel, or how IPC is a meaningless metric, and by the way, different ISA's use different instruction counts; those are not valid criticisms, those are petulant whinings by people who clearly don't understand technology. Debating society attacks, arguing about the supposed exact meanings of words or insisting that some 5% difference is significant -- those are indications that the person doing the argument has the mind of a lawyer not the mind of a scientist engineer.
And fine, attack with a lawyer mind on legalistic grounds.
But don't expect the scientists and engineers to be impressed by legalistic arguments...
 

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
I've read a bunch of Richie Rich's stuff, and the criticisms, and I would not, on average, call them "valid criticisms". They tend to be of the form "we don't like the numbers you've produced, so go find other [impossible or irrelevant] numbers".

For example
- complaints that GeekBench doesn't represent something or other about CPUs. Show us SPEC so we can see memory performance.
- well SPEC doesn't represent large I-footprint and lots of dynamic linking. Show us browsers.
- well browsers don't represent large system performance. Show us data warehouse numbers.
- well you can't generalize from Graviton to Apple. And anyway I don't care about any of that, how fast does it play my favorite game.

He has been told repeatedly that IPC doesn't scale across a reasonably frequency range (say 2 to 5GHz) and responded by showing that, for a wide range of use cases it pretty much does.

I have not seen ANY technically informed and intelligent criticisms of the work he has done, just a whole lot of "I don't want to believe it, therefore [silly objection]".
You can disagree with him (and me) on certain "beliefs" that are difficult to prove, things like
it makes more sense to concentrate on brainiac design than on frequency. But that's not what I am seeing, what I am seeing is just an unwillingness to accept even the basic starting points of his arguments.
If he says "Apple has IPC twice Intel, and therefore can match Intel at half the frequency" that's essentially a statement of fact. When someone responds to that with random nonsense about how Apple is "cheating" by using large caches, or will never be able to get to 5GHz like Intel, or how IPC is a meaningless metric, and by the way, different ISA's use different instruction counts; those are not valid criticisms, those are petulant whinings by people who clearly don't understand technology. Debating society attacks, arguing about the supposed exact meanings of words or insisting that some 5% difference is significant -- those are indications that the person doing the argument has the mind of a lawyer not the mind of a scientist engineer.
And fine, attack with a lawyer mind on legalistic grounds.
But don't expect the scientists and engineers to be impressed by legalistic arguments...

How about him ignoring dozens of valid Geekbench scores from x86 CPUs that have significantly higher "IPC" than what is in his chart and do so simply because he doesn't find them valid for some unstated reason? How about him "calculating" the cost of Graviton2 and Ampere based upon his estimated size of Graviton2 and Ampere from a mix of core sizes and made up numbers and refuses to change it even when presented with info from Altra that he is grossly underestimating the die sizes? How about ignoring all benchmarks for any ARM based server CPU that is not Spec or GB? How about him trying to prove Blender results show the same relative performance of his Rpi4 against Ryzen as did GB to only be proven wrong and then later lie about it?

Is this what you are defending and anyone who brings up these points is just whining?
 

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
"you cross over into what GPUs do really well and so the benefit of supporting such on a CPU becomes questionable"

Not exactly. The issue is what sort of regularity you find in your code.
A GPU is optimized for situations where there is instruction regularity (same code acting on many many items) but very little expected regularity in data access patterns, so everything is optimized for preserving huge amounts of temporary state while DRAM is being accessed and letting other computations happen during that wait.
A CPU (even one with wide vectors) is still much more a latency engine, able to do much better on smaller data volumes, with much less of a requirement that the long delays of a GPU be amortized over huge amounts of data.

"That's the question of future compute trends and so far from what I can tell, the answer is by and large, GPU is still the way to go"
Well that's precisely the issue! No, this isn't about super computers, it's about everyday computers. Some of us like Richie Rich or juanrga or myself think it is indeed a big deal that this combination of very performant ISA and wide vectors will soon be available to phones and desktops (perhaps Apple even this year? perhaps generic ARM next year?)

You are falling prey to what I call monomania, a never-ending temptation in computing, even though the precise form keeps changing -- you seem to imagine that the end state of the world must be a *single* solution. Once that was we'd all use Windows. Then it was we'd all use x86. Then it was we'd all use phones. Now your version is we'll all use GPUs for numeric computation. But monomania is an aesthetic claim, it's not a technical claim!

A GPU is great for certain tasks, if those tasks have a certain type of regular structure, and are large enough. But if they are not large enough, the latency to get from CPU to GPU and back kills you. On-CPU computation is great if you have less structure, or smaller units of work. Why not provide both?

Sure A64fx is not designed for phones. This claim, that something supposedly designed for one space will work poorly in another, has a long (and not very glorious) history when it comes to ARM. We first heard, of course, that ARM will never be able to scale up to what was necessary for the desktop and to match Intel performance. Then we heard that ARM could never seriously scale up to match data warehouse chips. Now you're trying the reverse argument. Rather than the argument from "clearly these are different", how about you argue based on technology? In what ways is A64fx different from what Apple or ARM will do, and how is that relevant?

OK, Fujitsu uses 2x512-bit pipes. This seems way larger than Apple and ARM will aim for. I'd expect Apple to go with 2x256-bit pipes, ARM maybe even just one 2x256-bit pipe? So sure, half the peak throughput. No surprise there. BUT STILL
- that gives Apple 512 vs their current 384 wide total FPU capability. So 33% boost there. Nice. Not SVE-dependent (they'll still decompose that down to 4x128-wide NEON pipes) but nevertheless a 33% boost over what they have today.

IN addition they get the various ISA+lower power boosts that others have seen from SVE -- less control overhead, no clean up loops, masking, scatter gather. This can be as high as 3x, though more realistically it's more like 20 to 30% on most code. (Though, on the other hand, it will also probably gain quite a few percentage points over the next few years as people and compilers figure out new ways to use what is a fairly new and not yet well-explored type of instruction set. Beyond that what Apple and ARM will implement is likely SVE2, filling in some holes in SVE and allowing it to be used in more places.

So basically we have
- ARM and Apple already have performant FPU/NEON.
- they both care about improving these (even if you don't understand why). Apple moved from 2 to three NEON units (at the A10 I think), and has reduced the latency of the FPU ops at least once, maybe twice. ARM has likewise gone through at least one round of reducing their FPU op latency, and have kept adding FPU/ASIMD functionality (FP16, dot product, better load-store performance)
- so why couldn't they just add the SVE/SVE2 ISA functionality to their cores and get an ISA boost that's usually around 20% or so, but sometimes quite a bit higher?

The exciting thing about SVE is not that at some point someone will be using 2048-bit vectors in their supercomputer; it's that even phone-level cores can get a substantial boost in performance for certain types of dense computational code, without requiring any more FMAC units, just by having a richer more flexible instruction set.

I'm not interested in "market dominance", and even if I were, I wouldn't be interested in such a conversation unless the term is defined. Because I know how this game is played. Party A will say "there are a billion phones with SVE, and 20 million computers using AVX-512". Party B will say "but there is zero use of SVE in servers". Party A will say "you never said servers". Party B will start talking about dollar volumes saying that matters more than chip counts. And so it goes...

You keep changing the subject. The post you initially replied to was about A64FX and its supercomputer performance. You replied by showing it's perf/w ranking from 8 months ago in the top 500 supercomputer list. That is what I was discussing the whole time but now you say it's not about supercomputers, it's about everyday computers. Why? That's not what I was discussing and I think that's pretty clear in my posts. If you want to talk about SVE in consumer devices, that's fine, let's do that, but let's separate out that discussion from what's happening in supercomputers because its a different market and a different topic.

As far as what makes A64FX different than consumer CPUs, it's pretty simple, the amount of power/transistors spent on HPC type calculations. The opportunity was there for Apple and others to add wide SVE instructions to their 7 nm consumer CPUs but they didn't because the cost wasn't worth the benefit. As they progress to future nodes they may start to go wider, but they'll be doing so at smaller nodes where the cost to benefit analysis will be different. Even then, who knows how wide they will go versus more specialized instructions/hardware. If you have insight on ARM licensees plans for this, I'd love to hear it.
 
  • Like
Reactions: Tlh97 and lobz

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,478
14,434
136
I've read a bunch of Richie Rich's stuff, and the criticisms, and I would not, on average, call them "valid criticisms". They tend to be of the form "we don't like the numbers you've produced, so go find other [impossible or irrelevant] numbers".

For example
- complaints that GeekBench doesn't represent something or other about CPUs. Show us SPEC so we can see memory performance.
- well SPEC doesn't represent large I-footprint and lots of dynamic linking. Show us browsers.
- well browsers don't represent large system performance. Show us data warehouse numbers.
- well you can't generalize from Graviton to Apple. And anyway I don't care about any of that, how fast does it play my favorite game.

He has been told repeatedly that IPC doesn't scale across a reasonably frequency range (say 2 to 5GHz) and responded by showing that, for a wide range of use cases it pretty much does.

I have not seen ANY technically informed and intelligent criticisms of the work he has done, just a whole lot of "I don't want to believe it, therefore [silly objection]".
You can disagree with him (and me) on certain "beliefs" that are difficult to prove, things like
it makes more sense to concentrate on brainiac design than on frequency. But that's not what I am seeing, what I am seeing is just an unwillingness to accept even the basic starting points of his arguments.
If he says "Apple has IPC twice Intel, and therefore can match Intel at half the frequency" that's essentially a statement of fact. When someone responds to that with random nonsense about how Apple is "cheating" by using large caches, or will never be able to get to 5GHz like Intel, or how IPC is a meaningless metric, and by the way, different ISA's use different instruction counts; those are not valid criticisms, those are petulant whinings by people who clearly don't understand technology. Debating society attacks, arguing about the supposed exact meanings of words or insisting that some 5% difference is significant -- those are indications that the person doing the argument has the mind of a lawyer not the mind of a scientist engineer.
And fine, attack with a lawyer mind on legalistic grounds.
But don't expect the scientists and engineers to be impressed by legalistic arguments...
How about the MANY times he has estimated things with no valid basis for his estimations. And for one the price of graviton2 ? Nobody knows, but the post that shows every reason in the world why $500 is an insanely low estimate ? And the size ? and not accounting for monolithic vs chiplets in the size estimate ? I could go on. He keeps ignoring anything that does not fit his agenda. My opinions have nothing to do with x86 vs ARM, but everything to do with logic and reasonability.

A reasonable discussion of ARM would be welcome.
 
  • Like
Reactions: Tlh97 and Vixis Rei

Gideon

Golden Member
Nov 27, 2007
1,608
3,570
136
I have not seen ANY technically informed and intelligent criticisms of the work he has done, just a whole lot of "I don't want to believe it, therefore [silly objection]".
You can disagree with him (and me) on certain "beliefs" that are difficult to prove, things like
it makes more sense to concentrate on brainiac design than on frequency. But that's not what I am seeing, what I am seeing is just an unwillingness to accept even the basic starting points of his arguments.
If he says "Apple has IPC twice Intel, and therefore can match Intel at half the frequency" that's essentially a statement of fact. When someone responds to that with random nonsense about how Apple is "cheating" by using large caches, or will never be able to get to 5GHz like Intel, or how IPC is a meaningless metric, and by the way, different ISA's use different instruction counts; those are not valid criticisms, those are petulant whinings by people who clearly don't understand technology. Debating society attacks, arguing about the supposed exact meanings of words or insisting that some 5% difference is significant -- those are indications that the person doing the argument has the mind of a lawyer not the mind of a scientist engineer.
And fine, attack with a lawyer mind on legalistic grounds.
But don't expect the scientists and engineers to be impressed by legalistic arguments...

I do agree that there is quite a lot of denial about Arm on these forums. Also that Richie sometimes gets unnecessary hate and invalid counter-arguments regarding the IPC metrics but he does make estimates that are up to 2x inaccurate, and refuses to fix them, even when pointed out (continiuing copy-pasta and text from his eariler posts)

Example, he low-balled 128 core Ampera die-size by nearly 2 times. Even when linked to Ian Cutress citing Ampera itself that it's near recticle limit, he totally ignored it and kept going about 500$ apples to 7500$ oranges. The same about multiple geekbench results.

It's not like I even disagree on some of the broad points he's making, but it's rather his communication style and "hard number claims" that he copy-pastas over and over (of which some are in the ballpark, others are way off, and he refuses to reassess).

It's quite clear that x86 will be in a world of hurt, if Arm can execute it's neoverse roadmap of 30% yearly gains (which seems very likely at least for the short term). Not to mention if Nuvia tops it by multiples doing something radically different with specialized hardware for hyperscalers (which Jon Masters has hinted pretty heavily in interviews).

It's also clear that Arm Macs should be very very good, especially ones done on 5NP in 2021.

In the age of no clock-speed increases in the near future, it's also obvious that the only way forward is fatter cores, significantly so if you want to extract any real benefit from future nodes (other than more cores). It's not like Richie is the only one on the planet that discovered it. It's absolute common sense, and even Intel is trying to go that way (though their execution is lacking).
 
Apr 30, 2015
131
10
81
How about the MANY times he has estimated things with no valid basis for his estimations. And for one the price of graviton2 ? Nobody knows, but the post that shows every reason in the world why $500 is an insanely low estimate ? And the size ? and not accounting for monolithic vs chiplets in the size estimate ? I could go on. He keeps ignoring anything that does not fit his agenda. My opinions have nothing to do with x86 vs ARM, but everything to do with logic and reasonability.

A reasonable discussion of ARM would be welcome.
ARM have stated explicitly that they expected unit production cost (UPC) for ARM based server SoCs to be in the range $50 -$200.
Do you have any explicit statement from AMD or Intel on UPCs for their server SoCs?
All that is presented above is a convoluted argument to substantiate some one's personal belief; they do not even distinguish between UPC and selling price, which is why their argument is weak. That is not evidence.
Furthermore AMD and Intel selling prices for their server SoCs cannot be as low as their UPCs, because of their overheads.
 
  • Like
Reactions: name99

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,478
14,434
136
ARM have stated explicitly that they expected unit production cost (UPC) for ARM based server SoCs to be in the range $50 -$200.
Do you have any explicit statement from AMD or Intel on UPCs for their server SoCs?
All that is presented above is a convoluted argument to substantiate some one's personal belief; they do not even distinguish between UPC and selling price, which is why their argument is weak. That is not evidence.
Furthermore AMD and Intel selling prices for their server SoCs cannot be as low as their UPCs, because of their overheads.
So you maintain that with NO specifics, just a blanket statement, that a 64 core ARM chip wich is the same is bigger in silicon size to a 64 core EPYC, (see the post I referenced) cost $500 ? I can see them being less, but not that cheap. Thats the same price as my 3900x 12 core.
 
  • Like
Reactions: Tlh97

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
ARM have stated explicitly that they expected unit production cost (UPC) for ARM based server SoCs to be in the range $50 -$200.
Do you have any explicit statement from AMD or Intel on UPCs for their server SoCs?
All that is presented above is a convoluted argument to substantiate some one's personal belief; they do not even distinguish between UPC and selling price, which is why their argument is weak. That is not evidence.
Furthermore AMD and Intel selling prices for their server SoCs cannot be as low as their UPCs, because of their overheads.
So you maintain that with NO specifics, just a blanket statement, that a 64 core ARM chip wich is the same is bigger in silicon size to a 64 core EPYC, (see the post I referenced) cost $500 ? I can see them being less, but not that cheap. Thats the same price as my 3900x 12 core.

Just to hopefully clear up some confusion, I think you are talking about 2 different things.

@Systems analyst is discussing how much the chips actually cost to produce (die cost, packaging, etc.)
@Markfw is discussing how much the chips would cost given more or less retail pricing because this discussion started with the comparison of a 64 core Epyc retail price.

Obviously the "retail" price would be much higher than the production cost and we know that AMD/Intel have very high margins on these chips, so I suggest rather than debating either of these numbers, just go back to the core of the original argument on this topic which was if the ARM server CPUs we've seen so far would be cheaper to make than their x86 competitors. While we still won't get hard numbers on this because Amazon and Ampere don't share actual die sizes, there's plenty of evidence to use to discuss this versus hypothetical retail pricing or exact product costs.
 

Doug S

Platinum Member
Feb 8, 2020
2,201
3,405
136
AVX-512 has been a mess from the start. If your argument is simply SVE is better than AVX, then ok I guess. I just don't see that there's much to discuss here. A huge problem that AVX-512 and wide SVE solutions have/will have is that when you start to support these super wide instructions, you cross over into what GPUs do really well and so the benefit of supporting such on a CPU becomes questionable. Sure, custom designing a CPU like the A64FX for HPC and including these wide instructions will help, but even then does it get you close enough to what you can achieve with a GPU to make people change course? We're talking supercomputers here, not home users. That's the question of future compute trends and so far from what I can tell, the answer is by and large, GPU is still the way to go.

Using GPUs for HPC has always been a stopgap solution - they had number crunching and local fast memory so were faster than off the shelf CPUs for certain classes of FP number crunching.

You seem to be starting with the belief "GPUs are the best solution for HPC" and think attempts to make CPUs do better at it are pointless because you should just use GPUs. That ignores the whole history of HPC where GPUs are a recent phenomena over only a few years.

If you build CPUs designed for HPC (as used to be done a few decades ago before hundreds then thousands and beyond of off the shelf RISC or x86 CPUs were done - i.e. Seymour Cray's stuff at CDC and then Cray) by including very wide SIMD instructions and and some fast local memory then they are going to be better than GPUs because they are designed for computation, not graphics with a side of "oh hey we'll support HPC too because we found out we can charge way more for HPC cards than GPUs".
 

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
Using GPUs for HPC has always been a stopgap solution - they had number crunching and local fast memory so were faster than off the shelf CPUs for certain classes of FP number crunching.

You seem to be starting with the belief "GPUs are the best solution for HPC" and think attempts to make CPUs do better at it are pointless because you should just use GPUs. That ignores the whole history of HPC where GPUs are a recent phenomena over only a few years.

If you build CPUs designed for HPC (as used to be done a few decades ago before hundreds then thousands and beyond of off the shelf RISC or x86 CPUs were done - i.e. Seymour Cray's stuff at CDC and then Cray) by including very wide SIMD instructions and and some fast local memory then they are going to be better than GPUs because they are designed for computation, not graphics with a side of "oh hey we'll support HPC too because we found out we can charge way more for HPC cards than GPUs".

No, my argument meant is that the current and near term best solution for HPC lies in GPUs, at least for the majority of use cases for these supercomputers. For the medium to long term future, there's a real debate to be had. It seems to me that Intel and AMD are trying to reshape that debate with their unified programming languages and they may just do that, but that also is a big unknown in terms of realistic effectiveness.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,478
14,434
136
Just to hopefully clear up some confusion, I think you are talking about 2 different things.

@Systems analyst is discussing how much the chips actually cost to produce (die cost, packaging, etc.)
@Markfw is discussing how much the chips would cost given more or less retail pricing because this discussion started with the comparison of a 64 core Epyc retail price.

Obviously the "retail" price would be much higher than the production cost and we know that AMD/Intel have very high margins on these chips, so I suggest rather than debating either of these numbers, just go back to the core of the original argument on this topic which was if the ARM server CPUs we've seen so far would be cheaper to make than their x86 competitors. While we still won't get hard numbers on this because Amazon and Ampere don't share actual die sizes, there's plenty of evidence to use to discuss this versus hypothetical retail pricing or exact product costs.
To quote YOU

"In conclusion, there's a high probability that Rome is cheaper than all of its Arm competitors, even without counting things like die salvaging and common chiplet business model. It definitely appears to be cheaper than the 80 core Ampere chip and by a significant margin. "

THIS is what I am talking about. COST. So I can't believe that ROME's actual cost is like $300
 

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
To quote YOU

"In conclusion, there's a high probability that Rome is cheaper than all of its Arm competitors, even without counting things like die salvaging and common chiplet business model. It definitely appears to be cheaper than the 80 core Ampere chip and by a significant margin. "

THIS is what I am talking about. COST. So I can't believe that ROME's actual cost is like $300

That's fine, I just wanted to make sure because the original claim of $500 came from Richie Rich saying that Graviton2 only cost $500 when 64 core Rome cost $7500. I'm not sure what the wafer cost of TSMC's 7 nm process is so I'm not comfortable putting an exact dollar amount on any of them.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,478
14,434
136
That's fine, I just wanted to make sure because the original claim of $500 came from Richie Rich saying that Graviton2 only cost $500 when 64 core Rome cost $7500. I'm not sure what the wafer cost of TSMC's 7 nm process is so I'm not comfortable putting an exact dollar amount on any of them.
Its just that everything he posts is twisted to make ARM look good. I still say with the ARM chip being monolithic, that EPYC costs less. A LOT less as you said.

Again, if we could have realistic debates about ARM, it would be nice. Like I believe that the benchmarks where the graviton got killed by EPYC in a couple of benchmarks was real. But in others it was close. This is no where NEAR Richies take on the situation.
 
  • Like
Reactions: Tlh97

Doug S

Platinum Member
Feb 8, 2020
2,201
3,405
136
That's fine, I just wanted to make sure because the original claim of $500 came from Richie Rich saying that Graviton2 only cost $500 when 64 core Rome cost $7500. I'm not sure what the wafer cost of TSMC's 7 nm process is so I'm not comfortable putting an exact dollar amount on any of them.

64 core Rome costs $7500 because that's what AMD is able to get people to pay for it. Surely you aren't so naive you think the prices that Intel and AMD charge for their top end CPUs has anything whatsoever to do with the price to make them?

Teardowns/BOMs of iPhones have consistently pegged the cost of Apple's SoCs in the $30 range, and they are generally around 100 mm^2. Let's say the firms that do these things for a living don't know what they are doing, and the price is actually double that, so $60. What's the total silicon area of a 64C Rome? Seems pretty difficult to make a compelling case the foundry cost for Rome is more than $500 and even that may be high.

Now that figure doesn't include amortization of design cost, all the testing/validation servers require and other fixed costs that are divided up across all units (including SKUs with fewer cores) this is just how much you have to figure they are paying TSMC for their part.

As Markfw says, the math for a monolithic chip is different because the bigger a chip the worse the yield. They can get around that by having redundancy (i.e. extra cores) but people here are claiming that Graviton isn't doing that.
 
  • Like
Reactions: name99

name99

Senior member
Sep 11, 2010
404
303
136
t
64 core Rome costs $7500 because that's what AMD is able to get people to pay for it. Surely you aren't so naive you think the prices that Intel and AMD charge for their top end CPUs has anything whatsoever to do with the price to make them?

Teardowns/BOMs of iPhones have consistently pegged the cost of Apple's SoCs in the $30 range, and they are generally around 100 mm^2. Let's say the firms that do these things for a living don't know what they are doing, and the price is actually double that, so $60. What's the total silicon area of a 64C Rome? Seems pretty difficult to make a compelling case the foundry cost for Rome is more than $500 and even that may be high.

Now that figure doesn't include amortization of design cost, all the testing/validation servers require and other fixed costs that are divided up across all units (including SKUs with fewer cores) this is just how much you have to figure they are paying TSMC for their part.

As Markfw says, the math for a monolithic chip is different because the bigger a chip the worse the yield. They can get around that by having redundancy (i.e. extra cores) but people here are claiming that Graviton isn't doing that.

That claim (Graviton lack of redundancy) is one of the reasons I find it hard to take certain criticisms seriously.
Options:
- yields (at the size of Graviton2) are just not bad enough to be worth worrying about
OR
- Amazon has redundancy (so what's the big deal)
OR
- Amazon is too stupid to understand how to use redundancy to grossly improve effective yield, even though the entire world knows how to do this.

Is (3) really a sensible option? If you're insisting that it must be the case because clearly 1 and 2 are false, are you surprised that other people are skeptical of your opinions and analyses?