Discussion AWS Graviton2 64 vCPU Arm CPU Heightens War of Intel Betrayal

exquisitechar · Dec 3, 2019

https://www.servethehome.com/aws-graviton2-64-core-arm-cpu-heightens-war-of-intel-betrayal/

Pretty big deal for ARM in servers. Interested in seeing a comparison between this and Rome.

DrMrLordX · Mar 18, 2020

Richie Rich said:
@everybody: bla bla bla .... Apple will never move into servers .... bla bla bla...
Reality: ex-Apple architects starts the NUVIA up

Nuvia isn't apple. Plus lawsuits. Apple is busy trying to bury Nuvia, not support them. But hey go right on with that tangent. Actually, don't.

Reality: Graviton2

Then why waste your time with Apple's cores in a thread about Graviton2? Honestly. And it (Graviton2) still hasn't benched against Rome. Nor has anything from Ampere. Or ThunderX3. I haven't seen Kunpeng benched against it either. Or A64FX!

There will be much more crying, sobbing and whining from guys

Again there is no crying, whining, or sobbing. People are tired of repetitive posting. The only one crying here is you.

lobz · Mar 18, 2020

Richie Rich said:
There will be much more crying, sobbing and whining from guys like Thunder57, lobs, coercitive etc. in 2021 when Ampere Mystique and Graviton3 based on A77 will come. Yeah, prepare your tissues boys because I'm gonna pull out some older post of you. This gonna be fun
Because the majority agreed on a certain opinion does not mean that they are right. It could just mean they are sheeps....

I think you should seek psychiatric help.

amrnuke · Mar 25, 2020

Nothingness said:
Can you (or someone else that doesn't have me on ignore ) double check your computation? Where did the 40% speed come from?

Sorry for not responding.
I just used the AT Graviton2 review, and the AT 7742 2P analysis divided by two.
However, I definitely made a huge error in Excel - somehow the formula didn't copy and it didn't divide by two. Should have double checked.
Graviton2 is 80% the speed of 1/2 of a 2P 7742 by my calculations.

	Graviton2	EPYC 7742 2P	EPYC 7742	G2 as % of 7742
	64 vCPU		1/2 of 2P

MT
400	1613	4820	2410	66.93%
401	924	3250	1625	56.86%
403	701	3540	1770	39.60%
429	597	1540	770	77.53%
445	1692	4170	2085	81.15%
456	2904	6480	3240	89.63%
458	1605	3900	1950	82.31%
462	725	1180	590	122.88%
464	2821	6400	3200	88.16%
471	574	1510	755	76.03%
473	806	1550	775	104.00%
483	1048	2870	1435	73.03%

				79.84%

ST	G2		7742
400	30.05		43.7	68.76%
401	19.21		27.2	70.63%
403	34.49		42.6	80.96%
429	29.4		39.6	74.24%
445	27.73		32.7	84.80%
456	48.5		60.5	80.17%
458	25.93		27.6	93.95%
462	93.79		72.3	129.72%
464	46.05		60.4	76.24%
471	23.19		23	100.83%
473	19.84		25.4	78.11%
483	32.21		47.8	67.38%

				83.82%

Richie Rich · Apr 2, 2020

amrnuke said:
SMT on vs SMT off on 3900X results in 23.44% increase in performance, normalized to the fact that these are not well-scaling tests overall, divide 23.44% / 52.9% = 44.3% benefit of SMT.

View attachment 18993

The last column with 23.4% SMT benefit looks reasonable finally. However I still don't understand why you try to rape this measured number by some crazy 52.9%. If you want to eliminate Amhdal's scaling penalty to get pure SMT benefit then the most clean way is to measure that by running multiple ST instances. That's why @Nothingness mentioned that SPECrate 28% SMT benefit.

To sum up: I was right from begining with my 25%, wasn't I?

Richie Rich · Apr 3, 2020

coercitiv said:
Because he compared 12c/12t versus 6c/12t and showed the obvious: workloads don't scale with more cores as expected.

Assume SMT scaling is 25% and core scaling is 100%
1c/2t - resulting performance is 125%
2c/2t - resulting performance is 200%
--> relative performance between 2c/2t and 1c/2t: 1.6X

However in the real world relative performance between 12c/12t versus 6c/12t is only 1.24X , meaning either:
Option A - SMT gains are actually 60%

Aha, you take real measure SMT benefit of 1.25X then instantly raping it to 1.6X and saying SMT benefit is 1.6X. That's insane and wrong conclusion.

The only thing what you are right is that 1.6X is "relative performance between 2c/2t and 1c/2t". Yes, different number of physical cores and different SMT activation status.

If you want to calculate SMT benefit then you have to normalize to same number of physical cores:

relative performance 1.6X needs to be divided it by 2 cores... 1.6 / 2 = 0.8X
this 0.8X is performance hit when SMT is OFF (now that's correct comparison of 1c/1t vs. 1c/2t)
do inverse function 1/0.8= 1.25X ....... voila.... you get same number like on beginning (different percentage base only)

You and Armnuke redated cannot do even a basic school math. That's pretty sad. I'm wonder how you can do some basic university stuff like diffencial equations when you are in trouble with this basics. However I hope now it's clear.

I think you can find a better word to use in tech.

esquared
Anandtech Forum Director

Andrei. · Apr 6, 2020

I'm not even going to read your whole post because it's the same old stupid IPC story which is just wrong with absolutely no tether to reality. The whole CISC vs RISC thing being brought up is akin to no technical knowledge on the topic.

(For those people understanding the topic know the value of the table I'm posting above and what I just did in a forum thread)

The above are actual IPC figures for the A12 along with retired instruction count. The whole argument is void because the instruction count for the workload isn't wildly different than on x86. I had done the x86 vs AArch64 instruction comparison before as I said repeatedly over the last year where this keeps being brought up, it doesn't majorly differ much beyond a ~10% divergence depending on the test.

I'll run fresh IPC figures on desktops in a few months but you can refer to many other x86 resources for actual IPC figures, for sample https://dl.acm.org/doi/fullHtml/10.1145/3369383 / https://dl.acm.org/cms/attachment/3cb26a5a-f323-4a19-ba4b-d7f3cdd23fb7/taco1604-46-f05.jpg

The TLDR; is that yes Apple has 80%+ higher IPC than Intel. Get over it, stop trying to deny reality.

amrnuke · Apr 7, 2020

Nothingness said:
Sorry but I don't believe your data, for the many reasons I've been repeating ad nauseam.

But you have no evidence! You pick data from different sources and play with it.

It's no more unbelievable than what you've done.

I guess you wanted to say that results scale with frequency, right? The answer is obviously no, it won't scale but the answer also is that it won't bring the 83% advantage down to the levels you mention.

I'm speechless and I'm done discussing this any further.

Here's the problem. Y'all are making claims without backing them up. I'm trying to use the sources you, Andrei, and Richie Rich provided to back up your claims, and I haven't found anything that backs up those claims. Remember, I am not the one making claims. I am taking your sources and trying to verify your claims and I'm unable to do it. You shouldn't have to believe my data. My data is the same data that has been given in this conversation. I'm just trying to verify people's arguments. And I'm not finding such verification.

In reality, it is Andrei, markfw, Richie Rich, and you who should be providing the verification, but you are not.

Here are the assertions and what I have found based on the data provided (if any):

Richie Rich: A13 has +83% IPC over 9900K and 3900X
- problem - no IPC data comparing 9900K and 3900X and A13. The A13 has a +83% clock-normalized SPECint2006 score, but SPECint2006 hasn't been proven to correlate very well to IPC
- to correct this - change his signature to be accurate, or provide data to back up his claim, or provide data showing the SPECint2006 correlates nicely to IPC

Andrei: "Apple" has +80% or more IPC over "Intel"
- problem - "Apple" is generic and "Intel" is generic, does he mean that all Apple chips averaged have +80% IPC over all Intel chips? A13 over Intel 3930K? A6 over 9900K? Who knows?!?!? The only IPC data he provided showed A12 has +60% IPC over 6700K. Hence the statement, given the data provided, is wrong. What he should have said was that the A12 has a 60.9% IPC lead over the 6700K, because that's all the data that was presented.
- to correct this - 1) define "Apple" and define "Intel", 2) provide the IPC data showing +80% or more IPC of Apple over Intel

markfw: "Apple does not get 83% more IPC"
- problem - he provides no data to back it up at all, notwithstanding again that "Apple" is generic and he provides no comparator to the generic "Apple". More IPC than what?
- to correct this - 1) define "Apple" and define the comparator, 2) give us the data disproving that it has an 83% IPC benefit over... whatever it is that he wants to compare it to

You: "FWIW I made some x86-64 vs AArch64 instruction measurements some years ago. AArch64 is very competitive both in terms of number of instruction and in terms of total instruction size (that was to assess instruction density)."
- problem - you provided no data to back it up. Not wrong, but incomplete. Well, maybe wrong. We just don't know because you haven't provided any data to verify your claim! To quote Christopher Hitchens, "That which is asserted without evidence can be dismissed without evidence." You can tell me that the lizard king is the one true ruler of the world. I don't care. Show some data, some work, some proof.
- to correct this - provide the data, or at least be more granular about the results. "Very competitive" doesn't mean much.

You: "Anyway if the number of executed instructions is within 10% between x86-64 and AArch64"
- problem - we cannot take this as fact, because you have not provided the data.
You: "that 83% figure would only be wrong by as much."
- problem - this is reasonable speculation, but since it is founded on 1) no data and 2) further speculation we therefore cannot claim it as true
- to correct this - 1) Turn your speculation in the first sentence into an evidence-based statement by providing the data, or at least a granular result that we can verify, and 2) do the work to prove the speculation in your second sentence, because a 10% difference between x86-64 and AArch64 on the chips you ran the numbers on may not be the same as it is on an A13 vs 9900K.

You: When confronted with incomplete data, you told me: "Why do you want 6700K? 9900K is the better uarch no?"
- problem - As I mentioned, I would like to compare apples to apples. We only have IPC data on A12 and 6700K. We only have SPECint2006 on 9900K and A12. To make this comparison we would need either SPECint2006 run by AT on a 6700K, or IPC numbers for the 9900K. We have neither of those.
How does doing what you propose help us with our data quality?
- to correct this: don't tell me to throw an orange into the apple bin and try to compare the sweetness

And as for me, I KNOW I have made poor conclusions based on my lack of knowledge of the subject, and limited amount of data to work with. But I have tried to limit making claims out of thin air without at least providing a rationale and the information I used to arrive at that claim. To the extent I haven't provided such rationale and information, I am of course no better or worse than the people I am criticizing above.

What I feel we must all do is realize that we have some data, and none of that data backs up any of the assertions made above made by you, Andrei, markfw, or Richie Rich. What I would have hoped for is that when people made the claims above, they would have at least had some evidence to back it up, or shown some work to back it up. But no one has provided such data. And until the data is provided, all of those statements are presented without evidence and I don't see why we should take any of them as factual/true.

name99 · Apr 7, 2020

Schmide said:
I have a question. Is the A13 provisioning operations to the AMX blocks or is it strict ARM instructions?

I've heard of erroneous 462.libquantum results where the compiler pushed operations to all cores rather than do an appropriate peak.

https://www.realworldtech.com/forum/?threadid=80010&curpostid=80013

and

https://clang.llvm.org/docs/CommandGuide/clang.html

No publicly available compiler sends instructions to AMX. That includes XCode.
It is possible that internal Apple libraries (not used by SPEC code) like Accelerate or the ML libraries use AMX but there's no good evidence for that (performance or people looking at the binaries).
So far the way to bet is that AMX is not hooked up to anything outside Apple, and that there will be a big reveal (both compiler-side and in libraries) at WWDC.

SPEC allows auto-parallelization of any code as long as the compiler does it without developer help. This is reasonable insofar as you want a compiler to do whatever it can automatically (eg autovectorization). What is problematic is the vendor compilers that put massive effort into ways to auto-parallelize code that looks exactly like SPEC -- and no a damn thing else.
THAT is why people are unimpressed with vendor compiler results for SPEC; it's not that the results are "wrong" so much as that they are utterly uninformative. They don't tell you much about the capabilities of a core (if some of the code has been run across multiple cores) and they don't tell you much about the compiler (if the only code that will get such auto treatment is code that looks EXACTLY like SPEC).

lobz · Apr 7, 2020

name99 said:
But be aware that the world is watching... The kids who understand the real issues (and who have the most interesting conversations) don't have time for people who think the goal of these discussions is to score points; their goal is to understand the CPUs.

But they do have time for namecalling? Bullcrap. Talk about self-justifying arrogant elitism... I don't think that was meant to be this place's standard. Although, in a time where one of Tom's top editors can call the owner of a well known and actually 100 times more credible and objective tech reviewer (Steve from HardwareUnboxed) a liar and a troll and absolutely no apology or such is made publicly whatsoever, I guess I should't be surprised about anything here either.

DrMrLordX · Dec 3, 2019

It is a big deal, but unless Amazon starts selling Graviton2 to other parties, it's more a story of Intel losing ODM market share than anything else. Intel has bent over backwards to make Amazon, Google, and (presumably) Microsoft happy with custom hardware and early releases. Apparently Amazon isn't happy with IceLake-SP or Cooper Lake.

JasonLD · Dec 3, 2019

DrMrLordX said:
It is a big deal, but unless Amazon starts selling Graviton2 to other parties, it's more a story of Intel losing ODM market share than anything else. Intel has bent over backwards to make Amazon, Google, and (presumably) Microsoft happy with custom hardware and early releases. Apparently Amazon isn't happy with IceLake-SP or Cooper Lake.

Nah, Amazon was going to do it even if Intel wasn't behind of Xeon schedules. Well, I am sure Intel would prefer to lose marketshare to AMD than ARM based servers so Intel might have to sweeten up the deals.

uzzi38 · Dec 3, 2019

DrMrLordX said:
Apparently Amazon isn't happy with IceLake-SP or Cooper Lake.

Nobody is, for good reason too.

DrMrLordX · Dec 3, 2019

JasonLD said:
Nah, Amazon was going to do it even if Intel wasn't behind of Xeon schedules. Well, I am sure Intel would prefer to lose marketshare to AMD than ARM based servers so Intel might have to sweeten up the deals.

It's one thing for Amazon to design the CPU in its labs. It's another for them to deploy them en masse instead of deploying Cooper Lake (because let's face it, even Amazon can't get IceLake-SP in quantity).

uzzi38 · Dec 3, 2019

DrMrLordX said:
(because let's face it, even Amazon can't get IceLake-SP in quantity).

>ICL-SP
>Quantity

Thanks for the laugh, needed that.

soresu · Dec 3, 2019

DrMrLordX said:
It is a big deal, but unless Amazon starts selling Graviton2 to other parties, it's more a story of Intel losing ODM market share than anything else. Intel has bent over backwards to make Amazon, Google, and (presumably) Microsoft happy with custom hardware and early releases. Apparently Amazon isn't happy with IceLake-SP or Cooper Lake.

It may be that they are diversifying to offer ARM instances for AWS uses - when decent dev boards are often expensive and anaemic it could have its uses.

JasonLD · Dec 3, 2019

DrMrLordX said:
It's one thing for Amazon to design the CPU in its labs. It's another for them to deploy them en masse instead of deploying Cooper Lake (because let's face it, even Amazon can't get IceLake-SP in quantity).

I think it will be beneficial for Amazon in a long run if they can provide their CPU in-house instead of relying on Intel or AMD. Even if Ice Lake-SP was in sufficient quantity, it wouldn't have stopped Amazon from deploying their own Arm based Instances.

Ajay · Dec 3, 2019

Well, Intel's performance and core count complacency, combined with the execution failure on 10 nm has really bit them in the butt. AWS now has great leverage to exert over Intel and even AMD in terms of pricing (much less for AMD due to $/core costs). I wonder, if for the short term, this is AWS goal? Where is Amazon doing the design work for graviton?

liahos1 · Dec 3, 2019

isnt the +40% perf per vcore? I feel like people are missing the finer points here? Vcore is half intel core. Intel core has hyperthreading (vcpu).

1.4/2 = 0.7
1/0.7 = 1.43 favoring intel

soresu · Dec 3, 2019

liahos1 said:
isnt the +40% perf per vcore? I feel like people are missing the finer points here? Vcore is half intel core. Intel core has hyperthreading (vcpu).

1.4/2 = 0.7
1/0.7 = 1.43 favoring intel

Vcore/Vcpu is because it's meant for VM's only.

soresu · Dec 3, 2019

Ajay said:
Well, Intel's performance and core count complacency, combined with the execution failure on 10 nm has really bit them in the butt. AWS now has great leverage to exert over Intel and even AMD in terms of pricing (much less for AMD due to $/core costs). I wonder, if for the short term, this is AWS goal? Where is Amazon doing the design work for graviton?

I believe it's the Annapurna Labs they acquired some time ago - that's an assumption, not a fact as far as I am aware.

Though I think ARM did a lot of the work for them with the N1 design, it sounds like much more than a mere core, more like a licensable whole server chip design.

liahos1 · Dec 3, 2019

this is also comping against 8175. Shouldnt the better comp be 8276?

uzzi38 · Dec 3, 2019

liahos1 said:
this is also comping against 8175. Shouldnt the better comp be 8276?

I'd rather see a comparison vs 7742. All these ARM companies keep comparing to 7601P or CL-SP. We all know where leadership perf at least for the next year and a half lies.

Markfw · Dec 3, 2019

uzzi38 said:
I'd rather see a comparison vs 7742. All these ARM companies keep comparing to 7601P or CL-SP. We all know where leadership perf at least for the next year and a half lies.

I can say the 7601 (2p version that I have) is slower than the 2990wx, and the 7742 kicks it into oblivion, so yes, that is what I would like to see.

liahos1 · Dec 3, 2019

uzzi38 said:
I'd rather see a comparison vs 7742. All these ARM companies keep comparing to 7601P or CL-SP. We all know where leadership perf at least for the next year and a half lies.

Why comp against the 7742? It's single core boost is 3.4ghz. It would be better to comp vcpu against the 8175 (3.1ghz boost) if the point of the comp was to make you look as good as possible (hence no 8276).

Discussion AWS Graviton2 64 vCPU Arm CPU Heightens War of Intel Betrayal

Senior member

Lifer

Platinum Member

Golden Member

Senior member

Senior member

Senior member

Golden Member

Senior member

Platinum Member

Lifer

Senior member

Platinum Member

Lifer

Platinum Member

Platinum Member

Senior member

Lifer

Senior member

Platinum Member

Platinum Member

Senior member

Platinum Member

Moderator Emeritus, Elite Member

Senior member