Discussion AWS Graviton2 64 vCPU Arm CPU Heightens War of Intel Betrayal

exquisitechar · Dec 3, 2019

https://www.servethehome.com/aws-graviton2-64-core-arm-cpu-heightens-war-of-intel-betrayal/

Pretty big deal for ARM in servers. Interested in seeing a comparison between this and Rome.

amrnuke · Apr 6, 2020

Nothingness said:
@amrnuke The IPC term use here is overloaded.

FWIW I made some x86-64 vs AArch64 instruction measurements some years ago. AArch64 is very competitive both in terms of number of instruction and in terms of total instruction size (that was to assess instruction density). People who think AArch64 lags behind x86 in terms of ISA just didn't study it.

EDIT: @Andrei. beat me to it. Anyway our measurements seem to give similar results.

Do you still have that data? I would love to see it.

As I responded to Andrei:
- my complaint is not that IPC is vastly different between the two (though it could be, it seems not to be the case), though we have to obviously keep our thinking caps on when comparing two ISAs. My RISC vs CISC instruction comparison was part of illustrating that we need definition as to what we mean by IPC.
- my complaint is that Richie Rich is labeling a test that compares relative work done per cycle, as IPC, which it is not, even though his result may be similar, he has not done any calculations on instructions per cycle.

Schmide · Apr 6, 2020

I have a question. Is the A13 provisioning operations to the AMX blocks or is it strict ARM instructions?

I've heard of erroneous 462.libquantum results where the compiler pushed operations to all cores rather than do an appropriate peak.

https://www.realworldtech.com/forum/?threadid=80010&curpostid=80013

and

https://clang.llvm.org/docs/CommandGuide/clang.html

-Ofast Enables all the optimizations from -O3 along with other aggressive optimizations that may violate strict compliance with language standards

Andrei. · Apr 6, 2020

amrnuke said:
I wish you had read my post instead of just assuming it was a "stupid" old argument.

The problem is that you're presenting a theory based on the RISC vs CISC argument that keep have been bringing up to the IPC argument, which hasn't been valid for over a decade and certainly not valid for AArch64. It *is* the same stupid old argument that just needs to outright die instead of being brought up as a theory again, because people will read it and they will continue to be convinced of it and repeat it elsewhere.

amrnuke said:
Long story short, he's claiming 83% IPC lead based on SPECint2006 data, which is false equivalence.

But it's not false equivalence. It's perfectly sound, that's the point.

amrnuke said:
Do you mind providing a chart on retired instructions for x86 that you have done previously?

I hadn't saved it anywhere (I was just quickly checking things), I'll have to setup a box again with Linux and rerun it.

amrnuke said:
Also, finally, can you speak on their statement: "Obviously, IPC cannot be used as a performance metric since two different ISAs are being evaluated." Is this pertinent since ARM/x86 are different ISAs? The authors state that in such a case of different ISAs, FLOPC is a more accurate way of comparing two ISAs. Does that apply here, and if not, why not?

They're talking about low-level analysis, which is of course correct. But for us in the general public the whole argument is moot, especially in the face of the small differences in retired instructions. Performance per clock ~= instructions per clock for me as it's good enough between x86 and AArch64.

Schmide said:
I have a question. Is the A13 provisioning operations to the AMX blocks or is it strict ARM instructions?

AMX is still decoded by the core - however Apple's public compiler has no support for the custom instructions. yes it's custom instructions and yes they are allowed to do it. Because they're not exposing them it's not creating any ISA fragmentation.

Schmide · Apr 6, 2020

Andrei. said:
AMX is still decoded by the core - however Apple's public compiler has no support for the custom instructions. yes it's custom instructions and yes they are allowed to do it. Because they're not exposing them it's not creating any ISA fragmentation.

but we shouldn't use it as a metric for ARM cores since it is akin to an HSA operation.

EDIT: moreover it gives a false representation of the actual scaling any additional core would produce.

amrnuke · Apr 6, 2020

Andrei. said:
The problem is that you're presenting a theory based on the RISC vs CISC argument that keep have been bringing up to the IPC argument, which hasn't been valid for over a decade and certainly not valid for AArch64. It *is* the same stupid old argument that just needs to outright die instead of being brought up as a theory again, because people will read it and they will continue to be convinced of it and repeat it elsewhere.

But my problem with his signature, and everything I'm talking about, is making sure that he actually knows what he's talking about, because he's calling scaled performance per clock "IPC" when it isn't. Case in point, IPC ranges largely in the single digits. His scores range in the 50s. Why don't I just go back and delete the brief RISC vs CISC discussion so it doesn't distract you? I would be making the same points and asking the same questions about whether we actually have the data that A13 really has 83% IPC gain over 3900X and 9900K.

Andrei. said:
But it's not false equivalence. It's perfectly sound, that's the point.

IPC
Based on the A12 vs SKL data on IPC you provided, the A12 benefit over SKL is 60.9%:
(2.928 average IPC for A12, and in the paper you linked to, 1.819 average IPC for 6700K)
SPECint2006
SPECint2006 scores scaled to score per GHz you see that A12 scores 48.965* average (source), divided by 2.5 GHz, which is 19.586 SPEC per GHz. And 6700K scored 70.492 average (source), divided by 4.2 GHz, which is 16.784 SPEC per GHz. That gives a 16.7% lead to A12. Even if we exclude libquantum the difference is only 26%.
* - I said 48.798 before but included an erroneous entry for 400; also, Anandtech got a SPECint2006 Speed score of 45.32, which is not the same as the average, but it makes A12 look worse so I'll leave it higher.

That seems fairly disparate to me. Can you check my numbers to make sure I'm looking at this right? It seems like SPECint2006 gives very different results when normalized for clock speed, than the IPC data from your calculations and the paper you linked to.

Andrei. said:
I hadn't saved it anywhere (I was just quickly checking things), I'll have to setup a box again with Linux and rerun it.

I don't want you to have to waste your time if our points can be concluded that SPECint2006 =/= IPC using the above information. But if you need to square off an A13 vs 9900K to verify, by all means, it would be fun!

Andrei. said:
They're talking about low-level analysis, which is of course correct. But for us in the general public the whole argument is moot, especially in the face of the small differences in retired instructions. Performance per clock ~= instructions per clock for me as it's good enough between x86 and AArch64.

See above. It seems that it's a little different.

Nothingness · Apr 7, 2020

amrnuke said:
Do you still have that data? I would love to see it.

Sorry I'm not allowed to share it. You'll have to wait for @Andrei. data.

Someone posted some similar comparison but only for a single input for SPEC 2000 176.gcc here:

RWT Forums - Real World Tech

content overridden

www.realworldtech.com

Code:

i386___ instructions=6896517827 size=20761288034 3.01 bytes/instr
x86-64_ instructions=7445258067 size=25713146061 3.45 bytes/instr
aarch64 instructions=6691899327 size=26767597308 4.00 bytes/instr
arm____ instructions=7721825824 size=30796498138 3.99 bytes/instr
thumb__ instructions=7877291651 size=23736719388 3.01 bytes/instr

It's so sad SPEC doesn't put in public domain their benchmarks once they are retired. I guess it would be a time consuming task to ensure the copyright holders agree with that.

- my complaint is that Richie Rich is labeling a test that compares relative work done per cycle, as IPC, which it is not, even though his result may be similar, he has not done any calculations on instructions per cycle.

Many conflate IPC with score/frequency, I sometimes do that myself even though I agree it should be avoided.

Anyway if the number of executed instructions is within 10% between x86-64 and AArch64 , that 83% figure would only be wrong by as much.

Nothingness · Apr 7, 2020

amrnuke said:
SPECint2006
SPECint2006 scores scaled to score per GHz you see that A12 scores 48.965* average (source), divided by 2.5 GHz, which is 19.586 SPEC per GHz. And 6700K scored 70.492 average (source), divided by 4.2 GHz, which is 16.784 SPEC per GHz. That gives a 16.7% lead to A12. Even if we exclude libquantum the difference is only 26%.
* - I said 48.798 before but included an erroneous entry for 400; also, Anandtech got a SPECint2006 Speed score of 45.32, which is not the same as the average, but it makes A12 look worse so I'll leave it higher.

That seems fairly disparate to me. Can you check my numbers to make sure I'm looking at this right? It seems like SPECint2006 gives very different results when normalized for clock speed, than the IPC data from your calculations and the paper you linked to.

Any comparison with icc (or AOCC) on SPEC is pointless. Don't do that please.

In general avoid getting results from different sources when you can get results from a single one. It is a waste of time, really.

PS: You talk about "average" above. You know that SPEC uses geometric mean, right? Just checking

EDIT: There was an article by David Kanter in Microprocessor Report about icc. I'm afraid it's behind a paywall but I give the link anyway: https://www.linleygroup.com/mpr/article.php?id=11708

amrnuke · Apr 7, 2020

Nothingness said:
Sorry I'm not allowed to share it. You'll have to wait for @Andrei. data.

Someone posted some similar comparison but only for a single input for SPEC 2000 176.gcc here:

RWT Forums - Real World Tech

content overridden

www.realworldtech.com

Code:

i386___ instructions=6896517827 size=20761288034 3.01 bytes/instr x86-64_ instructions=7445258067 size=25713146061 3.45 bytes/instr aarch64 instructions=6691899327 size=26767597308 4.00 bytes/instr arm____ instructions=7721825824 size=30796498138 3.99 bytes/instr thumb__ instructions=7877291651 size=23736719388 3.01 bytes/instr

It's so sad SPEC doesn't put in public domain their benchmarks once they are retired. I guess it would be a time consuming task to ensure the copyright holders agree with that.

Many conflate IPC with score/frequency, I sometimes do that myself even though I agree it should be avoided.

Anyway if the number of executed instructions is within 10% between x86-64 and AArch64 , that 83% figure would only be wrong by as much.

And yet markfw said 83% is wrong, not that it wasn't close. I think he, Richie Rich and Andrei are all talking about something about which we do NOT have the data to make a conclusion. That's my overall point. No data, no good surrogates as best I can tell, and here we are chastising people over it.

Nothingness said:
Any comparison with icc (or AOCC) on SPEC is pointless. Don't do that please.

It was the easiest data I found

I didn't realize it was completely pointless, but I see the fault since optimized code will artificially inflate scores.

Nothingness said:
In general avoid getting results from different sources when you can get results from a single one. It is a waste of time, really.

Sadly I do not see where anyone has compared a 6700K to an A12 from the same source. The problem when we can't get results from a single source is that we are left with poorer quality data, though I don't think it's completely meaningless.

So I guess, unless someone wants to produce data that suggest we can use SPECint2006 scores as a reasonable surrogate for IPC, rather than having substantial differences in results, then absolutely let's do it!

Nothingness said:
PS: You talk about "average" above. You know that SPEC uses geometric mean, right? Just checking

I have heard that before, and completely forgot about it when making my calculation.

Thank you!

Nothingness said:
EDIT: There was an article by David Kanter in Microprocessor Report about icc. I'm afraid it's behind a paywall but I give the link anyway: https://www.linleygroup.com/mpr/article.php?id=11708

I know Intel and AMD's optimized code is absurd. So we should stick with gcc/llvm which I will try to ensure in the future

Also, I actually went back and looked at Richie's signature... all I could think is "what the fork"? Despite experts saying we shouldn't use SPECint to compare ISAs, that's what he did.

1. Intel Core i9 9900K @5GHz ......... SPECint2006 score: 54.28 ...... 10.86 pts/GHz
2. Apple A13 @2.65 GHz .................. SPECint2006 score: 52.82 ...... 19.93 pts/GHz ...... +83 % IPC over 9900K
3. AMD Ryzen 3950X @4.6 GHz ...... SPECint2006 score:50.02 ...... 10.87 pts/GHz ...... + 0% IPC over 9900K .... fastest clocked Ryzen beaten by iPhone CPU
4. ARM Cortex A77@2.84 GHz ......... SPECint2006 score: 33.32 ...... 11.73 pts/GHz ...... + 8% IPC over 9900K

But the Anandtech article shows the following, and took the advice of the paper Andrei linked to (since Andrei of course wrote the article), and compared only SPECfp scores for A13 vs 9900K and 3900X, since one cannot, per those authors, use SPECint to compare different ISAs:
9900K @ 5.0 GHz gets 75.15, which is 15.03 pts/GHz
3900X @ 4.6 GHz gets 73.66, which is 16.01 pts/GHz
A13 @ 2.66 GHz gets 52.82, which is 19.86 pts/GHz
Meaning A13 has a 32% lead over 9900K and a 24% lead over 3900X.
Unless I'm reading this wrong.

Nothingness · Apr 7, 2020

amrnuke said:
And yet markfw said 83% is wrong, not that it wasn't close. I think he, Richie Rich

These 2 guys are at two opposed extremes. So I take what they say with a huge grain of salt. In fact I take with a grain salt what anyone writes including myself

It was the easiest data I found I didn't realize it was completely pointless, but I see the fault since optimized code will artificially inflate scores.

The data for 9900K has been in @Andrei. results for months. Why do you want 6700K? 9900K is the better uarch no?

Sadly I do not see where anyone has compared a 6700K to an A12 from the same source. The problem when we can't get results from a single source is that we are left with poorer quality data, though I don't think it's completely meaningless. But if we assume (as many people have stated) that IPC has not changed from 6700K to 9900K, then we can use the 6700K IPC results as a surrogate for 9900K, and use Anandtech's own SPECint2006 score:
A12 geomean - 45.32 / 2.5 GHz = 18.128
9900K geomean - 75.15 / 5 GHz = 15.03
18.128 / 15.03 = 20.6% lead for A12.

Why do you pick FP? Why A12?
9900K score for SPECint is 54.28 so 10.86/GHz and for A13 it is 52.82 so 19.93/Ghz. So you get +83% score/GHz.

Now there is a big gotcha here: the frequency difference is way too large to be ignored, and the 9900K is at a big disadvantage here.

But the Anandtech article shows the following, and took the advice of the paper Andrei linked to, and compared only SPECfp scores, since one cannot, per those authors, use SPECint to compare different ISAs:

What paper are talking about? As you present it, it is an utterly stupid statement. Both int and fp give interesting data, but using fp can create a distortion where wider vectors and more numerous FP units can give a disproportionate advantage that doesn't translate to many apps. And it's obviously not only a matter of ISA, but a matter of a particular microarch.

9900K @ 5.0 GHz gets 75.15, which is 15.03 pts/GHz
3900X @ 4.6 GHz gets 73.66, which is 16.01 pts/GHz
A13 @ 2.66 GHz gets 52.82, which is 19.86 pts/GHz
Meaning A13 has only a 32% lead over 9900K and a 24% lead over 3900X.
Unless I'm reading this wrong.

Your computation is correct.

amrnuke · Apr 7, 2020

Nothingness said:
These 2 guys are at two opposed extremes. So I take what they say with a huge grain of salt. In fact I take with a grain salt what anyone writes including myself

Fair points!

Nothingness said:
The data for 9900K has been in @Andrei. results for months. Why do you want 6700K? 9900K is the better uarch no?

Well, I want 6700K because the IPC data is only for 6700K. I have no true IPC data available for 9900K, so to compare:
IPC: A12 vs 6700K
SPECint2006: A12 vs 9900K
Seems silly
Especially since the IPC data was with iso-clock at 3GHz but 9900K SPECint2006 was run at 5GHz.

Nothingness said:
Why do you pick FP? Why A12?
9900K score for SPECint is 54.28 so 10.86/GHz and for A13 it is 52.82 so 19.93/Ghz. So you get +83% score/GHz.

Now there is a big gotcha here: the frequency difference is way too large to be ignored, and the 9900K is at a big disadvantage here.

What paper are talking about? As you present it, it is an utterly stupid statement. Both int and fp give interesting data, but using fp can create a distortion where wider vectors and more numerous FP units can give a disproportionate advantage that doesn't translate to many apps. And it's obviously not only a matter of ISA, but a matter of a particular microarch.

* I modified my post after posting, because I didn't include the data I intended to.

The authors of this paper linked by Andrei state in section 5.3: "Obviously, IPC cannot be used as a performance metric since two different ISAs are being evaluated. Instead, FLOPC was accompanied to B in step ① of Figure 4. The number of FP operations is an inherent attribute of the application." So that is why I compared SPECfp instead of SPECint, and I assume that's why Andrei only compared SPECfp scores in his article here on Anandtech, rather than comparing SPECint scores. But I'll let him speak for himself as to why he made that decision.

As for why A12, I chose it because we have no IPC data for A13. I wanted to compare apples to apples.
A12 IPC
6700K IPC
A12 SPECint
6700K SPECint

However, if I want to compare A12 to 6700K we lack SPECint done by same source as A12.

If I want to compare A12 to 9900K to see whether SPECint is a good surrogate, we are missing raw IPC data on 9900K.

If I want to compare A13 to 9900K we are missing IPC data on both.

So we are just missing a lot of information here, to be honest.

There are major problems with using SPECint as a surrogate, major problems using SPECfp as a surrogate, but at least we have some smarter people than me saying FLOPC is a better metric to compare different ISAs, which leads me to suspect that SPECfp is possibly a better way to compare two different ISAs.

IPC comparison of A12 to 6700K shows IPC difference is 60%.
SPECfp comparison, which could be more valid surrogate when comparing ISAs, shows difference between A13 and 9900K and 3900X is much smaller than 83%.
In any case, we cannot say anything about IPC differences between A13 and 9900K/3900X because we lack the data.

If we sub in 9900K as a surrogate for 6700K in the SPECint tests (since some claim IPC hasn't changed over that timeframe, though I don't know what data this is based on), we get:
IPC A12 over 6700K = 60.9%
SPECint2006/GHz A12 over 9900K = 67.0%
SPECfp2006/GHz A12 over 9900K = 45.9%
But again caveat being that 6700K was running at 3GHz and 9900K is running at 5GHz which creates some issues. And also that 9900K has double L3 cache in ST operations compared to 6700K, and we don't know if LSD microcode update was done on the 6700K in the IPC reviews, but we do know LSD was re-enabled on 9900K, this has an unknown benefit or drawback.

I just don't know how much we can say based on the information we have.

Markfw · Apr 7, 2020

amrnuke said:
But the Anandtech article shows the following, and took the advice of the paper Andrei linked to (since Andrei of course wrote the article), and compared only SPECfp scores for A13 vs 9900K and 3900X, since one cannot, per those authors, use SPECint to compare different ISAs:
9900K @ 5.0 GHz gets 75.15, which is 15.03 pts/GHz
3900X @ 4.6 GHz gets 73.66, which is 16.01 pts/GHz
A13 @ 2.66 GHz gets 52.82, which is 19.86 pts/GHz
Meaning A13 has a 32% lead over 9900K and a 24% lead over 3900X.
Unless I'm reading this wrong.

The 32% and 24% ? I can believe that. But also, the A13 will never scale up to the speeds and number of core that AMD/Intel have.

The A13 was designed as a smartphone CPU, and does it quite well. The Intel/AMD CPUs are for desktops, and they do their job well (to differing degrees of course)

Trying to compare the 2 is insane IMO. Like comparing a ultralight with a jet fighter, they have completely different purposes.

Nothingness · Apr 7, 2020

Markfw said:
The 32% and 24% ? I can believe that.

Because that fits your denial?

But also, the A13 will never scale up to the speeds and number of core that AMD/Intel have.

"Never" is a strong word. But I agree that's quite unlikely.

The A13 was designed as a smartphone CPU, and does it quite well. The Intel/AMD CPUs are for desktops, and they do their job well (to differing degrees of course)

Oh yes Intel and AMD cores only go in desktops. I will ask my company to investigate why we have the same cores from laptops to servers, that must be some alien technology.

Trying to compare the 2 is insane IMO. Like comparing a ultralight with a jet fighter, they have completely different purposes.

Joke aside, you know that most of AMD/Intel cores are almost identical from tablets up to servers? Even ARM did that with A76 going from smartphone to servers.

amrnuke · Apr 7, 2020

Markfw said:
The 32% and 24% ? I can believe that.

I believe both the SPECint score and the SPECfp score, and I believe the A12 IPC benefit over 6700K. Because I believe data.

I don't believe any statement of fact for which there is no evidence. As such I don't believe that A13 has an 83% or 80%+ IPC benefit over 9900K until the data bears it out. (Either show that SPECint2006 results scale with IPC fairly linearly across different chips in different ISAs, or just show the IPC comparison head to head!). It very well may, or perhaps it may not. But making that statement as if it's a fact is done so in very poor judgment on everyone's part here.

Markfw said:
But also, the A13 will never scale up to the speeds and number of core that AMD/Intel have.

This is true for now, I think that is why you won't be seeing ARM in the HEDT and gaming markets any time soon.

I am rooting for Nuvia and Graviton (though Nuvia isn't out, and Graviton has major limitations), for instance, because I think the competition it will generate will be huge. If Nuvia can scale to more than just limited applications like Graviton, it'll be really fun to see what breakthroughs we can achieve, for example on an exascale level, when so many people are racing to the top. This is especially true when considering medical research for instance, where having multiple vendors pushing each other could make it trivial to model protein folding, new drugs, analyze pandemic trends/interactions and confounders, and so on.

Markfw said:
The A13 was designed as a smartphone CPU, and does it quite well. The Intel/AMD CPUs are for desktops, and they do their job well (to differing degrees of course)

Trying to compare the 2 is insane IMO. Like comparing a ultralight with a jet fighter, they have completely different purposes.

Yep, they are two totally different chips for totally different markets.

But I do think it would be trivial for Apple to make a chip powerful enough for a laptop, for instance, with 4 x Lightning and 4 x Thunder. That has never been the problem. The problem is x86, not the chips, but the architecture, and how ingrained it is for laptop, HEDT, gaming, server.

Nothingness · Apr 7, 2020

amrnuke said:
I believe both the SPECint score and the SPECfp score, and I believe the A12 IPC benefit over 6700K. Because I believe data.

Sorry but I don't believe your data, for the many reasons I've been repeating ad nauseam.

I don't believe any statement of fact for which there is no evidence.

But you have no evidence! You pick data from different sources and play with it.

As such I don't believe that A13 has an 83% or 80%+ IPC benefit over 9900K until the data bears it out.

It's no more unbelievable than what you've done.

(Either show that SPECint2006 results scale with IPC fairly linearly across different chips in different ISAs, or just show the IPC comparison head to head!). It very well may, or perhaps it may not. But making that statement as if it's a fact is done so in very poor judgment on everyone's part here.

I guess you wanted to say that results scale with frequency, right? The answer is obviously no, it won't scale but the answer also is that it won't bring the 83% advantage down to the levels you mention.

This is true for now, I think that is why you won't be seeing ARM in the HEDT and gaming markets any time soon.

I am rooting for Nuvia and Graviton (though Nuvia isn't out, and Graviton has major limitations), for instance, because I think the competition it will generate will be huge. If Nuvia can scale to more than just limited applications like Graviton,

I'm speechless and I'm done discussing this any further.

Markfw · Apr 7, 2020

Nothingness said:
Because that fits your denial?

"Never" is a strong word. But I agree that's quite unlikely.

Oh yes Intel and AMD cores only go in desktops. I will ask my company to investigate why we have the same cores from laptops to servers, that must be some alien technology.

Joke aside, you know that most of AMD/Intel cores are almost identical from tablets up to servers? Even ARM did that with A76 going from smartphone to servers.

First, I believe the data Anandtech posted, this crap about denial is trolling.
Second, I said desktops, I meant desktop, laptop, server, hedt etc.. I don't have to name every variant, again with the trolling.

And saying AMD/Intel cores for tablets are the same as servers again is trolling, we all know that even a 3600 cpu is far different from a 7742 EPYC. again with the trolling.

And lastly, I hope you are serious bout not discussing it further, as I hate a troll.

amrnuke · Apr 7, 2020

Nothingness said:
Sorry but I don't believe your data, for the many reasons I've been repeating ad nauseam.

But you have no evidence! You pick data from different sources and play with it.

It's no more unbelievable than what you've done.

I guess you wanted to say that results scale with frequency, right? The answer is obviously no, it won't scale but the answer also is that it won't bring the 83% advantage down to the levels you mention.

I'm speechless and I'm done discussing this any further.

Here's the problem. Y'all are making claims without backing them up. I'm trying to use the sources you, Andrei, and Richie Rich provided to back up your claims, and I haven't found anything that backs up those claims. Remember, I am not the one making claims. I am taking your sources and trying to verify your claims and I'm unable to do it. You shouldn't have to believe my data. My data is the same data that has been given in this conversation. I'm just trying to verify people's arguments. And I'm not finding such verification.

In reality, it is Andrei, markfw, Richie Rich, and you who should be providing the verification, but you are not.

Here are the assertions and what I have found based on the data provided (if any):

Richie Rich: A13 has +83% IPC over 9900K and 3900X
- problem - no IPC data comparing 9900K and 3900X and A13. The A13 has a +83% clock-normalized SPECint2006 score, but SPECint2006 hasn't been proven to correlate very well to IPC
- to correct this - change his signature to be accurate, or provide data to back up his claim, or provide data showing the SPECint2006 correlates nicely to IPC

Andrei: "Apple" has +80% or more IPC over "Intel"
- problem - "Apple" is generic and "Intel" is generic, does he mean that all Apple chips averaged have +80% IPC over all Intel chips? A13 over Intel 3930K? A6 over 9900K? Who knows?!?!? The only IPC data he provided showed A12 has +60% IPC over 6700K. Hence the statement, given the data provided, is wrong. What he should have said was that the A12 has a 60.9% IPC lead over the 6700K, because that's all the data that was presented.
- to correct this - 1) define "Apple" and define "Intel", 2) provide the IPC data showing +80% or more IPC of Apple over Intel

markfw: "Apple does not get 83% more IPC"
- problem - he provides no data to back it up at all, notwithstanding again that "Apple" is generic and he provides no comparator to the generic "Apple". More IPC than what?
- to correct this - 1) define "Apple" and define the comparator, 2) give us the data disproving that it has an 83% IPC benefit over... whatever it is that he wants to compare it to

You: "FWIW I made some x86-64 vs AArch64 instruction measurements some years ago. AArch64 is very competitive both in terms of number of instruction and in terms of total instruction size (that was to assess instruction density)."
- problem - you provided no data to back it up. Not wrong, but incomplete. Well, maybe wrong. We just don't know because you haven't provided any data to verify your claim! To quote Christopher Hitchens, "That which is asserted without evidence can be dismissed without evidence." You can tell me that the lizard king is the one true ruler of the world. I don't care. Show some data, some work, some proof.
- to correct this - provide the data, or at least be more granular about the results. "Very competitive" doesn't mean much.

You: "Anyway if the number of executed instructions is within 10% between x86-64 and AArch64"
- problem - we cannot take this as fact, because you have not provided the data.
You: "that 83% figure would only be wrong by as much."
- problem - this is reasonable speculation, but since it is founded on 1) no data and 2) further speculation we therefore cannot claim it as true
- to correct this - 1) Turn your speculation in the first sentence into an evidence-based statement by providing the data, or at least a granular result that we can verify, and 2) do the work to prove the speculation in your second sentence, because a 10% difference between x86-64 and AArch64 on the chips you ran the numbers on may not be the same as it is on an A13 vs 9900K.

You: When confronted with incomplete data, you told me: "Why do you want 6700K? 9900K is the better uarch no?"
- problem - As I mentioned, I would like to compare apples to apples. We only have IPC data on A12 and 6700K. We only have SPECint2006 on 9900K and A12. To make this comparison we would need either SPECint2006 run by AT on a 6700K, or IPC numbers for the 9900K. We have neither of those.
How does doing what you propose help us with our data quality?
- to correct this: don't tell me to throw an orange into the apple bin and try to compare the sweetness

And as for me, I KNOW I have made poor conclusions based on my lack of knowledge of the subject, and limited amount of data to work with. But I have tried to limit making claims out of thin air without at least providing a rationale and the information I used to arrive at that claim. To the extent I haven't provided such rationale and information, I am of course no better or worse than the people I am criticizing above.

What I feel we must all do is realize that we have some data, and none of that data backs up any of the assertions made above made by you, Andrei, markfw, or Richie Rich. What I would have hoped for is that when people made the claims above, they would have at least had some evidence to back it up, or shown some work to back it up. But no one has provided such data. And until the data is provided, all of those statements are presented without evidence and I don't see why we should take any of them as factual/true.

Markfw · Apr 7, 2020

amrnuke said:
markfw: "Apple does not get 83% more IPC"
- problem - he provides no data to back it up at all, notwithstanding again that "Apple" is generic and he provides no comparator to the generic "Apple". More IPC than what?
- to correct this - 1) define "Apple" and define the comparator, 2) give us the data disproving that it has an 83% IPC benefit over... whatever it is that he wants to compare it to

I will simply say I retract that statement, and stick by post 286. This is what I meant when I said the above.

I think that in that post I am agreeing with Anandtech and you !

Andrei. · Apr 7, 2020

Markfw said:
First, I believe the data Anandtech posted, this crap about denial is trolling.
Second, I said desktops, I meant desktop, laptop, server, hedt etc.. I don't have to name every variant, again with the trolling.

You should be ashamed to call yourself a moderator here. You're calling a troll one of the actual people designing these chips - it's utter insanity.

Edit: To moderators: Then ban me already. The fact that a moderator is openly trolling here in this forum while adorning a big yellow Super Moderator tag yet somehow claiming he's not posting a moderator (what a super convenient rule) is absurd. Get your act together.

And also stop claiming you "warned me before" when you do stealth edits on previous posts with no notifications whatever. I don't randomly go check old posts. Wonder how long before you notice my edit here.

We noticed your edit hours ago. Just discussing the repercussions.
Administrator allisolm

Markfw said:
And saying AMD/Intel cores for tablets are the same as servers again is trolling, we all know that even a 3600 cpu is far different from a 7742 EPYC. again with the trolling.

AMD and Intel are using the exact same microarchitecture across their product lines. The 3600 CPU core to a 7742 IS IDENTICAL. IT'S EVEN THE EXACT SAME SILICON DIE. Do you realise how you sound here when you're spouting such utterly incorrect nonsense?

You're the one trolling here out of sheer idiocy. I don't even know who I would report to you to at this point - just utter and complete shame on you.

amrnuke said:
Andrei: "Apple" has +80% or more IPC over "Intel"
- problem - "Apple" is generic and "Intel" is generic, does he mean that all Apple chips averaged have +80% IPC over all Intel chips? A13 over Intel 3930K? A6 over 9900K? Who knows?!?!? The only IPC data he provided showed A12 has +60% IPC over 6700K. Hence the statement, given the data provided, is wrong. What he should have said was that the A12 has a 60.9% IPC lead over the 6700K, because that's all the data that was presented.
- to correct this - 1) define "Apple" and define "Intel", 2) provide the IPC data showing +80% or more IPC of Apple over Intel

I've already stated that architectural instructions retired between x86 and AArch64 is within 10%. The data I've published on the chips has been out for months* and the A13 has a 83% PPC lead over the 9900K. That 83% figure at most in the worst case disparity between retired instruction count between the ISAs goes down to 75%. Your whole circus here is arguing about whether Apple is 83 or 75% ahead. It's an utterly and completely meaningless discussion with absolutely no point to the competitive positioning of the micro-architectures in the industry and what this whole thread was started about.

* Please stop pulling numbers out of random places. Your 6700K SPEC figure is crap. I actually bothered to run the figures across the same compilers with the same flags on all the platforms. There's a freaking article on the homepage right now with the latest figures: https://images.anandtech.com/doci/15603/SPEC-2006.png

For the love of god stop the incessant bickering and idiotic comments and denial and trolling. All of what you're all achieving is driving the actual people who have knowledge and able to give some insights on the topic away from the site in sheer disgust.

Markfw · Apr 7, 2020

Andrei. said:
AMD and Intel are using the exact same microarchitecture across their product lines. The 3600 CPU core to a 7742 IS IDENTICAL. IT'S EVEN THE EXACT SAME SILICON DIE. Do you realise how you sound here when you're spouting such utterly incorrect nonsense?

The chiplets are the same, the IO die and other parts are totally different. 2 channel memory vs 8 ? ECC vs non-ecc memory ? Locked vs unlocked ? Other difference in the IO die ? The sockets are different as well.

And those facts are nonsense ?

I didn't even include binning, since that does not mean the chips are different in their silicon makeup. And I know technically all Ryzen chips support ECC, but most motherboards do not officially support it. And I didn't even get into the APU chips......

Andrei. · Apr 7, 2020

Markfw said:
The chiplets are the same, the IO die and other parts are totally different. 2 channel memory vs 8 ? ECC vs non-ecc memory ? Locked vs unlocked ? Other difference in the IO die ? The sockets are different as well.

And those facts are nonsense ?

NO JUST NO. The EPYC I/O die design is literally just a quadrupled desktop I/O die - AMD even officially stated this and touted this as a design advantage. You can literally see it in the die shot. I even covered this in articles: https://www.anandtech.com/show/1504...60x-and-3970x-review-24-and-32-cores-on-7nm/3

You can actually run EPYC in an NPS4 configuration - and AMD even recommends to do this for some situations - where the given quadrant only has access to its 2 memory channels. ECC vs non-ecc? You can run ECC on desktop and you can run your RGB memory in your server.

Empty arguments over empty arguments that keep on rolling and have utterly no substance to the topic and reality that CPU core microarchitectures from a mobile phone to a server can be - and ARE - the same. Just stop it.

Hitman928 · Apr 7, 2020

Andrei. said:
* Please stop pulling numbers out of random places. Your 6700K SPEC figure is crap. I actually bothered to run the figures across the same compilers with the same flags on all the platforms. There's a freaking article on the homepage right now with the latest figures: https://images.anandtech.com/doci/15603/SPEC-2006.png

Just for clarification (I didn't see it in the article but it was probably mentioned in a previous one), which compiler was used for the Apple CPUs and for the Android ones?

Andrei. · Apr 7, 2020

Hitman928 said:
Just for clarification (I didn't see it in the article but it was probably mentioned in a previous one), which compiler was used for the Apple CPUs and for the Android ones?

They're all using Clang/LLVM of similar versions (and please do not now tell me because the subversions aren't the same it's not valid).

Again, all documented over the various articles over time: https://images.anandtech.com/doci/15603/SPEC-April2020.png

Gideon · Apr 7, 2020

Markfw said:
The chiplets are the same, the IO die and other parts are totally different. 2 channel memory vs 8 ? ECC vs non-ecc memory ? Locked vs unlocked ? Other difference in the IO die ? The sockets are different as well.

And those facts are nonsense ?

I didn't even include binning, since that does not mean the chips are different in their silicon makeup. And I know technically all Ryzen chips support ECC, but most motherboards do not officially support it. And I didn't even get into the APU chips......

The fact of the matter is that it's an order of magnitude easier to improve Apple cores, cache layout and I/O to be a decent server chip, than it is to extract 70+% more IPC out of a x86 core. Heck, Graviton proves it for A76. Damn good for a first try. And bare in midn, all of Graviton building blocks are ARM defaults (L3, I/O stuff, other "glue") not Amazon's secret sauce. All of it is licensable to everyone, including Apple. Now obviously Apple will not go
into servers due to other reason that is a bit out of the scope of this topic (though probably a big driver for the exodus of lead engineers to Nuvia) but speaking only of technical limits, it's not anywhere as hard as people here a claiming.

Besides, people have had this argument before. Remember when Athlon 64 was murdering Pentium on desktop but Intel released Centrino, a laptop processor that had considerably higher IPC than both but much lower clocks?

A lot of people (even here) were claiming that this can never scale to desktop. Then the successor, Core 2 happened (and it also went to servers almost immediately)

Hitman928 · Apr 7, 2020

Andrei. said:
They're all using Clang/LLVM of similar versions (and please do not now tell me because the subversions aren't the same it's not valid).

Again, all documented over the various articles over time: https://images.anandtech.com/doci/15603/SPEC-April2020.png

I assumed they were mentioned I just didn't see it in the body of the article. So it's Apple LLVM for Apple CPUs, ARM LLVM for Android CPUs and straight LLVM for x86, thanks.

name99 · Apr 7, 2020

Schmide said:
I have a question. Is the A13 provisioning operations to the AMX blocks or is it strict ARM instructions?

I've heard of erroneous 462.libquantum results where the compiler pushed operations to all cores rather than do an appropriate peak.

https://www.realworldtech.com/forum/?threadid=80010&curpostid=80013

and

https://clang.llvm.org/docs/CommandGuide/clang.html

No publicly available compiler sends instructions to AMX. That includes XCode.
It is possible that internal Apple libraries (not used by SPEC code) like Accelerate or the ML libraries use AMX but there's no good evidence for that (performance or people looking at the binaries).
So far the way to bet is that AMX is not hooked up to anything outside Apple, and that there will be a big reveal (both compiler-side and in libraries) at WWDC.

SPEC allows auto-parallelization of any code as long as the compiler does it without developer help. This is reasonable insofar as you want a compiler to do whatever it can automatically (eg autovectorization). What is problematic is the vendor compilers that put massive effort into ways to auto-parallelize code that looks exactly like SPEC -- and no a damn thing else.
THAT is why people are unimpressed with vendor compiler results for SPEC; it's not that the results are "wrong" so much as that they are utterly uninformative. They don't tell you much about the capabilities of a core (if some of the code has been run across multiple cores) and they don't tell you much about the compiler (if the only code that will get such auto treatment is code that looks EXACTLY like SPEC).

Discussion AWS Graviton2 64 vCPU Arm CPU Heightens War of Intel Betrayal

Senior member

Golden Member

Diamond Member

Senior member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Moderator Emeritus, Elite Member

Diamond Member

Golden Member

Diamond Member

Moderator Emeritus, Elite Member

Golden Member

Moderator Emeritus, Elite Member

Senior member

Moderator Emeritus, Elite Member

Senior member

Diamond Member

Senior member

Platinum Member

Diamond Member

Senior member