Discussion AWS Graviton2 64 vCPU Arm CPU Heightens War of Intel Betrayal

Page 12 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
Sorry I'm not allowed to share it. You'll have to wait for @Andrei. data.

Someone posted some similar comparison but only for a single input for SPEC 2000 176.gcc here:
Code:
i386___ instructions=6896517827 size=20761288034 3.01 bytes/instr
x86-64_ instructions=7445258067 size=25713146061 3.45 bytes/instr
aarch64 instructions=6691899327 size=26767597308 4.00 bytes/instr
arm____ instructions=7721825824 size=30796498138 3.99 bytes/instr
thumb__ instructions=7877291651 size=23736719388 3.01 bytes/instr

It's so sad SPEC doesn't put in public domain their benchmarks once they are retired. I guess it would be a time consuming task to ensure the copyright holders agree with that.


Many conflate IPC with score/frequency, I sometimes do that myself even though I agree it should be avoided.

Anyway if the number of executed instructions is within 10% between x86-64 and AArch64 , that 83% figure would only be wrong by as much.
And yet markfw said 83% is wrong, not that it wasn't close. I think he, Richie Rich and Andrei are all talking about something about which we do NOT have the data to make a conclusion. That's my overall point. No data, no good surrogates as best I can tell, and here we are chastising people over it.

Any comparison with icc (or AOCC) on SPEC is pointless. Don't do that please.
It was the easiest data I found :( I didn't realize it was completely pointless, but I see the fault since optimized code will artificially inflate scores.

In general avoid getting results from different sources when you can get results from a single one. It is a waste of time, really.
Sadly I do not see where anyone has compared a 6700K to an A12 from the same source. The problem when we can't get results from a single source is that we are left with poorer quality data, though I don't think it's completely meaningless.

So I guess, unless someone wants to produce data that suggest we can use SPECint2006 scores as a reasonable surrogate for IPC, rather than having substantial differences in results, then absolutely let's do it!

PS: You talk about "average" above. You know that SPEC uses geometric mean, right? Just checking :)
I have heard that before, and completely forgot about it when making my calculation. :oops:
Thank you!

EDIT: There was an article by David Kanter in Microprocessor Report about icc. I'm afraid it's behind a paywall but I give the link anyway: https://www.linleygroup.com/mpr/article.php?id=11708
I know Intel and AMD's optimized code is absurd. So we should stick with gcc/llvm which I will try to ensure in the future :)


Also, I actually went back and looked at Richie's signature... all I could think is "what the fork"? Despite experts saying we shouldn't use SPECint to compare ISAs, that's what he did.
1. Intel Core i9 9900K @5GHz ......... SPECint2006 score: 54.28 ...... 10.86 pts/GHz
2. Apple A13 @2.65 GHz .................. SPECint2006 score: 52.82 ...... 19.93 pts/GHz ...... +83 % IPC over 9900K
3. AMD Ryzen 3950X @4.6 GHz ...... SPECint2006 score:50.02 ...... 10.87 pts/GHz ...... + 0% IPC over 9900K .... fastest clocked Ryzen beaten by iPhone CPU
4. ARM Cortex A77@2.84 GHz ......... SPECint2006 score: 33.32 ...... 11.73 pts/GHz ...... + 8% IPC over 9900K
But the Anandtech article shows the following, and took the advice of the paper Andrei linked to (since Andrei of course wrote the article), and compared only SPECfp scores for A13 vs 9900K and 3900X, since one cannot, per those authors, use SPECint to compare different ISAs:
9900K @ 5.0 GHz gets 75.15, which is 15.03 pts/GHz
3900X @ 4.6 GHz gets 73.66, which is 16.01 pts/GHz
A13 @ 2.66 GHz gets 52.82, which is 19.86 pts/GHz
Meaning A13 has a 32% lead over 9900K and a 24% lead over 3900X.
Unless I'm reading this wrong.
 
Last edited:
  • Like
Reactions: lightmanek

Nothingness

Diamond Member
Jul 3, 2013
3,063
2,047
136
And yet markfw said 83% is wrong, not that it wasn't close. I think he, Richie Rich
These 2 guys are at two opposed extremes. So I take what they say with a huge grain of salt. In fact I take with a grain salt what anyone writes including myself :D

It was the easiest data I found :( I didn't realize it was completely pointless, but I see the fault since optimized code will artificially inflate scores.
The data for 9900K has been in @Andrei. results for months. Why do you want 6700K? 9900K is the better uarch no?

Sadly I do not see where anyone has compared a 6700K to an A12 from the same source. The problem when we can't get results from a single source is that we are left with poorer quality data, though I don't think it's completely meaningless. But if we assume (as many people have stated) that IPC has not changed from 6700K to 9900K, then we can use the 6700K IPC results as a surrogate for 9900K, and use Anandtech's own SPECint2006 score:
A12 geomean - 45.32 / 2.5 GHz = 18.128
9900K geomean - 75.15 / 5 GHz = 15.03
18.128 / 15.03 = 20.6% lead for A12.
Why do you pick FP? Why A12?
9900K score for SPECint is 54.28 so 10.86/GHz and for A13 it is 52.82 so 19.93/Ghz. So you get +83% score/GHz.

Now there is a big gotcha here: the frequency difference is way too large to be ignored, and the 9900K is at a big disadvantage here.

But the Anandtech article shows the following, and took the advice of the paper Andrei linked to, and compared only SPECfp scores, since one cannot, per those authors, use SPECint to compare different ISAs:
What paper are talking about? As you present it, it is an utterly stupid statement. Both int and fp give interesting data, but using fp can create a distortion where wider vectors and more numerous FP units can give a disproportionate advantage that doesn't translate to many apps. And it's obviously not only a matter of ISA, but a matter of a particular microarch.

9900K @ 5.0 GHz gets 75.15, which is 15.03 pts/GHz
3900X @ 4.6 GHz gets 73.66, which is 16.01 pts/GHz
A13 @ 2.66 GHz gets 52.82, which is 19.86 pts/GHz
Meaning A13 has only a 32% lead over 9900K and a 24% lead over 3900X.
Unless I'm reading this wrong.
Your computation is correct.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
These 2 guys are at two opposed extremes. So I take what they say with a huge grain of salt. In fact I take with a grain salt what anyone writes including myself :D
Fair points!

The data for 9900K has been in @Andrei. results for months. Why do you want 6700K? 9900K is the better uarch no?
Well, I want 6700K because the IPC data is only for 6700K. I have no true IPC data available for 9900K, so to compare:
IPC: A12 vs 6700K
SPECint2006: A12 vs 9900K
Seems silly
Especially since the IPC data was with iso-clock at 3GHz but 9900K SPECint2006 was run at 5GHz.

Why do you pick FP? Why A12?
9900K score for SPECint is 54.28 so 10.86/GHz and for A13 it is 52.82 so 19.93/Ghz. So you get +83% score/GHz.

Now there is a big gotcha here: the frequency difference is way too large to be ignored, and the 9900K is at a big disadvantage here.

What paper are talking about? As you present it, it is an utterly stupid statement. Both int and fp give interesting data, but using fp can create a distortion where wider vectors and more numerous FP units can give a disproportionate advantage that doesn't translate to many apps. And it's obviously not only a matter of ISA, but a matter of a particular microarch.

* I modified my post after posting, because I didn't include the data I intended to.

The authors of this paper linked by Andrei state in section 5.3: "Obviously, IPC cannot be used as a performance metric since two different ISAs are being evaluated. Instead, FLOPC was accompanied to B in step ① of Figure 4. The number of FP operations is an inherent attribute of the application." So that is why I compared SPECfp instead of SPECint, and I assume that's why Andrei only compared SPECfp scores in his article here on Anandtech, rather than comparing SPECint scores. But I'll let him speak for himself as to why he made that decision.

As for why A12, I chose it because we have no IPC data for A13. I wanted to compare apples to apples.
A12 IPC
6700K IPC
A12 SPECint
6700K SPECint

However, if I want to compare A12 to 6700K we lack SPECint done by same source as A12.

If I want to compare A12 to 9900K to see whether SPECint is a good surrogate, we are missing raw IPC data on 9900K.

If I want to compare A13 to 9900K we are missing IPC data on both.

So we are just missing a lot of information here, to be honest.

There are major problems with using SPECint as a surrogate, major problems using SPECfp as a surrogate, but at least we have some smarter people than me saying FLOPC is a better metric to compare different ISAs, which leads me to suspect that SPECfp is possibly a better way to compare two different ISAs.



IPC comparison of A12 to 6700K shows IPC difference is 60%.
SPECfp comparison, which could be more valid surrogate when comparing ISAs, shows difference between A13 and 9900K and 3900X is much smaller than 83%.
In any case, we cannot say anything about IPC differences between A13 and 9900K/3900X because we lack the data.

If we sub in 9900K as a surrogate for 6700K in the SPECint tests (since some claim IPC hasn't changed over that timeframe, though I don't know what data this is based on), we get:
IPC A12 over 6700K = 60.9%
SPECint2006/GHz A12 over 9900K = 67.0%
SPECfp2006/GHz A12 over 9900K = 45.9%
But again caveat being that 6700K was running at 3GHz and 9900K is running at 5GHz which creates some issues. And also that 9900K has double L3 cache in ST operations compared to 6700K, and we don't know if LSD microcode update was done on the 6700K in the IPC reviews, but we do know LSD was re-enabled on 9900K, this has an unknown benefit or drawback.

I just don't know how much we can say based on the information we have.
 

Nothingness

Diamond Member
Jul 3, 2013
3,063
2,047
136
The 32% and 24% ? I can believe that.
Because that fits your denial?

But also, the A13 will never scale up to the speeds and number of core that AMD/Intel have.
"Never" is a strong word. But I agree that's quite unlikely.

The A13 was designed as a smartphone CPU, and does it quite well. The Intel/AMD CPUs are for desktops, and they do their job well (to differing degrees of course)
Oh yes Intel and AMD cores only go in desktops. I will ask my company to investigate why we have the same cores from laptops to servers, that must be some alien technology.

Trying to compare the 2 is insane IMO. Like comparing a ultralight with a jet fighter, they have completely different purposes.
Joke aside, you know that most of AMD/Intel cores are almost identical from tablets up to servers? Even ARM did that with A76 going from smartphone to servers.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
The 32% and 24% ? I can believe that.
I believe both the SPECint score and the SPECfp score, and I believe the A12 IPC benefit over 6700K. Because I believe data.

I don't believe any statement of fact for which there is no evidence. As such I don't believe that A13 has an 83% or 80%+ IPC benefit over 9900K until the data bears it out. (Either show that SPECint2006 results scale with IPC fairly linearly across different chips in different ISAs, or just show the IPC comparison head to head!). It very well may, or perhaps it may not. But making that statement as if it's a fact is done so in very poor judgment on everyone's part here.

But also, the A13 will never scale up to the speeds and number of core that AMD/Intel have.
This is true for now, I think that is why you won't be seeing ARM in the HEDT and gaming markets any time soon.

I am rooting for Nuvia and Graviton (though Nuvia isn't out, and Graviton has major limitations), for instance, because I think the competition it will generate will be huge. If Nuvia can scale to more than just limited applications like Graviton, it'll be really fun to see what breakthroughs we can achieve, for example on an exascale level, when so many people are racing to the top. This is especially true when considering medical research for instance, where having multiple vendors pushing each other could make it trivial to model protein folding, new drugs, analyze pandemic trends/interactions and confounders, and so on.

The A13 was designed as a smartphone CPU, and does it quite well. The Intel/AMD CPUs are for desktops, and they do their job well (to differing degrees of course)

Trying to compare the 2 is insane IMO. Like comparing a ultralight with a jet fighter, they have completely different purposes.
Yep, they are two totally different chips for totally different markets.

But I do think it would be trivial for Apple to make a chip powerful enough for a laptop, for instance, with 4 x Lightning and 4 x Thunder. That has never been the problem. The problem is x86, not the chips, but the architecture, and how ingrained it is for laptop, HEDT, gaming, server.
 

Nothingness

Diamond Member
Jul 3, 2013
3,063
2,047
136
I believe both the SPECint score and the SPECfp score, and I believe the A12 IPC benefit over 6700K. Because I believe data.
Sorry but I don't believe your data, for the many reasons I've been repeating ad nauseam.

I don't believe any statement of fact for which there is no evidence.
But you have no evidence! You pick data from different sources and play with it.

As such I don't believe that A13 has an 83% or 80%+ IPC benefit over 9900K until the data bears it out.
It's no more unbelievable than what you've done.

(Either show that SPECint2006 results scale with IPC fairly linearly across different chips in different ISAs, or just show the IPC comparison head to head!). It very well may, or perhaps it may not. But making that statement as if it's a fact is done so in very poor judgment on everyone's part here.
I guess you wanted to say that results scale with frequency, right? The answer is obviously no, it won't scale but the answer also is that it won't bring the 83% advantage down to the levels you mention.

This is true for now, I think that is why you won't be seeing ARM in the HEDT and gaming markets any time soon.

I am rooting for Nuvia and Graviton (though Nuvia isn't out, and Graviton has major limitations), for instance, because I think the competition it will generate will be huge. If Nuvia can scale to more than just limited applications like Graviton,
I'm speechless and I'm done discussing this any further.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,129
15,274
136
markfw: "Apple does not get 83% more IPC"
- problem - he provides no data to back it up at all, notwithstanding again that "Apple" is generic and he provides no comparator to the generic "Apple". More IPC than what?
- to correct this - 1) define "Apple" and define the comparator, 2) give us the data disproving that it has an 83% IPC benefit over... whatever it is that he wants to compare it to
I will simply say I retract that statement, and stick by post 286. This is what I meant when I said the above.

I think that in that post I am agreeing with Anandtech and you !
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,129
15,274
136
AMD and Intel are using the exact same microarchitecture across their product lines. The 3600 CPU core to a 7742 IS IDENTICAL. IT'S EVEN THE EXACT SAME SILICON DIE. Do you realise how you sound here when you're spouting such utterly incorrect nonsense?
The chiplets are the same, the IO die and other parts are totally different. 2 channel memory vs 8 ? ECC vs non-ecc memory ? Locked vs unlocked ? Other difference in the IO die ? The sockets are different as well.

And those facts are nonsense ?

I didn't even include binning, since that does not mean the chips are different in their silicon makeup. And I know technically all Ryzen chips support ECC, but most motherboards do not officially support it. And I didn't even get into the APU chips......
 
Last edited:

Andrei.

Senior member
Jan 26, 2015
316
386
136
The chiplets are the same, the IO die and other parts are totally different. 2 channel memory vs 8 ? ECC vs non-ecc memory ? Locked vs unlocked ? Other difference in the IO die ? The sockets are different as well.

And those facts are nonsense ?
NO JUST NO. The EPYC I/O die design is literally just a quadrupled desktop I/O die - AMD even officially stated this and touted this as a design advantage. You can literally see it in the die shot. I even covered this in articles: https://www.anandtech.com/show/1504...60x-and-3970x-review-24-and-32-cores-on-7nm/3

You can actually run EPYC in an NPS4 configuration - and AMD even recommends to do this for some situations - where the given quadrant only has access to its 2 memory channels. ECC vs non-ecc? You can run ECC on desktop and you can run your RGB memory in your server.

Empty arguments over empty arguments that keep on rolling and have utterly no substance to the topic and reality that CPU core microarchitectures from a mobile phone to a server can be - and ARE - the same. Just stop it.
 

Hitman928

Diamond Member
Apr 15, 2012
6,123
10,527
136
* Please stop pulling numbers out of random places. Your 6700K SPEC figure is crap. I actually bothered to run the figures across the same compilers with the same flags on all the platforms. There's a freaking article on the homepage right now with the latest figures: https://images.anandtech.com/doci/15603/SPEC-2006.png

Just for clarification (I didn't see it in the article but it was probably mentioned in a previous one), which compiler was used for the Apple CPUs and for the Android ones?
 

Andrei.

Senior member
Jan 26, 2015
316
386
136
Just for clarification (I didn't see it in the article but it was probably mentioned in a previous one), which compiler was used for the Apple CPUs and for the Android ones?
They're all using Clang/LLVM of similar versions (and please do not now tell me because the subversions aren't the same it's not valid).

Again, all documented over the various articles over time: https://images.anandtech.com/doci/15603/SPEC-April2020.png
 
  • Like
Reactions: Etain05

Gideon

Golden Member
Nov 27, 2007
1,769
4,126
136
The chiplets are the same, the IO die and other parts are totally different. 2 channel memory vs 8 ? ECC vs non-ecc memory ? Locked vs unlocked ? Other difference in the IO die ? The sockets are different as well.

And those facts are nonsense ?

I didn't even include binning, since that does not mean the chips are different in their silicon makeup. And I know technically all Ryzen chips support ECC, but most motherboards do not officially support it. And I didn't even get into the APU chips......
The fact of the matter is that it's an order of magnitude easier to improve Apple cores, cache layout and I/O to be a decent server chip, than it is to extract 70+% more IPC out of a x86 core. Heck, Graviton proves it for A76. Damn good for a first try. And bare in midn, all of Graviton building blocks are ARM defaults (L3, I/O stuff, other "glue") not Amazon's secret sauce. All of it is licensable to everyone, including Apple. Now obviously Apple will not go
into servers due to other reason that is a bit out of the scope of this topic (though probably a big driver for the exodus of lead engineers to Nuvia) but speaking only of technical limits, it's not anywhere as hard as people here a claiming.

Besides, people have had this argument before. Remember when Athlon 64 was murdering Pentium on desktop but Intel released Centrino, a laptop processor that had considerably higher IPC than both but much lower clocks?

A lot of people (even here) were claiming that this can never scale to desktop. Then the successor, Core 2 happened (and it also went to servers almost immediately)
 

Hitman928

Diamond Member
Apr 15, 2012
6,123
10,527
136
They're all using Clang/LLVM of similar versions (and please do not now tell me because the subversions aren't the same it's not valid).

Again, all documented over the various articles over time: https://images.anandtech.com/doci/15603/SPEC-April2020.png

I assumed they were mentioned I just didn't see it in the body of the article. So it's Apple LLVM for Apple CPUs, ARM LLVM for Android CPUs and straight LLVM for x86, thanks.
 
  • Like
Reactions: Tlh97

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
I've already stated that architectural instructions retired between x86 and AArch64 is within 10%.
You've stated it. But you have not shown any evidence or data to prove it. You are not above having to provide evidence to back up claims.

The data I've published on the chips has been out for months and the A13 has a 83% PPC lead over the 9900K. That 83% figure at most in the worst case disparity between retired instruction count between the ISAs goes down to 75%.
You have not proven that x86 and AArch64 retired instructions is within 10%, as above. If you were to show that it is 8%, then we can adjust IPC results for that. But not the SPEC scores, why would you adjust spec scores for differences in retired instructions if we don't even know if SPEC scores scale with IPC?

But, we cannot conflate IPC with SPECint2006/GHz scores without first proving that IPC correlates to SPECint2006 scores, and even then, intellectually, I am not sure it's an honest thing to do without first verifying that it produces a valid result.

Your whole circus here is arguing about whether Apple is 83 or 75% ahead.
No it's not. It's about the lack of intellectual honesty at this point. People, including you, are throwing claims around without backing them up and it's driving me crazy trying to figure out how people are coming to the conclusions they are. I don't really care whether "Apple" (whatever that means), is ahead by 75%, 0%, or 500%. I just want some proof of what people are claiming.

It's an utterly and completely meaningless discussion with absolutely no point to the competitive positioning of the micro-architectures in the industry and what this whole thread was started about.
This thread was started to talk about Graviton2 and how it competes with x86, hence the article containing an entire section talking about how how this is an x86 bloodbath. Not my words. Now we are left to talk about ISAs and uarchs. And a key component of that is trying to sort out exactly what benefits each architecture might have, and here we are with claims about "Apple" having +80% IPC benefit over "Intel", A13 having +83% IPC over 9900K, and so on, with no evidence to back it up. I think that information is very pertinent to the future discussion of Graviton2 as it stands in competition with Zen2, Zen3 chiplets and Xeon Platinum, as well as the upcoming Nuvia release.

For the love of god stop the incessant bickering and idiotic comments and denial and trolling.
This is not trolling. I am asking you to provide your proof that SPEC = IPC, that Apple has an 80% IPC lead over Intel. That is all.

All of what you're all achieving is driving the actual people who have knowledge and able to give some insights on the topic away from the site in sheer disgust.
You have a lot of knowledge. I want to know how you know that Apple has an 80% IPC advantage. You made a claim, I am asking for the proof. That is all. I thanked you for the IPC data on A12. I thanked you for the link to the IPC data on the 6700K. And asked further questions so we could have a good discussion. And you have responded by calling me a troll, calling my argument "stupid", and you still haven't even provided the proof of your claim.

It sucks to see that kind of behavior coming from someone with clear knowledge on the subject, someone who writes articles for this very site that I value so much.

I appreciate your contributions, but at the same time, I really feel taken aback at how you have responded to this very easy question to answer.
 

name99

Senior member
Sep 11, 2010
496
382
136
but we shouldn't use it as a metric for ARM cores since it is akin to an HSA operation.

EDIT: moreover it gives a false representation of the actual scaling any additional core would produce.

You do understand that ARMv8.6 will include matrix math operations, right? For all we know (it's all very unclear) the AMX ops are EXACTLY the ARMv8.6 ops. Or similar enough that apps submitted as bitcode to the App Store will correctly be translated to both.


Claiming that AMX is cheating is as ridiculous as claiming that large caches is cheating.
In the first place AMX is contributing nothing to today's SPEC A13 numbers; in the second place when/if it does contribute to the SPEC numbers, how is that different from AVX512?
 

name99

Senior member
Sep 11, 2010
496
382
136
But my problem with his signature, and everything I'm talking about, is making sure that he actually knows what he's talking about, because he's calling scaled performance per clock "IPC" when it isn't. Case in point, IPC ranges largely in the single digits. His scores range in the 50s. Why don't I just go back and delete the brief RISC vs CISC discussion so it doesn't distract you? I would be making the same points and asking the same questions about whether we actually have the data that A13 really has 83% IPC gain over 3900X and 9900K.

It is a well-known term of the art (among people who know WTF they are talking about) to treat IPC and "score[whether SPEC or GB]/GHz" as synonymous. They both inform you of the same thing, namely whether a core achieves its performance by reach for frequency or by reaching for smarts of various sorts. The most simple-minded dimensional analysis would make this clear.

If you want to play the childish game of point scoring, assuming you've stated something devastating by claiming that these two are not *technically* the same thing, go right ahead.

But be aware that the world is watching... The kids who understand the real issues (and who have the most interesting conversations) don't have time for people who think the goal of these discussions is to score points; their goal is to understand the CPUs.
Look back at your voluminous output over the past few days. Is there ANYTHING in there that contributes to useful understanding as opposed to point scoring?
 

name99

Senior member
Sep 11, 2010
496
382
136
Why do you pick FP? Why A12?
9900K score for SPECint is 54.28 so 10.86/GHz and for A13 it is 52.82 so 19.93/Ghz. So you get +83% score/GHz.

Now there is a big gotcha here: the frequency difference is way too large to be ignored, and the 9900K is at a big disadvantage here.

Once again the issue is why do you even CARE about these numbers? Is the goal understanding or redacted?

Why do people say that IPC (and IPC equivalents) can't be compared across frequencies? Because the comparison is misleading IF you are trying to use IPC to gauge some aspect of the micro-architecture.
If I want to compare two branch predictors, I want to keep *everything* else identical to see which predictor delivers higher IPC. If I run one core at twice the frequency, now I can't tell if the lower IPC of the faster core is because the branch predictor is not as good, or if it's because the faster core is simply spending more cycles waiting on RAM.

BUT
- that's not what we are doing here AND
- the comparison doesn't go the way you want.

The comparison here is ultimately: what is a better design direction? Speed demon or brainiac? Of course "better" is a flexible word, but we're treating it as some combination of
- smaller core
- lower power
- higher performance (on GB, SPEC, browser, ...)

So what we ACTUALLY have is two cores that get more or less equal results across a wide range of code, one achieving that by
- 5GHz
- much higher power
- core ~twice as large (subject to quibbling about uncore, process, ...),
one achieving that at
- 2.6GHz.

Arguments about "exact" IPC are moronic in this context, demonstrating an utter inability to pick up on what is important, namely that core A achieves essentially the same results as core I through very different means.
So what do you do with that info?

At a business level, it suggests that core A has a bright future ahead of it.
At the DESIGN level, it is interesting to consider the various mechanisms by which core A manages to achieve such a spectacular degree of "work done per cycle".

Saying that core I is hampered by running faster is completely missing the point. Well, duh, OF COURSE core I is hampered by running faster! That's why team A put all their effort into a brainiac design, not a speed demon design. Team I is welcome to go back to the drawing board and run their core at 2.6 or 3 or 3.5GHz.
But there's something insane about simultaneously saying
- of course A can do well because they only have to run at low frequencies; everyone knows that at higher frequencies you spend ever more time waiting on DRAM AND
- therefore what team I should do is reach for ever higher frequencies...

The discussion the adults here are having is not about rah rah team A vs team I. It is about given the realities of power, transistor size (high frequencies means larger transistors and cells), frequency scaling (both transistors and metal) and likely smaller reticles going forward, how much more should future CPUs push on the speed side vs the brainiac side?
You're not helping if your contribution to that is tribal double-speak along the lines of "sure A does really well --- but they're cheating by using large caches [or smarter design or lower frequency or whatever]".
There's no such thing as "cheating". There is design that is more or less fit for the purpose and the future of technology. You're not helping team I by convincing their marketing team to double-down on even higher frequencies in spite of how those have proved a dead end over the past five years!


Profanity is not allowed in the tech forums.

AT Mod Usandthem
 
Last edited by a moderator:

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
It is a well-known term of the art (among people who know WTF they are talking about) to treat IPC and "score[whether SPEC or GB]/GHz" as synonymous. They both inform you of the same thing, namely whether a core achieves its performance by reach for frequency or by reaching for smarts of various sorts. The most simple-minded dimensional analysis would make this clear.

If you want to play the childish game of point scoring, assuming you've stated something devastating by claiming that these two are not *technically* the same thing, go right ahead.

But be aware that the world is watching... The kids who understand the real issues (and who have the most interesting conversations) don't have time for people who think the goal of these discussions is to score points; their goal is to understand the CPUs.
Look back at your voluminous output over the past few days. Is there ANYTHING in there that contributes to useful understanding as opposed to point scoring?
I want to understand CPUs as well. And I really don't care who comes out ahead. I just want honesty about how we're getting this information.

There is a big difference between SAYING score/GHz correlates to IPC, and PROVING it. Unless there is proof, it is nothing better than anecdote.

And for those of us trying to learn it, just telling us to accept it as truth without showing the proof, that's absolutely absurd. As a physician, I have seen it on rounds, I've seen it in lectures, and it drives me insane. You have to have intellectual honesty.

You don't get to decide what the truth is. You get to prove it.

And come on, why call this childish? I and many others want to learn, and so many are being obstructionist. Just show the data (if you even have it) and move on. If it's so trivial, my goodness. Just show it.
 

RetroZombie

Senior member
Nov 5, 2019
464
386
96
Wall and wall of text and some of you guys explain nothing.

I already said this here i say it again and LOUD:
YOU CAN NO LONGER PROPERLY MEASURE IPC ON NEW CPUS IN THE NEW MULTICORE ERA!!!!!

These days even in single core single thread applications multicore cpus share resources like L2/L3 caches in order to improve performance, some do it better than others, some share more than others, this affecting single thread performance tremendously.
Now multi core with multi threading the best cpu given the best throughout and balance better the resources for each cpu core will also be the more efficient in multithreading apps.
Finally trying to measure multicore performance, basing it from the result of one single core/thread test is totally flawed because of all that.

So stop trying to guess what if apple A1x would do if their cpu had 64 cores (or anyone else’s) is next to impossible.
 

Schmide

Diamond Member
Mar 7, 2002
5,595
730
126
You do understand that ARMv8.6 will include matrix math operations, right? For all we know (it's all very unclear) the AMX ops are EXACTLY the ARMv8.6 ops. Or similar enough that apps submitted as bitcode to the App Store will correctly be translated to both.


Claiming that AMX is cheating is as ridiculous as claiming that large caches is cheating.
In the first place AMX is contributing nothing to today's SPEC A13 numbers; in the second place when/if it does contribute to the SPEC numbers, how is that different from AVX512?

The logic is sound. If you have a single accelerator and many cores. Sending data off to a compute device is not measuring the core. Would you allow an AMD APU to send data off to the GPU?

The whole point of the peak values is to measure the core, not what the core can do with a co-processor.

A cache is part of the memory system. These peak values are equally artificially inflated when you measure it on a server class system with 8-channel memory.

SIMD executes on core. In fact on most architectures if you execute a less wide instruction (SSE) the wider lanes still execute (AVX)

The great example of why this should be so. Bulldozer. Cores share a FPU. Would the peak value be relative to parallel execution?
 
  • Like
Reactions: lobz

soresu

Diamond Member
Dec 19, 2014
3,208
2,480
136
Shifting goals again: web applications and SQL services because they scale better with core count? Well, surprise surprise, they also scale better with SMT.

View attachment 19209

SMT4 brings SQL gains up to 80%, web apps up to 40%. And that's according to people who actually build ARM server chips.
Triton (TX3) seems like a pretty impressive core - it would be interesting to see its scores on the regular run of the mill benchmarks like GB4, Blender and such.
 

name99

Senior member
Sep 11, 2010
496
382
136
The logic is sound. If you have a single accelerator and many cores. Sending data off to a compute device is not measuring the core. Would you allow an AMD APU to send data off to the GPU?

The whole point of the peak values is to measure the core, not what the core can do with a co-processor.

A cache is part of the memory system. These peak values are equally artificially inflated when you measure it on a server class system with 8-channel memory.

SIMD executes on core. In fact on most architectures if you execute a less wide instruction (SSE) the wider lanes still execute (AVX)

The great example of why this should be so. Bulldozer. Cores share a FPU. Would the peak value be relative to parallel execution?

Everything that we know about AMX suggests that it is as much a part of the core as AVX512. That's precisely the point of the ARMv8.6A reference.
What makes you believe that AMX is EITHER an off-core accelerator or shared between cores? Apple specifically said they were "on the CPU".

Go to 2:13PM
 

Schmide

Diamond Member
Mar 7, 2002
5,595
730
126
Everything that we know about AMX suggests that it is as much a part of the core as AVX512. That's precisely the point of the ARMv8.6A reference.
What makes you believe that AMX is EITHER an off-core accelerator or shared between cores? Apple specifically said they were "on the CPU".

Go to 2:13PM

https://www.anandtech.com/show/14892/the-apple-iphone-11-pro-and-max-review/2

Shows the A13 as a separate unit. An on board memory controller or cache can be shared by more than one type of compute unit. A core has one bus that loads in and out through the cache hierarchy. Now if the AMX system can transfer data directly register to register. AMX is in the CPU. I'll even accept it if it's on the same L1, maybe even L2 but then we're returning to Bulldozer land where 2 quasi cores with independent L1s share an FPU and L2.

Not the best source but...(bloomberg)

All of the new iPhones will have faster A13 processors. There’s a new component in the chip, known internally as the “AMX” or “matrix” co-processor, to handle some math-heavy tasks, so the main chip doesn’t have to. That may help with computer vision and augmented reality, which Apple is pushing as a core feature of its mobile devices.

What is or isn't a co-processor? I'd say if it shares an instruction pointer and register set. Processor. If it dispatches out of the L1. Yes. The point at where we should probably draw the line is if more than a fixed set of data (simd) is iterated outside the clock domain.

There is a lot of delineation even on the front end of the CPU. Decoders and SMT cores certainly have their own domains to some extent.

I'm not here to distill metrics into a single point that gives an all encompassing number. I know from experience, that metric is fluid, especially in a multi-processing environment.

Attempting to do this from a few points of peak from multiple domains is peaking crazy.
 
  • Like
Reactions: Tlh97 and lobz

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
Everything that we know about AMX suggests that it is as much a part of the core as AVX512. That's precisely the point of the ARMv8.6A reference.
What makes you believe that AMX is EITHER an off-core accelerator or shared between cores? Apple specifically said they were "on the CPU".

Go to 2:13PM
How can you talk about 'kids who understand the real issues', only to go ahead the next minute and equate 'in the core' and 'on the CPU'?
 
Status
Not open for further replies.