Discussion AWS Graviton2 64 vCPU Arm CPU Heightens War of Intel Betrayal

exquisitechar · Dec 3, 2019

https://www.servethehome.com/aws-graviton2-64-core-arm-cpu-heightens-war-of-intel-betrayal/

Pretty big deal for ARM in servers. Interested in seeing a comparison between this and Rome.

amrnuke · Apr 7, 2020

Andrei. said:
I've already stated that architectural instructions retired between x86 and AArch64 is within 10%.

You've stated it. But you have not shown any evidence or data to prove it. You are not above having to provide evidence to back up claims.

Andrei. said:
The data I've published on the chips has been out for months and the A13 has a 83% PPC lead over the 9900K. That 83% figure at most in the worst case disparity between retired instruction count between the ISAs goes down to 75%.

You have not proven that x86 and AArch64 retired instructions is within 10%, as above. If you were to show that it is 8%, then we can adjust IPC results for that. But not the SPEC scores, why would you adjust spec scores for differences in retired instructions if we don't even know if SPEC scores scale with IPC?

But, we cannot conflate IPC with SPECint2006/GHz scores without first proving that IPC correlates to SPECint2006 scores, and even then, intellectually, I am not sure it's an honest thing to do without first verifying that it produces a valid result.

Andrei. said:
Your whole circus here is arguing about whether Apple is 83 or 75% ahead.

No it's not. It's about the lack of intellectual honesty at this point. People, including you, are throwing claims around without backing them up and it's driving me crazy trying to figure out how people are coming to the conclusions they are. I don't really care whether "Apple" (whatever that means), is ahead by 75%, 0%, or 500%. I just want some proof of what people are claiming.

Andrei. said:
It's an utterly and completely meaningless discussion with absolutely no point to the competitive positioning of the micro-architectures in the industry and what this whole thread was started about.

This thread was started to talk about Graviton2 and how it competes with x86, hence the article containing an entire section talking about how how this is an x86 bloodbath. Not my words. Now we are left to talk about ISAs and uarchs. And a key component of that is trying to sort out exactly what benefits each architecture might have, and here we are with claims about "Apple" having +80% IPC benefit over "Intel", A13 having +83% IPC over 9900K, and so on, with no evidence to back it up. I think that information is very pertinent to the future discussion of Graviton2 as it stands in competition with Zen2, Zen3 chiplets and Xeon Platinum, as well as the upcoming Nuvia release.

Andrei. said:
For the love of god stop the incessant bickering and idiotic comments and denial and trolling.

This is not trolling. I am asking you to provide your proof that SPEC = IPC, that Apple has an 80% IPC lead over Intel. That is all.

Andrei. said:
All of what you're all achieving is driving the actual people who have knowledge and able to give some insights on the topic away from the site in sheer disgust.

You have a lot of knowledge. I want to know how you know that Apple has an 80% IPC advantage. You made a claim, I am asking for the proof. That is all. I thanked you for the IPC data on A12. I thanked you for the link to the IPC data on the 6700K. And asked further questions so we could have a good discussion. And you have responded by calling me a troll, calling my argument "stupid", and you still haven't even provided the proof of your claim.

It sucks to see that kind of behavior coming from someone with clear knowledge on the subject, someone who writes articles for this very site that I value so much.

I appreciate your contributions, but at the same time, I really feel taken aback at how you have responded to this very easy question to answer.

name99 · Apr 7, 2020

Schmide said:
but we shouldn't use it as a metric for ARM cores since it is akin to an HSA operation.

EDIT: moreover it gives a false representation of the actual scaling any additional core would produce.

You do understand that ARMv8.6 will include matrix math operations, right? For all we know (it's all very unclear) the AMX ops are EXACTLY the ARMv8.6 ops. Or similar enough that apps submitted as bitcode to the App Store will correctly be translated to both.

ARM architecture - Wikipedia

en.wikipedia.org

Claiming that AMX is cheating is as ridiculous as claiming that large caches is cheating.
In the first place AMX is contributing nothing to today's SPEC A13 numbers; in the second place when/if it does contribute to the SPEC numbers, how is that different from AVX512?

name99 · Apr 7, 2020

amrnuke said:
But my problem with his signature, and everything I'm talking about, is making sure that he actually knows what he's talking about, because he's calling scaled performance per clock "IPC" when it isn't. Case in point, IPC ranges largely in the single digits. His scores range in the 50s. Why don't I just go back and delete the brief RISC vs CISC discussion so it doesn't distract you? I would be making the same points and asking the same questions about whether we actually have the data that A13 really has 83% IPC gain over 3900X and 9900K.

It is a well-known term of the art (among people who know WTF they are talking about) to treat IPC and "score[whether SPEC or GB]/GHz" as synonymous. They both inform you of the same thing, namely whether a core achieves its performance by reach for frequency or by reaching for smarts of various sorts. The most simple-minded dimensional analysis would make this clear.

If you want to play the childish game of point scoring, assuming you've stated something devastating by claiming that these two are not *technically* the same thing, go right ahead.

But be aware that the world is watching... The kids who understand the real issues (and who have the most interesting conversations) don't have time for people who think the goal of these discussions is to score points; their goal is to understand the CPUs.
Look back at your voluminous output over the past few days. Is there ANYTHING in there that contributes to useful understanding as opposed to point scoring?

name99 · Apr 7, 2020

Nothingness said:
Why do you pick FP? Why A12?
9900K score for SPECint is 54.28 so 10.86/GHz and for A13 it is 52.82 so 19.93/Ghz. So you get +83% score/GHz.

Now there is a big gotcha here: the frequency difference is way too large to be ignored, and the 9900K is at a big disadvantage here.

Once again the issue is why do you even CARE about these numbers? Is the goal understanding or redacted?

Why do people say that IPC (and IPC equivalents) can't be compared across frequencies? Because the comparison is misleading IF you are trying to use IPC to gauge some aspect of the micro-architecture.
If I want to compare two branch predictors, I want to keep *everything* else identical to see which predictor delivers higher IPC. If I run one core at twice the frequency, now I can't tell if the lower IPC of the faster core is because the branch predictor is not as good, or if it's because the faster core is simply spending more cycles waiting on RAM.

BUT
- that's not what we are doing here AND
- the comparison doesn't go the way you want.

The comparison here is ultimately: what is a better design direction? Speed demon or brainiac? Of course "better" is a flexible word, but we're treating it as some combination of
- smaller core
- lower power
- higher performance (on GB, SPEC, browser, ...)

So what we ACTUALLY have is two cores that get more or less equal results across a wide range of code, one achieving that by
- 5GHz
- much higher power
- core ~twice as large (subject to quibbling about uncore, process, ...),
one achieving that at
- 2.6GHz.

Arguments about "exact" IPC are moronic in this context, demonstrating an utter inability to pick up on what is important, namely that core A achieves essentially the same results as core I through very different means.
So what do you do with that info?

At a business level, it suggests that core A has a bright future ahead of it.
At the DESIGN level, it is interesting to consider the various mechanisms by which core A manages to achieve such a spectacular degree of "work done per cycle".

Saying that core I is hampered by running faster is completely missing the point. Well, duh, OF COURSE core I is hampered by running faster! That's why team A put all their effort into a brainiac design, not a speed demon design. Team I is welcome to go back to the drawing board and run their core at 2.6 or 3 or 3.5GHz.
But there's something insane about simultaneously saying
- of course A can do well because they only have to run at low frequencies; everyone knows that at higher frequencies you spend ever more time waiting on DRAM AND
- therefore what team I should do is reach for ever higher frequencies...

The discussion the adults here are having is not about rah rah team A vs team I. It is about given the realities of power, transistor size (high frequencies means larger transistors and cells), frequency scaling (both transistors and metal) and likely smaller reticles going forward, how much more should future CPUs push on the speed side vs the brainiac side?
You're not helping if your contribution to that is tribal double-speak along the lines of "sure A does really well --- but they're cheating by using large caches [or smarter design or lower frequency or whatever]".
There's no such thing as "cheating". There is design that is more or less fit for the purpose and the future of technology. You're not helping team I by convincing their marketing team to double-down on even higher frequencies in spite of how those have proved a dead end over the past five years!

Profanity is not allowed in the tech forums.

AT Mod Usandthem

amrnuke · Apr 7, 2020

name99 said:
It is a well-known term of the art (among people who know WTF they are talking about) to treat IPC and "score[whether SPEC or GB]/GHz" as synonymous. They both inform you of the same thing, namely whether a core achieves its performance by reach for frequency or by reaching for smarts of various sorts. The most simple-minded dimensional analysis would make this clear.

If you want to play the childish game of point scoring, assuming you've stated something devastating by claiming that these two are not *technically* the same thing, go right ahead.

But be aware that the world is watching... The kids who understand the real issues (and who have the most interesting conversations) don't have time for people who think the goal of these discussions is to score points; their goal is to understand the CPUs.
Look back at your voluminous output over the past few days. Is there ANYTHING in there that contributes to useful understanding as opposed to point scoring?

I want to understand CPUs as well. And I really don't care who comes out ahead. I just want honesty about how we're getting this information.

There is a big difference between SAYING score/GHz correlates to IPC, and PROVING it. Unless there is proof, it is nothing better than anecdote.

And for those of us trying to learn it, just telling us to accept it as truth without showing the proof, that's absolutely absurd. As a physician, I have seen it on rounds, I've seen it in lectures, and it drives me insane. You have to have intellectual honesty.

You don't get to decide what the truth is. You get to prove it.

And come on, why call this childish? I and many others want to learn, and so many are being obstructionist. Just show the data (if you even have it) and move on. If it's so trivial, my goodness. Just show it.

RetroZombie · Apr 7, 2020

Wall and wall of text and some of you guys explain nothing.

I already said this here i say it again and LOUD:
YOU CAN NO LONGER PROPERLY MEASURE IPC ON NEW CPUS IN THE NEW MULTICORE ERA!!!!!

These days even in single core single thread applications multicore cpus share resources like L2/L3 caches in order to improve performance, some do it better than others, some share more than others, this affecting single thread performance tremendously.
Now multi core with multi threading the best cpu given the best throughout and balance better the resources for each cpu core will also be the more efficient in multithreading apps.
Finally trying to measure multicore performance, basing it from the result of one single core/thread test is totally flawed because of all that.

So stop trying to guess what if apple A1x would do if their cpu had 64 cores (or anyone else’s) is next to impossible.

Schmide · Apr 7, 2020

name99 said:
You do understand that ARMv8.6 will include matrix math operations, right? For all we know (it's all very unclear) the AMX ops are EXACTLY the ARMv8.6 ops. Or similar enough that apps submitted as bitcode to the App Store will correctly be translated to both.

ARM architecture - Wikipedia

en.wikipedia.org

Claiming that AMX is cheating is as ridiculous as claiming that large caches is cheating.
In the first place AMX is contributing nothing to today's SPEC A13 numbers; in the second place when/if it does contribute to the SPEC numbers, how is that different from AVX512?

The logic is sound. If you have a single accelerator and many cores. Sending data off to a compute device is not measuring the core. Would you allow an AMD APU to send data off to the GPU?

The whole point of the peak values is to measure the core, not what the core can do with a co-processor.

A cache is part of the memory system. These peak values are equally artificially inflated when you measure it on a server class system with 8-channel memory.

SIMD executes on core. In fact on most architectures if you execute a less wide instruction (SSE) the wider lanes still execute (AVX)

The great example of why this should be so. Bulldozer. Cores share a FPU. Would the peak value be relative to parallel execution?

soresu · Apr 7, 2020

coercitiv said:
Shifting goals again: web applications and SQL services because they scale better with core count? Well, surprise surprise, they also scale better with SMT.

View attachment 19209

SMT4 brings SQL gains up to 80%, web apps up to 40%. And that's according to people who actually build ARM server chips.

Triton (TX3) seems like a pretty impressive core - it would be interesting to see its scores on the regular run of the mill benchmarks like GB4, Blender and such.

soresu · Apr 7, 2020

Nothingness said:
Even ARM did that with A76 going from smartphone to servers.

Wut?

Neoverse N1/Ares is BASED on Cortex A76/Enyo, but it is not exactly the same.

The used the same execution logic but the uncore elements in N1 are architected for server/datacenter use.

lobz · Apr 7, 2020

name99 said:
But be aware that the world is watching... The kids who understand the real issues (and who have the most interesting conversations) don't have time for people who think the goal of these discussions is to score points; their goal is to understand the CPUs.

But they do have time for namecalling? Bullcrap. Talk about self-justifying arrogant elitism... I don't think that was meant to be this place's standard. Although, in a time where one of Tom's top editors can call the owner of a well known and actually 100 times more credible and objective tech reviewer (Steve from HardwareUnboxed) a liar and a troll and absolutely no apology or such is made publicly whatsoever, I guess I should't be surprised about anything here either.

name99 · Apr 7, 2020

Schmide said:
The logic is sound. If you have a single accelerator and many cores. Sending data off to a compute device is not measuring the core. Would you allow an AMD APU to send data off to the GPU?

The whole point of the peak values is to measure the core, not what the core can do with a co-processor.

A cache is part of the memory system. These peak values are equally artificially inflated when you measure it on a server class system with 8-channel memory.

SIMD executes on core. In fact on most architectures if you execute a less wide instruction (SSE) the wider lanes still execute (AVX)

The great example of why this should be so. Bulldozer. Cores share a FPU. Would the peak value be relative to parallel execution?

Everything that we know about AMX suggests that it is as much a part of the core as AVX512. That's precisely the point of the ARMv8.6A reference.
What makes you believe that AMX is EITHER an off-core accelerator or shared between cores? Apple specifically said they were "on the CPU".

Go to 2:13PM

The Apple 2019 iPhone Event Live Blog (10am PT)

www.anandtech.com

Schmide · Apr 7, 2020

name99 said:
Everything that we know about AMX suggests that it is as much a part of the core as AVX512. That's precisely the point of the ARMv8.6A reference.
What makes you believe that AMX is EITHER an off-core accelerator or shared between cores? Apple specifically said they were "on the CPU".

Go to 2:13PM

The Apple 2019 iPhone Event Live Blog (10am PT)

www.anandtech.com

https://www.anandtech.com/show/14892/the-apple-iphone-11-pro-and-max-review/2

Shows the A13 as a separate unit. An on board memory controller or cache can be shared by more than one type of compute unit. A core has one bus that loads in and out through the cache hierarchy. Now if the AMX system can transfer data directly register to register. AMX is in the CPU. I'll even accept it if it's on the same L1, maybe even L2 but then we're returning to Bulldozer land where 2 quasi cores with independent L1s share an FPU and L2.

Not the best source but...(bloomberg)

All of the new iPhones will have faster A13 processors. There’s a new component in the chip, known internally as the “AMX” or “matrix” co-processor, to handle some math-heavy tasks, so the main chip doesn’t have to. That may help with computer vision and augmented reality, which Apple is pushing as a core feature of its mobile devices.

What is or isn't a co-processor? I'd say if it shares an instruction pointer and register set. Processor. If it dispatches out of the L1. Yes. The point at where we should probably draw the line is if more than a fixed set of data (simd) is iterated outside the clock domain.

There is a lot of delineation even on the front end of the CPU. Decoders and SMT cores certainly have their own domains to some extent.

I'm not here to distill metrics into a single point that gives an all encompassing number. I know from experience, that metric is fluid, especially in a multi-processing environment.

Attempting to do this from a few points of peak from multiple domains is peaking crazy.

lobz · Apr 7, 2020

name99 said:
Everything that we know about AMX suggests that it is as much a part of the core as AVX512. That's precisely the point of the ARMv8.6A reference.
What makes you believe that AMX is EITHER an off-core accelerator or shared between cores? Apple specifically said they were "on the CPU".

Go to 2:13PM

The Apple 2019 iPhone Event Live Blog (10am PT)

www.anandtech.com

How can you talk about 'kids who understand the real issues', only to go ahead the next minute and equate 'in the core' and 'on the CPU'?

Nothingness · Apr 8, 2020

soresu said:
Wut?

Neoverse N1/Ares is BASED on Cortex A76/Enyo, but it is not exactly the same.

The used the same execution logic but the uncore elements in N1 are architected for server/datacenter use.

That's why I explicitly used the term "core". That's very similar to what Intel does with cores that are almost identical across their product range while the interconnect, cache sizes, etc change. Same thing, really. So yes, both Intel and ARM use almost the same core from tablet to server. And I can't see why Apple could not achieve that.

Nothingness · Apr 8, 2020

name99 said:
Once again the issue is why do you even CARE about these numbers? Is the goal understanding or redacted?

Why do people say that IPC (and IPC equivalents) can't be compared across frequencies? Because the comparison is misleading IF you are trying to use IPC to gauge some aspect of the micro-architecture.
If I want to compare two branch predictors, I want to keep *everything* else identical to see which predictor delivers higher IPC. If I run one core at twice the frequency, now I can't tell if the lower IPC of the faster core is because the branch predictor is not as good, or if it's because the faster core is simply spending more cycles waiting on RAM.

BUT
- that's not what we are doing here AND
- the comparison doesn't go the way you want.

The comparison here is ultimately: what is a better design direction? Speed demon or brainiac? Of course "better" is a flexible word, but we're treating it as some combination of
- smaller core
- lower power
- higher performance (on GB, SPEC, browser, ...)

So what we ACTUALLY have is two cores that get more or less equal results across a wide range of code, one achieving that by
- 5GHz
- much higher power
- core ~twice as large (subject to quibbling about uncore, process, ...),
one achieving that at
- 2.6GHz.

Arguments about "exact" IPC are moronic in this context, demonstrating an utter inability to pick up on what is important, namely that core A achieves essentially the same results as core I through very different means.
So what do you do with that info?

At a business level, it suggests that core A has a bright future ahead of it.
At the DESIGN level, it is interesting to consider the various mechanisms by which core A manages to achieve such a spectacular degree of "work done per cycle".

Saying that core I is hampered by running faster is completely missing the point. Well, duh, OF COURSE core I is hampered by running faster! That's why team A put all their effort into a brainiac design, not a speed demon design. Team I is welcome to go back to the drawing board and run their core at 2.6 or 3 or 3.5GHz.
But there's something insane about simultaneously saying
- of course A can do well because they only have to run at low frequencies; everyone knows that at higher frequencies you spend ever more time waiting on DRAM AND
- therefore what team I should do is reach for ever higher frequencies...

The discussion the adults here are having is not about rah rah team A vs team I. It is about given the realities of power, transistor size (high frequencies means larger transistors and cells), frequency scaling (both transistors and metal) and likely smaller reticles going forward, how much more should future CPUs push on the speed side vs the brainiac side?
You're not helping if your contribution to that is tribal double-speak along the lines of "sure A does really well --- but they're cheating by using large caches [or smarter design or lower frequency or whatever]".
There's no such thing as "cheating". There is design that is more or less fit for the purpose and the future of technology. You're not helping team I by convincing their marketing team to double-down on even higher frequencies in spite of how those have proved a dead end over the past five years!

Maynard, you don't seem to be able to learn from your previous mistakes: don't point your gun at me, I try to be as factual as possible and I'm defending with data the ARM chips against some x86 fanatics. But I guess an Apple fanatics has a hard time swallowing that.

Nothingness · Apr 8, 2020

Andrei. said:
You should be ashamed to call yourself a moderator here. You're calling a troll one of the actual people designing these chips - it's utter insanity.

He really called me a troll? How ironic and funny. Isn't that a violation of forum rules to call someone a troll? Calling someone a fanboi is. I've put it on ignore, first time I had to put a moderator on ignore in more than 20 years in various forums.

For the love of god stop the incessant bickering and idiotic comments and denial and trolling. All of what you're all achieving is driving the actual people who have knowledge and able to give some insights on the topic away from the site in sheer disgust.

Though I've considered leaving the forum, at the moment I only play with the ignore button. There are still many people that have contradicting and/or interesting points of view

amrnuke · Apr 8, 2020

Nothingness said:
I'm defending with data the ARM chips against some x86 fanatics. But I guess an Apple fanatics has a hard time swallowing that.

Nothingness said:
Isn't that a violation of forum rules to call someone a troll? Calling someone a fanboi is.

@Nothingness

This is a very interesting set of quotes.

I noticed you said you're "defending with data". Did I miss one of your posts by accident that shows the data? Or can you explain what you mean?

naukkis · Apr 8, 2020

amrnuke said:
There is a big difference between SAYING score/GHz correlates to IPC, and PROVING it. Unless there is proof, it is nothing better than anecdote.

We have source code which have defined instructions to do whatever it will do. Machine-level instructions won't matter as there's multiple possible ways to translate those source code instructions even for same ISA. Whole Spec meaning is to offer source-code based benchmark which can be translated to cpu specific instructions freely.

Score/GHz is that benchmarks IPC. There's zero point of making machine instruction level comparison - in that race just adding desired amount of nops to instruction flow makes your IPC to rise - exactly to as high as wanted if cpu hardware is made to execute nops fast.

coercitiv · Apr 8, 2020

Turns out that rebuking someone's absurd views on the topic does not help the poor state of the thread after all, general desire for absurd conflict still remains, and the result is pretty much the same.

Lesson learned, again, for the nth time, until the next time.

insertcarehere · Apr 8, 2020

RetroZombie said:
Wall and wall of text and some of you guys explain nothing.

I already said this here i say it again and LOUD:
YOU CAN NO LONGER PROPERLY MEASURE IPC ON NEW CPUS IN THE NEW MULTICORE ERA!!!!!

These days even in single core single thread applications multicore cpus share resources like L2/L3 caches in order to improve performance, some do it better than others, some share more than others, this affecting single thread performance tremendously.
Now multi core with multi threading the best cpu given the best throughout and balance better the resources for each cpu core will also be the more efficient in multithreading apps.
Finally trying to measure multicore performance, basing it from the result of one single core/thread test is totally flawed because of all that.

So now we are moving the goal posts to saying that single-threaded performance/IPC doesn't matter anymore and overall multi-threaded throughput is king....

If only the ARM ISA had small, power and area-efficient cores that could be stacked wholesale into chips at low cost. Oh wait that's describes every single one of the Cortex cores they offer.

Neoverse E1 is touted to be under 0.5mm^2 on 7nm, imcluding SMT, nothing from x86 really comes close at this point in time.

amrnuke · Apr 8, 2020

naukkis said:
We have source code which have defined instructions to do whatever it will do. Machine-level instructions won't matter as there's multiple possible ways to translate those source code instructions even for same ISA. Whole Spec meaning is to offer source-code based benchmark which can be translated to cpu specific instructions freely.

Score/GHz is that benchmarks IPC. There's zero point of making machine instruction level comparison - in that race just adding desired amount of nops to instruction flow makes your IPC to rise - exactly to as high as wanted if cpu hardware is made to execute nops fast.

1) Agree, pure IPC evaluation is purely scientific and has little application to the real-world, where raw SPECint, SPECfp, and any other benchmark is all we care about, and normalizing to GHz doesn't matter one lick. However, again, I didn't make the IPC claims though, someone else did, and I'm a curious person, so I'm curious about it, and haven't been able to verify their claims, and they haven't backed up their claims with proof. That's all this is.

2) I agree that the instructions given by SPECint are to achieve a certain task, but there are many steps along the way that can cause variances in results. So when someone says SPEC/GHz is IPC, I want to see the proof. The reason I ask this is:

- Shouldn't we consider the purported (but not verified) 8 or 10% difference in instructions retired between the two ISAs?
- Shouldn't we consider differences in benchmark results depending on compiler? Do we actually know the differences between Clang/LLVM as part of Xcode, and whether it has any benefit or detriment compared to Clang/LLVM on Ubuntu on the same machine? I ask because as I understand, Xcode compiles in a hardware/device-specific fashion to optimize the application for that device. To the best of my knowledge Clang/LLVM on Ubuntu and Windows doesn't necessarily do so. Isn't this a potential source of variance?
- Specific to SPEC/GHz, shouldn't we also consider differences between reported boost score and average clock speed actually seen during testing?

This is not a comprehensive list, and again, these are just questions I'm asking to those who claim that there is no real difference between SPEC/GHz and IPC. While individually small, such differences do compound. When we introduce a bunch of areas of small (and easily dismissible, it seems) error, then we end up with the un-verified (and possibly wrong) assumption that IPC scales with SPEC score.

Please understand, this is all just me being inquisitive. I get the sense that there is no way that the IPC lead with A13 is anything but substantial. I am just curious how big. And that requires showing proof that the above factors have been controlled for, which wasn't done before, as best I can tell.

And also when people started making very specific claims ("+83%", "+80%", etc.) about how big the IPC lead is, I got excited and intrigued that they actually had the proof, but it seems that they don't actually have the data.

Nothingness · Apr 8, 2020

coercitiv said:
Turns out that rebuking someone's absurd views on the topic does not help the poor state of the thread after all, general desire for absurd conflict still remains, and the result is pretty much the same.

The problem is when some of the people who want to rebuke these absurd views also write absurd things. They were numerous and stubborn enough to make it a mess and render the discussion pointless.

Lesson learned, again, for the nth time, until the next time.

name99 · Apr 8, 2020

Schmide said:
https://www.anandtech.com/show/14892/the-apple-iphone-11-pro-and-max-review/2

Shows the A13 as a separate unit. An on board memory controller or cache can be shared by more than one type of compute unit. A core has one bus that loads in and out through the cache hierarchy. Now if the AMX system can transfer data directly register to register. AMX is in the CPU. I'll even accept it if it's on the same L1, maybe even L2 but then we're returning to Bulldozer land where 2 quasi cores with independent L1s share an FPU and L2.

Not the best source but...(bloomberg)

lobz said:
How can you talk about 'kids who understand the real issues', only to go ahead the next minute and equate 'in the core' and 'on the CPU'?

OMG!!! That's all I'll say.
You kids have fun.

RetroZombie · Apr 8, 2020

insertcarehere said:
So now we are moving the goal posts to saying that single-threaded performance/IPC doesn't matter anymore and overall multi-threaded throughput is king....

If it's 2020 and it still isn't, oh well...

insertcarehere said:
Neoverse E1 is touted to be under 0.5mm^2 on 7nm, imcluding SMT, nothing from x86 really comes close at this point in time.

I was trying to measure something with my post. Let's try to measure another:
From the vega release: Second of all, we have a formal die size and transistor count for Vega 10. The GPU is officially 486mm2, containing 12.5B transistors therein. That amounts to 3.9B more transistors than Fiji – an especially apt comparison since Fiji is also a 64 CU/64 ROP card – ...
Talking to AMD’s engineers, what especially surprised me is where the bulk of those transistors went; the single largest consumer of the additional 3.9B transistors was spent on designing the chip to clock much higher than Fiji.Vega 10 can reach 1.7GHz, whereas Fiji couldn’t do much more than 1.05GHz...

How big would the arm core grow if designed to run at 2x the clock (instead of 2.0Ghz like most arm chips run), do you know?

lobz · Apr 8, 2020

name99 said:
OMG!!! That's all I'll say.
You kids have fun.

Well that clears it up.

:facepalm for hours:

Discussion AWS Graviton2 64 vCPU Arm CPU Heightens War of Intel Betrayal

Senior member

Golden Member

Senior member

Senior member

Senior member

Golden Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Senior member

Golden Member

Diamond Member

Senior member

Senior member

Platinum Member