Solved! ARM Apple High-End CPU - Intel replacement

Richie Rich · Oct 14, 2019

There is a first rumor about Intel replacement in Apple products:

ARM based high-end CPU
8 cores, no SMT
IPC +30% over Cortex A77
desktop performance (Core i7/Ryzen R7) with much lower power consumption
introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
massive AI accelerator

Source Coreteks:

DrMrLordX · Mar 31, 2020

Nothingness said:
You can make that computation yourself, it's easy. That will prevent you from thinking there's some hidden agenda

Ah but then I would have to stop being lazy and relying on someone else's analysis.

Nothingness said:
SPECrate has nothing to do with multithreading at all. It's multiple processes doing the same thing launched at the same time. That's a very poor characterization for properly parallelized/multithreaded tasks: it's basically, for some of the subtests, a stress test for memory bandwidth. It's not even a good proxy for parallel compilation.

So back to square one.

ThatBuzzkiller · Mar 31, 2020

NTMBK said:
Anyway, I doubt Nintendo would ever swap to a non-NVidia vendor for a Switch follow up. They have a custom low level API written by NVidia, NVN, so they can't just swap in any old ARM SoC.

Assuming that they even ever planned for a follow up ...

Nintendo's options for compatibility is more limited than you might imagine because NVN isn't compatible with other Nvidia architectures aside from the Tegra versions of Maxwell/Pascal. NVN shaders are compiled offline so the device needs to have a compatible ISA to run the games.

With Nvidia, they make no guarantees that their future hardware designs will be binary compatible with their older designs so the idea of maintaining backwards compatibility was in jeopardy before the release of the new system.

DrMrLordX · Mar 31, 2020

amrnuke said:
Yes, big apologies. I made a follow-up post, I calculated something wrong somewhere. Someone said something that made me start thinking, and I went back and re-calculated.

I don't really blame you. At least you tried. Which is more than a lot of people do.

edit: for anyone looking at one example of an ARM CPU with some scaling problems, you really don't need to look any further than Anandtech's review. Not sure why I spent all that time obsessing over amrnuke's numbers:

ARM News, Reviews and Insights | Tom's Hardware

Discover the Tom's Hardware take on the ARM product range, with news, reviews and benchmarking for the hardcore PC enthusiast.

www.anandtech.com

The assertion that interconnects are easy and that you can just jam a bunch of cores together and make them perform is not true. Apple hasn't demonstrated any interconnects better than what ARM specced for Neoverse (and what Amazon used in Graviton2), so I don't really think it's appropriate to just look at Apple core IPC and extrapolate server CPU performance on that basis.

Nothingness · Apr 1, 2020

amrnuke said:
In other words we still know nothing about how ARM scales up.

Sorry but the way you phrase it is ambiguous. ARM per se has nothing that prevents it from scaling up/out.

There were some data about Centriq performance on NGINX.

Don't know if anything else was ever publicly disclosed about how various ARM server chips scale up.

Nothingness · Apr 1, 2020

DrMrLordX said:
edit: for anyone looking at one example of an ARM CPU with some scaling problems, you really don't need to look any further than Anandtech's review. Not sure why I spent all that time obsessing over amrnuke's numbers:

ARM News, Reviews and Insights | Tom's Hardware

Discover the Tom's Hardware take on the ARM product range, with news, reviews and benchmarking for the hardcore PC enthusiast.

www.anandtech.com

There are some issues but do you think Intel or AMD chips behave much better? Looking at @amrnuke data I'm not so sure.

The assertion that interconnects are easy and that you can just jam a bunch of cores together and make them perform is not true. Apple hasn't demonstrated any interconnects better than what ARM specced for Neoverse (and what Amazon used in Graviton2), so I don't really think it's appropriate to just look at Apple core IPC and extrapolate server CPU performance on that basis.

And why do you think it's an interconnect issue? I can build various other hypotheses. I'll give you one: the hardware prefetchers are not properly tuned for many threads wanting out of cache data at about the same time, and so are saturating both the memory controllers and the interconnect. That's still an issue but not the one you claim.

Your conclusion about extrapolation is nonetheless correct

But not applicable to a laptop/desktop chip with 8 cores.

DrMrLordX · Apr 1, 2020

Nothingness said:
What makes you think that? Are you sure none of the chips on ARM roadmap is SVE only because SVE2 wasn't finalized? As far as I know ARM hasn't communicated about that.

BTW sorry I neglected to address this post earlier.

I forget who it was, but some people here on this forum were essentially telling me that SVE was being skipped over in favor of SVE2 due to the useful extensions that were added. Fujitsu appears to have jumped the gun, not that it matters to them since it's a niche chip designed for HPC. Essentially I'm agreeing with those who seem to have better information than I on the subject.

Maybe if I do some intense forum searching I can find the posts. Might take some time.

Nothingness said:
There are some issues but do you think Intel or AMD chips behave much better?

Would have to look at detailed SPEC runs for Rome, Naples, and Cascade Lake-SP in single-socket scenarios with the same core alignments to do a detailed comparison. Not sure if that kind of data is available. Maybe @Andrei. can shed some light on that?

And why do you think it's an interconnect issue?

Process of elimination. Andrei's take on it was memory starvation, which . . . seemed a bit off given the memory analysis done on page 3 of the same article. It showed a pretty strong memory subsystem. As for the following theory:

I can build various other hypotheses. I'll give you one: the hardware prefetchers are not properly tuned for many threads wanting out of cache data at about the same time, and so are saturating both the memory controllers and the interconnect. That's still an issue but not the one you claim.

Andrei addressed that as well in his Graviton2 article:

What we’re also seeing here is that the Graviton2’s N1 cores prefetchers aren’t set up to be nearly as aggressive in some more complex patterns than what we saw in its mobile Cortex-A76 siblings; it’s likely that this was done on purpose to avoid unnecessary memory traffic on the chip, as with 64 cores you’re going to be very bandwidth starved, and you don’t want to waste any of that on possible mis-prefetching.

Now that doesn't mean you're entirely wrong. Just because the prefetchers "aren't set up to be nearly as aggressive" says little about how the behavior might still be affecting available memory bandwidth.

Your conclusion about extrapolation is nonetheless correct But not applicable to a laptop/desktop chip with 8 cores.

Yeah, pretty sure Apple can figure out an 8c chip alright. Not sure if they'll chase AMD or Intel into 16c+ territory. They may not bother.

Richie Rich · Apr 1, 2020

DrMrLordX said:
I forget who it was, but some people here on this forum were essentially telling me that SVE was being skipped over in favor of SVE2 due to the useful extensions that were added. Fujitsu appears to have jumped the gun, not that it matters to them since it's a niche chip designed for HPC. Essentially I'm agreeing with those who seem to have better information than I on the subject.

Maybe if I do some intense forum searching I can find the posts. Might take some time.

SVE is:

128 - 2048-bit vector width
mandatory instructions + optional matrix multiplication

SVE2 is an extension of SVE:

keeps same width 128-2048
improved auto-vectorization
mandatory instructions are about DSP/ Media
optional instructions are about AES

Just 4 instruction sets and 2048-bit width is slick approach for long time compatibility. Compared to messy 17 sets in AVX512.

https://static.docs.arm.com/100987/0000/acle_sve_100987_0000_04_en.pdf?_ga=2.219834265.1849955705.1585569385-435973099.1569595370

name99 · Apr 1, 2020

SarahKerrigan said:
Realistically, no; everything is going to be using its home node's memory anyway, barring occasional small spills. SPEC rate scales well to thousands of sockets; it's one of the reasons it's a relatively poor MT benchmark for commercial code (which tends to do a lot of sharing, especially when locks, etc, are used heavily.) It's slightly better for highly parallel workstation code but I still don't find rate terribly useful for MT throughput.

For ST, it remains the gold standard (although the reporting rules in '17 have some annoying aspects that IMO reduce the usefulness of submitted results.)

All true with one caveat:
SPEC (non-rate) tests most aspects of a core except issues related to instruction footprint. These include
- a large instruction footprint (like much database or OS code)
- a "distributed" footprint (lots of bouncing through multiple shared libraries, stressing the I1, I-TLB and the dylib code patterns
- generating code on the fly ala JIT

These are mostly unimportant for engineering workstations, but massively important for consumer use cases...
For these use cases it's always worth also comparing the browser benchmarks (which are obviously not ideal, especially when you have different browsers) but are the only real benchmarks available that do stress these I-side issues.

name99 · Apr 1, 2020

amrnuke said:
In other words we still know nothing about how ARM scales up.

What do you want to know? You won't get a useful answer if you don't pose a well-formed question.

The first question is one of correctness: can ARM scale up? The answer is clearly yes. "They" (ie some combination of ARM Inc and the various large SoC vendors) have a NoC that scales to at least 96 cores, and a solution (perhaps directory-based) for handling coherence across this many cores.

So correctness works. Then we have performance. And the issue is: what do you want to know.
There are multiple issues:
- there is total bandwidth. This is going to be constrained, more than anything else, by the number of memory controllers. This will presumably be scaled to what the data warehouse vendors *normally* require, not to the super-highest-value possibly imaginable. Right now these cores are targeting the cheap easily ported segment of the market, they're not trying to be specialized (and very expensive) engines for the most demanding jobs.
- how well does the LLC handle all the different demands of all these clients. This includes things like arbitrating between multiple prefetchers, but also topology, how much data to replicate in different slices, along with fancier ideas -- virtual write cache, LLC compression, dead line prediction, ...
- how well handled are locking primitives or barriers, things that need to enforce an ordering between more than one core

The bottom line is that there are no single numbers or benchmarks that give you useful answers to these three questions. If your goal is to have like a sports fan, you can latch onto something, misinterpret it, and wave a flag; but if your goal is understanding you're somewhat stuck. The best one can honestly do is look at things like success stories on Graviton 2, or occasional similar blog posts from the usual companies that operate in this space and are reasonably transparent (like Cloudflare).
And sure, those blog posts will mainly tell you how a certain type of code runs on these many-cores, it won't tell you how very different code runs. If Graviton 2 was not designed for massive bandwidth HPC calculations, no-one's going to try running their QCD code their. If it's a poor fit for SAP, no-one going to run SAP on it.

But NONE OF THAT tells you anything about "ARM's ability to scale". It tells you about the market that the ARM vendors are currently targeting. Which you should already know. A sane company doesn't decide that its first (or second) generation product is going to target not just the low-lying fruit but every computational task in the enterprise universe, from z to HPC to SAP to AWS!

Doug S · Apr 1, 2020

DrMrLordX said:
Yeah, pretty sure Apple can figure out an 8c chip alright. Not sure if they'll chase AMD or Intel into 16c+ territory. They may not bother.

If they want to go ARM for the Mac line they have no choice. An 8 core Mac Pro/iMac Pro is a non-starter.

They would of course follow AMD's lead with "chiplets", and not create a big monolithic die like Intel, the market for Mac Pro/iMac Pro is too small to justify it.

I don't see Apple having any trouble with the interconnect. Yes, it is some specialized knowledge but so is designing a leading edge CPU core and they didn't have that knowledge when they started down this path. They acquired the CPU design expertise they required, and could easily do the same for leading edge interconnect. It is quite possible they have already done so and it simply isn't public yet. Apple buys an average of one small company a month, but you only hear about 2-4 of them a year.

amrnuke · Apr 1, 2020

Getting back to

name99 said:
What do you want to know? You won't get a useful answer if you don't pose a well-formed question.

The first question is one of correctness: can ARM scale up? The answer is clearly yes. "They" (ie some combination of ARM Inc and the various large SoC vendors) have a NoC that scales to at least 96 cores, and a solution (perhaps directory-based) for handling coherence across this many cores.

So correctness works. Then we have performance. And the issue is: what do you want to know.
There are multiple issues:
- there is total bandwidth. This is going to be constrained, more than anything else, by the number of memory controllers. This will presumably be scaled to what the data warehouse vendors *normally* require, not to the super-highest-value possibly imaginable. Right now these cores are targeting the cheap easily ported segment of the market, they're not trying to be specialized (and very expensive) engines for the most demanding jobs.
- how well does the LLC handle all the different demands of all these clients. This includes things like arbitrating between multiple prefetchers, but also topology, how much data to replicate in different slices, along with fancier ideas -- virtual write cache, LLC compression, dead line prediction, ...
- how well handled are locking primitives or barriers, things that need to enforce an ordering between more than one core

The bottom line is that there are no single numbers or benchmarks that give you useful answers to these three questions. If your goal is to have like a sports fan, you can latch onto something, misinterpret it, and wave a flag; but if your goal is understanding you're somewhat stuck. The best one can honestly do is look at things like success stories on Graviton 2, or occasional similar blog posts from the usual companies that operate in this space and are reasonably transparent (like Cloudflare).
And sure, those blog posts will mainly tell you how a certain type of code runs on these many-cores, it won't tell you how very different code runs. If Graviton 2 was not designed for massive bandwidth HPC calculations, no-one's going to try running their QCD code their. If it's a poor fit for SAP, no-one going to run SAP on it.

But NONE OF THAT tells you anything about "ARM's ability to scale". It tells you about the market that the ARM vendors are currently targeting. Which you should already know. A sane company doesn't decide that its first (or second) generation product is going to target not just the low-lying fruit but every computational task in the enterprise universe, from z to HPC to SAP to AWS!

Well, I wonder how it scales up.
How does it perform when you start using all the cores for various things?
Broad, yes, but you said it yourself - they're designing it for everything!

I think I found a reasonable answer. Graviton2 performance scaling vs 7742 2P performance scaling. That is, SPECint2006 Rate Performance scaling results. I calculated the 7742 2P from the AT review (SPECint2006 here, SPECint2006 Rate here). Just used the plain Graviton2 results from here. Data below.

On average 7742 scales up in 2P to 128C/256T with relative performance of 67.49%, while Graviton2 scales up to 64C/64T with relative performance of 64.17%. Actually that seems pretty impressive scaling for the 7742 2P. We should probably scale that 110.39% result down to 100%, there is something amiss there. But still would be 66.63% relative performance.

I'd be interested to see how the 7542 alone in a single socket scales up to 64T. That'd be a great comparison to do more apples-to-apples evaluation.

Richie Rich · Apr 1, 2020

Hey guys, as far as I found this discussion very interesting, could we move Graviton2 scaling discussion into Graviton thread, please? Lets not mess two threads with the same thing.

DrMrLordX · Apr 2, 2020

name99 said:
The first question is one of correctness: can ARM scale up? The answer is clearly yes.

Not necessarily well, though. Maybe Huawei has had better luck? Still looking for Kunpeng 920 benchmarks.

And sure, those blog posts will mainly tell you how a certain type of code runs on these many-cores, it won't tell you how very different code runs. If Graviton 2 was not designed for massive bandwidth HPC calculations, no-one's going to try running their QCD code their. If it's a poor fit for SAP, no-one going to run SAP on it.

Intel has been selling their solutions as a one-size-fits-all for years, and massacred specialized system vendors in the process. If anyone wants to take ARM seriously in the same space, eventually it's gonna have to do the same thing.

But NONE OF THAT tells you anything about "ARM's ability to scale". It tells you about the market that the ARM vendors are currently targeting. Which you should already know. A sane company doesn't decide that its first (or second) generation product is going to target not just the low-lying fruit but every computational task in the enterprise universe, from z to HPC to SAP to AWS!

I'm constantly being told how good Apple's cores are and how high their IPC is, and how they're faster than x86 at everything. All they need to do is glue some together and domination ensues. Supposedly. It seemed like an interesting idea a few years ago, but with Apple not "going there" while the rest of the ARM world is "going there" (okay, Qualcomm bailed out) it's getting harder and harder to believe that a many-core Apple chip is going to dominate in workstation benchmarks (where Mac Pro will compete).

Seriously though, there's at least one poster here who has data ready to prove to you beyond reasonable doubt that A13 will crush anything anyone has ever made in every possible benchmark. That particular ARM vendor is (apparently) targeting everything. Kinda like Intel, actually.

Doug S said:
If they want to go ARM for the Mac line they have no choice. An 8 core Mac Pro/iMac Pro is a non-starter.

That comes down the road.

They would of course follow AMD's lead with "chiplets", and not create a big monolithic die like Intel, the market for Mac Pro/iMac Pro is too small to justify it.

Why? The chiplet idea was good for AMD since it allowed them to be competitive in the server world without making too many sacrifices on the desktop. AMD had to redesign their core for mobile since they couldn't get away their server design in that market. And it's been like that since 2017. Unless Apple is also going to chase the server market, I wouldn't automatically assume they'll do anything identical to AMD. Plenty of ARM vendors are demonstrating huge core counts in monolithic dice on TSMC processes. My expectation is to see monolithic cores from Apple. They'll yield well enough. I expect them to reuse IP blocks, but chiplets? Nahhh.

I don't see Apple having any trouble with the interconnect.

I'll put myself in the "wait and see" camp. They've done a really good job with core design up to this point.

Richie Rich said:
Hey guys, as far as I found this discussion very interesting, could we move Graviton2 scaling discussion into Graviton thread, please? Lets not mess two threads with the same thing.

Kind of hard to discuss ARM scaling discussions if you don't want ARM scaling discussions. Got any Kunpeng 920 scaling data? Eventually Apple is going to have to scale up their core counts and clockspeeds if they want desktop/laptop market share. So.

Nothingness · Apr 2, 2020

DrMrLordX said:
Not necessarily well, though. Maybe Huawei has had better luck? Still looking for Kunpeng 920 benchmarks.

Just to clarify. Why would ARM not scale well? Do you know a property of ARM instruction set architecture that prevents it from scaling well? Please don't confuse ARM with implementations of ARM architecture.

DrMrLordX · Apr 2, 2020

Nothingness said:
Why would ARM not scale well?

Same reason any other ISA might not scale well. You really want to see some bad scaling, go back and look at MP systems from the pre-Opteron days from AMD. Like the AMD Athlon MP 760:

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

They were nice enough to include some 1P results to show scaling. Take a look at 3D Studio Max:

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

1P->2P shows +27% performance. Yay?

Articles like that, and later articles about the Opteron and HT Assist got me thinking years ago about the challenges of multiprocessing. Cores that have shared L3 don't struggle anywhere near as much as single-core CPUs in separate sockets that rely exclusively on snooping the FSB for data requests; that being said, interconnect still matters. Intel had to ditch the ring on their server CPUs for a mesh for this reason. I'm guessing that most/all of the ARM server vendors are going mesh for their CPUs as well. Not sure what Apple will do to scale up to larger core counts. It'll be interesting to see.

Also, as you mentioned elsewhere, some uarch-specific (read: not ISA specific) features may cause problems with multicore scaling, such as prefetch behavior.

Nothingness · Apr 2, 2020

DrMrLordX said:
Same reason any other ISA might not scale well. You really want to see some bad scaling, go back and look at MP systems from the pre-Opteron days from AMD. Like the AMD Athlon MP 760:

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

They were nice enough to include some 1P results to show scaling. Take a look at 3D Studio Max:

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

1P->2P shows +27% performance. Yay?

Articles like that, and later articles about the Opteron and HT Assist got me thinking years ago about the challenges of multiprocessing. Cores that have shared L3 don't struggle anywhere near as much as single-core CPUs in separate sockets that rely exclusively on snooping the FSB for data requests; that being said, interconnect still matters. Intel had to ditch the ring on their server CPUs for a mesh for this reason. I'm guessing that most/all of the ARM server vendors are going mesh for their CPUs as well. Not sure what Apple will do to scale up to larger core counts. It'll be interesting to see.

Also, as you mentioned elsewhere, some uarch-specific (read: not ISA specific) features may cause problems with multicore scaling, such as prefetch behavior.

That was my point: it's not ARM or x86 that don't scale well. It's some implementations of ARM. Just as x86.

In fact I misread your previous answer: in all logic your "not necessarily well" doesn't mean "ARM doesn't scale well".

I think that some ARM implementations will catch up quickly because the interconnect problems have been studied, engineers are moving from company to company with their expertise, and some companies have money to invest into making it happen.

And what Apple will do? As I already wrote, I'm pretty sure they have the expertise to make 8-16 cores chips very good at scaling. Beyond that (and even already 16) is server territory something I'm not sure they want to do given Nuvia existence.

amrnuke · Apr 2, 2020

Nothingness said:
Why would ARM not scale well?

You are right. I think many of us (me included) have been conflating Graviton2 or A77 or A13 with ARM from an implementation standpoint.

ARM has no reason it wouldn't scale well that we know of but we simply don't have enough data points with ARM using >4 big cores to know a yes-or-no answer, and I don't know about uarch design and implementation to be able to say if there is anything specific about this uarch that would be more or less beneficial when scaled up for more rigorous laptop use and especially for server use.

I'm just really excited to see how these things stack up. I appreciate all AMD has done to make the market more competitive, but I am truly agnostic to the platform that pushes us into the future. If it's ARM, great, if it's x86, grand. If it's a combination fighting for the win, even better!

Doug S · Apr 2, 2020

DrMrLordX said:
Why? The chiplet idea was good for AMD since it allowed them to be competitive in the server world without making too many sacrifices on the desktop. AMD had to redesign their core for mobile since they couldn't get away their server design in that market. And it's been like that since 2017. Unless Apple is also going to chase the server market, I wouldn't automatically assume they'll do anything identical to AMD. Plenty of ARM vendors are demonstrating huge core counts in monolithic dice on TSMC processes. My expectation is to see monolithic cores from Apple. They'll yield well enough. I expect them to reuse IP blocks, but chiplets? Nahhh.

AMD did it because it helps yield and the chiplets can be used flexibly to create different core counts for different markets. ARM designs eyeing the server market with big dies don't care about different core counts because they are only targeting the HPC/cloud part of the market, while AMD wants to compete in the whole server market which isn't 64 core chips are the one size that fit all.

Apple has the same needs for flexibility as AMD but for different reasons. They don't want to use smaller core counts for smaller servers, they want to use smaller core counts for smaller desktops. Having flexibility to use different number of chiplets to produce different products and different configurations / price points. The ability to use a single chiplet as a standalone in the lower end desktop/laptop market would be a big win for them, because a single building block could be used to make everything from the Macbook Air to the Mac Pro.

Why should they design a huge die that could only be used in the Mac Pro and iMac Pro which together probably don't sell much more than a million units a year, then one or more other designs specific to lower end products? That costs them more with no benefit unless they want to be like Intel and allow access to new instructions only in certain market segments.

Reuse of IP blocks doesn't stop you from needing a new mask set for that huge die (the price of which is approaching $100 million at leading edge nodes) or from the reduced yields you get from those huge dies. While Apple can afford that additional cost, why spend money you don't have to? Plus it isn't like you don't need high end connectivity because you have a big die, it is just all on die rather than part of it being on the substrate of the MCM.

Carfax83 · Apr 2, 2020

Richie Rich said:
It's actually 60% and prediction is for growing. I'm sorry to put only 55% - this might let you think x86 has a chance but it doesn't
Mobile games sparked 60% of 2019 global game revenue

You can probably source better than that. Don't just use any favorable bit of information. Goldencasinonews.com, never heard of them at all, and they seem heavily biased towards mobile gaming because......they specialize in online gambling.

That said, I don't believe that figure at all. Between PC gaming and the consoles, I'm confident that x86-64 had the majority of the market share in 2019, and this is corroborated by newzoo, a research company that specializes in the gaming market.

I don't say x86 will disappear from market till next Monday. But economical factors (like 80% of global revenue market share are ARM based devices) means that most money goes into development of multiple ARMs. ARM and Apple worked hard to go from zero to hero and they continue in that hard work. x86 companies not, look at the IPC jump per year (Intel 4%, AMD 7.5%, ARM 20%, Apple 18%). If that trends continues then x86 is dead in less than decade. x86 needs to wake up and start working harder than ARM.

IPC isn't some bottomless pit. As many have said, the main reason why ARM CPUs are increasing IPC at higher rates is because their clock speeds are comparatively much lower than x86-64 CPUs. It's much harder to increase IPC and clock speeds at the same time.

Carfax83 · Apr 2, 2020

insertcarehere said:
I expect ARM to continue to evolve faster than x86 at least for the next few years, it's clear looking at the Apple cores that even A77 has a fair bit of headroom with regards to improving the core for more performance.

As I said earlier, ARM CPUs are clocked much lower than x86-64 CPUs and so its easier to improve IPC, because memory latency increases as the frequency of the CPU increases which results in the CPU spending more time waiting.

Also, Intel bungled their 10nm node which resulted in massive delays and stunted performance increases. It's up to AMD now, and by all accounts, Zen 3 is shaping up to be a monster. Clock speeds well above 4ghz, SMT, high core count, unified cache, and an IPC gain of probably close to 20% is going to be brutal. No ARM CPU is going to be able to match it in overall performance.

Carfax83 · Apr 2, 2020

Doug S said:
You keep saying this, but offer ZERO support for such a fantastic claim. What are the changes Intel made to the "microarchitecture itself" for the server line vs the consumer line. There are NONE, unless you count the addition of wider AVX instructions which have zero impact on the overall IPC.

Are cache and memory controllers considered part of the microarchitecture? Because Intel's server CPUs have more cache and more memory channels.

Also I've said many times that both Intel and AMD design their CPUs to be used across multiple platforms, so high clock speeds, big SIMD, high core counts, SMT etcetera are all important attributes for performance in these markets. Apple on the other hand have specialized their CPUs for mobile because that's the only platform they are currently targeting, so single thread performance is unusually high.

You can't extrapolate what Intel and AMD have done, to what Apple is doing because they are completely different methods. I've posted the graph many times before, but you keep ignoring it because you have no answer. The frequency/voltage curve for the A12 is ridiculous and shows that the current A series microarchitecture could never succeed as is, in a high core count package.

Richie Rich · Apr 2, 2020

Carfax83 said:
You can probably source better than that. Don't just use any favorable bit of information. Goldencasinonews.com, never heard of them at all, and they seem heavily biased towards mobile gaming because......they specialize in online gambling.

That said, I don't believe that figure at all. Between PC gaming and the consoles, I'm confident that x86-64 had the majority of the market share in 2019, and this is corroborated by newzoo, a research company that specializes in the gaming market.

What we see on that picture?

- mobile is 45% (ARM)
- consoles are 32% (Nintendo Switch has 40% of consoles so that's 13pp)
- mobile + Switch (ARM A57) = 45 + 13 = 58%
- mobile growth is 10.2% YoY
- PC growth is 4.0% YoY

Resume:

- ARM devices has 58% game market share
- mobile ARM is growing 2.5 times faster than PC

Do you see the dark future for x86 now?

Markfw · Apr 2, 2020

Richie Rich said:
What we see on that picture?

- mobile is 45% (ARM)

- consoles are 32% (Nintendo Switch has 40% of consoles so that's 13pp)

- mobile + Switch (ARM A57) = 45 + 13 = 58%

- mobile growth is 10.2% YoY

- PC growth is 4.0% YoY

Resume:

- ARM devices has 58% game market share

- mobile ARM is growing 2.5 times faster than PC

Do you see the dark future for x86 now?

First, you can't add switch to mobile for mobile% (your 60% mobile calc)

Second, smartphone and tablet appear to be what they call mobile, not sure the 3rd definition.

Lastly, mobile games have been going up as smartphone usage has gone up. But I think the market is pretty well saturated by now. So, first, I see the 45% as a real mobile number, and second, the future of this is a guess. You guess smartphone/tablet will grow, and I say its all going to plateau.

Stop twisting the facts.

soresu · Apr 2, 2020

Markfw said:
You guess smartphone/tablet will grow, and I say its all going to plateau.

There's evidence for it plateauing already - the mobile market hit that meh point of upgrades some time ago now.

They keep touting moar performance with every new Axx core, but I doubt most Apple users can even tell the difference unless they are running benchmarks.

Up until recently I would have said there is still potential in growing markets like India and China - but the recent crisis has put a sizable question mark on that.

Nothingness · Apr 3, 2020

Markfw said:
First, you can't add switch to mobile for mobile% (your 60% mobile calc)

@Richie Rich doesn't say it's 60% mobile, he says that Switch @40% of 32%, plus mobile means that ARM is close to 60% of the market. That's how I read it and it makes sense to me (if we assume the Switch indeed is 40% of the console market).

Lastly, mobile games have been going up as smartphone usage has gone up. But I think the market is pretty well saturated by now. So, first, I see the 45% as a real mobile number, and second, the future of this is a guess. You guess smartphone/tablet will grow, and I say its all going to plateau.

By your logic PC gaming should have decreased when the PC market started decreasing years ago. Did that happen? Honest question, I don't know

Solved! ARM Apple High-End CPU - Intel replacement

Senior member

Lifer

Golden Member

Lifer

Diamond Member

Diamond Member

Lifer

Senior member

Senior member

Senior member

Diamond Member

Golden Member

Senior member

Lifer

Diamond Member

Lifer

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Moderator Emeritus, Elite Member

Diamond Member

Diamond Member