Solved! ARM Apple High-End CPU - Intel replacement

Richie Rich · Oct 14, 2019

There is a first rumor about Intel replacement in Apple products:

ARM based high-end CPU
8 cores, no SMT
IPC +30% over Cortex A77
desktop performance (Core i7/Ryzen R7) with much lower power consumption
introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
massive AI accelerator

Source Coreteks:

DrMrLordX · Mar 29, 2020

Carfax83 said:
Has anyone done any comparison benchmarking with SVE against AVX512?

None of which I am aware. SVE is deprecated anyway. A64FX is probably the only production/semi-production CPU that will ever use it. SVE2 hasn't gone into production at all.

Richie Rich · Mar 29, 2020

Carfax83 said:
My point is, that many of those people have not taken into consideration the changes that would need to be made to the overall design of the A series CPUs to make it more scalable and performant in multithreaded workloads, and that these necessary changes would almost certainly definitely result in a significant reduction of single threaded performance.

What changes Intel has in core between consumer Core i9 and server Xeon? Besides stronger FPU and L2$ size, there is no change.
What changes AMD has in core between consumer Ryzen 1800X and server 32 core EPYC ? No change at all, just identical 4xCPU
What changes AMD has in core between consumer Ryzen 3700X and server 64 core EPYC ? No change at all, chiplets are identical, just different IOD.
What changes ARM has in core between consumer Cortex A76 and server 64 core Graviton2? 1MB L2$ instead of 0.5MB.

As a matter a fact Graviton2 has better ST performance than consumer A76. You should read Andrei's G2 test more carefully: Compared to a mobile Cortex-A76 such as in the Kirin 990 (which is the best A76 implementation out there), the resulting IPC is 32% better for the Graviton2 in SPECint2006, and 10% better for SPECfp2006.

Above examples prove that you are wrong and Apple A13 based server CPU would scale without any problem. Sooner you admit this fact smaller the shock from Nuvia performance you will have.

Carfax83 said:
If the A series was such a winning design for other workloads, we would have seen it in x86 land already. The fact that AMD and Intel are both pursuing similar designs tells me all I need to know about the most ideal designs for these types of workloads.

Apple cores are the winning design actually. ARM changed his strategy from maximum performance per area (PPA) cores (A72, 73, 75) to much wider Austin designs A76-78 inspired by Apple. Look at A77 how wide machine it is: 4xALU+2xJump, 2xLSU+2xstore, 2xFPU, that's wider than Zen2 and is clearly inspired by Apple's super wide monster.

x86 have the most ideal designs? That's funny. You mean that company which went from competitive 3xALU design to much slower 2xALU design and almost went bankrupt because of that? And then today saying look how good we are, we made a 40% IPC jump! Yeah, one step backward and two step forward strategy. It's an interesting alternative to tick-tock strategy. And the second company was so smart to not go backwards and kept minimal effort resulting to 4% IPC per year improvement over last decade. And let themselves outrun in IPC by Apple since A10 Hurricane 2016 and by ARM A77 since last year. Where is the ideal design when x86 is loosing positions everywhere?

DrMrLordX · Mar 29, 2020

Richie Rich said:
What changes Intel has in core between consumer Core i9 and server Xeon? Besides stronger FPU and L2$ size, there is no change.

Gahhhh do your research!

Skylake-SP, Cascade Lake-SP, and Cooper Lake have numerous differences between them and Skylake/Kabylake/Coffeelake/Comet Lake.

The server CPUs use a mesh interconnect. Desktop CPUs do not.
The server CPUs have larger L2 and smaller L3 per core compared to desktop CPUs.
All of the server cores support some AVX512 extensions. The only consumer cores to date to support any AVX512 instructions are Cannonlake and IceLake (not counting HEDT).

Richie Rich said:
Above examples prove that you are wrong and Apple A13 based server CPU would scale without any problem.

Graviton2 has problems scaling, what makes you think Apple won't? Unless they develop a better interconnect and/or include more L3 in their design?

coercitiv · Mar 29, 2020

To argue the lack of apparent optimizations going from Ryzen to EPYC is proof for low tradeoff requirements between consumer and server workspace is an outstanding display of cognitive dissonance.

Except for the APU based SKUs, Ryzen is considered a prime example of consumer product based entirely on server silicon. Ryzen's weakpoints are generally acknowledged as stemming from tradeoffs to maximize EPYC efficiency. As a simple example, this forum alone is filled with people dreaming of a true monolithic desktop Ryzen.

If you want to scale, you make sacrifices.

Nothingness · Mar 29, 2020

Hitman928 said:
Perhaps I wasn't clear enough that I was being facetious, but I was doing so to make a point. I also don't know why you keep calling other test collections outside of Spec "microbenchmarks", seemingly trying to discredit them, as if Spec isn't made up of a collection of "microbenchmarks".

Oh please come on do yourself a favor and look at C-ray or Dhrystone source code. These are microbenchmarks. And 7-zip is a very specific benchmark which as far as I know has heavy x86 tuning and much less (if any) ARM tuning.

The last one, OpenSSL, is a clear sign of two things: heavy x86 tuning, and more important in my mind, AMD chips simply are the best at big num (look at gmplib results).

Additionally, I'm not aware of any evidence of AOCC using cheats like ICC has been accused of in the past. On the contrary, AOCC is built upon current versions of LLVM and then given additional Zen optimizations (and increased compile time to try and produce the fastest code). That gives AMD data on which optimizations produce the best real world effects so that those optimizations can then be incorporated into future industry standard compilers.

And you think that AOCC being almost twice faster than gcc in ServeTheHome results is not a sign of heavy targeted tuning. Really? You think gcc is that bad while it's been tested on SPEC for years?

From my understanding ARM does the same but upstreams their optimizations into GCC so you get the optimizations earlier with that compiler compared to Zen (if product timelines were equivalent).

That's an AMD failure. A big one, don't you think? And if you think some magic generic (as opposed to "let's make SPEC fast") tuning will bring a 2x speed up to SPEC rate AMD you're being incredibly naive, or heavily biased.

Nothingness · Mar 29, 2020

Carfax83 said:
Has anyone done any comparison benchmarking with SVE against AVX512?

No because no SVE chip is publicly available. We can just try to guess performance from data sheets.

If you want to have some fun: https://github.com/fujitsu/A64FX/tree/master/doc

And it's not a question of SVE vs AVX-512, it's a question of comparing chips with each of these instruction sets

Carfax83 · Mar 29, 2020

Richie Rich said:
What changes Intel has in core between consumer Core i9 and server Xeon? Besides stronger FPU and L2$ size, there is no change.

What changes AMD has in core between consumer Ryzen 1800X and server 32 core EPYC ? No change at all, just identical 4xCPU

What changes AMD has in core between consumer Ryzen 3700X and server 64 core EPYC ? No change at all, chiplets are identical, just different IOD.

What changes ARM has in core between consumer Cortex A76 and server 64 core Graviton2? 1MB L2$ instead of 0.5MB.

Your problem is that you don't do any research before making grandiose statements you don't know anything about. Intel's modern Xeon lineup uses a mesh interconnect which is not the case for their consumer CPUs that still use the ring bus. Also, Rome has humongous L3 caches, up to 256MB which is far more than what you see in the consumer lineup.

Above examples prove that you are wrong and Apple A13 based server CPU would scale without any problem. Sooner you admit this fact smaller the shock from Nuvia performance you will have.

This is a total mischaracterization of my argument. I have never said that the Apple A13 could not scale up by adding more cores. My argument has always been that while the A13 could be scaled up, it would need to undergo significant changes to the cache hierarchy and the microarchitecture itself, which would result in reducing the very high single threaded IPC you've been raving about this entire thread.

Apple cores are the winning design actually. ARM changed his strategy from maximum performance per area (PPA) cores (A72, 73, 75) to much wider Austin designs A76-78 inspired by Apple. Look at A77 how wide machine it is: 4xALU+2xJump, 2xLSU+2xstore, 2xFPU, that's wider than Zen2 and is clearly inspired by Apple's super wide monster.

The Apple cores are a winning design only if you want strong single threaded performance. The frequency/voltage curve is too steep to be using this core (as is) as the basis for a serious multicore CPU, without significant modifications. Let me refresh your memory as you seem to keep ignoring this:

Even just 8 of these things on the same die would be too much, much less 64.

x86 have the most ideal designs? That's funny. You mean that company which went from competitive 3xALU design to much slower 2xALU design and almost went bankrupt because of that? And then today saying look how good we are, we made a 40% IPC jump! Yeah, one step backward and two step forward strategy. It's an interesting alternative to tick-tock strategy. And the second company was so smart to not go backwards and kept minimal effort resulting to 4% IPC per year improvement over last decade. And let themselves outrun in IPC by Apple since A10 Hurricane 2016 and by ARM A77 since last year. Where is the ideal design when x86 is loosing positions everywhere?

Nobody cares about Apple's high IPC, because it's not used outside of smartphones and tablets. Also, if ARM CPUs ever clock as high as x86-64 CPUs, they will run into the same problems with increasing IPC for reasons already mentioned. And the last time I checked, x86-64 is dominating HPC, servers, databases, gaming PCs, laptops and regular desktops.

ARM has lots of work to do if they want to be the dominant architecture that x86-64 has become!

Hitman928 · Mar 29, 2020

Nothingness said:
Oh please come on do yourself a favor and look at C-ray or Dhrystone source code. These are microbenchmarks.

I agree on Dhrystone, although it can still produce useful data about the microarchitecture, but not C-ray. C-ray may be small, but it actually does something useful using calculations that are done all the time in "real world apps". It's one of the most widely run benchmarks there is. It's basically taking a small part of what apps like pov-ray do and isolating those calculations for an easy to compile (thus making it widely compatible) benchmark.

And 7-zip is a very specific benchmark which as far as I know has heavy x86 tuning and much less (if any) ARM tuning.

Isn't this a big part of the issue for ARM though. If you say we have to limit benchmarks to only applications that have equal optimizations for x86 and ARM, despite decades of x86 dominance in the server and desktop market, your available test suite is going to be severely limited and also unrealistic if you are wanting to switch to actually using ARM.

The last one, OpenSSL, is a clear sign of two things: heavy x86 tuning, and more important in my mind, AMD chips simply are the best at big num (look at gmplib results).

So no tests that Epyc inherently does well in either, got it. OpenSSL is (as the name suggests), open source so ARM vendors are free to analyze and contribute code if they think the code isn't optimized well for ARM.

And you think that AOCC being almost twice faster than gcc in ServeTheHome results is not a sign of heavy targeted tuning. Really? You think gcc is that bad while it's been tested on SPEC for years?

It was 67% faster and using an older version of GCC versus the just released AOCC which was built upon the just released LLVM so I expected a very healthy uptick in performance same as if they had tested against the older version of LLVM. With that said, that's probably still too high for "real world" results which I pointed out in my reply to Mark. However, using an older version of GCC where AMD doesn't upstream optimizations is also not a fair comparison either for Rome which was my whole point from the beginning.

That's an AMD failure. A big one, don't you think? And if you think some magic generic (as opposed to "let's make SPEC fast") tuning will bring a 2x speed up to SPEC rate AMD you're being incredibly naive, or heavily biased.

AMD would probably greatly benefit by working a lot more with the upstream effort of LLVM and especially GCC. I'm guessing that as AMD continues to improve financially, you'll see more of these types of efforts, just like they are finally getting around to working on fixing their vector paths with glibC.

AMD Developers Looking At GNU C Library Platform Optimizations For Zen - Phoronix

www.phoronix.com

You are also creating a pretty large strawman with your 2x speed-up comment which I not only never said, but I had already mentioned previously that comparing the published SPEC results in comparison to the Graviton2 Anandtech results showed an unrealistic advantage to Rome, but I appreciate the digs at my character. I suppose the proof will be in the pudding as they say and we'll have to see how many contracts the Ampere and TX3 chips end up winning.

Richie Rich · Mar 29, 2020

Carfax83 said:
ARM has lots of work to do if they want to be the dominant architecture that x86-64 has become!

ARM devices has majority of revenue worldwide:

biggest market is smart phones (7x bigger than servers) and ARM has 100% and x86 0%.
gaming market: 55% of revenue goes to ARM devices
server market: ARM is expanding, several new companies like Amazon, Ampere, Marvell, Nuvia, Fujitsu, Hisilicon ... how many new companies has x86? Zero.
laptops: Qcomm has Snapdragon 8cx …. another ARM expansion
desktop: HiSilicon tries ATX boards with 64 core Kungpeng 920 …. another ARM expansion

And where is x86 expanding? Nowhere. It is loosing positions everywhere so it's hard to speak about dominancy today

Markfw · Mar 29, 2020

Richie Rich said:
ARM devices has majority of revenue worldwide:

biggest market is smart phones (7x bigger than servers) and ARM has 100% and x86 0%.

gaming market: 55% of revenue goes to ARM devices

server market: ARM is expanding, several new companies like Amazon, Ampere, Marvell, Nuvia, Fujitsu, Hisilicon ... how many new companies has x86? Zero.

laptops: Qcomm has Snapdragon 8cx …. another ARM expansion

desktop: HiSilicon tries ATX boards with 64 core Kungpeng 920 …. another ARM expansion

And where is x86 expanding? Nowhere. It is loosing positions everywhere so it's hard to speak about dominancy today

So because ARM is used in smartphones, and thats what it is designed for, you think it will expand to all other areas that its NOT designed for ?

Let me know what you are smoking, I want some.

NostaSeronx · Mar 29, 2020

Markfw said:
you think it will expand to all other areas that its NOT designed for ?

Ha, ha, Intel/IBM/AMD will never get into the HPC market. They will never surpass Cray/NEC vector architectures. Energy efficient/low cost processors will never replace true high performance processors.

Carfax83 · Mar 29, 2020

Richie Rich said:
ARM devices has majority of revenue worldwide:

biggest market is smart phones (7x bigger than servers) and ARM has 100% and x86 0%.

gaming market: 55% of revenue goes to ARM devices

server market: ARM is expanding, several new companies like Amazon, Ampere, Marvell, Nuvia, Fujitsu, Hisilicon ... how many new companies has x86? Zero.

laptops: Qcomm has Snapdragon 8cx …. another ARM expansion

desktop: HiSilicon tries ATX boards with 64 core Kungpeng 920 …. another ARM expansion

And where is x86 expanding? Nowhere. It is loosing positions everywhere so it's hard to speak about dominancy today

The mobile gaming market isn't 55% so I don't know where you got that figure from. If you combine the PC and console markets, then that would be over 50%. Mobile gaming is likely around 45% or so.

And I'm not denying that ARM won't ever gain a foothold in many of these industries. It's just going to be a uphill battle. ARM can change and evolve, but so can x86-64. There's been rumors for several years that Intel wants to make a cleaner x86 architecture with a lot of the legacy stuff removed. I guess we'll have to see.

insertcarehere · Mar 29, 2020

Carfax83 said:
And I'm not denying that ARM won't ever gain a foothold in many of these industries. It's just going to be a uphill battle. ARM can change and evolve, but so can x86-64. There's been rumors for several years that Intel wants to make a cleaner x86 architecture with a lot of the legacy stuff removed. I guess we'll have to see.

Snapdragon 835 (2017) to Snapdragon 855 (2019) literally doubled single-threaded performance in two years (I.E Zen 1 to 2), while the 865 is another 30% increase in the span of a year. It's clear which architecture has been evolving faster in this timeframe.

name99 · Mar 29, 2020

DrMrLordX said:
In phones. You still can't get a reasonably-priced A76 (or A77) sbc in the united states. Want something a76-like in a laptop? 8cx or bust. That's it! There's just no way to buy these cores in any kind of notebook or desktop form factor. Except 8cx. Where are the "serious" A77 machines this year? I don't see them.

A76 was widespread in 2018, furthermore. It took over a year to get it anywhere that wasn't a phone. A77 is new in phones this year. I would not expect much/any access to A77 outside of mobile until NEXT year.

As for Ampere and ThunderX3 . . . still waiting!

The small cores only have one NEON unit, I think, but Graviton2 has them on every core. It's notable that Amazon (and really it was ARM with the Neoverse reference design) committed all that silicon to SIMD while Apple mostly didn't. One's a phone SoC and the other isn't. Consider the previous context of this thread as to why that's relevant.

What are you talking about?
The LARGE Apple cores have 3 NEON units each. The SMALL cores have 1 NEON unit each because, duh, they are optimized for being small cores. I've no idea why you consider this an interesting point.
Do you have some bizarre belief that Apple large cores SHARE the three NEON units, and the 4 small cores are sharing a single NEON unit?

The point that seems relevant to me is that Apple (and ARM vendors generally) have a suite of SIMD technologies that they can deploy to the extent it makes sense for their targets. Apple will presumably use many of their large cores (presumably with AMX or SVE or both [we still don't know the relationship between AMX, SVE, and the ARMv8.6 spec] for both their desktop machines and any possible servers.

Graviton 2 I believe (but I am not sure) has 2 NEON units per core. ThunderX 3 has 4 NEON units per core. There's obviously flexibility here.

name99 · Mar 29, 2020

Carfax83 said:
The Apple cores are a winning design only if you want strong single threaded performance.

Uh, WOT????
OF COURSE the issue is "strong single threaded performance". That is literally the entire discussion!!!
Any monkey can replicate cores and give high multi-core performance, that is just not hard. Regardless of what you think of Amazon or Marvell, they have functioning designs with 64 or 96 cores on a chip; Apple could do the same, Cavium is doing the same. Huawei, Samsung could do the same. It's JUST NOT HARD.

The hard part is the part you're insisting we ignore: strong single threaded performance!

SarahKerrigan · Mar 29, 2020

name99 said:
Uh, WOT????
OF COURSE the issue is "strong single threaded performance". That is literally the entire discussion!!!
Any monkey can replicate cores and give high multi-core performance, that is just not hard. Regardless of what you think of Amazon or Marvell, they have functioning designs with 64 or 96 cores on a chip; Apple could do the same, Cavium is doing the same. Huawei, Samsung could do the same. It's JUST NOT HARD.

The hard part is the part you're insisting we ignore: strong single threaded performance!

While you are correct on this, I would remind you that Cavium is Marvell. You've talked about them a couple times as if they're separate entities or product lines, and they are not.

Carfax83 · Mar 29, 2020

insertcarehere said:
Snapdragon 835 (2017) to Snapdragon 855 (2019) literally doubled single-threaded performance in two years (I.E Zen 1 to 2), while the 865 is another 30% increase in the span of a year. It's clear which architecture has been evolving faster in this timeframe.

And do you expect that to continue?

insertcarehere · Mar 29, 2020

Carfax83 said:
And do you expect that to continue?

I expect ARM to continue to evolve faster than x86 at least for the next few years, it's clear looking at the Apple cores that even A77 has a fair bit of headroom with regards to improving the core for more performance.

name99 · Mar 29, 2020

SarahKerrigan said:
While you are correct on this, I would remind you that Cavium is Marvell. You've talked about them a couple times as if they're separate entities or product lines, and they are not.

My bad! Sorry, I keep mixing up Cavium and Ampere

Carfax83 · Mar 30, 2020

name99 said:
Uh, WOT????
OF COURSE the issue is "strong single threaded performance". That is literally the entire discussion!!!
Any monkey can replicate cores and give high multi-core performance, that is just not hard. Regardless of what you think of Amazon or Marvell, they have functioning designs with 64 or 96 cores on a chip; Apple could do the same, Cavium is doing the same. Huawei, Samsung could do the same. It's JUST NOT HARD.

The hard part is the part you're insisting we ignore: strong single threaded performance!

I'm not saying we should ignore strong single threaded performance from the Apple A series, just to understand how Apple has attained it and why their microarchitecture is focused on it. As myself and others have been saying, single threaded performance is very important for mobile devices, while for desktop and other markets that x86 has a strong footing in, multithreaded performance has also become critical as more workloads are being parallelized. So Intel and AMD have to make more balanced designs in terms of single and multithreaded performance compared to Apple, who's CPUs are going into smartphones and tablets.

Both Intel and AMD design CPUs with microarchitectures that must be performant and scalable in a wide variety of platforms, everything from laptops, gaming rigs to servers to HPC etcetera. So focusing on single threaded performance like what Apple is doing would be counterproductive. So if Apple decided to make CPUs to compete with Intel and AMD in other markets, I would expect that whatever CPU they would come up with would look drastically different than the A series.

Also I would disagree with your statement concerning how easy it is to replicate cores and give high multi core performance. Although I'm not an engineer or industry professional, making a good, scalable uncore is apparently really hard to do and both Intel and AMD utilize significant engineering and financial resources towards doing so.

DrMrLordX · Mar 30, 2020

name99 said:
What are you talking about?
The LARGE Apple cores have 3 NEON units each. The SMALL cores have 1 NEON unit each because, duh, they are optimized for being small cores. I've no idea why you consider this an interesting point.

It's interesting because Apple still hasn't released an A-series SoC that prioritizes NEON on every core. Amazon has done so. Fujitsu went above-and-beyond by adopting SVE (too bad for them SVE is deprecated, but not all is lost for them). SIMD performance is not an absolute priority on a mobile chip. Very good of you to make my point for me. If you hadn't noticed, someone here seems to think Apple is on the cusp of taking over the server world, despite not really taking SIMD all that seriously and not having fielded a chip with an interconnect or cache structure suitable for high core counts.

Do you have some bizarre belief that Apple large cores SHARE the three NEON units, and the 4 small cores are sharing a single NEON unit?

No. Learn to read and take things in context.

The point that seems relevant to me is that Apple (and ARM vendors generally) have a suite of SIMD technologies that they can deploy to the extent it makes sense for their targets.

Their target is not the server room. Which was my entire point. No real need to bring up ARM server performance and Apple chips. The two have no relationship with one another, and you can't extrapolate the performance of future ARM server CPUs from Apple SoCs.

name99 said:
Any monkey can replicate cores and give high multi-core performance, that is just not hard.

It appears that Amazon has failed to do just that. Care to explain why?

Nothingness · Mar 30, 2020

Hitman928 said:
I agree on Dhrystone, although it can still produce useful data about the microarchitecture, but not C-ray. C-ray may be small, but it actually does something useful using calculations that are done all the time in "real world apps". It's one of the most widely run benchmarks there is. It's basically taking a small part of what apps like pov-ray do and isolating those calculations for an easy to compile (thus making it widely compatible) benchmark.

Did I say it is useless? I said it is a microbenchmark.

Isn't this a big part of the issue for ARM though. If you say we have to limit benchmarks to only applications that have equal optimizations for x86 and ARM, despite decades of x86 dominance in the server and desktop market, your available test suite is going to be severely limited and also unrealistic if you are wanting to switch to actually using ARM.

Yes that's an issue. But we are comparing micro-architectures. You want to have an as fair comparison as possible, so you try to pick software that has the same level of optimization on both architectures, and compilers that are as close as possible. As I said it's like picking benchmarks where Intel cheated for a comparison against AMD; do that if you want but don't expect to be trusted except by fanatics who don't turn their brain on.

So no tests that Epyc inherently does well in either, got it. OpenSSL is (as the name suggests), open source so ARM vendors are free to analyze and contribute code if they think the code isn't optimized well for ARM.

I'm tired of this, really. I have not excluded these benchmarks, go re-read what I wrote instead of putting words in my mouth. I'm just characterizing the benchmarks, but if you have difficulty understanding this, it's either me being unable to convey my point of view, or you being obtuse (or biased).

It was 67% faster and using an older version of GCC versus the just released AOCC which was built upon the just released LLVM so I expected a very healthy uptick in performance same as if they had tested against the older version of LLVM. With that said, that's probably still too high for "real world" results which I pointed out in my reply to Mark. However, using an older version of GCC where AMD doesn't upstream optimizations is also not a fair comparison either for Rome which was my whole point from the beginning.

Agreed, but that's not a reason to setup an unrealistic AOCC vs gcc comparison on SPEC as you did. We all know that icc is cheating and should not be trusted, so why should we even talk about AOCC on SPEC? Because we all love AMD? Because we don't believe ARM is good at anything? Because ARM fans are a pain in a place forum rules don't allow me to mention?

I think that my point boils down to this: your initial comparison, no matter how many warnings you put around it, was meaningless and was taken as a proof of an unrealistic advantage of Rome by dumb AMD fanatics.

AMD would probably greatly benefit by working a lot more with the upstream effort of LLVM and especially GCC. I'm guessing that as AMD continues to improve financially, you'll see more of these types of efforts, just like they are finally getting around to working on fixing their vector paths with glibC.

AMD Developers Looking At GNU C Library Platform Optimizations For Zen - Phoronix

www.phoronix.com

Yeah AMD finally moving years after others did, it's a very good thing. Too bad they couldn't realize earlier that spending $200k a year for a good SW engineer was all that was needed to get low hanging fruits (and ARM are guilty of the same mistake). But if you think that their AVX/AVX2 changes or other libc tweaks will give a significant speedup on SPEC int, well I won't convince you of anything.

You are also creating a pretty large strawman with your 2x speed-up comment which I not only never said, but I had already mentioned previously that comparing the published SPEC results in comparison to the Graviton2 Anandtech results showed an unrealistic advantage to Rome,

The almost 2x speed up (well OK 1.7x, 300 for AOCC vs 180 for gcc) is not a comment you made it's data from ServeTheHome. You know like an evidence that AOCC is so much faster that they are surely playing tricks that make any comparison with such results so pointless no one should make one even with a warning.

but I appreciate the digs at my character.

My English is not good enough to get this

But if you think I'm trying to insult you or whatever, I'm not and I'd be sorry if you thought so. I'm just frustrated that we obviously are in agreement on most things except that I think your original comparison was pointless (and I miserably failed at convincing you).

I suppose the proof will be in the pudding as they say and we'll have to see how many contracts the Ampere and TX3 chips end up winning.

Except that few will get that data (and are under NDA as I am).

Nothingness · Mar 30, 2020

DrMrLordX said:
None of which I am aware. SVE is deprecated anyway. A64FX is probably the only production/semi-production CPU that will ever use it. SVE2 hasn't gone into production at all.

Not sure where you get info from, but SVE is not deprecated. SVE2 is an extension of SVE.

EDIT: https://developer.arm.com/tools-and...utorials/sve/sve-vs-sve2/introduction-to-sve2

SVE2 extends the SVE instruction set to enable more data-processing domains (beyond HPC and ML).

Richie Rich · Mar 30, 2020

Carfax83 said:
The mobile gaming market isn't 55% so I don't know where you got that figure from. If you combine the PC and console markets, then that would be over 50%. Mobile gaming is likely around 45% or so.

It's actually 60% and prediction is for growing. I'm sorry to put only 55% - this might let you think x86 has a chance but it doesn't

Mobile games sparked 60% of 2019 global game revenue

I don't say x86 will disappear from market till next Monday. But economical factors (like 80% of global revenue market share are ARM based devices) means that most money goes into development of multiple ARMs. ARM and Apple worked hard to go from zero to hero and they continue in that hard work. x86 companies not, look at the IPC jump per year (Intel 4%, AMD 7.5%, ARM 20%, Apple 18%). If that trends continues then x86 is dead in less than decade. x86 needs to wake up and start working harder than ARM.

soresu · Mar 30, 2020

Richie Rich said:
It's actually 60% and prediction is for growing. I'm sorry to put only 55% - this might let you think x86 has a chance but it doesn't
Mobile games sparked 60% of 2019 global game revenue

I don't say x86 will disappear from market till next Monday. But economical factors (like 80% of global revenue market share are ARM based devices) means that most money goes into development of multiple ARMs. ARM and Apple worked hard to go from zero to hero and they continue in that hard work. x86 companies not, look at the IPC jump per year (Intel 4%, AMD 7.5%, ARM 20%, Apple 18%). If that trends continues then x86 is dead in less than decade. x86 needs to wake up and start working harder than ARM.

It depends what you mean by gaming - most games on mobile platforms are either casual or ports of very old games.

They stopped porting GTA games at GTA3 and went no further, whether because of space issues or bandwidth, who knows - but there is certainly problems in that market.

You can't really call it competitive until a significant portion of AAA PC/console games are making it to mobile within less than a year, and right now that is not even close to true. The closest comparison is Switch with some AAA wide market releases, but even then that is on gimped hardware from 2015 that does not even match XB1 for oomph - leaving graphics often less than impressive by comparison.

What is needed is one of the big smartphone makers to make their own dedicated console with a state of the art ARM SoC - Samsung would be a good choice given their choice of RDNA IP would allow them some code parity with the coming console generation.

I'm honestly surprised that Apple never made this a true priority - perhaps afraid of the PR embarrassment that might arise from matching wills to Sony or Nintendo and getting utterly annihilated in sales.

Solved! ARM Apple High-End CPU - Intel replacement

Senior member

Lifer

Senior member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Moderator Emeritus, Elite Member

Diamond Member

Diamond Member

Senior member

Senior member

Senior member

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Lifer

Diamond Member

Diamond Member

Senior member

Diamond Member