Solved! ARM Apple High-End CPU - Intel replacement

Page 32 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Richie Rich

Senior member
Jul 28, 2019
470
229
76
There is a first rumor about Intel replacement in Apple products:
  • ARM based high-end CPU
  • 8 cores, no SMT
  • IPC +30% over Cortex A77
  • desktop performance (Core i7/Ryzen R7) with much lower power consumption
  • introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
  • massive AI accelerator

Source Coreteks:
 
  • Like
Reactions: vspalanki
Solution
What an understatement :D And it looks like it doesn't want to die. Yet.


Yes, A13 is competitive against Intel chips but the emulation tax is about 2x. So given that A13 ~= Intel, for emulated x86 programs you'd get half the speed of an equivalent x86 machine. This is one of the reasons they haven't yet switched.

Another reason is that it would prevent the use of Windows on their machines, something some say is very important.

The level of ignorance in this thread would be shocking if it weren't depressing.
Let's state some basics:

(a) History. Apple has never let backward compatibility limit what they do. They are not Intel, they are not Windows. They don't sell perpetual compatibility as a feature. Christ, the big...

Richie Rich

Senior member
Jul 28, 2019
470
229
76
My point is, that many of those people have not taken into consideration the changes that would need to be made to the overall design of the A series CPUs to make it more scalable and performant in multithreaded workloads, and that these necessary changes would almost certainly definitely result in a significant reduction of single threaded performance.
  • What changes Intel has in core between consumer Core i9 and server Xeon? Besides stronger FPU and L2$ size, there is no change.
  • What changes AMD has in core between consumer Ryzen 1800X and server 32 core EPYC ? No change at all, just identical 4xCPU
  • What changes AMD has in core between consumer Ryzen 3700X and server 64 core EPYC ? No change at all, chiplets are identical, just different IOD.
  • What changes ARM has in core between consumer Cortex A76 and server 64 core Graviton2? 1MB L2$ instead of 0.5MB.

As a matter a fact Graviton2 has better ST performance than consumer A76. You should read Andrei's G2 test more carefully: Compared to a mobile Cortex-A76 such as in the Kirin 990 (which is the best A76 implementation out there), the resulting IPC is 32% better for the Graviton2 in SPECint2006, and 10% better for SPECfp2006.



Above examples prove that you are wrong and Apple A13 based server CPU would scale without any problem. Sooner you admit this fact smaller the shock from Nuvia performance you will have.


If the A series was such a winning design for other workloads, we would have seen it in x86 land already. The fact that AMD and Intel are both pursuing similar designs tells me all I need to know about the most ideal designs for these types of workloads.
Apple cores are the winning design actually. ARM changed his strategy from maximum performance per area (PPA) cores (A72, 73, 75) to much wider Austin designs A76-78 inspired by Apple. Look at A77 how wide machine it is: 4xALU+2xJump, 2xLSU+2xstore, 2xFPU, that's wider than Zen2 and is clearly inspired by Apple's super wide monster.

x86 have the most ideal designs? That's funny. You mean that company which went from competitive 3xALU design to much slower 2xALU design and almost went bankrupt because of that? And then today saying look how good we are, we made a 40% IPC jump! Yeah, one step backward and two step forward strategy. It's an interesting alternative to tick-tock strategy. And the second company was so smart to not go backwards and kept minimal effort resulting to 4% IPC per year improvement over last decade. And let themselves outrun in IPC by Apple since A10 Hurricane 2016 and by ARM A77 since last year. Where is the ideal design when x86 is loosing positions everywhere?
 

DrMrLordX

Lifer
Apr 27, 2000
21,640
10,857
136
  • What changes Intel has in core between consumer Core i9 and server Xeon? Besides stronger FPU and L2$ size, there is no change.
Gahhhh do your research!

Skylake-SP, Cascade Lake-SP, and Cooper Lake have numerous differences between them and Skylake/Kabylake/Coffeelake/Comet Lake.

The server CPUs use a mesh interconnect. Desktop CPUs do not.
The server CPUs have larger L2 and smaller L3 per core compared to desktop CPUs.
All of the server cores support some AVX512 extensions. The only consumer cores to date to support any AVX512 instructions are Cannonlake and IceLake (not counting HEDT).

Above examples prove that you are wrong and Apple A13 based server CPU would scale without any problem.

Graviton2 has problems scaling, what makes you think Apple won't? Unless they develop a better interconnect and/or include more L3 in their design?
 

coercitiv

Diamond Member
Jan 24, 2014
6,211
11,945
136
To argue the lack of apparent optimizations going from Ryzen to EPYC is proof for low tradeoff requirements between consumer and server workspace is an outstanding display of cognitive dissonance.

Except for the APU based SKUs, Ryzen is considered a prime example of consumer product based entirely on server silicon. Ryzen's weakpoints are generally acknowledged as stemming from tradeoffs to maximize EPYC efficiency. As a simple example, this forum alone is filled with people dreaming of a true monolithic desktop Ryzen.

If you want to scale, you make sacrifices.
 
Last edited:

Nothingness

Platinum Member
Jul 3, 2013
2,422
754
136
Perhaps I wasn't clear enough that I was being facetious, but I was doing so to make a point. I also don't know why you keep calling other test collections outside of Spec "microbenchmarks", seemingly trying to discredit them, as if Spec isn't made up of a collection of "microbenchmarks".
Oh please come on do yourself a favor and look at C-ray or Dhrystone source code. These are microbenchmarks. And 7-zip is a very specific benchmark which as far as I know has heavy x86 tuning and much less (if any) ARM tuning.

The last one, OpenSSL, is a clear sign of two things: heavy x86 tuning, and more important in my mind, AMD chips simply are the best at big num (look at gmplib results).

Additionally, I'm not aware of any evidence of AOCC using cheats like ICC has been accused of in the past. On the contrary, AOCC is built upon current versions of LLVM and then given additional Zen optimizations (and increased compile time to try and produce the fastest code). That gives AMD data on which optimizations produce the best real world effects so that those optimizations can then be incorporated into future industry standard compilers.
And you think that AOCC being almost twice faster than gcc in ServeTheHome results is not a sign of heavy targeted tuning. Really? You think gcc is that bad while it's been tested on SPEC for years?

From my understanding ARM does the same but upstreams their optimizations into GCC so you get the optimizations earlier with that compiler compared to Zen (if product timelines were equivalent).
That's an AMD failure. A big one, don't you think? And if you think some magic generic (as opposed to "let's make SPEC fast") tuning will bring a 2x speed up to SPEC rate AMD you're being incredibly naive, or heavily biased.
 

Nothingness

Platinum Member
Jul 3, 2013
2,422
754
136
Has anyone done any comparison benchmarking with SVE against AVX512?
No because no SVE chip is publicly available. We can just try to guess performance from data sheets.

If you want to have some fun: https://github.com/fujitsu/A64FX/tree/master/doc

And it's not a question of SVE vs AVX-512, it's a question of comparing chips with each of these instruction sets ;)
 
  • Like
Reactions: Tlh97 and Carfax83

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
  • What changes Intel has in core between consumer Core i9 and server Xeon? Besides stronger FPU and L2$ size, there is no change.
  • What changes AMD has in core between consumer Ryzen 1800X and server 32 core EPYC ? No change at all, just identical 4xCPU
  • What changes AMD has in core between consumer Ryzen 3700X and server 64 core EPYC ? No change at all, chiplets are identical, just different IOD.
  • What changes ARM has in core between consumer Cortex A76 and server 64 core Graviton2? 1MB L2$ instead of 0.5MB.

Your problem is that you don't do any research before making grandiose statements you don't know anything about. Intel's modern Xeon lineup uses a mesh interconnect which is not the case for their consumer CPUs that still use the ring bus. Also, Rome has humongous L3 caches, up to 256MB which is far more than what you see in the consumer lineup.

Above examples prove that you are wrong and Apple A13 based server CPU would scale without any problem. Sooner you admit this fact smaller the shock from Nuvia performance you will have.

This is a total mischaracterization of my argument. I have never said that the Apple A13 could not scale up by adding more cores. My argument has always been that while the A13 could be scaled up, it would need to undergo significant changes to the cache hierarchy and the microarchitecture itself, which would result in reducing the very high single threaded IPC you've been raving about this entire thread.

Apple cores are the winning design actually. ARM changed his strategy from maximum performance per area (PPA) cores (A72, 73, 75) to much wider Austin designs A76-78 inspired by Apple. Look at A77 how wide machine it is: 4xALU+2xJump, 2xLSU+2xstore, 2xFPU, that's wider than Zen2 and is clearly inspired by Apple's super wide monster.

The Apple cores are a winning design only if you want strong single threaded performance. The frequency/voltage curve is too steep to be using this core (as is) as the basis for a serious multicore CPU, without significant modifications. Let me refresh your memory as you seem to keep ignoring this:

1585519498004.png

Even just 8 of these things on the same die would be too much, much less 64.


x86 have the most ideal designs? That's funny. You mean that company which went from competitive 3xALU design to much slower 2xALU design and almost went bankrupt because of that? And then today saying look how good we are, we made a 40% IPC jump! Yeah, one step backward and two step forward strategy. It's an interesting alternative to tick-tock strategy. And the second company was so smart to not go backwards and kept minimal effort resulting to 4% IPC per year improvement over last decade. And let themselves outrun in IPC by Apple since A10 Hurricane 2016 and by ARM A77 since last year. Where is the ideal design when x86 is loosing positions everywhere?

Nobody cares about Apple's high IPC, because it's not used outside of smartphones and tablets. Also, if ARM CPUs ever clock as high as x86-64 CPUs, they will run into the same problems with increasing IPC for reasons already mentioned. And the last time I checked, x86-64 is dominating HPC, servers, databases, gaming PCs, laptops and regular desktops.

ARM has lots of work to do if they want to be the dominant architecture that x86-64 has become! :cool:
 

Hitman928

Diamond Member
Apr 15, 2012
5,321
8,005
136
Oh please come on do yourself a favor and look at C-ray or Dhrystone source code. These are microbenchmarks.

I agree on Dhrystone, although it can still produce useful data about the microarchitecture, but not C-ray. C-ray may be small, but it actually does something useful using calculations that are done all the time in "real world apps". It's one of the most widely run benchmarks there is. It's basically taking a small part of what apps like pov-ray do and isolating those calculations for an easy to compile (thus making it widely compatible) benchmark.

And 7-zip is a very specific benchmark which as far as I know has heavy x86 tuning and much less (if any) ARM tuning.

Isn't this a big part of the issue for ARM though. If you say we have to limit benchmarks to only applications that have equal optimizations for x86 and ARM, despite decades of x86 dominance in the server and desktop market, your available test suite is going to be severely limited and also unrealistic if you are wanting to switch to actually using ARM.

The last one, OpenSSL, is a clear sign of two things: heavy x86 tuning, and more important in my mind, AMD chips simply are the best at big num (look at gmplib results).

So no tests that Epyc inherently does well in either, got it. OpenSSL is (as the name suggests), open source so ARM vendors are free to analyze and contribute code if they think the code isn't optimized well for ARM.

And you think that AOCC being almost twice faster than gcc in ServeTheHome results is not a sign of heavy targeted tuning. Really? You think gcc is that bad while it's been tested on SPEC for years?

It was 67% faster and using an older version of GCC versus the just released AOCC which was built upon the just released LLVM so I expected a very healthy uptick in performance same as if they had tested against the older version of LLVM. With that said, that's probably still too high for "real world" results which I pointed out in my reply to Mark. However, using an older version of GCC where AMD doesn't upstream optimizations is also not a fair comparison either for Rome which was my whole point from the beginning.

That's an AMD failure. A big one, don't you think? And if you think some magic generic (as opposed to "let's make SPEC fast") tuning will bring a 2x speed up to SPEC rate AMD you're being incredibly naive, or heavily biased.

AMD would probably greatly benefit by working a lot more with the upstream effort of LLVM and especially GCC. I'm guessing that as AMD continues to improve financially, you'll see more of these types of efforts, just like they are finally getting around to working on fixing their vector paths with glibC.


You are also creating a pretty large strawman with your 2x speed-up comment which I not only never said, but I had already mentioned previously that comparing the published SPEC results in comparison to the Graviton2 Anandtech results showed an unrealistic advantage to Rome, but I appreciate the digs at my character. I suppose the proof will be in the pudding as they say and we'll have to see how many contracts the Ampere and TX3 chips end up winning.
 
Last edited:
  • Like
Reactions: Tlh97 and Carfax83

Richie Rich

Senior member
Jul 28, 2019
470
229
76
ARM has lots of work to do if they want to be the dominant architecture that x86-64 has become! :cool:
ARM devices has majority of revenue worldwide:
  • biggest market is smart phones (7x bigger than servers) and ARM has 100% and x86 0%.
  • gaming market: 55% of revenue goes to ARM devices
  • server market: ARM is expanding, several new companies like Amazon, Ampere, Marvell, Nuvia, Fujitsu, Hisilicon ... how many new companies has x86? Zero.
  • laptops: Qcomm has Snapdragon 8cx …. another ARM expansion
  • desktop: HiSilicon tries ATX boards with 64 core Kungpeng 920 …. another ARM expansion
And where is x86 expanding? Nowhere. It is loosing positions everywhere so it's hard to speak about dominancy today ;)
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,570
14,520
136
ARM devices has majority of revenue worldwide:
  • biggest market is smart phones (7x bigger than servers) and ARM has 100% and x86 0%.
  • gaming market: 55% of revenue goes to ARM devices
  • server market: ARM is expanding, several new companies like Amazon, Ampere, Marvell, Nuvia, Fujitsu, Hisilicon ... how many new companies has x86? Zero.
  • laptops: Qcomm has Snapdragon 8cx …. another ARM expansion
  • desktop: HiSilicon tries ATX boards with 64 core Kungpeng 920 …. another ARM expansion
And where is x86 expanding? Nowhere. It is loosing positions everywhere so it's hard to speak about dominancy today ;)
So because ARM is used in smartphones, and thats what it is designed for, you think it will expand to all other areas that its NOT designed for ?

Let me know what you are smoking, I want some. :)
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
ARM devices has majority of revenue worldwide:
  • biggest market is smart phones (7x bigger than servers) and ARM has 100% and x86 0%.
  • gaming market: 55% of revenue goes to ARM devices
  • server market: ARM is expanding, several new companies like Amazon, Ampere, Marvell, Nuvia, Fujitsu, Hisilicon ... how many new companies has x86? Zero.
  • laptops: Qcomm has Snapdragon 8cx …. another ARM expansion
  • desktop: HiSilicon tries ATX boards with 64 core Kungpeng 920 …. another ARM expansion
And where is x86 expanding? Nowhere. It is loosing positions everywhere so it's hard to speak about dominancy today ;)

The mobile gaming market isn't 55% so I don't know where you got that figure from. If you combine the PC and console markets, then that would be over 50%. Mobile gaming is likely around 45% or so.

And I'm not denying that ARM won't ever gain a foothold in many of these industries. It's just going to be a uphill battle. ARM can change and evolve, but so can x86-64. There's been rumors for several years that Intel wants to make a cleaner x86 architecture with a lot of the legacy stuff removed. I guess we'll have to see.
 

insertcarehere

Senior member
Jan 17, 2013
639
607
136
And I'm not denying that ARM won't ever gain a foothold in many of these industries. It's just going to be a uphill battle. ARM can change and evolve, but so can x86-64. There's been rumors for several years that Intel wants to make a cleaner x86 architecture with a lot of the legacy stuff removed. I guess we'll have to see.

Snapdragon 835 (2017) to Snapdragon 855 (2019) literally doubled single-threaded performance in two years (I.E Zen 1 to 2), while the 865 is another 30% increase in the span of a year. It's clear which architecture has been evolving faster in this timeframe.
 

name99

Senior member
Sep 11, 2010
404
303
136
In phones. You still can't get a reasonably-priced A76 (or A77) sbc in the united states. Want something a76-like in a laptop? 8cx or bust. That's it! There's just no way to buy these cores in any kind of notebook or desktop form factor. Except 8cx. Where are the "serious" A77 machines this year? I don't see them.

A76 was widespread in 2018, furthermore. It took over a year to get it anywhere that wasn't a phone. A77 is new in phones this year. I would not expect much/any access to A77 outside of mobile until NEXT year.

As for Ampere and ThunderX3 . . . still waiting!



The small cores only have one NEON unit, I think, but Graviton2 has them on every core. It's notable that Amazon (and really it was ARM with the Neoverse reference design) committed all that silicon to SIMD while Apple mostly didn't. One's a phone SoC and the other isn't. Consider the previous context of this thread as to why that's relevant.

What are you talking about?
The LARGE Apple cores have 3 NEON units each. The SMALL cores have 1 NEON unit each because, duh, they are optimized for being small cores. I've no idea why you consider this an interesting point.
Do you have some bizarre belief that Apple large cores SHARE the three NEON units, and the 4 small cores are sharing a single NEON unit?

The point that seems relevant to me is that Apple (and ARM vendors generally) have a suite of SIMD technologies that they can deploy to the extent it makes sense for their targets. Apple will presumably use many of their large cores (presumably with AMX or SVE or both [we still don't know the relationship between AMX, SVE, and the ARMv8.6 spec] for both their desktop machines and any possible servers.

Graviton 2 I believe (but I am not sure) has 2 NEON units per core. ThunderX 3 has 4 NEON units per core. There's obviously flexibility here.
 

name99

Senior member
Sep 11, 2010
404
303
136
The Apple cores are a winning design only if you want strong single threaded performance.

Uh, WOT????
OF COURSE the issue is "strong single threaded performance". That is literally the entire discussion!!!
Any monkey can replicate cores and give high multi-core performance, that is just not hard. Regardless of what you think of Amazon or Marvell, they have functioning designs with 64 or 96 cores on a chip; Apple could do the same, Cavium is doing the same. Huawei, Samsung could do the same. It's JUST NOT HARD.

The hard part is the part you're insisting we ignore: strong single threaded performance!
 

SarahKerrigan

Senior member
Oct 12, 2014
373
539
136
Uh, WOT????
OF COURSE the issue is "strong single threaded performance". That is literally the entire discussion!!!
Any monkey can replicate cores and give high multi-core performance, that is just not hard. Regardless of what you think of Amazon or Marvell, they have functioning designs with 64 or 96 cores on a chip; Apple could do the same, Cavium is doing the same. Huawei, Samsung could do the same. It's JUST NOT HARD.

The hard part is the part you're insisting we ignore: strong single threaded performance!

While you are correct on this, I would remind you that Cavium is Marvell. You've talked about them a couple times as if they're separate entities or product lines, and they are not.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Snapdragon 835 (2017) to Snapdragon 855 (2019) literally doubled single-threaded performance in two years (I.E Zen 1 to 2), while the 865 is another 30% increase in the span of a year. It's clear which architecture has been evolving faster in this timeframe.

And do you expect that to continue?
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Uh, WOT????
OF COURSE the issue is "strong single threaded performance". That is literally the entire discussion!!!
Any monkey can replicate cores and give high multi-core performance, that is just not hard. Regardless of what you think of Amazon or Marvell, they have functioning designs with 64 or 96 cores on a chip; Apple could do the same, Cavium is doing the same. Huawei, Samsung could do the same. It's JUST NOT HARD.

The hard part is the part you're insisting we ignore: strong single threaded performance!

I'm not saying we should ignore strong single threaded performance from the Apple A series, just to understand how Apple has attained it and why their microarchitecture is focused on it. As myself and others have been saying, single threaded performance is very important for mobile devices, while for desktop and other markets that x86 has a strong footing in, multithreaded performance has also become critical as more workloads are being parallelized. So Intel and AMD have to make more balanced designs in terms of single and multithreaded performance compared to Apple, who's CPUs are going into smartphones and tablets.

Both Intel and AMD design CPUs with microarchitectures that must be performant and scalable in a wide variety of platforms, everything from laptops, gaming rigs to servers to HPC etcetera. So focusing on single threaded performance like what Apple is doing would be counterproductive. So if Apple decided to make CPUs to compete with Intel and AMD in other markets, I would expect that whatever CPU they would come up with would look drastically different than the A series.

Also I would disagree with your statement concerning how easy it is to replicate cores and give high multi core performance. Although I'm not an engineer or industry professional, making a good, scalable uncore is apparently really hard to do and both Intel and AMD utilize significant engineering and financial resources towards doing so.
 
  • Like
Reactions: Thunder 57

DrMrLordX

Lifer
Apr 27, 2000
21,640
10,857
136
What are you talking about?
The LARGE Apple cores have 3 NEON units each. The SMALL cores have 1 NEON unit each because, duh, they are optimized for being small cores. I've no idea why you consider this an interesting point.

It's interesting because Apple still hasn't released an A-series SoC that prioritizes NEON on every core. Amazon has done so. Fujitsu went above-and-beyond by adopting SVE (too bad for them SVE is deprecated, but not all is lost for them). SIMD performance is not an absolute priority on a mobile chip. Very good of you to make my point for me. If you hadn't noticed, someone here seems to think Apple is on the cusp of taking over the server world, despite not really taking SIMD all that seriously and not having fielded a chip with an interconnect or cache structure suitable for high core counts.

Do you have some bizarre belief that Apple large cores SHARE the three NEON units, and the 4 small cores are sharing a single NEON unit?

No. Learn to read and take things in context.

The point that seems relevant to me is that Apple (and ARM vendors generally) have a suite of SIMD technologies that they can deploy to the extent it makes sense for their targets.

Their target is not the server room. Which was my entire point. No real need to bring up ARM server performance and Apple chips. The two have no relationship with one another, and you can't extrapolate the performance of future ARM server CPUs from Apple SoCs.

Any monkey can replicate cores and give high multi-core performance, that is just not hard.

It appears that Amazon has failed to do just that. Care to explain why?
 

Nothingness

Platinum Member
Jul 3, 2013
2,422
754
136
I agree on Dhrystone, although it can still produce useful data about the microarchitecture, but not C-ray. C-ray may be small, but it actually does something useful using calculations that are done all the time in "real world apps". It's one of the most widely run benchmarks there is. It's basically taking a small part of what apps like pov-ray do and isolating those calculations for an easy to compile (thus making it widely compatible) benchmark.
Did I say it is useless? I said it is a microbenchmark.

Isn't this a big part of the issue for ARM though. If you say we have to limit benchmarks to only applications that have equal optimizations for x86 and ARM, despite decades of x86 dominance in the server and desktop market, your available test suite is going to be severely limited and also unrealistic if you are wanting to switch to actually using ARM.
Yes that's an issue. But we are comparing micro-architectures. You want to have an as fair comparison as possible, so you try to pick software that has the same level of optimization on both architectures, and compilers that are as close as possible. As I said it's like picking benchmarks where Intel cheated for a comparison against AMD; do that if you want but don't expect to be trusted except by fanatics who don't turn their brain on.

So no tests that Epyc inherently does well in either, got it. OpenSSL is (as the name suggests), open source so ARM vendors are free to analyze and contribute code if they think the code isn't optimized well for ARM.
I'm tired of this, really. I have not excluded these benchmarks, go re-read what I wrote instead of putting words in my mouth. I'm just characterizing the benchmarks, but if you have difficulty understanding this, it's either me being unable to convey my point of view, or you being obtuse (or biased).

It was 67% faster and using an older version of GCC versus the just released AOCC which was built upon the just released LLVM so I expected a very healthy uptick in performance same as if they had tested against the older version of LLVM. With that said, that's probably still too high for "real world" results which I pointed out in my reply to Mark. However, using an older version of GCC where AMD doesn't upstream optimizations is also not a fair comparison either for Rome which was my whole point from the beginning.
Agreed, but that's not a reason to setup an unrealistic AOCC vs gcc comparison on SPEC as you did. We all know that icc is cheating and should not be trusted, so why should we even talk about AOCC on SPEC? Because we all love AMD? Because we don't believe ARM is good at anything? Because ARM fans are a pain in a place forum rules don't allow me to mention?

I think that my point boils down to this: your initial comparison, no matter how many warnings you put around it, was meaningless and was taken as a proof of an unrealistic advantage of Rome by dumb AMD fanatics.

AMD would probably greatly benefit by working a lot more with the upstream effort of LLVM and especially GCC. I'm guessing that as AMD continues to improve financially, you'll see more of these types of efforts, just like they are finally getting around to working on fixing their vector paths with glibC.

Yeah AMD finally moving years after others did, it's a very good thing. Too bad they couldn't realize earlier that spending $200k a year for a good SW engineer was all that was needed to get low hanging fruits (and ARM are guilty of the same mistake). But if you think that their AVX/AVX2 changes or other libc tweaks will give a significant speedup on SPEC int, well I won't convince you of anything.

You are also creating a pretty large strawman with your 2x speed-up comment which I not only never said, but I had already mentioned previously that comparing the published SPEC results in comparison to the Graviton2 Anandtech results showed an unrealistic advantage to Rome,
The almost 2x speed up (well OK 1.7x, 300 for AOCC vs 180 for gcc) is not a comment you made it's data from ServeTheHome. You know like an evidence that AOCC is so much faster that they are surely playing tricks that make any comparison with such results so pointless no one should make one even with a warning.

but I appreciate the digs at my character.
My English is not good enough to get this :( But if you think I'm trying to insult you or whatever, I'm not and I'd be sorry if you thought so. I'm just frustrated that we obviously are in agreement on most things except that I think your original comparison was pointless (and I miserably failed at convincing you).

I suppose the proof will be in the pudding as they say and we'll have to see how many contracts the Ampere and TX3 chips end up winning.
Except that few will get that data (and are under NDA as I am).
 
  • Like
Reactions: Richie Rich

Nothingness

Platinum Member
Jul 3, 2013
2,422
754
136
None of which I am aware. SVE is deprecated anyway. A64FX is probably the only production/semi-production CPU that will ever use it. SVE2 hasn't gone into production at all.
Not sure where you get info from, but SVE is not deprecated. SVE2 is an extension of SVE.

EDIT: https://developer.arm.com/tools-and...utorials/sve/sve-vs-sve2/introduction-to-sve2
SVE2 extends the SVE instruction set to enable more data-processing domains (beyond HPC and ML).
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
The mobile gaming market isn't 55% so I don't know where you got that figure from. If you combine the PC and console markets, then that would be over 50%. Mobile gaming is likely around 45% or so.
It's actually 60% and prediction is for growing. I'm sorry to put only 55% - this might let you think x86 has a chance but it doesn't :)
Mobile games sparked 60% of 2019 global game revenue

I don't say x86 will disappear from market till next Monday. But economical factors (like 80% of global revenue market share are ARM based devices) means that most money goes into development of multiple ARMs. ARM and Apple worked hard to go from zero to hero and they continue in that hard work. x86 companies not, look at the IPC jump per year (Intel 4%, AMD 7.5%, ARM 20%, Apple 18%). If that trends continues then x86 is dead in less than decade. x86 needs to wake up and start working harder than ARM.
 

soresu

Platinum Member
Dec 19, 2014
2,665
1,865
136
It's actually 60% and prediction is for growing. I'm sorry to put only 55% - this might let you think x86 has a chance but it doesn't :)
Mobile games sparked 60% of 2019 global game revenue

I don't say x86 will disappear from market till next Monday. But economical factors (like 80% of global revenue market share are ARM based devices) means that most money goes into development of multiple ARMs. ARM and Apple worked hard to go from zero to hero and they continue in that hard work. x86 companies not, look at the IPC jump per year (Intel 4%, AMD 7.5%, ARM 20%, Apple 18%). If that trends continues then x86 is dead in less than decade. x86 needs to wake up and start working harder than ARM.
It depends what you mean by gaming - most games on mobile platforms are either casual or ports of very old games.

They stopped porting GTA games at GTA3 and went no further, whether because of space issues or bandwidth, who knows - but there is certainly problems in that market.

You can't really call it competitive until a significant portion of AAA PC/console games are making it to mobile within less than a year, and right now that is not even close to true. The closest comparison is Switch with some AAA wide market releases, but even then that is on gimped hardware from 2015 that does not even match XB1 for oomph - leaving graphics often less than impressive by comparison.

What is needed is one of the big smartphone makers to make their own dedicated console with a state of the art ARM SoC - Samsung would be a good choice given their choice of RDNA IP would allow them some code parity with the coming console generation.

I'm honestly surprised that Apple never made this a true priority - perhaps afraid of the PR embarrassment that might arise from matching wills to Sony or Nintendo and getting utterly annihilated in sales.