Solved! ARM Apple High-End CPU - Intel replacement

Richie Rich · Oct 14, 2019

There is a first rumor about Intel replacement in Apple products:

ARM based high-end CPU
8 cores, no SMT
IPC +30% over Cortex A77
desktop performance (Core i7/Ryzen R7) with much lower power consumption
introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
massive AI accelerator

Source Coreteks:

KeithP · Jun 25, 2020

Richie Rich said:
Transition to ARM is very conservative and late move from Apple. Normally from technical point of view they could move to ARM since A11 Monsoon 6xALU core three years ago.

I don't think the transition was just about an available CPU. All the groundwork for the transition had to be done in software development. Tools had to be created and changes to the existing macOS were done to start the transition and make it easier. Apple has done this before and has probably be planning this for years. I don't think it is accurate to describe it as late or conservative.

Richie Rich said:
But what surprises me a lot is MS Word and Excel in native ARM MacOS binary while MS's own Windows on ARM is using emulation for Surface X Pro.

Could it not be that Microsoft is Working on a native version for Windows on ARM and it is more or less at the same stage of development? My understanding was that Office currently works pretty well on ARM, emulated or not. Maybe the ARM versions are being developed at the same time and will be released at the same time?

-KeithP

Richie Rich · Jun 25, 2020

KeithP said:
I don't think the transition was just about an available CPU. All the groundwork for the transition had to be done in software development. Tools had to be created and changes to the existing macOS were done to start the transition and make it easier. Apple has done this before and has probably be planning this for years. I don't think it is accurate to describe it as late or conservative.

The chief architect of those Apple cores (Gerard Williams) left Apple and created start up called Nuvia to create server ARM CPU. Reason was that Apple didn't want to expand outside mobile market. Another proof is when you look at TOP10 PPC table https://forums.anandtech.com/thread...ores-ipc-ppc-comparison.2580622/post-40203309
Even an old A10 Hurrican has higher PPC/IPC than any latest x86 CPU including Intel's Ice Lake and Zen2. You can predict that no Golden Cove neither Zen3 will beat an old A11 Monsoon 6xALU monster.

Apple engineers knew they are gonna have the top leading CPU in PPC since 2012 where development of A10 and A11 started. Jim Keller left in 2012, coincidence? Maybe Keller knew that Apple's management has no clue about potential of those cores and will keep them in iPhone forever. G. Williams left 6 later for the same reason. Keller had always very fine sense for management capabilities. From this point Apple is the best company (8 years, 2004-2012), AMD is much worse (left after 3 years and they canceled his K12) and lets see if he returns into Intel or not.

KeithP said:
Could it not be that Microsoft is Working on a native version for Windows on ARM and it is more or less at the same stage of development? My understanding was that Office currently works pretty well on ARM, emulated or not. Maybe the ARM versions are being developed at the same time and will be released at the same time?

-KeithP

Good catch. Probably yes.

Doug S · Jun 25, 2020

LightningZ71 said:
I have no expectation that MT performance will suffer greatly on an A13 or later implementation. The Thunder low power cores on A13 are significantly faster than the Temptest cores on the A12. In a situation where their performance will count, the slow downs that the AT review showed due to the memory system being in a low power state wouldn't be an issue.

Since Apple is designing SoCs specifically for the Mac I would not assume they even have little cores.

IntelUser2000 · Jun 25, 2020

Richie Rich said:
Even an old A10 Hurrican has higher PPC/IPC than any latest x86 CPU including Intel's Ice Lake and Zen2. You can predict that no Golden Cove neither Zen3 will beat an old A11 Monsoon 6xALU monster.

Actually it will be better than A12 if Golden Cove gets 20% improvement.

You know Windows isn't the best score for the 1065G7 right? With 1420 GB5 score it'll outperform A12, assuming 5-7% for WC and 20% for GC.

LightningZ71 · Jun 25, 2020

Doug S said:
Since Apple is designing SoCs specifically for the Mac I would not assume they even have little cores.

They did mention having a "Family" of SoCs. I suspect that the laptop chips will be big.LITTLE to keep power numbers under control. Their higher end stuff could "easily" be just 8 big cores without having to reinvent the wheel.

People like to beat on the little cores as being a performance liability. Compare their little cores to the performance uplift that SMT gives x86 cores. Unless the task is heavily memory bound, the performance improvement is only ~30% or so between 4/4 and 4/8. I dare say that the little cores on A13 are at least 30% of the performance of the big cores, so why would they be a big liability in MT? Their clocks in mobile applications are heavily down-selected to keep in the high point of the efficiency curve for the process and design rules they work with. With an expanded power budget typical of laptops vs. mobile phones, there's not as much need to be so aggressive with keeping their clocks low. It'll be worth it to sacrifice some thermal and power efficiency to push those clocks another 20%.

soresu · Jun 25, 2020

An interesting question would be how well Rosetta 2 efficiency compares to MS's WoW solution for x86 -> ARM64 binary translation.

In other news MS just ported OpenJDK to Windows on ARM. Link.

I'm not sure how much Windows software relies on it, but it's one thing less for devs to worry about I guess.

Doesn't Android development rely on OpenJDK?

That would certainly open up a market for Android devs on WARM.

Thala · Jun 25, 2020

soresu said:
An interesting question would be how well Rosetta 2 efficiency compares to MS's WoW solution for x86 -> ARM64 binary translation.

Hopefully close to the same performance. If Rosetta 2 is closer to QEMU, that would be extremely bad.
With other words, if they achieve close to or slightly above 50% native in something like Geekbench like Microsoft - i would be impressed.

RetroZombie · Jun 25, 2020

Richie Rich said:
AMD Zen2 .... 3.6 mm2 including 0.5MB L2$

Stupid question:
Since you are measuring the core size by mm2 and not transistors, your measure is based on matisse, right?
Since renoir is more dense than matisse, what is the size of the renoir zen2 core in mm2?

Doug S · Jun 25, 2020

soresu said:
An interesting question would be how well Rosetta 2 efficiency compares to MS's WoW solution for x86 -> ARM64 binary translation

It should be MUCH better, at least for the binaries that are able to be converted at install time. Microsoft's is run time translation which even when cached between runs will never perform as well as static translation.

The unanswered question is what percentage of binaries will Rosetta 2 be able to perform static translation on. There are some obvious cases where it wouldn't work, though perhaps that makes those early candidates for an ARM port. For instance, anything that does run time code generation is only going to work with a JIT. However, the best examples of that are browsers (and obviously Safari will be native, but we'll have to see how long it takes for Firefox & Chrome) and Java (not sure if that will be native at launch but I just heard today about a preview version of Java for Windows/ARM)

soresu · Jun 25, 2020

Doug S said:
It should be MUCH better, at least for the binaries that are able to be converted at install time. Microsoft's is run time translation which even when cached between runs will never perform as well as static translation.

Static translation is not currently possible for pre compiled assembly code - at least not without a far more lengthy and non automated procedure than Apple can offer customers en masse.

This is not some magic secret sauce that Apple is using, it is simply for uncompiled source code or likely more compact program 'byte code' in much the same way as Android does its AOT compilation for Java and Kotlin apps.

If they could do it at all, then everything would do this and dynamic recompilation (binary translation) would not be necessary at all.

Clearly it is still necessary and you have the wrong end of the stick.

No high end DCC app that requires complex computation and/or a very large code base will be available in an uncompiled form through any app store, no matter how secure said app store claims to be, this would open up those companies to theft of valuable codebases which would be unacceptable to any large company.

This rules out Adobe CC and Autodesk software at the very least, not to mention most games.

Doug S · Jun 26, 2020

soresu said:
Static translation is not currently possible for pre compiled assembly code - at least not without a far more lengthy and non automated procedure than Apple can offer customers en masse.

This is not some magic secret sauce that Apple is using, it is simply for uncompiled source code or likely more compact program 'byte code' in much the same way as Android does its AOT compilation for Java and Kotlin apps.

If they could do it at all, then everything would do this and dynamic recompilation (binary translation) would not be necessary at all.

Clearly it is still necessary and you have the wrong end of the stick.

No high end DCC app that requires complex computation and/or a very large code base will be available in an uncompiled form through any app store, no matter how secure said app store claims to be, this would open up those companies to theft of valuable codebases which would be unacceptable to any large company.

This rules out Adobe CC and Autodesk software at the very least, not to mention most games.

Well Apple said they are doing this at WWDC, maybe you should tell them what they are doing isn't possible.

beginner99 · Jun 26, 2020

marcUK2 said:
.personally I think that now there is huge competition between Intel and AMD...when Intel sorts out its problems, which I expect to be fairly soon...Apple have probably shot themselves in the foot

While I see where you are coming from, I think being vertically integrate like with consoles is also a huge advantage but requires you to sell a lot of hardware. You can extract more out of the hardware because you don't have to worry about compatibility of different configuration much at all.
Apple controls the OS and all it's APIs plus the SOC. I'm sure they will load the SOC full of "accelerators" speak dedicated hardware blocks which will be called transparently to the developers by the according API. Dedicated hardware is just so much more efficient and faster. I mean they could do the same as the consoles will with some hardware for IO. Lots of possibilities.
The main thing still missing in PC space is AMDs dream of HSA which they seem to have buried. If you can just programm your computations and the OS/SOC take care on which hardware it runs (the best suited for the task) that would be a tremendous win and make iGPUs much, much more usable. Currently they are a waste of space, Renoir is the only reasonable one really. Tigerlake is for sure overshooting on the iGPU side.

marcUK2 · Jun 26, 2020

Richie Rich said:
billion
I agree with you. However that Apple A13 is monstrous core and it's special kind. It has 128+128 kB L1$ what is 50% of Zen2 L2 cache (512kB L2). A13's L2 is 8MB shared what is 50% of Zen2's L3 for CCX (16MB). So it's very hard to compare such a different monster core with traditional x86 or Cortex cores because it's L2 acts like L3.

But anyway, 4.53mm2 A13 is 1.26x larger core size than Zen2's 3.6 mm2. However A13 is still producing 1.84x higher PPC, so 1.84/1.26 = results in 1.46x higher PPA iso clock. Don't you think it's still massive advantage?

How can you receive 2.28 PPA for A78 when core area is more than 3x smaller? Even with identical PPC you have to receive PPA > 3x. And when A78 is approx 15% faster at iso clock than Zen2 you have to get even higher PPA, don't you think?

Not sure I agree with this way of measuring...I argue that it is total performance for a given number of transistors for the whole package(not just the core size) at a specific operational frquency on a given equal manufacturing node.
Then I believe the ARMS are not so much greater than x86s.

For example the A12z has 10 billion transistors, while a zen chiplet has 3.9 billion and the IO has 2.1 billion , and a low end RX560 GPU has 3 billion

...Which I suspect absolutely destroys an A12z in both CPU and GPU in raw performance, while the A will obvious destroy in efficiency

(HOPEFULLY Keller returns to AMD IMO)

Richie Rich · Jun 26, 2020

RetroZombie said:
Stupid question:
Since you are measuring the core size by mm2 and not transistors, your measure is based on matisse, right?
Since renoir is more dense than matisse, what is the size of the renoir zen2 core in mm2?

It's a good question, not a stupid.
But how dense the Renoir is? Give me the number. Based on pretty high clock Renoir achieves I doubt there is significantly higher density. Even if Renoir is double dense, that would shrink 3.6 -> 1.8 mm2 and this is still 35% more area than A78 while Renoir has lower PPC/IPC than A78 (and probably lower than Matisse due to smaller L3$). Renoir is loosing in both parameters anyway.

soresu · Jun 26, 2020

marcUK2 said:
For example the A12z has 10 billion transistors, while a zen chiplet has 3.9 billion and the IO has 2.1 billion , and a low end RX560 GPU has 3 billion

...Which I suspect absolutely destroys an A12z in both CPU and GPU in raw performance, while the A will obvious destroy in efficiency

Apples and oranges.

CPU's and GPU's are optimised for different workloads, that's why we have APU's with both instead of one 'jack of all trades' compute uArch (which is not to say that such a uArch is impossible, just out of reach at present).

While GPU's have become more capable of general compute over time, they are still massively parallel processors and do not deal well with mostly serial compute tasks of the kind still constrained to single threaded, or very lightly threaded operation on CPU's.

It would be interesting to see whether in the future if ML cannot be used to design a unifying compute uArch that could handle both CPU and GPU loads performantly and efficiently - though of course with GPU encompassing ML/tensor loads now as well as raster and RT fixed function circuits I would imagine that such a uArch would be cumbersome.

insertcarehere · Jun 26, 2020

marcUK2 said:
Not sure I agree with this way of measuring...I argue that it is total performance for a given number of transistors for the whole package(not just the core size) at a specific operational frquency on a given equal manufacturing node.
Then I believe the ARMS are not so much greater than x86s.

For example the A12z has 10 billion transistors, while a zen chiplet has 3.9 billion and the IO has 2.1 billion , and a low end RX560 GPU has 3 billion

...Which I suspect absolutely destroys an A12z in both CPU and GPU in raw performance, while the A will obvious destroy in efficiency

(HOPEFULLY Keller returns to AMD IMO)

A mobile SoC has to integrate a lot more fixed functions (ISP, NPU, wifi..etc) which are normally taken care of by the motherboard in any desktop CPU/GPU.

RetroZombie · Jun 26, 2020

Richie Rich said:
But how dense the Renoir is? Give me the number.

I don't know, i only know it's more dense.

I only found this link with all the nodes densities, but not by chip parts:
7nm vs 10nm vs 14nm: Fabrication Process

Mopetar · Jun 26, 2020

Since Apple is moving to their own SoC it stands to reason they'd need to design their own boards as well.

I don't think it hurts them to have the SoC take care of a lot of those functions and does simplify the board design and cut down on the number of suppliers Apple would have to deal with.

I think some of it ultimately depends on how their chip designs evolve over time. It isn't too difficult to imagine the chips they have now being good enough to power a MacBook Air, but the real question is how they build for the pro segment where 8+ cores are necessary. Using multiple SoCs creates a lot of redundancy for the fixed function silicon, but it does let them harvest defective dies that have otherwise functional cores. Adopting AMDs chiplet strategy is also a possibility, but their product range seems too wide to have a single chiplet to serve then all.

Doug S · Jun 26, 2020

Mopetar said:
Since Apple is moving to their own SoC it stands to reason they'd need to design their own boards as well.

I don't think it hurts them to have the SoC take care of a lot of those functions and does simplify the board design and cut down on the number of suppliers Apple would have to deal with.

Unlike Intel and AMD, Apple doesn't have to serve a large marketplace of many OEMs with widely disparate needs and product types. They know exactly what products their chips will go in and can decide what works best as far as what goes in the SoC and what goes on the board. A lot of that probably comes down to "what do we design ourselves vs what do we buy?"

They can't integrate wifi on the SoC for example because they buy chips from Broadcom. Since they are working towards having their own cellular modem it makes sense they may do the same for wifi someday (though it sounds like they think Broadcom is a lot easier to deal with than Qualcomm so there's less financial incentive to make this move) Another possibility would be to strike a deal to license the IP from Broadcom. Every new process generation ups the transistor budget by so much that greater integration is to be expected. You can only usefully use so many cores after all, especially on the consumer level products.

marcUK2 · Jun 26, 2020

insertcarehere said:
A mobile SoC has to integrate a lot more fixed functions (ISP, NPU, wifi..etc) which are normally taken care of by the motherboard in any desktop CPU/GPU.

Which is why it's not such a great idea to be putting these things in imacs or power macs I doubt the MB logic really takes up that much space on a die relative to cache, the gpu, hell even cores are minute structures really.

I just looked up renoir's transistor count and its 10 billion as well. With 8 core zen2, Vega gpu, and soc logic. And runs near 4.5ghz, as opposed to 2.5ghz of the Axx on the same node size, but given the choice I wouldn't be pleased buying a desktop with a mobile chip.

Ateotd....whether were comparing today's architecture 2020 or 5/4/3nm of 2023+, given the same number of transistors on the same process, you're going to get roughly the same die size, performance and efficiency depending on what you prioritize for in your specific design regardless of ISA.

Apple may pull it of, may not, but its guaranteed that AMD Intel will always be pushing each other fiercely, which leaves apple potentially in a bind if the g4 g5 scenario rears its head. I can't see much to gain really, but lots to loose.

Most consumers don't care for isa or even know what it is, and no doubt any underlying architecture is going to be performant enough for most users, but it would be a shame to see high end desktop macs performing as badly as a PC 1/4 its price for years again, while content creators are ripping ahead on 5ghz 3nm 256 thread treadripper zen 4 cores.

Will be nice to see what arm cores are capable of though.

teejee · Jun 26, 2020

Mopetar said:
Since Apple is moving to their own SoC it stands to reason they'd need to design their own boards as well.

I don't think it hurts them to have the SoC take care of a lot of those functions and does simplify the board design and cut down on the number of suppliers Apple would have to deal with.

I think some of it ultimately depends on how their chip designs evolve over time. It isn't too difficult to imagine the chips they have now being good enough to power a MacBook Air, but the real question is how they build for the pro segment where 8+ cores are necessary. Using multiple SoCs creates a lot of redundancy for the fixed function silicon, but it does let them harvest defective dies that have otherwise functional cores. Adopting AMDs chiplet strategy is also a possibility, but their product range seems too wide to have a single chiplet to serve then all.

I think many macbooks will use the same SoC as Ipad pro (A14x for the first ones), so I expect the board to be very similar to Ipad pro. These laptops will likely be fanless.

This is the big thing with using their own SoC, they will leave most of the PC legacy behind and use a modern Ipad platform extended to a laptop.

They might have a faster SoC as well for more high-end macbooks, those will require a fan, but otherwise a similar platform, just a bit more I/O and stuff.

insertcarehere · Jun 26, 2020

marcUK2 said:
Which is why it's not such a great idea to be putting these things in imacs or power macs I doubt the MB logic really takes up that much space on a die relative to cache, the gpu, hell even cores are minute structures really.

I just looked up renoir's transistor count and its 10 billion as well. With 8 core zen2, Vega gpu, and soc logic. And runs near 4.5ghz, as opposed to 2.5ghz of the Axx on the same node size, but given the choice I wouldn't be pleased buying a desktop with a mobile chip.

Ateotd....whether were comparing today's architecture 2020 or 5/4/3nm of 2023+, given the same number of transistors on the same process, you're going to get roughly the same die size, performance and efficiency depending on what you prioritize for in your specific design regardless of ISA.

Apple may pull it of, may not, but its guaranteed that AMD Intel will always be pushing each other fiercely, which leaves apple potentially in a bind if the g4 g5 scenario rears its head. I can't see much to gain really, but lots to loose.

Will be nice to see what arm cores are capable of though.

A12Z/X on first generation 7nm was ~25% smaller in terms of die size than Renoir on a more refined node, regardless of transistor count. That doesn't mean the extra size of Renoir won't be well spent on a desktop application, but it is far from an apples to apples comparison.

OTOH it's not even funny how far ahead ARM cores are in the low-power space. An x86 analogue that matches the A13 and snapdragon 865 in performance & power doesn't exist, indeed if the recent benchmarks of Intel's Lakefield are any indication x86 has a long way to go to make their ARM counterparts worry.

There is no guarantee that Intel & AMD will be able to keep up their cadence of improvements as well, Zen 2 & Ice Lake are not exactly impressive architectures when compared to Apple's in-house work, and given the next few years will be small refinements to said architecture. That's not a high bar for Apple's team to stay ahead and be competitive for at least the next 5 years or so.

Richie Rich · Jun 27, 2020

IntelUser2000 said:
Actually it will be better than A12 if Golden Cove gets 20% improvement.

You know Windows isn't the best score for the 1065G7 right? With 1420 GB5 score it'll outperform A12, assuming 5-7% for WC and 20% for GC.

I talked about PPC/IPC not a absolute performance:

A13 has 82% higher PPC/IPC than 9900K
A12 has 59%
A11 has 41%
Ice Lake has 16%

If Golden Cove is 20% IPC jump, than 1.16* 1.2 = 1.39 ..... just 39% and bellow A11
If you include Willow Cove 5% IPC jump 1.39*1.05 = 1.46 ...... 46% is above A11 but way bellow A12.

And this is Golden Cove 2022 core comparing to 2017/2018 Apple's core A11/A12. So it's really 4-5 years technological retardation for Intel. That's really horrible. And AMD is even more retarded because Zen2 has lower IPC than Ice Lake.

marcUK2 said:
Not sure I agree with this way of measuring...I argue that it is total performance for a given number of transistors for the whole package(not just the core size) at a specific operational frquency on a given equal manufacturing node.
Then I believe the ARMS are not so much greater than x86s.

For example the A12z has 10 billion transistors, while a zen chiplet has 3.9 billion and the IO has 2.1 billion , and a low end RX560 GPU has 3 billion

...Which I suspect absolutely destroys an A12z in both CPU and GPU in raw performance, while the A will obvious destroy in efficiency

(HOPEFULLY Keller returns to AMD IMO)

That's actually good idea. If A12X has 122mm2 with 10 bilion transistors then Zen2+RX560 with 9 bilion should have 110 mm2. Why AMD decided to create larger 156mm2 Renoir with much slower GPU? Are they insane?

insertcarehere said:
There is no guarantee that Intel & AMD will be able to keep up their cadence of improvements as well, Zen 2 & Ice Lake are not exactly impressive architectures when compared to Apple's in-house work, and given the next few years will be small refinements to said architecture. That's not a high bar for Apple's team to stay ahead and be competitive for at least the next 5 years or so.

Cadence:

Intel - 2015 Skylake, 2019 Icelake, 2022 Golden Cove... 7 yeas for two IPC jumps... that's 3.5 year cadence
AMD - 2017, 2019, 2020...that's 1.5 year cadence
Apple - every year new Big core and Little core ... 0.5 year cadence
ARM Cortex cores - new big core yearly in the past (A77), since this year two big cores (A78 + X1), and next year three new cores for ARMv9 (A79?, X2 and little core A58?)...that's even more massive cadence than Apple

Anyone who still thinks x86 isn't done? Apple is abandoning sinking x86 ship. Same way they was right that PowerPC is gonna die.

And bombshell would be that A14 supports SVE2 this year. A13 already supports AMX instructions which are part of SVE2. Powerful 2048-bit SIMD FPU would be last nail into x86 coffin.

witeken · Jun 27, 2020

Richie Rich said:
I talked about PPC/IPC not a absolute performance:

A13 has 82% higher PPC/IPC than 9900K

A12 has 59%

A11 has 41%

Ice Lake has 16%

If Golden Cove is 20% IPC jump, than 1.16* 1.2 = 1.39 ..... just 39% and bellow A11
If you include Willow Cove 5% IPC jump 1.39*1.05 = 1.46 ...... 46% is above A11 but way bellow A12.

And this is Golden Cove 2022 core comparing to 2017/2018 Apple's core A11/A12. So it's really 4-5 years technological retardation for Intel. That's really horrible. And AMD is even more retarded because Zen2 has lower IPC than Ice Lake.

That's actually good idea. If A12X has 122mm2 with 10 bilion transistors then Zen2+RX560 with 9 bilion should have 110 mm2. Why AMD decided to create larger 156mm2 Renoir with much slower GPU? Are they insane?

Cadence:

Intel - 2015 Skylake, 2019 Icelake, 2022 Golden Cove... 7 yeas for two IPC jumps... that's 3.5 year cadence

AMD - 2017, 2019, 2020...that's 1.5 year cadence

Apple - every year new Big core and Little core ... 0.5 year cadence

ARM Cortex cores - new big core yearly in the past (A77), since this year two big cores (A78 + X1), and next year three new cores for ARMv9 (A79?, X2 and little core A58?)...that's even more massive cadence than Apple

Anyone who still thinks x86 isn't done? Apple is abandoning sinking x86 ship. Same way they was right that PowerPC is gonna die.

And bombshell would be that A14 supports SVE2 this year. A13 already supports AMX instructions which are part of SVE2. Powerful 2048-bit SIMD FPU would be last nail into x86 coffin.

I'll just leave a few remarks...
-I think it was readily discussed a while ago already that your 82% number is grossly exaggerated
-Intel was on a 1-year cadence with Tick-Tock
-Intel is again on a 1-year cadence since delays solved: Palm, Sunny, Willow, Golden, Ocean Cove, etc. Atom is on a ~2-year cadence but with bigger ~30% IPC jumps... so does that give Intel a 0.66-year cadence by your logic? What if you include Intel's GPU team? Movidius, Habana, Mobileye, 3D NAND, 3D XPoint,...?
-iPhone also is at 1-year cadence...
-Using Intel's 10nm delays to make any general point about Intel is pointless. Those delays weren't planned. Ice Lake is 2017 IP, for example...
-What Ice Lake maybe lacks in IPC, it makes up for in SIMD, clock speed, etc.
-Jim Keller said Intel is working on something substantially bigger than Sunny Cove
-Have you ever seen a 5GHz Arm/Apple chip? I'll gladly take a 5GHz Golden Cove with ~15% or so lower IPC than a 3GHz Apple chip with negligible IPC lead (performance = IPC x frequency)
-Intel's goal is performance from 4.5W to 250W, not IPC
-Arm supporting an AVX-2048 equivalent doesn't mean vendors will implement it for their phone chips...
-Intel has patents for going beyond AVX-512
-Xeons contains two AVX-512 units
-Intel just announced AMX for SPR... Intel isn't stopping SIMD/vector/matrix performance improvements just like IPC improvements, who knew?
-More in general, none of your remarks really concern Arm vs. x86. Apple's micro-architectures could just as well be implemented on the x86 architecture...
-At best you could say Skylake is a sinking ship (agreed), but luckily Intel has slew of new architectures in the pipeline at a yearly cadence

Richie Rich · Jun 27, 2020

witeken said:
I'll just leave a few remarks...
-I think it was readily discussed a while ago already that your 82% number is grossly exaggerated
-Intel was on a 1-year cadence with Tick-Tock

Tick-tock means new uarch every 2 years so the cadence is twice slower than you suggest. Apple is releasing EVERY year new cores, big and little. Both with massive IPC gains over 15% every year. And with frequency gains as well.

Intel IPC evolution is weak.
Intel's frequency evolution is stopped and decreasing (Sandy Bridge was able 5GHz OC, Ice Lake has problems to achieve 4.2 GHz). As a matter of fact Ice Lake is slower than Coffie Lake in 9900K. Therefore Apple A13@2.6 is beating Ryzen@4.6 and IceLake@4.2. In iPhone with TDP limit 5W.

And TSMC 5nm offers 18% frequency gain. So with no pipeline/stage tweaks A13/A14 can run at 3.1 GHz easily. Ampere Altra with Cortex A76/N1 is running 3.0/3.3 GHz turbo at 7nm. Cortex X1 thanks to 5nm can run at 3.5 GHz (3 * 1.18) in laptops and 3.9 GHz (3.3 * 1.18) in top desktop versions.

At TSMC 3nm ARM chips will reach 4.1 GHz (3.5 * 1.18) and top versions will reach 4.6 GHz (3.9 * 1.18). Because it's not thermally bounded like big hungry x86 cores. At 3nm x86 cores will probably need to use lower clocks around 4 GHz. Do you see the massive problem x86 is facing? Lower clocks and half IPC/PPC is the reason why Apple leaving sinking x86 ship.

witeken said:
-Jim Keller said Intel is working on something substantially bigger than Sunny Cove

I agree Intel's Golden Cove could be pretty nasty surprise for AMD. Or whatever new uarch development Keller initiated.

witeken said:
-Have you ever seen a 5GHz Arm/Apple chip? I'll gladly take a 5GHz Golden Cove with ~15% or so lower IPC than a 3GHz Apple chip with negligible IPC lead (performance = IPC x frequency)
-Intel's goal is performance from 4.5W to 250W, not IPC

IMHO we'll never see Golden Cove @ 5GHz. Willow Cove in Tiger lake gives 4.7 GHz so maybe can touch 5GHz for fine binned desktop chips. However Golden Cove with 20% IPC jump and keeping 5Ghz? I don't think so. I expect Golden Cove to reach 4.5 - 4.7 GHz max.

Golden Cove with only 15% IPC lower? Are you kidding? 20% IPC jump above Tiger Lake is Icelake GB PPC score 321 pts/GHz * 1.05 * 1.2 = 404 pts/Ghz. Apple A13 from 2019 has 502 pts/GHz and that's massive 24% higher PPC/IPC than Golden Cove. And this year comes A14, next year A15. So Golden Cove will have to face A15 on N5P or A16 on 3nm. Apple constantly gain 15% IPC every year. That's massive IPC improvement of 32% for A15 in 2021 and insane 52% IPC improvement for A16 for 2022. The IPC gap will probably increase. So projected GB PPC score of Apple A16 chip is 763 pts/GHz - that's 89% higher IPC than Golden Cove, almost double and higher gap than now.

witeken said:
-Intel has patents for going beyond AVX-512
-Xeons contains two AVX-512 units
-Intel just announced AMX for SPR... Intel isn't stopping SIMD/vector/matrix performance improvements just like IPC improvements, who knew?
-More in general, none of your remarks really concern Arm vs. x86. Apple's micro-architectures could just as well be implemented on the x86 architecture...
-At best you could say Skylake is a sinking ship (agreed), but luckily Intel has slew of new architectures in the pipeline at a yearly cadence

I agree, SIMD units and AVX512 is current advantage of x86 designs over 128-bit NEON. But that's over with upcoming ARMv9 2048-bit SVE2. Even nobody announced 2048-bit SIMDs yet it's nice open highway for future ARM designs to support competition among ARM vendors. x86 is locked to others, poor competition, nothing outlined for future. Intel is cooking secretly his own AMX and AMD will need 4 years more to develop FPUs supporting it (2025). Great. x86 will have broad AMX support in 2025 at best. ARM has AMX support in SVE today (Fujitsu A64FX in Fukagu supercomputer) and for everybody including smartphones Matterhorn is coming next year. And Apple's A14 could have SVE2 support this year maybe.

Does it look horrible for x86 in my eyes only?

Solved! ARM Apple High-End CPU - Intel replacement

Senior member

Diamond Member

Senior member

Diamond Member

Elite Member

Platinum Member

Diamond Member

Golden Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Member

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Diamond Member

Member

Senior member

Senior member

Senior member

Diamond Member

Senior member