Solved! ARM Apple High-End CPU - Intel replacement

Page 44 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Richie Rich

Senior member
Jul 28, 2019
470
229
76
There is a first rumor about Intel replacement in Apple products:
  • ARM based high-end CPU
  • 8 cores, no SMT
  • IPC +30% over Cortex A77
  • desktop performance (Core i7/Ryzen R7) with much lower power consumption
  • introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
  • massive AI accelerator

Source Coreteks:
 
  • Like
Reactions: vspalanki
Solution
What an understatement :D And it looks like it doesn't want to die. Yet.


Yes, A13 is competitive against Intel chips but the emulation tax is about 2x. So given that A13 ~= Intel, for emulated x86 programs you'd get half the speed of an equivalent x86 machine. This is one of the reasons they haven't yet switched.

Another reason is that it would prevent the use of Windows on their machines, something some say is very important.

The level of ignorance in this thread would be shocking if it weren't depressing.
Let's state some basics:

(a) History. Apple has never let backward compatibility limit what they do. They are not Intel, they are not Windows. They don't sell perpetual compatibility as a feature. Christ, the big...

marcUK2

Member
Sep 23, 2019
74
39
61
...personally I think that now there is huge competition between Intel and AMD...when Intel sorts out its problems, which I expect to be fairly soon...Apple have probably shot themselves in the foot, and they'll be retransitioning to x86 in a few years, simply because they cant keep up with the competition.

I suspect g4 debacle all over again, and Apple also think they can pull off GPUs as good as the AMD / Nvidia competition...haha, no I dont think.

I dont use any Apple products these days, but I did.....meh!....but I like ARM from a modern design standpoint, so hopefully they pull it off, just because I would like to see what is possible with threadripper class Arm based silicon....but IBM power for macs didnt really turn out so good either.

Good luck!

Interestingly, if you compare die sizes and transistor count in the current gen of chips from Apple, AMD and intel, you can easily deduce that Apples A chips are not so much better than the x86 competition
 

Doug S

Platinum Member
Feb 8, 2020
2,268
3,519
136
Jim Keller surprise resignation from Intel
Apple dumps Intel as chip supplier
Apple needs to develop Axx class processors to compete with 2022 Intel 7nm Xeons and Gen 5 Ryzen...

Who you gonna call?

Jim Keller to Apple in 3..2..1...

The design of the processors Apple will be using in 2022 is already well underway, anyone added to the team now wouldn't affect it much since the basic design and floorplan are already complete, it is just the detail work and simulation/testing that remain. It will be taped out in not much more than a year from now.

Intel will have their hands full in 2022 competing with Apple A16 based SoCs on TSMC's 3nm process, if anything Intel needs to get Keller back.
 

marcUK2

Member
Sep 23, 2019
74
39
61
The design of the processors Apple will be using in 2022 is already well underway, anyone added to the team now wouldn't affect it much since the basic design and floorplan are already complete, it is just the detail work and simulation/testing that remain.

Intel will have their hands full in 2022 competing with Apple A16 based SoCs on TSMC's 3nm process, if anything Intel needs to get Keller back.
so youre saying that Apple wouldnt be interested in Jim Keller joining the cpu design team?
 

DrMrLordX

Lifer
Apr 27, 2000
21,637
10,855
136
Who cares, that's a nearly two year old core

A12x/A12z is still the fastest CPU Apple sells in their iPad Pro lineup, so I think it's interesting. Plus there are people who claim it has higher IPC than some newer, more-modern CPUs so let's see if they're right, hmm?

and we already have performance information on it from Anandtech's A12X article.

Not really, but okay.

I do agree that whatever system Apple plans on selling in Q4 of this year ARE for more interesting than the dev machines going on sale in Q2/Q3.
 
  • Like
Reactions: Tlh97

Richie Rich

Senior member
Jul 28, 2019
470
229
76
so youre saying that Apple wouldnt be interested in Jim Keller joining the cpu design team?
Apple A14 is now in mass production, A15 is close to tape out and A16, A17 are under development. If Jim Keller joins Apple today he will chance to influence A18 for 2024 at 2nm process. I'd like to see Jim Keller in some start up, creating something very wild, maybe Nuvia or Tachyum-like.

Interestingly, if you compare die sizes and transistor count in the current gen of chips from Apple, AMD and intel, you can easily deduce that Apples A chips are not so much better than the x86 competition
This is not true. If you compare die size of those cores:
  • AMD Zen2 .... 3.6 mm2 including 0.5MB L2$
  • ARM A76 ...... 1.2 mm2 including 1MB L2$
  • ARM A77 ...... 1.2 mm2 including 0.5MB L2$ .... 108% PPC of Zen2
  • ARM A78 ...... 1.1 mm2 including 0.5MB L2$..... 115% PPC of Zen2
  • ARM X1 ........ 2.0 mm2 including 1MB L2$........ 140% PPC of Zen2
  • Apple A13 ..... 2.6 mm2 without L2$ (8MB shared L2$)..... 184% PPC of Zen2
Edit:
  • AMD Zen2 .... 3.6 mm2 including 0.5MB L2$
  • ARM A76 ...... 1.2 mm2 including 0.5MB L2$
  • ARM A77 ...... 1.4 mm2 including 0.5MB L2$ .... 108% PPC of Zen2 SPECint
  • ARM A78 ...... 1.33 mm2 including 0.5MB L2$..... 115% PPC of Zen2 SPECint
  • ARM X1 ........ 2.1-2.3 mm2 including 1MB L2$........ 140% PPC of Zen2 SPECint

A78 has 3.3 times lower core area that Zen2 while having 15% higher performance per clock..... -> 3.2 * 1.15 = 3.8 times higher PPA (Performance Per Area).

Apple A13 .... 1.3x lower area and 84% higher PPC -> 1.3 * 1.84 = 2.5x higher PPA than Zen2.

You can see that both Apple and Cortex cores have stellar PPA and efficiency and x86 core looks like garbage. Apple ARM laptops based on new A14 later this year will outperform majority of desktop PCs. Apple doesn't need a luck here, they have the most powerful architecture on the world right now.
 
Last edited:

MrTeal

Diamond Member
Dec 7, 2003
3,569
1,699
136
Apple A14 is now in mass production, A15 is close to tape out and A16, A17 are under development. If Jim Keller joins Apple today he will chance to influence A18 for 2024 at 2nm process. I'd like to see Jim Keller in some start up, creating something very wild, maybe Nuvia or Tachyum-like.


This is not true. If you compare die size of those cores:
  • AMD Zen2 .... 3.6 mm2 including 0.5MB L2$
  • ARM A76 ...... 1.2 mm2 including 1MB L2$
  • ARM A77 ...... 1.2 mm2 including 0.5MB L2$ .... 108% PPC of Zen2
  • ARM A78 ...... 1.1 mm2 including 0.5MB L2$..... 115% PPC of Zen2
  • ARM X1 ........ 2.0 mm2 including 1MB L2$........ 140% PPC of Zen2
  • Apple A13 ..... 2.6 mm2 without L2$ (8MB shared L2$)..... 184% PPC of Zen2

A78 has 3.3 times lower core area that Zen2 while having 15% higher performance per clock..... -> 3.2 * 1.15 = 3.8 times higher PPA (Performance Per Area).

Apple A13 .... 1.3x lower area and 84% higher PPC -> 1.3 * 1.84 = 2.5x higher PPA than Zen2.

You can see that both Apple and Cortex cores have stellar PPA and efficiency and x86 core looks like garbage. Apple ARM laptops based on new A14 later this year will outperform majority of desktop PCs. Apple doesn't need a luck here, they have the most powerful architecture on the world right now.
From the AT article you linked, they show the area of the big core complex as 9.06, so a per core size of 4.53mm² including 4MB L2. That's obviously a much large L2 than the others, but that increased cache size will also factor into performance.

Edit: I should also add that the A78 comparison is kind of nonsensical. You can't just multiply the area difference by a performance per clock to get PPA, you would need to divide performance by area. That would give 2.28x PPA. Even then, you're comparing a core that supports two threads per core to a single thread core, using a single threaded benchmark.
 
Last edited:

Richie Rich

Senior member
Jul 28, 2019
470
229
76
From the AT article you linked, they show the area of the big core complex as 9.06, so a per core size of 4.53mm² including 4MB L2. That's obviously a much large L2 than the others, but that increased cache size will also factor into performance.
I agree with you. However that Apple A13 is monstrous core and it's special kind. It has 128+128 kB L1$ what is 50% of Zen2 L2 cache (512kB L2). A13's L2 is 8MB shared what is 50% of Zen2's L3 for CCX (16MB). So it's very hard to compare such a different monster core with traditional x86 or Cortex cores because it's L2 acts like L3.

But anyway, 4.53mm2 A13 is 1.26x larger core size than Zen2's 3.6 mm2. However A13 is still producing 1.84x higher PPC, so 1.84/1.26 = results in 1.46x higher PPA iso clock. Don't you think it's still massive advantage?


Edit: I should also add that the A78 comparison is kind of nonsensical. You can't just multiply the area difference by a performance per clock to get PPA, you would need to divide performance by area. That would give 2.28x PPA. Even then, you're comparing a core that supports two threads per core to a single thread core, using a single threaded benchmark.
How can you receive 2.28 PPA for A78 when core area is more than 3x smaller? Even with identical PPC you have to receive PPA > 3x. And when A78 is approx 15% faster at iso clock than Zen2 you have to get even higher PPA, don't you think?
 

MrTeal

Diamond Member
Dec 7, 2003
3,569
1,699
136
I agree with you. However that Apple A13 is monstrous core and it's special kind. It has 128+128 kB L1$ what is 50% of Zen2 L2 cache (512kB L2). A13's L2 is 8MB shared what is 50% of Zen2's L3 for CCX (16MB). So it's very hard to compare such a different monster core with traditional x86 or Cortex cores because it's L2 acts like L3.

But anyway, 4.53mm2 A13 is 1.26x larger core size than Zen2's 3.6 mm2. However A13 is still producing 1.84x higher PPC, so 1.84/1.26 = results in 1.46x higher PPA iso clock. Don't you think it's still massive advantage?
You can't just strip Zen2's higher frequency capability and compare at iso clock when designing a high frequency core that boosts north of 4GHz costs area. You're trying to create some sort of performance per clock per area, which is always going to favor a lower clocked core. By your math A78 has 2.5x the performance of A13 per area iso clock, so unless you also think the reference ARM implementation has a massive, gargantuan, unassailable advantage over Apple's cores I'm not sure what you're getting at.

How can you receive 2.28 PPA for A78 when core area is more than 3x smaller? Even with identical PPC you have to receive PPA > 3x. And when A78 is approx 15% faster at iso clock than Zen2 you have to get even higher PPA, don't you think?
Because Zen2 is a higher performance core than A78. Take the raw Geekbench score and divide it by the core area of each. Again, performance per area doesn't have a component of clock speed in it, it's just performance (even just using geekbench) divided by area.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
You can't just strip Zen2's higher frequency capability and compare at iso clock when designing a high frequency core that boosts north of 4GHz costs area. You're trying to create some sort of performance per clock per area, which is always going to favor a lower clocked core. By your math A78 has 2.5x the performance of A13 per area iso clock, so unless you also think the reference ARM implementation has a massive, gargantuan, unassailable advantage over Apple's cores I'm not sure what you're getting at.


Because Zen2 is a higher performance core than A78. Take the raw Geekbench score and divide it by the core area of each. Again, performance per area doesn't have a component of clock speed in it, it's just performance (even just using geekbench) divided by area.
That fair point to take into account also CPU frekquency. Sure Zen2 core would be much smaller if AMD'd use high density library like Apple did. So calculating further, ARM Cortex A78 can run at 3 GHz max (Ampere Altra claim 3.3 Ghz but let assume only 3 GHz) at 7nm. Zen2 can run all core at 4.2 GHz and that's 4.2/3.0 = 1.4x higher performance due to higher clocks. So the PPA of A78 would fall from 3.8x (iso-clock) down to 3.8/1.4 = 2.7x.

What do you think? Is it 2.7x higher PPA for A78 no problem for Zen2?

IMHO it's still huge problem for x86. Intel and AMD differs from each other few percent (5-20% at iso-process). I would say when A78 is better in this PPA for massive 170% than Zen2 then both Intel and AMD are in big trouble. Not now, but since 2021 when A78 will get into server chips. And Zen3 with 20% IPC improvement cannot delete 170% advantage of A78 (assuming that Zen3 will increase IPC while keeps identical transistor count). For example Sunny Cove core in Ice Lake brought 18% IPC jump in cost of 38% more transistors (PPA decreased). So if Zen3 will increase core area more than IPC rise it can have even worse PPA than Zen2. So the PPA gap between ARM A78 and x86 can be even worse with new uarchs... maybe, lets see, but honestly I doubt Zen3 will have less transistors than Zen2.

PPA is the secret sauce why ARMs like 64-core Graviton2 or 80-core Altra can be monolithic dies (my estimation is 350-400 mm2 die size) and have very low power consumption in the same time. From PPA point of view the A78 Hercules is ultimate x86 server CPU killer. Much more dangerous than Apple's monster core would ever be in server space IMHO. And I do not see a way how x86 can stop this dangerous little ARM raptor core. Maybe Intel's Gracemont core and Snow Ridge ?
 

MrTeal

Diamond Member
Dec 7, 2003
3,569
1,699
136
That fair point to take into account also CPU frekquency. Sure Zen2 core would be much smaller if AMD'd use high density library like Apple did. So calculating further, ARM Cortex A78 can run at 3 GHz max (Ampere Altra claim 3.3 Ghz but let assume only 3 GHz) at 7nm. Zen2 can run all core at 4.2 GHz and that's 4.2/3.0 = 1.4x higher performance due to higher clocks. So the PPA of A78 would fall from 3.8x (iso-clock) down to 3.8/1.4 = 2.7x.

What do you think? Is it 2.7x higher PPA for A78 no problem for Zen2?
I still get 2.28, I think you're getting weird values because you're using your numbers for PPC (perf/freq), then dividing by area, then comparing the two, and then trying to factor out frequency using a different frequency (4.6GHz vs 4.2GHz), all while accumulating rounding errors.
 
  • Like
Reactions: Tlh97

Richie Rich

Senior member
Jul 28, 2019
470
229
76
I still get 2.28, I think you're getting weird values because you're using your numbers for PPC (perf/freq), then dividing by area, then comparing the two, and then trying to factor out frequency using a different frequency (4.6GHz vs 4.2GHz), all while accumulating rounding errors.
You are correct, when using GeekBench score we get 2.28x (I used SPECint data to compare A78 and Zen2).

But the PPA 2.28x times higher is no problem for AMD? We are not talking about 8% advantage. Neither 28% advantage. We are talking about 128% advantage in PPA, 2.28x time more PPA souds like a nightmare. How do you thing x86 vendors will fight that?
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Oh, and A76 only has 512KB of L2 cache. And with it, its 1.26mm2.

According to ARM, the core in the A77 is 40-45% larger than A76. That's with both having 512KB L2 cache.

Revised figures:
A76/512KB L2 - 1.26mm2
A77/512KB L2 - 1.77mm2
A78/512KB L2 - 1.68mm2
 

Doug S

Platinum Member
Feb 8, 2020
2,268
3,519
136
@Richie Rich
please can you post anything else than SPEC or GB?
I asked it several times, I tried to find something about ipad pros but every single f..review has the same
geekbench and/or spec

I think you'll just need to wait six months for the ARM Macs to appear to be able to run a wider array of stuff. There aren't a lot of benchmarks out there that aren't either very narrow (like Speedometer 2.0) or don't run on iOS.

Still, SPEC2006, if you throw out the few tests that Intel has gamed, is a much better CPU benchmark than all the PCMark and gaming benchmarks that litter typical phone and PC reviews. SPEC2017 is even better, but I don't think Anandtech managed to get that running on either iOS or Android yet.

Anyone who is expecting Apple's ARM CPUs to look bad once they are running on macOS and can run a wider array of benchmarks (including on Windows/ARM and Linux/ARM since unlike the phones/tablets it'll be able to boot other operating systems) is going to be very disappointed.
 

Gideon

Golden Member
Nov 27, 2007
1,646
3,709
136
Anyone who is expecting Apple's ARM CPUs to look bad once they are running on macOS and can run a wider array of benchmarks (including on Windows/ARM and Linux/ARM since unlike the phones/tablets it'll be able to boot other operating systems) is going to be very disappointed.

Agreed. I can't find it, but a while back there was at least one blogpost of a random software developer running i'ts own CPU-limited project on his Kaby Lake Mac vs his A12 Iphone and the phone was sligtly faster.

Now obviously the results will differ greatly, based on workload, but people expecting these chips to run bad (espescally if upper end A14 chips, released in 2021 are built on 5NP process) are delirious. They might not win by leaps and bonuds. But they will be very competitive and noticeably faster in many (particularily single-threaded) workloads.

BTW Anand Lal Shimpi mentioned in an interview, that in some MT workloads Apple's chips are also using both Big and Little cores, so if they have enough of these, it's not like MT results would be horrible either
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Oh, and A76 only has 512KB of L2 cache. And with it, its 1.26mm2.

According to ARM, the core in the A77 is 40-45% larger than A76. That's with both having 512KB L2 cache.

Revised figures:
A76/512KB L2 - 1.26mm2
A77/512KB L2 - 1.77mm2
A78/512KB L2 - 1.68mm2
Do you have source for A77 to be 40-45% larger because this looks like BS. ARM LLC with their focus at PPA would never release core degrading PPA so much (20% more IPC for 45% area). Andrei mentioned 17% more area which seems reasonable and alligned with further PPA improvement (20% more IPC for 17% more area).


A76/N1
03_Infra%20Tech%20Day%202019_Filippo%20Neoverse%20N1%20FINAL%20WM15_575px.jpg


A77:

"... the A77 is said to be only 17% bigger than the A76 – still significantly smaller than the next best microarchitecture from the competition."

A78:
A78-X1-crop-7_575px.png



So you are right I have a mistake in the A76 area and this accumulated into A77 and A78 too. Correct data should be:
  • A76 .... 1.2 mm2 0.5MB L2$
  • A77 .... 1.4 mm2 0.5MB L2$ (+17% area)
  • A78 .... 1.33 mm2 0.5MB L2$ (-5% area)


So it means the PPA advantage of A78 over Zen2 is 1.89x (instead wrong 2.28x). But it is still huge difference while taking into account 4,2 GHz clock for desktop Ryzen.

In servers there is no clock advantage because all server CPUs run basically around 3 GHz clock. And for iso-clock you have PPA ratio calculated from PPC/IPC and that's:
A78 .... PPC/IPC GB5.1 ... 306 pts/GHz
Zen2 ... PPC/IPC GB5.1 ... 286 pts/GHz .... 1.07x PPC ratio in favor of A78

Area ratio is 3.6 mm2 / 1.33 mm2 = 2.7x.
Overall iso-clock PPA is therefore 2.7 * 1.07 = 2.89x

That's probably an answer why ARM is running into servers first - low clocks can show full advantage in Performance per area metric. Graviton and Ampere Altra can get almost 3x more performance from same piece of silicon area. How do you think x86 vendors can fight this? It was no problem when old 32-bit ARMs had very laughable 1/5 PPC/IPC of x86 cores. But look how people stopped smiling to Gravion2 now even using an old A76 (8% lower PPC/IPC than Zen1) and yet slaughter Epyc basedn on Zen1 and offering higher performance per thread than Zen2 Epyc. Those smiles got freezed now. But today when A78 has higher PPC/IPC than Zen2? IMHO that's catastrophic situation what's coming soon.

The situation is that ARM Ampere Altra Max will have 128-cores at 7nm monolith while AMD Zen2 ends up at 64-core with big help of chiplet architecture (remember Rome has 1005 mm2 die size combined). Without that AMD would probably fall down to 32-core monolith or 4x 16-core Zen1-like. What if ARM vendors will adopt chiplet architecture too? That's horror scenario for both AMD and Intel.

I think everybody expects that every new x86 uarch spends more transistors than brings IPC - that's normal for high performance chips. Sunny Cove in Ice Lake was 18% IPC for 38% transitors increase. And Zen3 will be similar (20% IPC for 30% more transistors I expect). But this means it's PPA goes down while ARM LLC with A78 increased PPA. The PPA gap is constantly increasing in time. x86 is using/wasting transistors and energy for max performance while mobile environment forced ARM designers to think twice before they spend any additional transistors. It's in the genes. Only Intel is aware of this PPA situation and has back up plan with Snow Ridge server system with its Gracemont Atom cores. But I worry about AMD, they have no back up plan since they canceled their great ARM K12 (it seems K12 was brilliant back up plan how to unchaine AMD from x86 sinking ship and it was horrible idea for canceling it, what would we expect from people they developed BD, no wonder Keller left, he knew what's coming from ARM).

And new ARMv9 with much more efficient SVE2 SIMD vectors it can help to increase PPA even further. Take a look at Fukagu supercomputer how it can just with ARM CPUs beat supercomputers based on Nvidia GPUs. Who would expect that pure CPUs SC can beat GPUs in that area and become the most powerfull SC in several SC charts? Ok, I'm optimistic about ARM, but even the current PPA and PPC numbers shows that x86 will have huge problems to face current ARM server systems.

Not mentioning economy point of view that Amazon can manufacture his 64-core Graviton2 for 500 USD while AMD EPYC 7742 costs 7500 USD while delivering only 50% more performance in return. This is the second blade of ARM sword which will slaughter x86 in blood bath.
 

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
I have no expectation that MT performance will suffer greatly on an A13 or later implementation. The Thunder low power cores on A13 are significantly faster than the Temptest cores on the A12. In a situation where their performance will count, the slow downs that the AT review showed due to the memory system being in a low power state wouldn't be an issue. I dare say that, under load, their performance will easily land in the middle of the A73-A75 range, which is going to be providing a significant amount of high efficiency computing power. Even if Apple does nothing more than make an A13X for their Macbook offerings that uses 4 Lightning cores and 4 Thunder cores and manages to increase the peak clocks under load by 10%, and allows a 20% increase in the frequency range of the Thunder cores, you've still got a CPU that's effectively as fast as or faster than any of the 10XXXG1-U class or below processors when running native code. I can't see where that will hamstring them.
 

Eug

Lifer
Mar 11, 2000
23,587
1,001
126
I have no expectation that MT performance will suffer greatly on an A13 or later implementation. The Thunder low power cores on A13 are significantly faster than the Temptest cores on the A12. In a situation where their performance will count, the slow downs that the AT review showed due to the memory system being in a low power state wouldn't be an issue. I dare say that, under load, their performance will easily land in the middle of the A73-A75 range, which is going to be providing a significant amount of high efficiency computing power. Even if Apple does nothing more than make an A13X for their Macbook offerings that uses 4 Lightning cores and 4 Thunder cores and manages to increase the peak clocks under load by 10%, and allows a 20% increase in the frequency range of the Thunder cores, you've still got a CPU that's effectively as fast as or faster than any of the 10XXXG1-U class or below processors when running native code. I can't see where that will hamstring them.
My expectation is there will be no A13X. There will be A14 and A14X.
And of course, A14X would be even faster than a theoretical A13X.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
If you bothered to do your own calculations, you'll see its actually 40% larger for the area.
Aha, no numbers, just from picture. OK, but did you realize that those -5% looks like 1/4 of that A77 extension? This means those bars doesn't start at 0 and are shifted to emphasize those gains. This means A77 is around Andrei's number of 17%. Don't you think Andrei was right?
 

soresu

Platinum Member
Dec 19, 2014
2,664
1,863
136
So you are right I have a mistake in the A76 area and this accumulated into A77 and A78 too. Correct data should be:
How many times must it be repeated before it sinks in?

N1 is based on A76 - but they are different owing to N1's focus on scaling and massively multicore performance, likely enough to put off your measurements.
 

soresu

Platinum Member
Dec 19, 2014
2,664
1,863
136
But I worry about AMD, they have no back up plan since they canceled their great ARM K12 (it seems K12 was brilliant back up plan how to unchaine AMD from x86 sinking ship and it was horrible idea for canceling it, what would we expect from people they developed BD, no wonder Keller left, he knew what's coming from ARM).
Canceled does not mean forgotten - it was clearly well past the drawing board phase when shelved for the sake of giving Zen their all, though I doubt it would ever be resurrected given alternatives are available now with superior performance.

Keller did not imply that K12 had some great advantage in perf/watt over Zen, only that it had a "bigger engine".

IMHO it's a good thing that they did shelve it, otherwise we would probably have had the same mess as we did with release Vega.

ie concentrating on more than one project at once (console GPU's) with a limited R&D budget - and ending up with an inferior product vs the more refined and errata free variant in Renoir, which is so different in raw perf from 14nm Vega as to be another GFX generation altogether.

Also Keller left for the same reason he always does, new horizons (hehehehe pun) - for the same reason he left Apple before AMD, and Tesla after that.

Either way, at this point ARM's off the shelf cores have become so performant that if AMD desires to go into that breach once more they can always take part in the Cortex X program, custom or otherwise - that way they can get an SoC running relatively quickly and go custom if they so wish.

We know ARM already has the chops to do SMT efficiently from A65/E1, so I don't expect that to be a barrier to any co developed custom core with AMD.

Besides which there will be no sinking yet for years to come.

I can see you are very excitable about ARM's future prospects, but enterprise IT folks are not very excitable people, and often demand absolute software compatibility for as long as possible - my father works for a company that only just transitioned away from a hybrid system of Windows 7 and a DOS based database architecture. I kid you not.

These people expect DECADES in compatibility, so unlike with a media consumption driven consumer crowd you cannot simply ask them to make a huge, breaking change unless they are already making such a big change elsewhere.