Solved! ARM Apple High-End CPU - Intel replacement

Page 64 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Richie Rich

Senior member
Jul 28, 2019
470
229
76
There is a first rumor about Intel replacement in Apple products:
  • ARM based high-end CPU
  • 8 cores, no SMT
  • IPC +30% over Cortex A77
  • desktop performance (Core i7/Ryzen R7) with much lower power consumption
  • introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
  • massive AI accelerator

Source Coreteks:
 
  • Like
Reactions: vspalanki
Solution
What an understatement :D And it looks like it doesn't want to die. Yet.


Yes, A13 is competitive against Intel chips but the emulation tax is about 2x. So given that A13 ~= Intel, for emulated x86 programs you'd get half the speed of an equivalent x86 machine. This is one of the reasons they haven't yet switched.

Another reason is that it would prevent the use of Windows on their machines, something some say is very important.

The level of ignorance in this thread would be shocking if it weren't depressing.
Let's state some basics:

(a) History. Apple has never let backward compatibility limit what they do. They are not Intel, they are not Windows. They don't sell perpetual compatibility as a feature. Christ, the big...

defferoo

Member
Sep 28, 2015
47
45
91
That's the problem here, we simply can't get our priorities straight.

On one side, we praise the fact that Apple can and will deliver more performance with custom ARM silicon, but on the other side when someone like @beginner99 points out Apple intentionally held out on performance in their existing lineups, the immediate response is to focus on other product attributes as main benefits and sale drivers. It's almost as if performance is important as differentiating factor, not as objective metric, as requirement for enjoying Apple products.
It seems like you're referring to how the base MacBook Air has two cores and the base iMac with HDD is just a terrible value. I don't disagree, the base products are a terrible value and I would never buy them. If you thought I was talking about other product attributes as a way to justify the price/existence of these base products then you're wrong. I'm simply stating that there is more to a computer than the primary specs (Processing, Memory, Storage), and those factors should be considered when looking at value. Obviously, I am discussing like for like machines here in terms of the base specs. I wouldn't compare a 2-core MacBook Air to a 4-core Dell XPS 13 and say that the MacBook Air is better because it has some differentiating factors not found in the Dell XPS. If I'm to compare value, I would start with the same base specs and then look at the price of the machine.

Now that that's out of the way, I don't see why this means I cannot be excited for what Apple Silicon might bring to the table for Macs. Things are more nuanced than you present them to be. There are products in Apple's lineup that are terrible IMO, but that doesn't invalidate the rest of their products for me. For example, the performance of the base MacBook Air is just unacceptable, so is the thermal solution for that entire line of products (seriously? the fan doesn't even blow on the heatsink?). However, the iPad and iPhone have shown consistent improvements in performance for the past 10 years under the constraints of a fanless environment. The fact that they're bringing this consistency to the Mac, which has had to deal with Intel's 14+++ fiasco using the same architecture for 4 years running is what's most exciting.
 
Last edited:
  • Like
Reactions: scannall

awesomedeluxe

Member
Feb 12, 2020
69
23
41
My guess is that tight on-package mounting (with implications for non-ability to alter RAM after purchase...) will be the standard for the real Macs, but for the DTK they may well have put together a quick sub-optimal scheme that both burns extra power and runs slower.
Memory is one of the biggest questions for Apple Silicon IMO. One optimistic case I read recently was Apple really pumping up the CPU L cache and then going all in on on-package HBM for shared system memory. Absurd cache sizes allow the CPU cores to have very low latency while the GPU gets everything it wants for bandwidth. And of course, it's 0% user upgradable in all configurations, so be prepared to pay up front if you want more of it.

I think bigger cache sizes are inevitable, if only because Apple is close to a wall with what they can achieve by clocking their CPU cores up. But I am often scratching my head over what they will do with memory for pro machines with pro graphics solutions.
 

Doug S

Platinum Member
Feb 8, 2020
2,203
3,405
136
Memory is one of the biggest questions for Apple Silicon IMO. One optimistic case I read recently was Apple really pumping up the CPU L cache and then going all in on on-package HBM for shared system memory. Absurd cache sizes allow the CPU cores to have very low latency while the GPU gets everything it wants for bandwidth. And of course, it's 0% user upgradable in all configurations, so be prepared to pay up front if you want more of it.

I think bigger cache sizes are inevitable, if only because Apple is close to a wall with what they can achieve by clocking their CPU cores up. But I am often scratching my head over what they will do with memory for pro machines with pro graphics solutions.

Well theoretically if they used LPDDR/GDDR or possibly even HBM for on package memory in the low/mid range where stuff isn't expandable and you are basically stuck with the RAM you bought it with (i.e. already the case in some of the Mac line) and then on the higher end Macbook Pro, iMac Pro, Mac Pro the in package memory could be graphics only, with DIMM slots for system memory.

That way you get the fast RAM where it is needed for graphics, just like discrete GPUs do, but preserve the expandability that the people buying something like a Mac Pro must have.
 

awesomedeluxe

Member
Feb 12, 2020
69
23
41
Well theoretically if they used LPDDR/GDDR or possibly even HBM for on package memory in the low/mid range where stuff isn't expandable and you are basically stuck with the RAM you bought it with (i.e. already the case in some of the Mac line) and then on the higher end Macbook Pro, iMac Pro, Mac Pro the in package memory could be graphics only, with DIMM slots for system memory.

That way you get the fast RAM where it is needed for graphics, just like discrete GPUs do, but preserve the expandability that the people buying something like a Mac Pro must have.
I think only the Mac Pro has a shot at upgradable RAM. I mean, none of those other models currently allow that except I think the Mac Pro and the iMac 27.

Though that doesn't mean the system memory for the Macbook Pro has to be on the APU package either; it could be soldered down somewhere else. Frankly, were it not for the hullabaloo Apple made about unified memory I would not really be considering on-package memory at all - I would just assume there's HBM on the package with the graphics cores, wherever they are, and (LP)DDR5 somewhere else. This may still be the case, or it may be the case with HBM acting as last-level cache instead of VRAM.

But using HBM as on-package system memory with a big CPU cache currently strikes me as the best option for performance for the MBP16, assuming that 1) they are in fact committed to unified memory and 2) there is not a completely separate dGPU.

In other news, Intel has delayed 7 nm to the end of 2022. Ouch.
Absolutely savage. And the CPUs are 2023. Very, very bad news for Intel - but good news for Apple, who can maintain a manufacturing edge by just paying TSMC enough to keep AMD six months to a year behind.
 
  • Like
Reactions: Tlh97 and Etain05

Antey

Member
Jul 4, 2019
105
153
116
i was measuring Centaur SNC core area and i thought i would be interesting to do the same with other cores, it's not a 100% accurate measure but its very close. the smaller the core the hardest is to measure, we would need a higher res image to measure it better i guess.

and to say something about them, Exynos M4 is too big for what it dielevers to be honest, it's more or less 3,5 times bigger than a Cortex A75! SNC core is 4,37mm² (with L2) on 16FF, hey not bad! i would like to compare it with a 16FF ARM core, i think mediatek is using it.

Cortex A55 (N7+ TSMC) = 0,33mm²
Cortex A76 (N7+ TSMC) = 1,24mm²
Exynos M4 (8LPP SAMSUNG) = 4,02mm²
Cortex A55 (8LPP SAMSUNG) = 0,37mm²
Cortex A75 (8LPP SAMSUNG) = 1,16mm²

Exynos 9820 = 127mm² = 678px*712px, 1mm = 61,6522px

Kirin 990 5G = 96,72mm² = 600px*602px, 1mm = 61,1091px


https://www.techpowerup.com/img/vhaU6AhNUIZc5AgX.jpg
 
Last edited:

Doug S

Platinum Member
Feb 8, 2020
2,203
3,405
136
Though that doesn't mean the system memory for the Macbook Pro has to be on the APU package either; it could be soldered down somewhere else. Frankly, were it not for the hullabaloo Apple made about unified memory I would not really be considering on-package memory at all - I would just assume there's HBM on the package with the graphics cores, wherever they are, and (LP)DDR5 somewhere else. This may still be the case, or it may be the case with HBM acting as last-level cache instead of VRAM.

Having some memory in package and some memory in DIMMs doesn't mean it isn't unified. Unified memory means the CPU and GPU share an address space, not that they share the same block of physical RAM.
 

ksec

Senior member
Mar 5, 2010
420
117
116
No one knows what they will do for the Mac Pro, and the consensus is that's a problem that won't be solved until 2022.

One of the thing I overlooked and assumed wrong was the required usage, or the intention of pushing high clock speed Single Core Performance. That Apple will *need* to use 7nm / 5nm HP node for its Mac Pro and iMac. And the need to push 4Ghz+. How is Apple going to recoup it development cost on a small volume Desktop CPU chip on HP node? It was the economical model that doesn't fit and puzzles me.

But now it just occurred to me, what if Apple decides Single Thread performance isn't as important? You end up having 64 Core CPU on LP node, running with SVE, at 3.2Ghz with 2.5W per Core. Once you include Quad Memory Channel and PCI-Express this is easily running at 240W TDP.

The same for GPU as well, Apple will be trading Die Size area for Energy Efficiency. And everything will be done on LP node. No new development tools, design rules, and additional mask. If there is a product with higher TDP needs, go parallel and add more core, whether that is CPU, GPU, or NPU.

I think this hypotheses fixes my previous model where making a Mac CPU on HP node doesn't make much ROI.
 

Doug S

Platinum Member
Feb 8, 2020
2,203
3,405
136
One of the thing I overlooked and assumed wrong was the required usage, or the intention of pushing high clock speed Single Core Performance. That Apple will *need* to use 7nm / 5nm HP node for its Mac Pro and iMac. And the need to push 4Ghz+. How is Apple going to recoup it development cost on a small volume Desktop CPU chip on HP node? It was the economical model that doesn't fit and puzzles me.

Why would they need to push 4 GHz+ when a 2.6 GHz A13 is competitive with an Intel core running at around 4.5 GHz?

I'm sure they will use TSMC's HPC cells on the high end for iMac Pro / Mac Pro. That buys about 10%, and there are other tunables that can buy them another 10%. That, and the bump they will get automatically from going to 5nm is all they need to beat Intel's fastest cores in single thread. They don't need any other changes to the core like improving the cache, TLB, BTB etc. but they probably will make them - because why beat Intel by a small margin if they can beat them by a bigger margin? That doesn't require a ground up redesign though, just some changes where appropriate because efficiently handling a lot more cores and multiple terabytes of RAM will benefit from additional resources in some areas.

As for how you recoup development cost, look at how the hell much Intel charges for the CPUs Apple is currently using in those Macs. Multiply that by the number of those Macs sold and that's how much they can afford to spend on this. And look at the base price of a Mac Pro. You really think Apple is going to cheap out on this?
 

name99

Senior member
Sep 11, 2010
404
303
136
Why would they need to push 4 GHz+ when a 2.6 GHz A13 is competitive with an Intel core running at around 4.5 GHz?

I'm sure they will use TSMC's HPC cells on the high end for iMac Pro / Mac Pro. That buys about 10%, and there are other tunables that can buy them another 10%. That, and the bump they will get automatically from going to 5nm is all they need to beat Intel's fastest cores in single thread. They don't need any other changes to the core like improving the cache, TLB, BTB etc. but they probably will make them - because why beat Intel by a small margin if they can beat them by a bigger margin? That doesn't require a ground up redesign though, just some changes where appropriate because efficiently handling a lot more cores and multiple terabytes of RAM will benefit from additional resources in some areas.

As for how you recoup development cost, look at how the hell much Intel charges for the CPUs Apple is currently using in those Macs. Multiply that by the number of those Macs sold and that's how much they can afford to spend on this. And look at the base price of a Mac Pro. You really think Apple is going to cheap out on this?

You can't argue with fools. And there's a particular class of fool that is utterly convinced, no matter what you say, that performance=GHz.
 
  • Like
Reactions: Lodix

awesomedeluxe

Member
Feb 12, 2020
69
23
41
Having some memory in package and some memory in DIMMs doesn't mean it isn't unified. Unified memory means the CPU and GPU share an address space, not that they share the same block of physical RAM.
For clarification, are you suggesting that there would be HBM (of some variety) on package and DDR (of some variety) off package that would be treated, from the developer's perspective, as a single resource? I would not assert this is not doable, but I would presume it is very difficult. I'd be interested in hearing more about how such a solution would work.

If you're just saying there could be DDR in different places so the system is upgradable, I don't disagree with that at all, I just don't think it's a priority for Apple. I am mostly trying to resolve how Apple could reconcile "unified memory" with the reality that the CPU and GPU have different memory priorities.

You can't argue with fools. And there's a particular class of fool that is utterly convinced, no matter what you say, that performance=GHz.
Yeah, I don't think it's even too hard to guess the range for where GHz will wind up given that Apple pushes the 11 Max pretty far down the power curve already. And I really struggle to see any device Apple releases next year pushing 4.
 

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
For clarification, are you suggesting that there would be HBM (of some variety) on package and DDR (of some variety) off package that would be treated, from the developer's perspective, as a single resource?

I would not assert this is not doable, but I would presume it is very difficult. I'd be interested in hearing more about how such a solution would work.
Technically Kaby G did this no?

Albeit it did use 2 separate chips to manage it.

The main bother would be having 2 separate IMC types on the same chip, but I don't imagine that it would be that difficult to manage it.

The real question is could you just get away with HBM on its own?

I know that the original HBM was definitely not designed with CPU's in mind, but the sheer length of time HBM3 has been cooking and the faceplant from HMC means that the HBM people definitely have a niche to exploit if it is possible.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
The real question is could you just get away with HBM on its own?
Yes.
hbmram.png

(Freedom Everywhere w/ E31(1st)/Freedom Unleashed w/ U54/E51(2nd))
Freedom Revolution w/ U7/S7 (3rd)

Two Freedom Everywhere and One Freedom Unleashed. Hopefully means a Freedom Revolution board will come out.

Samsung was first to 16GB HBM2e, with SK Hynix being second to 16GB HBM2e, and Micron will do HBM2 8GB this year and HBM2e 16GB next year.

For any desktop orientated machine, it will need >16GB in a cheap format. 16GB 8-Hi single-stack is a good start.
 
  • Like
Reactions: awesomedeluxe

awesomedeluxe

Member
Feb 12, 2020
69
23
41
Technically Kaby G did this no?

Albeit it did use 2 separate chips to manage it.

The main bother would be having 2 separate IMC types on the same chip, but I don't imagine that it would be that difficult to manage it.

The real question is could you just get away with HBM on its own?

I know that the original HBM was definitely not designed with CPU's in mind, but the sheer length of time HBM3 has been cooking and the faceplant from HMC means that the HBM people definitely have a niche to exploit if it is possible.
I'm just going off memory here, but I think the CPU either treated the HBM as cache or didn't use it at all.

As for your question, it's a great one. I'm not sure. I don't presume HBM3 exists. Nor do I presume the theoretical 24GB stacks of HBM2E exist. Either one would make using HBM alone much more achievable.

As it stands - multiple 16GB stacks of HBM2E on the package seems narrowly possible as a solution for system memory on pro platforms. If the CPU has a huge cache size, it wouldn't care about the extra latency vs LPDDR5. So you can have your cake and eat it too.

The limiting device is the MBP 16. Two stacks of HBM2E are using 10W, which is starting to push it in a laptop. 32GB is probably becomes the max for memory that device, which is currently configurable with 64GB RAM + 8GB HBM2E. Of course, anything smaller than the 16 is right out, but those devices never had VRAM anyway.
 

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
Nor do I presume the theoretical 24GB stacks of HBM2E exist.
None exist in the market place as far as I know, but supposedly Samsung have cracked the necessary process to stack up to 12 dies at least so it is definitely possible whether or not anyone is willing to pay for it yer.

Presumably the danger of breakage increases with more dies stacked (or drilled), so who knows how well HBM yields at greater stack heights.
The limiting device is the MBP 16. Two stacks of HBM2E are using 10W, which is starting to push it in a laptop.
Isn't HBM usually more efficient from wider IO and shorter distance to the processor?
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
I just don’t see a need to use HBM on anything less than what we consider Pro level Macs. In a year or so, DDR5 Dunn’s will be our, and some high end phones are already using soldered DDR5 chips in their designs. The equivalent of dual channel DDR4 bit width in a DDR5 design currently is specified at around 105GB/sec of throughput with max jedec specs. For perspective, that’s about the memory performance of the RX550 and 560 in PCIe add in card form. Plenty for CPUs with a high performance iGPU.

once you get to the rarified air of the Pro spec Macs, you’ve got the budget to be substantially more exotic. Case in point, do you think that Intel was just giving away their high core count Xeon processors and that the quad+ channel ram sticks were free? Since we’re discussing products that are over a year away, I don’t find it unreasonable to think that they might have configurations that have four 16GB stacks of HBM2e or even four stacks of 24GB for 96GB of RAM.

But, what does that all get you? How is Apple going to provide enough CPU AND GPU performance for a pro product with one chip? I don’t think that they will. That’s so much heat and power to deal with. And where is all their compute performance coming from? Their integrated gpu on their chips is good, but not THAT good.
 

awesomedeluxe

Member
Feb 12, 2020
69
23
41
Isn't HBM usually more efficient from wider IO and shorter distance to the processor?
All correct. But efficiency doesn't mean less power used, it means more bang for your buck. But the bandwidth is a lot higher than LPDDR, so power is higher too.

And it's dense. And it's near the APU cores. It's a heat/area problem in the MBP16 which can't cool it's internals at the rate of an iMac.

I just don’t see a need to use HBM on anything less than what we consider Pro level Macs. In a year or so, DDR5 Dunn’s will be our, and some high end phones are already using soldered DDR5 chips in their designs. The equivalent of dual channel DDR4 bit width in a DDR5 design currently is specified at around 105GB/sec of throughput with max jedec specs. For perspective, that’s about the memory performance of the RX550 and 560 in PCIe add in card form. Plenty for CPUs with a high performance iGPU.
Also, it's not really viable on most non-pro machines. HBM2E uses too much power to go in a phone, iPad, or a Macbook Air.

The bigger tell though is, what machines have a dedicated GPU with its own high-bandwidth memory right now? These are the machines where something more than DDR5 might be necessary.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
But efficiency doesn't mean less power used, it means more bang for your buck. But the bandwidth is a lot higher than LPDDR, so power is higher too.
hbmmwgb.png

HBM2/E is still the overall lower power option.

There is also multiple speed bin options available; Speed bins: 0.5, 1.0, 1.5, 1.6, 1.8, 2.0, 2.4, 3.0, 3.2 Gbps

LPDDR5 is 6.4 Gb/s => (128 * 6.4 ) / 8 => 102.4 GB/s
HBM2E to do the above => ( 102.4 * 8 ) / 1024 => 0.8 Gb/s
Samsung's 110mm2 HBM2E 16GB archieves 5 Gbps @ 1.1V and SK Hynix's 110mm2 HBM2E 16GB achieves 4 Gbps @ 1.14V.

If Samsung/SK Hynix ever come out with a Low-Power 512-bit it only needs 1.6 Gb/s to beat LPDDR5. With the Low-Power & High-Bandwidth Memory probably being even lower power than HBM2/HBM2E is at it's higher speed.
 
Last edited:

vigilant007

Junior Member
Dec 7, 2014
21
10
81
You can't argue with fools. And there's a particular class of fool that is utterly convinced, no matter what you say, that performance=GHz.

Yes, unfortunately. My former Uncle, an Engineer at Nokia and Alcatel was insistent that performance = GHz. The arguments I had with him growing up was cute.

It got tedious when I had much deeper understanding of chip architecture, and his line was very clear “performance is based on clock speed. Period”




Sent from my iPad using Tapatalk Pro
 
  • Wow
Reactions: Thunder 57

vigilant007

Junior Member
Dec 7, 2014
21
10
81
I just don’t see a need to use HBM on anything less than what we consider Pro level Macs. In a year or so, DDR5 Dunn’s will be our, and some high end phones are already using soldered DDR5 chips in their designs. The equivalent of dual channel DDR4 bit width in a DDR5 design currently is specified at around 105GB/sec of throughput with max jedec specs. For perspective, that’s about the memory performance of the RX550 and 560 in PCIe add in card form. Plenty for CPUs with a high performance iGPU.

once you get to the rarified air of the Pro spec Macs, you’ve got the budget to be substantially more exotic. Case in point, do you think that Intel was just giving away their high core count Xeon processors and that the quad+ channel ram sticks were free? Since we’re discussing products that are over a year away, I don’t find it unreasonable to think that they might have configurations that have four 16GB stacks of HBM2e or even four stacks of 24GB for 96GB of RAM.

But, what does that all get you? How is Apple going to provide enough CPU AND GPU performance for a pro product with one chip? I don’t think that they will. That’s so much heat and power to deal with. And where is all their compute performance coming from? Their integrated gpu on their chips is good, but not THAT good.

I think it depends on the form factor. Design considerations for the MacBook Air are VERY different than the MacBook Pro.

It’s still amazing to me that the iMac Pro essentially has a 1U server running quiet hanging off the back of a display.

I expect there will be some form of deviation from everything being SOC when we look at say the Mac Pro. They could theoretically develop their own version of AMDs Infinity Fabric Link to maximize the ability for separate SOCs to communicate. The MPX Connector could lead the way to a separate card that has multiple purpose built chips like large blocks of Apple designed GPU cores. Granted they could have their own memory dedicated to them.


Sent from my iPad using Tapatalk Pro