Apple adding ARM coprocessor to future Macs

IntelUser2000 · Feb 7, 2017

NTMBK said:
But Intel wants to keep the PCH on their older process to protect their profit margins, and Apple get an inferior product as a result.

Perhaps this is a sign of complacency. Some risks are involved in doing anything. Netburst, while very risky in terms of circuit design, brought very innovative concepts. Eventually I believe this led to their success with Core chips. Then they went completely opposite. I wonder if their 6-series chipset bug has to do with it as well? Imagine having to recall CPUs, rather than just chipsets. Fixing that would be a pain. Maintaining the right balance would be the key to success.

Actually they are on Pentium tier levels. The problem is that is not scalling well.

I think Apple in a few years will be able to do just that.

ElFenix · Feb 7, 2017

at sub single watt power envelopes, arm has an advantage. once you have double digit power envelopes, arm has no real advantage over intel from an ISA/architecture standpoint, and intel's access to its much superior manufacturing takes over.

now, it may be, that because a lot of software and use cases just require "good enough" performance, double digit power consumption isn't needed. so, maybe apple goes ARM for everything (as apple has other reasons to do so) anyway.

or maybe apple just gets intel to fix its PCH power consumpton as well as build in a 5th, specific, low power core just for these not-sleep operations.

scannall · Feb 7, 2017

ultimatebob said:
Yeah, but there are no existing ARM processors that are as nearly as fast as a Core i3 processor let alone a Core i7. By the time Apple jumped ship from IBM, Intel's processors were twice as fast as the best G5's available.

Unless ARM has some amazing skunkworks project for a revolutionary desktop processor, I don't think that Intel has much to worry about.

But no i3 gets more performance/per watt than Apple's A10 either. Different markets, different goals.

Lodix · Feb 8, 2017

ElFenix said:
at sub single watt power envelopes, arm has an advantage. once you have double digit power envelopes, arm has no real advantage over intel from an ISA/architecture standpoint, and intel's access to its much superior manufacturing takes over.

now, it may be, that because a lot of software and use cases just require "good enough" performance, double digit power consumption isn't needed. so, maybe apple goes ARM for everything (as apple has other reasons to do so) anyway.

or maybe apple just gets intel to fix its PCH power consumpton as well as build in a 5th, specific, low power core just for these not-sleep operations.

Have you seen any ARM CPU designed for double digit TDP to be able to talk ? I think you don't. And if there is any advantage in Intel process nodd it is very little and starting this year is over.

ElFenix · Feb 8, 2017

Lodix said:
Have you seen any ARM CPU designed for double digit TDP to be able to talk ? I think you don't. And if there is any advantage in Intel process nodd it is very little and starting this year is over.

No, but we've seen other desktop and workstation class RISC and MIPS type processors, and Intel surpassed and then demolished them.

There's no magic to processor design. It's all engineering and trade-offs. Intel can see what arm (or apple or Qualcomm) is doing and vice versa. The only place where arm or any other RISC machine has any inherent advantage over Intel's designs is the simplified decoder.

Once you're moving around desktop quantities of data and doing desktop type things with it, moving all that data becomes the main power consumer.

I'll believe that tsmc's third spin on "16 nm" is is anywhere close to Intel when I see it.

IntelUser2000 · Feb 8, 2017

ElFenix said:
No, but we've seen other desktop and workstation class RISC and MIPS type processors, and Intel surpassed and then demolished them.

There's no magic to processor design. It's all engineering and trade-offs. Intel can see what arm (or apple or Qualcomm) is doing and vice versa. The only place where arm or any other RISC machine has any inherent advantage over Intel's designs is the simplified decoder.

That's very true.

But. We forget one factor that's important as technical competence. That's managing people and having realistic expectations.

What Intel's fantastic quarterly money-making hides may be the internal problems that will surface over a longer period of time. Having the best engineers, best equipment, best testing methodologies won't matter if the people working at it is sapped of their desire to do their best.

I would assume Apple is naturally the top company to go to if you are a competent chip engineer. And you can say whatever about Apple's practices elsewhere. But chip design wise they've been improving immensely over past few years. And their design team delivers according to the product schedule launch.

Apple A6 back in 2012 is what made people pay attention. Then they did it again with A7 a year later. The A7 was a big shock to so many people. Like Anandtech said, Bay Trail came out with much fanfare, but the lead lasted merely a week. The chip Intel was promising to be a revolutionary update to Atom with a chance to reverse not only their mobile fortunes but possibly their PC-less future was beaten.

Then people said A8 had "only" a 25% improvement because they've reached the peak, like Intel did with Core in 2006 after being 2x fast as Netburst. Then A9 brings another two-fold improvement. Let's not talk about how all other vendors took a year or longer after PowerVR announce their arch to make a product out of it, but Apple does is mere months later. Or that now they have a competent GPU team that puts Intel Iris chips to shame.

Apple chips have a much brighter future than Intel chips. Engineers must see this.

Nothingness · Feb 9, 2017

IntelUser2000 said:
Apple chips have a much brighter future than Intel chips. Engineers must see this.

As an engineer, I tend to agree. What Apple has achieved is incredible. But Intel also have very bright engineers and micro-architects, and a lot of money.

The question of Apple getting to the same level of single thread performance as Intel is difficult to answer. My guess (that I already gave in another thread) is that they won't be able to achieve parity before a few more years, and that's not even sure.

lopri · Feb 11, 2017

big.LITTLE has won. Despite initial skepticism and resistance from the purists, big.LITTLE has won over the monolithic core design. For now. Just about all ARM SOCs manufactured today are big.LITTLE. Apple has moved their phone SOC to big.LITTLE design, and now is rumored to adopt a big.LITTLE for their productivity devices (ARM + X86 hybrid no less). AMD has already released a hybrid solution in its Pro line-up which debuted last year. Heck I have even heard that the PlayStation 4 Pro's south bridge is a full-fledged ARM SOC that replaced a traditional South Bridge with a programmable logic, which takes over the x86 CPU in low power mode. (This sounds very similar to what Apple is rumored to do with their next laptop design) Microsoft has been trying to run Windows on ARM, and my gut feeling is that they will also adopt an Apple-like approach at first, i.e. big.LITTLE.

A lot has been said about the alleged inferiority of big.LITTLE design: that it is not an elegant solution; that it is a temporary patchwork with inevitable overheads; that it does not provide any perf/watt benefit in comparison to some ideally-conceived monolithic core solutions; that it is a marketing-driven tactic to inflate core counts. There may be some truth in those claims. But what is also true is that an imaginary SOC is imaginary. What good does a theoretically perfect SOC do if it cannot be manufactured? With all of the leading OEMs adopting big.LITTLE, the argument against it has no leg to stand, it seems.

Eug · Feb 11, 2017

I wonder what this means for Windows compatibility in Macs. Maybe it will be like their Touch Bar implementation. The ARM chip and Touch Bar go into a default mode when it gets no instructions from the OS. Or maybe in this case it just won't work at all.

Andrei. · Feb 11, 2017

lopri said:
A lot has been said about the alleged inferiority of big.LITTLE design: that it is not an elegant solution; that it is a temporary patchwork with inevitable overheads; that it does not provide any perf/watt benefit in comparison to some ideally-conceived monolithic core solutions;

People who denied all of those things, especially denying the fact that a smaller core is more efficient than a bigger one which is a reality as simple as it gets given the laws of physics, were, and are complete idiots with zero knowledge of even basic EE and have no business in commenting or analysing anything in this industry and best to just keep their opinions for themselves. bL is an elegant solution because of how stupidly simple it is. There couldn't have been a more bigger "fuck you, you're wrong" to efficient monolithic core fanboys than Apple adopting a bL design. And no you don't even have to claim "for now", as the physics won't change in the future either, it's something that's here to stay.

bjt2 · Feb 11, 2017

lopri said:
big.LITTLE has won. Despite initial skepticism and resistance from the purists, big.LITTLE has won over the monolithic core design. For now. Just about all ARM SOCs manufactured today are big.LITTLE. Apple has moved their phone SOC to big.LITTLE design, and now is rumored to adopt a big.LITTLE for their productivity devices (ARM + X86 hybrid no less). AMD has already released a hybrid solution in its Pro line-up which debuted last year. Heck I have even heard that the PlayStation 4 Pro's south bridge is a full-fledged ARM SOC that replaced a traditional South Bridge with a programmable logic, which takes over the x86 CPU in low power mode. (This sounds very similar to what Apple is rumored to do with their next laptop design) Microsoft has been trying to run Windows on ARM, and my gut feeling is that they will also adopt an Apple-like approach at first, i.e. big.LITTLE.

A lot has been said about the alleged inferiority of big.LITTLE design: that it is not an elegant solution; that it is a temporary patchwork with inevitable overheads; that it does not provide any perf/watt benefit in comparison to some ideally-conceived monolithic core solutions; that it is a marketing-driven tactic to inflate core counts. There may be some truth in those claims. But what is also true is that an imaginary SOC is imaginary. What good does a theoretically perfect SOC do if it cannot be manufactured? With all of the leading OEMs adopting big.LITTLE, the argument against it has no leg to stand, it seems.

I have read a PDF, under paywall, on IEEExplorer that talks of Carrizo and Bristolridge AVFS (bested in Ryzen).
Well, even on those processors, the AVFS is capable of lower CPU Vcore, under the current Vdd, core by core, down to 0.5V. There are very low power state, in addition to lower clocked p-states, in which the CPU can be configured with OOO, branch prediction and speculation off: just like a very low power ARM or old atom chip; a simple in-order core. And if I remember well, minimum BR clock is 400MHz.
These low power states, in the paper, are said to be the lowest consumption on light threads, to allow other cores to clock higher. But this can also be an automatic, dynamic and integrated big.LITTLE configuration: if you need all the power, all cores will be in high power mode. If you need low power and there are few light threads, you can put all cores in "little" mode and also intermediate cases.
Think of it as a dynamic big.little configuration, instead of wasting space for more cores...

Andrei. · Feb 11, 2017

bjt2 said:
I have read a PDF, under paywall, on IEEExplorer that talks of Carrizo and Bristolridge AVFS (bested in Ryzen).
Well, even on those processors, the AVFS is capable of lower CPU Vcore, under the current Vdd, core by core, down to 0.5V. There are very low power state, in addition to lower clocked p-states, in which the CPU can be configured with OOO, branch prediction and speculation off: just like a very low power ARM or old atom chip; a simple in-order core. And if I remember well, minimum BR clock is 400MHz.
These low power states, in the paper, are said to be the lowest consumption on light threads, to allow other cores to clock higher. But this can also be an automatic, dynamic and integrated big.LITTLE configuration: if you need all the power, all cores will be in high power mode. If you need low power and there are few light threads, you can put all cores in "little" mode and also intermediate cases.
Think of it as a dynamic big.little configuration, instead of wasting space for more cores...

You're confusing low power with high efficiency. It doesn't matter how much of a core you turn off to save power as you can't make it more efficient than having a separate core with a power optimised synthetisation, lower leakage / higher Vt transistors. You simply can't have a circuit (as in a electrical circuit at the lowest possible level) be both high performance as well as low power. A microarchtectural symmetric design like the Snapdragon 820 is the best example of a purely implementation driven big.LITTLE approach where the lower power cores are outright more efficient because their synthesis is designed to be like that. Also I doubt the increased die size from a small A53 or something like Zephyr is of any major concern to any vendor.

bjt2 · Feb 11, 2017

Andrei. said:
You're confusing low power with high efficiency. It doesn't matter how much of a core you turn off to save power as you can't make it more efficient than having a separate core with a power optimised synthetisation, lower leakage / higher Vt transistors. You simply can't have a circuit (as in a electrical circuit at the lowest possible level) be both high performance as well as low power. A microarchtectural symmetric design like the Snapdragon 820 is the best example of a purely implementation driven big.LITTLE approach where the lower power cores are outright more efficient because their synthesis is designed to be like that. Also I doubt the increased die size from a small A53 or something like Zephyr is of any major concern to any vendor.

I know that the efficiency of a true in order core, with HVT transistors can't be obtained, but the 14nm FF is a very low power process, so an approximation, that does not draw much power and does not consume further area for less used cores, it's better than nothing.
AFAIK there is not another design that allows to turn off power hungry features dynamically on each core...
Speculation and branch prediction are power hungry (and a waste if your predictions are wrong) features and OOO probabily is a non linear feature: you draw X% more to gain much less in performance. Disabling all these feature let save much power. So it's better than nothing...

Andrei. · Feb 12, 2017

bjt2 said:
AFAIK there is not another design that allows to turn off power hungry features dynamically on each core...

I didn't read the article you're referring to but most likely it doesn't entail actual power gating of those feature but rather clock gating them as the former is technically unfeasible to do dynamically due to the latencies and rush currents required. Automatic clock gating of unit blocks is extremely common and all ARM cores no matter from which vendor already do it extensively. Something such as the A73 have around 250 microarchitecturally provided ACG's to various core blocks to turn off clocks. I'm sure Apple and other custom core vendors also extensively use them. Again, these features power saving features not efficiency gaining features. The efficiency difference between a big and little core is in the order of 2-3x, no matter what you do to a big core you'll never achieve such figures by "clever design".

Those power efficiency curves are just a intrinsic characteristic of the silicon that can't be changed during runtime (FD-SOI body biasing unfortunately never materialised).

bjt2 · Feb 12, 2017

Andrei. said:
I didn't read the article you're referring to but most likely it doesn't entail actual power gating of those feature but rather clock gating them as the former is technically unfeasible to do dynamically due to the latencies and rush currents required. Automatic clock gating of unit blocks is extremely common and all ARM cores no matter from which vendor already do it extensively. Something such as the A73 have around 250 microarchitecturally provided ACG's to various core blocks to turn off clocks. I'm sure Apple and other custom core vendors also extensively use them. Again, these features power saving features not efficiency gaining features. The efficiency difference between a big and little core is in the order of 2-3x, no matter what you do to a big core you'll never achieve such figures by "clever design".

Those power efficiency curves are just a intrinsic characteristic of the silicon that can't be changed during runtime (FD-SOI body biasing unfortunately never materialised).

I know that. What i was saying is that there is not a superscalar ooo design that can be run in in-order mode. All speculation, branch prediction, OoO are features that increase the IPC, but cost much more of the performance gain. I don't know the actual number, but it's probable that e.g. it's +20% performance for a +60% power. If you have light threads, you can give up on the +20% performance, saving a +60% power increase...
Clearly, even if the clock can be reduced to 550Mhz with 0,5V and very low consumption (e.g. <300mW/core), with a little core can be made better, but at ocst of area and complexity. And the gain can be of few hundreds mW...

Andrei. · Feb 12, 2017

bjt2 said:
I know that. What i was saying is that there is not a superscalar ooo design that can be run in in-order mode. All speculation, branch prediction, OoO are features that increase the IPC, but cost much more of the performance gain. I don't know the actual number, but it's probable that e.g. it's +20% performance for a +60% power. If you have light threads, you can give up on the +20% performance, saving a +60% power increase...
Clearly, even if the clock can be reduced to 550Mhz with 0,5V and very low consumption (e.g. <300mW/core), with a little core can be made better, but at ocst of area and complexity. And the gain can be of few hundreds mW...

When little cores use around 15-30mW at their lowest execution states (not idle) and total platform power is ~300mW, you sure want those "few hundreds mW". Is this the paper you're referring to? I'll see if I can get access.

bjt2 · Feb 12, 2017

Andrei. said:
When little cores use around 15-30mW at their lowest execution states (not idle) and total platform power is ~300mW, you sure want those "few hundreds mW". Is this the paper you're referring to? I'll see if I can get access.

Mmhhh. This one sounds new (by the number)... I'll download and read...

I described in italian here: http://www.hwupgrade.it/forum/showpost.php?p=44459489&postcount=15611

The second link is on carrizo, the first is on BR. THe third is on the clock stretching on vdroop, older...

EDIT: the one you linked seems to be the full paper. I read a short presentation for the ISSCC of 2 pages... This that you linked is the full fledged paper... Thanks!

THere are author photos at the end and the last seems to be the one that is hugging the AMD Ryzen system in a photo going around in these hours...

2is · Feb 12, 2017

Rngwn said:
This may or may not backfire for Apple as the owners of ARM macs will not be able to use bootcamp to run windows, at least until Microsoft adds ARM processor support for Windows. Though, Microsoft has begin working on it.

They are adding arm as a co-processor. It will still have x86 at its core. At least for now

jpiniero · Nov 18, 2017

https://9to5mac.com/2017/11/18/imac-pro-a10-fusion-chip/

New rumor... the iMac Pro will include an A10 Fusion chip. And it's very much integrated, to the point that it handles the boot process. So we'll have to see if a lot of the functionality mentioned in the earlier article is coming.

At the very least I imagine this is the beginning of the end of Hackintoshes since a lot of the OS is going to be running ARM even if Apple never fully leaves Intel.

Jan Olšan · Nov 18, 2017

jpiniero said:
https://9to5mac.com/2017/11/18/imac-pro-a10-fusion-chip/

New rumor... the iMac Pro will include an A10 Fusion chip. And it's very much integrated, to the point that it handles the boot process.

Hmm, so basically they use it as their own/controlled IME?

beginner99 · Nov 19, 2017

NTMBK said:
But Intel wants to keep the PCH on their older process to protect their profit margins, and Apple get an inferior product as a result.

And that is why EMIB was born. Best of both worlds.

AMDisTheBEST · Nov 19, 2017

Apple is just being Apple.

Ratman6161 · Nov 19, 2017

.vodka said:
Well, they've already got a kickass ARM core that considering the prices they ask for their products, it's not that much of a problem to get it in there to do some tasks, initially.

Next step, moving the entire platform to ARM (MS already has full fat Windows 10 running on ARM...) Apple did it before with Rosetta in the PPC -> x86 transition, they can do it again.

"kickass" is not really what is required here. This would be about very low power usage. Still seems like a solution in search of a problem sort of deal to me.

scannall · Nov 19, 2017

Ratman6161 said:
"kickass" is not really what is required here. This would be about very low power usage. Still seems like a solution in search of a problem sort of deal to me.

For the intended purpose anyway, driving the touch bar and listening for Hello Siri their ARM core in the Apple Watch would be better suited. Very low power.

Eug · Nov 19, 2017

jpiniero said:
https://9to5mac.com/2017/11/18/imac-pro-a10-fusion-chip/

New rumor... the iMac Pro will include an A10 Fusion chip. And it's very much integrated, to the point that it handles the boot process. So we'll have to see if a lot of the functionality mentioned in the earlier article is coming.

At the very least I imagine this is the beginning of the end of Hackintoshes since a lot of the OS is going to be running ARM even if Apple never fully leaves Intel.

This solves the issue with Intel's iMac Pro appropriate multi-core chips not having an integrated GPU. We were wondering how Apple would implement hardware 10-bit HEVC h.265 decode, and hardware 8-bit HEVC h.265 encode, considering Apple has specifically decided NOT to support these features with AMD GPUs.

A10 Fusion is a perfect fit to address this, since it supports both and is a low power part they produce themselves, and it means they don't have to resort to implementing a third platform for hardware HEVC support.

HEVC-encode-decode-capture-support-mac-and-ios-1024x880.jpg

Apple adding ARM coprocessor to future Macs

Elite Member

Elite Member

Golden Member

Senior member

Elite Member

Elite Member

Platinum Member

Elite Member

Lifer

Senior member

Senior member

Senior member

Senior member

Senior member

Senior member

Senior member

Senior member

Diamond Member

Lifer

Senior member

Diamond Member

Senior member

Senior member

Golden Member

Lifer