Solved! ARM Apple High-End CPU - Intel replacement

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Richie Rich

Senior member
Jul 28, 2019
470
229
76
There is a first rumor about Intel replacement in Apple products:
  • ARM based high-end CPU
  • 8 cores, no SMT
  • IPC +30% over Cortex A77
  • desktop performance (Core i7/Ryzen R7) with much lower power consumption
  • introduction with new gen MacBook Air in mid 2020 (considering also MacBook PRO and iMac)
  • massive AI accelerator

Source Coreteks:
 
  • Like
Reactions: vspalanki
Solution
What an understatement :D And it looks like it doesn't want to die. Yet.


Yes, A13 is competitive against Intel chips but the emulation tax is about 2x. So given that A13 ~= Intel, for emulated x86 programs you'd get half the speed of an equivalent x86 machine. This is one of the reasons they haven't yet switched.

Another reason is that it would prevent the use of Windows on their machines, something some say is very important.

The level of ignorance in this thread would be shocking if it weren't depressing.
Let's state some basics:

(a) History. Apple has never let backward compatibility limit what they do. They are not Intel, they are not Windows. They don't sell perpetual compatibility as a feature. Christ, the big...

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
I'm not even 100% sure that they care that much about Mac's at this point at all.
They don't. In both revenue and profit table they are far down in the also ran section. They just are a necessity as development machines for the whole ecosystem so they have to exist.
 

Tup3x

Senior member
Dec 31, 2016
944
925
136
Apple aside I would first like to see how Surface Pro X actually performs. Preferably with native ARM apps vs. competition. If the results are good things might look quite a bit different for future 2-in-1 and laptop hardware. I'd recon it will still take two generations before ARM on Windows is truly ready (when it comes to performance) but I guess we'll just have to wait and see.
 

the2199

Junior Member
Oct 17, 2019
13
4
81
it so funny when I say apple will not do it. or even if it will happen it will be a flop. and when i give a valid arguments. I only get downvoted so I am going to quote my self again
for example x264 the video encoder for h.264 or (AVC) is fine-tuned for x86 assembly. you can run x264 on any other (ISA) it will run just fine but it will not as fast compare to x86. there is a difference between code running on the cpu and code running on the cpu and using it to it fullest. take google new av1 decoder, for example, it created with arm on mined not for x86 you can compile it for x86 but the performance is so slow compared to other decoders see how LIBGAV1 is so slow
and arm support for UEFI or APCI is laughable. APCI is responsible for letting your os know for example what is the frequency or your ram among other things

and arm cpu like to treat Hardwear like a BlackBox

the biggest issue to optimize your program and its libraries for arm. ios is optimized or arm because ios have been using arm since day 1

but macOS has been using x86 for ages .and I know they are able to pull off the transaction from (PowerPC) to (x86) but that does not mean they will be able to pull it off again

don't get me wrong apple is able to run and compile macOS for arm. but there a wide difference between running something and running something and optimizing it. and even if they optimized the os that does not mean 3rd party apps on the store will get optimized also
 

naukkis

Senior member
Jun 5, 2002
701
569
136
I did say the appearance is roughly similar aside from scaling of the frequency axis. This means that two different architectures may have different frequencies say at certain voltage but the relative scaling stays the same. Or more strictly speaking cycle time is roughly inverse proportional to (Vcc-Vth)/Vcc. The proportionality factor depends on the (capacitance) of the critical path of the particular architecture.

Those Apple A12 voltage tables are from phone, so with voltage increase temperatures will also increase rapidly. Those voltage/frequency curves would be quite different in less thermally constrained environment.
 

naukkis

Senior member
Jun 5, 2002
701
569
136
So why hasn't it been done? Why is nobody thinking about doing it? I am so tired of armchair engineers that seemingly know better than the real engineers that produce these products.

How could a phone have a cpu as fast as fastest desktop cpu? That's a insane situation, most of people still just don't understand it because it should not be possible at all.
 
  • Like
Reactions: wintercharm

Thunder 57

Platinum Member
Aug 19, 2007
2,647
3,706
136
Easy: because Intel and AMD used to make the best CPU at an affordable price. And legacy of course as others wrote.

Like it or not x86 as an instruction set sucks; did you do assembly language programming on it? And on other assembly language to compare? It's just an abomination; it has gotten better with 32-bit and x86-64, but its 8080 roots still show.

But Intel and AMD implementations of x86 are good and this is what matters to the end user (plus legacy obviously).

Never thought about it like that. Yes, x86 assembly sucks. That's probably why they teach MIPS or ARM assembly at most universities. I did MIPS, but they did spend some time on x86 as well. So the assembly may suck, but x86 is still good.

How could a phone have a cpu as fast as fastest desktop cpu? That's a insane situation, most of people still just don't understand it because it should not be possible at all.

Because it's not. Let's see that phone transcode some h264.
 

name99

Senior member
Sep 11, 2010
404
303
136
Probably. But wouldn't that be dedicated hardware, like Quicksync? I bet it is slow as heck on the actual CPU of the SoC.

Why would you want to do it on the CPU?
THAT is the basic problem with your analysis; you're living in the past. Most of the examples you want to give of highly tuned x86 code that don't have ARM counterparts don't have those counterparts because no-one in ARM/Apple-land cares.
Video/audio/image encoding are done with dedicated hardware. Likewise for crypto. Likewise for some types of compression (it remains unclear just what compression Apple still does on the CPU, doubtless with tuned ARM assembly, vs what they have already moved to accelerators.)

So what's left?
There are things like highly tuned interpreters. If you care about running Lua or Python on ARMv8 it MIGHT be slower. Obviously JS is just fine. Anyone have any real data?

There is also highly tuned numerical code. When I compare Mathematica on iMac Pro to Wolfram Player on A12X iPad Pro, for the most part they are comparable in performance, but there are clear areas where Wolfram Player substantially lags.
Most of these differences appear to be policy choices by Wolfram as to how performant to make Wolfram Player so that it doesn't compete too much with Mathematica (eg the parallelization/vectorization support is abysmal).
But there's clearly a non-policy issue with the bignum support, which is just terrible --- that's probably the one (and only) real-world case I am aware of where highly-tuned x86 code has not (yet...) been rewritten to highly-tuned ARMv8 and is probably just using straight C.
 
  • Like
Reactions: Nothingness

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
Because it's not. Let's see that phone transcode some h264.
Relatively speaking to older desktop systems it is.

It should be possible, even with just CPU - I was playing with x264 on a dual core Athlon X2 back in 2006, so a quad core A76 should do pretty well if the encoder has enough NEON optimisations in the codebase.
 

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
Probably. But wouldn't that be dedicated hardware, like Quicksync? I bet it is slow as heck on the actual CPU of the SoC.
ASIC is a completely different beast designed to run at relatively low clocks (sub Ghz?) compared to CPU cores, because the circuit pretty much replicates software codec functions directly in silicon without extraneous overhead.

Their main benefit is in mobile systems that are highly power constrained, in desktops they just take the load off the CPU, allowing it to be used for other things at the same time (like playing games while streaming encoded video).

Their main disadvantage as a fixed circuit is a lack of tunability, unlike x264/x265/libaom/SVT-AV1/rav1e which can endlessly be updated and made better over time - more quality, more SIMD, more cores.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Why would you want to do it on the CPU?

From what I understand, software encoding gives the highest quality per bitrate at smaller file sizes compared to using hardware encoding. This distinction isn't likely to go away either, as encoders become more computationally complex.

I posted a thread a few weeks ago about Intel's SVT codecs which offers dramatic performance gains compared to other solutions. If Intel can have performance gains like this, whilst maintaining quality, then the case for hardware encoders is not going to look good in the future.
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
If Intel can have performance gains like this, whilst maintaining quality, then the case for hardware encoders is not going to look good in the future.
The case is still there for the game streaming niche, an ASIC takes the load largely off the CPU - for video encoding on mobiles and video DSLR's/cinecams it's also needed, though the bitrate target tends to be much higher for the latter, so quality tuning is less necessary.

As for the SVT codecs, their main benefit is to systems with a large quantity of cores and memory - owing to the less than stellar quality decrease as you scale thread count in competing codecs.

SVT seemingly has zero quality decrease as you scale thread count, though it does seem to have a performance scaling ceiling - an Epyc 7742 hardly scores worse than 2x Epyc 7742 by Phoronix's testing.

It's my hope that as SVT is now open, we will see those scaling benefits make it into other codecs like x264 and rav1e as the proverbial magic sauce is unravelled.
 
Last edited:

beginner99

Diamond Member
Jun 2, 2009
5,208
1,580
136
For laptops, I fail to see why it's not possible to design one laptop chip and fuse off parts/adjust clocks to differentiate performance between different product lines. Mac Pros (and to a lesser extent iMacs) are a tougher nut to crack with this approach but certainly not impossible, given the die sizes involved its almost certain that they'd have lots of chips not up to stuff for top clocks/full functions.

For laptops it can work yes maybe even imac with some compromise. But macpro is not doable. You can get macpros with dual xeons and with the next macpro the could go with zen 2 with up to 64 cores.

When one lightning core in an A13 SoC draws 4-5w to get similar single-threaded performance as a Ryzen 2 core in a 3900x drawing ~18w with no process advantage, that's a heck of an incentive to do ditch x86.

Proof needed. Single-threaded in what? geekbench? How do you get the power numbers per core? All the non-core stuff which desktop CPUs have a lot more uses power too. A13 doesn't have pcie, sata or infinity fabric power use. And we have seen how much power that stuff uses.
Only thing I can say is that Apple is known to offer subpar connectivity for the price and makes additional profit on that by selling dongles. Might work for laptop where fanboys still buy even so it only has 1 usb-c port. but on mac pro? Nope.


How could a phone have a cpu as fast as fastest desktop cpu? That's a insane situation, most of people still just don't understand it because it should not be possible at all.

burst performance is one thing sustained completely different. Why can I run the same core in a 5w laptop that is in a full blown server with >200w? Sustained performance. The jump from ULV to server cpus in terms of wattage is far bigger than from Axx to laptop cpus. So your point isn't really much of value. The main difference simply is power management or sustained performance.

And then the fact that apple designs 1 soc that goes into mobile devices only in contrast to intels cores that go into laptops to desktops to servers. And the cores 5w CPUs in ULV laptops are identical to the cores in a 9900k using close to 150w. What's the difference? Cooling or said otherwise sustained performance.
Not to mention x86 backwards compatibility which intel/amd have to adhere to.

Once Apple actually needs to match sustained performance and not just IPC, then the picture changes dramatically.
 

naukkis

Senior member
Jun 5, 2002
701
569
136
Because it's not. Let's see that phone transcode some h264.

And desktop cpu's will lose to server farm. I did though that we are comparing single-core performance, for which best available measuring tool which is also used with cpu manufacturers is spec. And those Anand spec tests show that in specInt A13 and 9900K integer scores are equal. And equal spec scores from phone and best performing desktop cpu is pretty funny thing to happen - most people still refuse to believe it's actually happening. And it should not happen, only reason of such a weird results are that Apple has extremely good cpu arch for phone and at similar time x86 manufacturers have piss poor archs for desktop.
 

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
That's getting better but still no XCode and no runtime generated code. So useless as a dev machine, at least for me.
Google seem to be on a similar track with Chrome OS, I think they ported Android Studio to work on it?

If so, I would be surprised if Apple is not pursuing similar efforts with iPad OS.