Apple CPUs "just margins off" desktop CPUs - Anandtech

IntelUser2000 · Oct 14, 2018

These comparisons don't really matter. If Apple makes an A13-A14 Macbook/iMac, they could go all integrated route and have no expansion slots. The advantages of that is lower power, lower cost.

Intel and Apple's focus is different enough that we can't compare them Apples-to-Apples. That's the real problem we have. We are trying to look at the real world through a benchmark, where everything is equal. There's no such thing. Nothing exists in a vacuum. You'll get very flawed results when you try to do that.

Intel is trying to be the best at the horizontally integrated model where it works with multiple vendors. Apple is trying to be the most vertically integrated company where they can control as much of the system as possible.

There is obviously an advantage to the total vertically integrated model Apple is trying to pursue. Often, Intel has to wait for other vendors to do their stuff before they release their chips, or to fix a bug. For example remember Speedshift? It needed a compatible OS. For Intel, they need to work with Microsoft, the open source community at various Linux distros, and Apple. If you own the operating system, and the device, you can schedule everything together.

Thala · Oct 14, 2018

Nothingness said:
iOS has some restrictions such as not allowing run time code generation. This will limit the benchmarks you can run.

So what? I was replying to "...resistant to independent testing at the hardware level" - which is clearly not the case.

Thala · Oct 14, 2018

IntelUser2000 said:
These comparisons don't really matter. If Apple makes an A13-A14 Macbook/iMac, they could go all integrated route and have no expansion slots. The advantages of that is lower power, lower cost.

It is totally irrelevant when comparing the efficiency of core architectures if they are vertically or horizontally integrated into the system - or if they have any board level expansion slots.

DrMrLordX · Oct 15, 2018

Nothingness said:
iOS has some restrictions such as not allowing run time code generation. This will limit the benchmarks you can run.

Well, it goes deeper than that . . .

Thala said:
So what? I was replying to "...resistant to independent testing at the hardware level" - which is clearly not the case.

Can you run the Phoronix test suite on iOS? If you can then great.

Right now, if I want to compare (for example) a Kirin 970 vs. a Goldmont+ or something, I put Linux on both systems, and I start picking out benches from the Phoronix suite. I'm not sure how many of them will actually run on ARM64 Linux, but from what I have seen of Phoronix's testing of server ARM CPUs, it is possible.

Otherwise I am looking at let's say, some closed-source OS like iOS running stuff like Antutu and Geekbench, and that's it.

With the A12 being out today and the A12x;Kirin 980; and Snapdragon 8180 coming in the future, it's time for x86 fanatics to start looking at what ARM brings to the table. The ARMy is coming. But in order to take them seriously, we have to establish some common ground for testing. Geekbench just doesn't cut it.

Nothingness · Oct 15, 2018

DrMrLordX said:
Well, it goes deeper than that . . .

Can you run the Phoronix test suite on iOS? If you can then great.

Sorry but that's just a non-blocking infrastructure issue. Nothing prevents you from running the Phoronix benchmarks by hand... except for cases such as the one I mentioned where the OS prevents you from doing run time code generation.

Right now, if I want to compare (for example) a Kirin 970 vs. a Goldmont+ or something, I put Linux on both systems, and I start picking out benches from the Phoronix suite. I'm not sure how many of them will actually run on ARM64 Linux, but from what I have seen of Phoronix's testing of server ARM CPUs, it is possible.

Yes, it is possible. I've run it on Jetson TX2. And, as others, found out that Michael has not properly configured the board.

Otherwise I am looking at let's say, some closed-source OS like iOS running stuff like Antutu and Geekbench, and that's it.

In some cases a properly tuned binary (such as Geekbench) is better than an open source package that has only been tuned for x86.

With the A12 being out today and the A12x;Kirin 980; and Snapdragon 8180 coming in the future, it's time for x86 fanatics to start looking at what ARM brings to the table. The ARMy is coming. But in order to take them seriously, we have to establish some common ground for testing. Geekbench just doesn't cut it.

Good luck finding a board with proper Linux (as in Linux distro support) support. Most of them are Android only with hacks to run Linux. Even HiKey 970.

No matter how much I like ARM, the situation is depressing.

Hitman928 · Oct 15, 2018

Nothingness said:
I disagree: if these IO are not used, a properly designed SoC won't consume more just because the handling of these IO are on-chip. And if you start using these IO then you're not measuring CPU efficiency alone any more.

Yes, these features I'm sure could be power gated to a great extent, but in the context of the original question it seems like a trivial answer to me to suggest adding a bunch of functionality won't change anything because we'll just turn it off anyway. Pretty sure the question was asked with the assumption that said functionality would actually be used.

As far as efficiency goes, that was never mentioned in the original discussion but I addressed it at some point in one of my posts (or rather that I don't think we yet have the proper data or scope to really discuss it in terms of a desktop environment).

Hitman928 · Oct 15, 2018

CZroe said:
It sure sounded like you were saying it right here:

hitman928 said:

These are SOCs where the functionality mentioned is integrated into the CPU itself. So yes, power consumption is effected.

Click to expand...

I read it as “Intel CPUs are SOCs where the functionality mentioned is integrated into the CPU itself.”

Specifically intel CPUs were never mentioned in the original discussion. It was about x86 (which is a broad category) versus A12 specifically. Hence the pronoun "these" referring to the original question between A12 and x86 CPUs. I can understand that it could cause confusion given the chain of comments but either way, the actual point still stands.

because, obviously, you are saying that it has to be added to an Apple chip in the next line, which I read as “So yes, power consumption is affected when you add these to an Apple CPU.”

Correct, going back to the original question, it was about adding IO that current Apple chips do not support, so yes, adding that support will increase power needs. If the question was about adding deep learning hardware into x86 CPUs, my answer would have been the same for x86 CPUs as well.

Sure sounded like you were trying to create a distinction without a difference to apply “SOC” to Intel and imply that Apple will have to add Jaycee features to become an SOC themselves. I guess you mean to say that it has to add such features to become a comparable SOC.

Correct, but only comprable in context of the original question.

I look at it the exact. . . snip

This is all well and good and a fine discussion to have, but is completely outside the scope of the original question and answer which were both very basic and specific. I never mentioned anything or try to argue anything outside of the simple question and answer of adding certain IO functionality to A12 and its effect. You're the only one who started arguing all sorts of other points and insulting other people for not agreeing with your one sided argument.

DrMrLordX · Oct 15, 2018

Nothingness said:
Sorry but that's just a non-blocking infrastructure issue. Nothing prevents you from running the Phoronix benchmarks by hand... except for cases such as the one I mentioned where the OS prevents you from doing run time code generation.

So you can actually download and compile those benches on iOS, and then run them? I'm sure it might be easier to do it on Android. See below.

Yes, it is possible. I've run it on Jetson TX2. And, as others, found out that Michael has not properly configured the board.

Bad benches are still bad. That has been, is, and will be a problem. Hopefully people can point out the mistakes he (and others) have made in the past to avoid them moving forward.

In some cases a properly tuned binary (such as Geekbench) is better than an open source package that has only been tuned for x86.

That's another stumbling block to good comparative testing between x86 systems and ARM chips. Most of the ARM binary benches are aimed at showing how well the chip will do in a mobile platform, such as a cell phone. x86 testing is completely different. The usual spate of binaries run on mobile chips does very little for me as an x86 enthusiast, and presumably benchmark suites preferred by x86 desktop nerds do very little to excite cellphone users. Who cares how well an A12 does running Cinebench R15 when practically nobody is going to be running a niche/outmoded 3D rendering suite on a cellphone? Or Blender or similar.

Additionally, if you can run Blender on an ARM system, is it going to support Neon etc.? By default, probably not. There are Blender versions with SSE and AVX support. But Neon? At least with an open source benchmark, someone can add that in, given the time and expertise. That puts a heavy burden on the reviewer, though.

Good luck finding a board with proper Linux (as in Linux distro support) support. Most of them are Android only with hacks to run Linux. Even HiKey 970.

How do the hacks affect system performance, though?

No matter how much I like ARM, the situation is depressing.

It won't change until someone - presumably ARM hardware vendors - decide that proper desktop OS support is desirable for their product. Right now, I think Qualcomm, Apple, and Huawei are content to force people to use what they can from Android and iOS and to hell with anything else. Maybe once the Snapdragon 8180 is out we'll see some more tuned binaries for Win10. Linux support? We'll see.

Thala · Oct 15, 2018

DrMrLordX said:
That's another stumbling block to good comparative testing between x86 systems and ARM chips. Most of the ARM binary benches are aimed at showing how well the chip will do in a mobile platform, such as a cell phone. x86 testing is completely different. The usual spate of binaries run on mobile chips does very little for me as an x86 enthusiast, and presumably benchmark suites preferred by x86 desktop nerds do very little to excite cellphone users. Who cares how well an A12 does running Cinebench R15 when practically nobody is going to be running a niche/outmoded 3D rendering suite on a cellphone? Or Blender or similar.

Additionally, if you can run Blender on an ARM system, is it going to support Neon etc.? By default, probably not. There are Blender versions with SSE and AVX support. But Neon? At least with an open source benchmark, someone can add that in, given the time and expertise. That puts a heavy burden on the reviewer, though.

Typically you compile the benchmark you are interested in yourself. If there is a code path for AVX but no code path for NEON available, i disable AVX to get an apples-to-apples comparison. As for example with 7-zip I disabled AVX as there was no suitable NEON code path available.

It won't change until someone - presumably ARM hardware vendors - decide that proper desktop OS support is desirable for their product. Right now, I think Qualcomm, Apple, and Huawei are content to force people to use what they can from Android and iOS and to hell with anything else. Maybe once the Snapdragon 8180 is out we'll see some more tuned binaries for Win10. Linux support? We'll see.

Qualcomm offers for their Snapdragon line of SoCs excellent Windows10 support - including all drivers for DirectX12, Audio/Video/Camera/Network etc. As a side effect, with the Windows Subsystem for Linux (WSL) you can run Aarch64 Linux binaries on Windows.

DrMrLordX · Oct 16, 2018

Thala said:
Typically you compile the benchmark you are interested in yourself. If there is a code path for AVX but no code path for NEON available, i disable AVX to get an apples-to-apples comparison. As for example with 7-zip I disabled AVX as there was no suitable NEON code path available.

Honestly, I would rather see NEON vs AVX/AVX2/AVX512 than raw fp vs. raw fp. You're going to have to gimp the x86 chip and force it to run a x87 codepath which almost never happens in real-world scenarios. Unless you're content to run SSE2/SSE3/SSE4.1 on x86 vs raw fp on ARM.

edit: it does look like there are some conversion tools to translate SSE code to NEON (and vice versa), so it might be possible to run SSE/SSE2/SSE3/SSE4.1 benchmarks versus NEON without a whole lot of work.

Then there's other oddball issues, like . . . division operations not being pipelined in certain architectures. I don't know if ARM32 or ARM64 allow you to pipeline division operations, but I'm pretty sure that's a gamebreaker on x86 CPUs. I recall code for PCs having significant speed advantages if you can convert division operations to multiplication operations using constants (assuming you know in advance what will be the denominator; you just convert to a decimal value).

Qualcomm offers for their Snapdragon line of SoCs excellent Windows10 support - including all drivers for DirectX12, Audio/Video/Camera/Network etc. As a side effect, with the Windows Subsystem for Linux (WSL) you can run Aarch64 Linux binaries on Windows.

That leaves Huawei and Apple out in the cold. But if I wanted real Snapdragon vs Intel/AMD performance metrics, it'd be tempting to use Win10 as a testing environment for all platforms, with custom-compiled binaries for the ARM system where necessary.

Nothingness · Oct 16, 2018

Thala said:
Typically you compile the benchmark you are interested in yourself. If there is a code path for AVX but no code path for NEON available, i disable AVX to get an apples-to-apples comparison. As for example with 7-zip I disabled AVX as there was no suitable NEON code path available.

I all you are interested in is pure CPU performance then I agree with your approach. Though it could be argued that doing so, you will miss the advantages of one of the ISA, and so you should concentrate on software that have been properly ported and tuned for both architectures.

And anyway if you want to get information about what the end user experience will be that's the wrong way of measuring things.

Qualcomm offers for their Snapdragon line of SoCs excellent Windows10 support - including all drivers for DirectX12, Audio/Video/Camera/Network etc. As a side effect, with the Windows Subsystem for Linux (WSL) you can run Aarch64 Linux binaries on Windows.

Yeah, that confirms what I wrote: depressing situation.

Thala · Oct 16, 2018

Nothingness said:
I all you are interested in is pure CPU performance then I agree with your approach. Though it could be argued that doing so, you will miss the advantages of one of the ISA, and so you should concentrate on software that have been properly ported and tuned for both architectures.

And anyway if you want to get information about what the end user experience will be that's the wrong way of measuring things.

I am more interested in comparing architectures than end-user experience. And technically it is not necessarily an ISA advantage but rather an issue of porting effort.

Sometimes i am just doing the porting to NEON - which i did for PovRay - the interesting result was, i did not get any speedup when only a single NEON unit is present (like on my device) because the available data level parallelism was too low. Looking forward to Cortex A76 - which is supposed to have dual NEON units.
Still the overall PovRay result was good for a low power device, because a Snapdragon 835 will utilize all 8 cores for the problem (without throttling)

Nothingness · Oct 16, 2018

Thala said:
Sometimes i am just doing the porting to NEON - which i did for PovRay - the interesting result was, i did not get any speedup when only a single NEON unit is present (like on my device) because the available data level parallelism was too low. Looking forward to Cortex A76 - which is supposed to have dual NEON units.
Still the overall PovRay result was good for a low power device, because a Snapdragon 835 will utilize all 8 cores for the problem (without throttling)

Did you just recompile with the right options? And what was your device?

Thala · Oct 16, 2018

Nothingness said:
Did you just recompile with the right options? And what was your device?

PovRay? I did compile with MSVC release build (optimize speed) - both x64 and ARM64 - same options. My device is a HP Envy X2 (Windows 10) - Snapdragon 835 (4xA73 -2.45GHz, 4xA53-1.9GHz). If you run PovRay with 8 threads, you see all 8 cores @ 100% load.

Nothingness · Oct 17, 2018

Thala said:
PovRay? I did compile with MSVC release build (optimize speed) - both x64 and ARM64 - same options. My device is a HP Envy X2 (Windows 10) - Snapdragon 835 (4xA73 -2.45GHz, 4xA53-1.9GHz). If you run PovRay with 8 threads, you see all 8 cores @ 100% load.

Thanks. And I guess you checked that the MS compiler is generating non-scalar NEON instructions where the x86 back-end generates vector instructions? I mean is MS compiler good at vectorizing, and is it as good at generating AArch64 as it is at generating x86.

Thala · Oct 17, 2018

Nothingness said:
Thanks. And I guess you checked that the MS compiler is generating non-scalar NEON instructions where the x86 back-end generates vector instructions? I mean is MS compiler good at vectorizing, and is it as good at generating AArch64 as it is at generating x86.

Vectorization is enabled by default, the backend will generate SSE2/4.2, AVX, AVX2 or NEON code depending on architecture. That having said i have yet to see a compiler producing good vectorized code for non-trivial loops. Thats the reason that for critical functions you often find explicitly vectorized code using intrinsics.
My hope is, that the introduction of SVE in actual architectures (aside from Supercomputers) will improve the situation drastically. The SVE ISA is designed in a way, which makes it much easier for the compiler to vectorize non-trivial code.

name99 · Oct 24, 2018

Gideon said:
Very interesting benchmarks. Just a tiny nitpick, and I know the difference will be minor, but imo it would be slightly more accurate to use Clang as the compiler for Intel Xeon, instead of GCC (as that's also what apple uses heavily)

You know that if he does that, people will just say that LLVM sucks, and Intel would look much better on GCC, right?
Go read Phoronix. Every time the subject of GCC vs LLVM comes up, that’s the response...
And it is essentially correct. For x86 GCC is (slightly) ahead of LLVM, primarily bcs GCC has had a lot longer to optimize and tries to do less than LLVM.

name99 · Oct 24, 2018

CatMerc said:
Food for thought: The fabric and uncore alone on Skylake cores eat about as much power as the TDP of an entire A12 SoC. They have different goals in terms of flexibility and scalability in both the architectural sense and user sense.

Combine that with the different nodes, and how Skylake is a 2015 architecture, with its successor being held up by manufacturing issues, making this an ARM Vs x86 comparison is silly.

Here's a funny thing. From what I've heard, Tiger Lake is supposed to bring idle power down to 6mW. That made Apple executives take a step back and re-evaluate their plans. Now consider that Tiger Lake was supposed to ALREADY be out, and it's quite clear to me that it's a problem of execution, not of architecture.

Oh god. My vaporware can beat up your vaporware!
We’ve already seen the (decidedly unimpressive) Ice Lake results, and that’s supposed to ship 2020. You think Apple’s going to be impressed, and changed their plans, for something that will ship in, MAYBE, 2021, and will be a slightly boosted Ice Lake (which is just a slightly boosted Skylake)?

CatMerc · Oct 25, 2018

name99 said:
Oh god. My vaporware can beat up your vaporware!
We’ve already seen the (decidedly unimpressive) Ice Lake results, and that’s supposed to ship 2020. You think Apple’s going to be impressed, and changed their plans, for something that will ship in, MAYBE, 2021, and will be a slightly boosted Ice Lake (which is just a slightly boosted Skylake)?

vOv

I didn't make any claims about directions Apple will take. I just said there was extreme surprise when Intel revealed their Tiger Lake plans to the executives.

I also completely disagree with your "unimpressive" claims for Ice Lake. Literally a leaked result of a test platform. We don't know how high it clocks, power characteristics, die sizes, anything. One thing for sure is that it proves Skylake isn't the wall for x86 IPC like some people suggest.

Nothingness · Oct 25, 2018

CatMerc said:
I also completely disagree with your "unimpressive" claims for Ice Lake. Literally a leaked result of a test platform. We don't know how high it clocks, power characteristics, die sizes, anything. One thing for sure is that it proves Skylake isn't the wall for x86 IPC like some people suggest.

So we don't know anything, but you know for sure its IPC will be better enough to disprove some claims? Isn't there some contradiction here?

I think that if we assume the Geekbench results are correct, the increase in performance per clock is nice. But for that we have to assume we know something.

CatMerc · Oct 25, 2018

Nothingness said:
So we don't know anything, but you know for sure its IPC will be better enough to disprove some claims? Isn't there some contradiction here?

I think that if we assume the Geekbench results are correct, the increase in performance per clock is nice. But for that we have to assume we know something.

We can assume the IPC in those tests is a floor of what we can expect barring catastrophic architectural failures that they couldn't fix in time.

Greyguy1948 · Oct 25, 2018

I do hope Anandtech will have a deep look at Kirin 980 soon. Latency for example...
Some tests:
https://www.notebookcheck.net/Huawei-Mate-20-Pro-Smartphone-Review.338680.0.html
Javascript like JetStream is only 40% of the last Iphone!

Arachnotronic · Oct 25, 2018

Greyguy1948 said:
I do hope Anandtech will have a deep look at Kirin 980 soon. Latency for example...
Some tests:
https://www.notebookcheck.net/Huawei-Mate-20-Pro-Smartphone-Review.338680.0.html
Javascript like JetStream is only 40% of the last Iphone!

Javascript performance depends on the browser/platform in addition to the CPU.

Nothingness · Oct 25, 2018

The Geekbench score seems to indicate that Cortex-A76 (found in Kirin 980) is slightly below Apple A10 level: https://browser.geekbench.com/v4/cpu/compare/8874900?baseline=10368871

Andrei. · Oct 25, 2018

Greyguy1948 said:
I do hope Anandtech will have a deep look at Kirin 980 soon. Latency for example...
Some tests:
https://www.notebookcheck.net/Huawei-Mate-20-Pro-Smartphone-Review.338680.0.html
Javascript like JetStream is only 40% of the last Iphone!

Yes ... Pixel 3 review is taking a ton of time because of the camera part.

Nothingness said:
The Geekbench score seems to indicate that Cortex-A76 (found in Kirin 980) is slightly below Apple A10 level: https://browser.geekbench.com/v4/cpu/compare/8874900?baseline=10368871

It's also a bit below A10 in SPEC. Efficiency is good though. The performance scores are about the same as the 3GHz results I projected here: https://www.anandtech.com/show/12785/arm-cortex-a76-cpu-unveiled-7nm-powerhouse/4

Efficiency is between those two projections.

Apple CPUs "just margins off" desktop CPUs - Anandtech

Elite Member

Golden Member

Golden Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Lifer

Golden Member

Lifer

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Senior member

Senior member

Golden Member

Diamond Member

Golden Member

Member

Lifer

Diamond Member

Senior member