SPEC CPU 2006 : Skylake Core M vs Broadwell Core M vs Apple A9X

raghu78 · Jan 22, 2016

http://anandtech.com/show/9766/the-apple-ipad-pro-review/4

"Relative to the MacBook, the iPad Pro does best in 445.gobmk, the Go benchmark, while its largest deficit is with 462.libquantum. The latter is a particularly interesting case as the benchmark is very easy to vectorize, giving us perhaps our best look at the vector performance of Twister versus Broadwell, and how well their respective compilers can actually vectorize it. The end result has the Intel platforms solidly in the lead here, hinting that Intel still has better vector performance at this time.

Shifting gears to the Asus ZenBook UX305CA and its newer Skylake based Core m3-6Y30, to little surprise Skylake closes the gap with A9X in the benchmarks where Core M was losing, and pulls further ahead in the benchmarks where it was winning. Despite this the two systems split the number of wins at 5 each, but in the cases where the ZenBook is winning it’s very clearly winning. Overall Skylake sees some decent performance improvements relative to the Broadwell CPU in our MacBook – with the exact gains depending on the test – allowing it to widen the gap compared to the A9X. Overall A9X is still competitive in specific scenarios, but on average it definitely trails the Skylake Core m3.

Finally, going back to Broadwell we have the ASUS Transformer Book T300 Chi, which incorporates a high-end Core M-5Y71 processor. This is still officially a 4.5W TDP processor, and as a result this essentially measures Broadwell Core M’s best case performance. With a maximum CPU clockspeed of 2.9GHz as compared to the slower low-end Skylake and Broadwell CPUs, the T300 Chi unsurprisingly beats the iPad Pro in every single benchmark. At best the two are neck-and-neck with Apple’s best benchmark, 445.gobmk, but otherwise it’s a clear and very significant lead for Intel’s fastest Broadwell Core M processor."

A good comparison of the fastest mobile SoCs on an industry standard benchmark SPEC INT 2006. Skylake/Broadwell is the clear winner and it could be due to those 256 bit AVX2 FP units which support 256 bit integer/floating point operations. Apple A9X is using 128 bit FP units. Anyway Apple is closing the gap rapidly and A10X at 10nm vs Kabylake 14nm Core M and A11X at 7nm vs Cannonlake 10nm Core M should be very interesting.

ShintaiDK · Jan 22, 2016

A9X looks quite bad. 38.5wh battery as well to compete with something that can pretty much do the same with a 24wh battery.

Seems Skylake-Y in this case beats the A9X in performance/watt with around 3x.

It should also be noted the A9X got twice the memory bandwidth.

Apples focus with A10 seems to be a custom GPU.

defferoo · Jan 22, 2016

ShintaiDK said:
A9X looks quite bad. 38.5wh battery as well to compete with something that can pretty much do the same with a 24wh battery.

Seems Skylake-Y in this case beats the A9X in performance/watt with around 3x.

It should also be noted the A9X got twice the memory bandwidth.

Apples focus with A10 seems to be a custom GPU.

huh? you mean 42 Wh battery right? the surface pros have had a 42 Wh battery since the first generation. In case you didn't notice, Surface Pro 4 also lasted about 25-30% less than the iPad Pro with a bigger battery. So no... Skylake isn't 3x better in performance/watt.

ShintaiDK · Jan 22, 2016

defferoo said:
huh? you mean 42 Wh battery right? the surface pros have had a 42 Wh battery since the first generation. In case you didn't notice, Surface Pro 4 also lasted about 25-30% less than the iPad Pro with a bigger battery. So no... Skylake isn't 3x better in performance/watt.

The Core M version is 24wh isn't it?

Surface models also uses DDR3L and Not LPDDR4. NVME SSDs and so on.

jhu · Jan 22, 2016

It's odd they compiled SPECint for x86 using -m32 option but not for ARMv8 version. If anything, speedwise, you'd want -m32 for ARMv8 (lower memory size usage I think) and -m64 for x86 (more registers to utilize).

defferoo · Jan 22, 2016

ShintaiDK said:
The Core M version is 24wh isn't it?

Surface models also uses DDR3L and Not LPDDR4. NVME SSDs and so on.

the m3 version is 38.2 Wh, so the battery capacity is about the same as the iPad Pro. Anandtech didn't compare the battery life of the m3 surface pro 4, but it's supposed to be comparable to the other models.

ShintaiDK · Jan 22, 2016

jhu said:
It's odd they compiled SPECint for x86 using -m32 option but not for ARMv8 version. If anything, speedwise, you'd want -m32 for ARMv8 (lower memory size usage I think) and -m64 for x86 (more registers to utilize).

Its very odd with all the switches for x86.
32bit and reducing speed.

Example with no-prev-div:
This option improves precision of floating-point divides. It has a slight impact on speed

And on the other side with Ofast:
-Ofast - Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math and the Fortran-specific -fno-protect-parens and -fstack-arrays.

Sweepr · Jan 22, 2016

Can we call Geekbench crap for x86 vs ARM SoC comparisons now?

AnandTech said:
Ultimately I think it’s reasonable to say that Intel’s Core M processors hold a CPU performance edge over iPad Pro and the A9X SoC. Against Intel’s slowest chips A9X is competitive, but as it stands A9X can’t keep up with the faster chips.

I wish they had tested a Core m7-6Y75 device, up to 3.1GHz (vs Core m3-6Y30's 2.2GHz). For example Lenovo Miix 700, 12'' Windows 10 tablet/convertible.

ShintaiDK · Jan 22, 2016

Sweepr said:
Can we call Geekbench crap for x86 vs ARM SoC comparisons now?

That's dead for sure. 100% useless.

GB can just as well come clean with their bias

Nothingness · Jan 22, 2016

ShintaiDK said:
Its very odd with all the switches for x86.
32bit and reducing speed.

Some of the SPEC tests are significantly faster when compiled for 32-bit on Intel. 403.gcc and 429.mcf in particular. I ran 403.gcc on my 4770K and the 64-bit version is more than 10% slower than the 32-bit one. mcf would be even worse.

Also Intel has been tuning so much for SPEC that results using icc are more than dubious. I'm afraid the same will happen with SPEC 2016, given that Intel has been part of the committee since the very beginning so they have had access to the source of the upcoming benchmark for years. Note that the same applies to most companies that have their own compiler.

The most fair way to compare CPU is as always to use as close compilers as possible, which basically means using gcc on all platforms.

Exophase · Jan 22, 2016

It's good that the article mentions the role that the compiler plays in the benchmark scores but I think they didn't really get the whole picture there. There seems to be a good faith presumption that the libquantum score reflects ICC being better at vectorization in general, and not that they're specifically targeting this subtest. The fact is that Intel has a big financial incentive to make compilers that are as good at SPEC as possible, while the developers of compilers like Clang and GCC do not.

The other consideration is that ICC is nowhere close to the universal standard for compilation that Xcode is on iOS devices. Or at least I've never used it, I've never seen it used in the workplace, and among open source projects I know of it's not that heavily used.

Arachnotronic · Jan 22, 2016

Can't wait to see how A10 does next year.

Fjodor2001 · Jan 22, 2016

Perf/watt?

Die area?

Price per chip?

Just wondering...

ShintaiDK · Jan 22, 2016

Its 147mm2.

Fjodor2001 · Jan 22, 2016

ShintaiDK said:
Its 147mm2.

All of them?

Arachnotronic · Jan 22, 2016

Fjodor2001 said:
Perf/watt?

Die area?

Price per chip?

Just wondering...

Die area of SKL-Y is ~98mm^2; A9X is a 147mm^2 die. Note that the A9X includes the Southbridge on the die whereas SKL-Y requires a separate PCH on package.

jhu · Jan 22, 2016

Nothingness said:
Some of the SPEC tests are significantly faster when compiled for 32-bit on Intel. 403.gcc and 429.mcf in particular. I ran 403.gcc on my 4770K and the 64-bit version is more than 10% slower than the 32-bit one. mcf would be even worse.

Could you supply a download link? I can't seem to find one.

Thala · Jan 22, 2016

What useless result! Different Compilers with quite different options. Why do they enable IPO for x86 as example? This alone gives the x86 version quite an advantage. And then of course usage of aliasing hints is always questionable (-ansi-alias).
So not only that ICC is pretty much optimized for Spec CPU, there are aliasing hints and ipo enabled on top of that.
Finally with these options for ICC auto-vectorization is enabled. For ARMCC typically you would have to specify --vectorize.

Summary: More than use-less comparison!

ps. /Ofast should not be used either.

Hans de Vries · Jan 22, 2016

raghu78 said:
http://anandtech.com/show/9766/the-apple-ipad-pro-review/4

"Relative to the MacBook, the iPad Pro does best in 445.gobmk, the Go benchmark, while its largest deficit is with 462.libquantum. The latter is a particularly interesting case as the benchmark is very easy to vectorize, giving us perhaps our best look at the vector performance of Twister versus Broadwell,

Utterly worthless and extremely misleading test.
Does this ridiculous soap opera never ends ????

Comparing parallelized code versus non auto parallelized code

Libquantum runs here on a single thread on the A9X while it runs
on four threads on the Intel processors.

Furthermore: Intel did spend more than a (bizarre) $100,000,000 or
so on optimizations to break SPEC_2006.

Use the same (non Intel) compiler or don't use it at all!

Yuriman · Jan 22, 2016

Hans de Vries said:
Utterly worthless and extremely misleading test.
Does this ridiculous soap opera never ends ????

Comparing parallelized code versus non auto parallelized code
Libquantum runs here on a single thread on the A9X while it runs
on four threads on the Intel processors.

Furthermore: Intel did spend more than a (bizarre) $100,000,000 or
so on optimizations to break SPEC_2006.

Use the same (non Intel) compiler or don't use it at all!

Let's take it a step further - nobody should be allowed to use the Intel compiler for anything at all. Intel should not be allowed to have a compiler that works better.

ShintaiDK · Jan 22, 2016

And ban the Apple compiler too

Furthermore: Apple did spend more than a (bizarre) $100,000,000 or
so on optimizations to break GeekBench.

Use the same (non Apple) compiler or don't use it at all!

raghu78 · Jan 22, 2016

jhu said:
It's odd they compiled SPECint for x86 using -m32 option but not for ARMv8 version. If anything, speedwise, you'd want -m32 for ARMv8 (lower memory size usage I think) and -m64 for x86 (more registers to utilize).

I did not notice that. I would expect anandtech to compare x86-64 to ARMv8-64 or x86-32 with ARMv8-32. If it was 32 bit x86 with 64 bit ARMv8 its not a fair comparison. Anyway Intel big cores have 256 bit FP units compared to Apple's big cores which have 128 bit FP units. That could also be a reason for better vector performance.

Apple though has room to add a lot more transistors at 10nm and 7nm. In terms of area scaling.

TSMC 16FF+ - 1.0
TSMC 10nm - 0.48 (2.1x logic density increase vs 16FF+. 1/2.1 = 0.48 )
TSMC 7nm - 0.26 to 0.28 (40-45% area shrink vs 10nm).

In terms of power at same transistor performance

TSMC 16FF+ - 1.0
TSMC 10nm - 0.6 (40% power reduction vs 16FF+)
TSMC 7nm - 0.42 to 0.45 (25-30% reduction vs 10nm)

So by early 2018 with the A11X, Apple can fit in more than twice the transistors of A9X and still be less than 100sq mm and draw same power as A9X. That sets them up very well against Cannonlake 10nm Core M. Apple need 3 things right now - a good implementation of SMT, a very well optimized compiler for SPEC CPU and a 256 bit FPU to compete with Intel's big cores. I think they would get these done in a couple of generations. Apple should keep pushing IPC as high as possible and clocks around 2.5 Ghz as thats the sweet spot for mobile CPUs.

Thala · Jan 22, 2016

Anyway Intel big cores have 256 bit FP units compared to Apple's big cores which have 128 bit FP units. That could also be a reason for better vector performance.

Nonsense. Look at the ridiculous selection of compiler options and you know why Intel wins.
Such a test has to be performed with the same compiler using the very same options. (e.g. gcc)

raghu78 · Jan 22, 2016

Thala said:
Nonsense. Look at the ridiculous selection of compiler options and you know why Intel wins.

Apple has to take responsibility for compiler optimizations. I am thinking they will address it going forward.

Thala · Jan 22, 2016

Apple has to take responsibility for compiler optimizations. I am thinking they will address it going forward.

I am not talking about compiler optimizations Apple would have to do or not. I am saying if you want to compare the microarchitectures you would have to use the same compiler with the same options. The options used by Anandtech for different compilers enable vastly different optimizations.

Otherwise you cannot possibly draw any conclusions and the result is use-less. This is a perfect example of how Spec CPU benchmark should not be used. It is not even worth discussing the results.

SPEC CPU 2006 : Skylake Core M vs Broadwell Core M vs Apple A9X

Diamond Member

Lifer

Member

Lifer

Lifer

Member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Lifer

Golden Member

Senior member

Diamond Member

Lifer

Diamond Member

Golden Member

Diamond Member

Golden Member