SPEC CPU 2006 : Skylake Core M vs Broadwell Core M vs Apple A9X

thunng8 · Jan 22, 2016

ShintaiDK said:
And ban the Apple compiler too

The only difference is the Apple compiler is use in every iOs app.

While ICC is known as the SPEC compiler. Good at SPEC runs but very little used in commercial or open source applications.

Exophase · Jan 22, 2016

ShintaiDK said:
And ban the Apple compiler too

Even if that were really true there's nothing stopping you from using the so-called Apple compiler to generate x86 binaries too, where you'd probably get most of the benefit of whatever optimization benefits they put into it. Especially if those include benchmark breakers. And since that compiler is open source we can evaluate whatever they did or didn't do to it.

Vesku · Jan 22, 2016

Considering the gap in overall CPU experience between Apple and Intel there is probably some serious concern among Intel execs every product refresh that they will see A series SoCs finally mixed into the traditional x86 Apple product stack.

ICC based Spec benchmarking is best case scenario for Intel vs Apple CPU and they actually lose roughly half the time, ouch.

dark zero · Jan 22, 2016

Seems that Antutu is the only valid compiler for now...

Ryan Smith · Jan 22, 2016

Hi guys, just dropping by to respond to a couple of comments.

To be clear here, we're well aware of the pros and cons of SPEC CPU 2006, which is why there was a fairly long preamble to that section describing them. SPEC is not perfect and should not be the only benchmark you ever listen to. However among cross-platform benchmarks it's very unique in its capabilities and a powerful tool as well.

In any case, that SPEC CPU is a "system's processor, memory subsystem and compiler" benchmark is an intentional aspect of its design. The traditional way to run SPEC CPU is to pair it up with the fastest compiler with the most aggressive settings you can get away with, to give the system every possibility opportunity to produce the best possible score. This is how we've chosen to use SPEC CPU as well, in accordance with how it's typically used.

The issue with compilers is that it's impossible to take them out of the equation. Even if one uses the same compiler for multiple architecture, you are now benchmarking how well a compiler optimizes for a specific architecture, and there are some big gaps there. In some ways it's definitely better, but in other ways it's worse.

Ultimately traditional wisdom is that it's better to admit that the compiler is part of the test and that it's another way to optimize execution of the benchmark, rather than trying (and failing) to remove it from the equation. It's imperfect for sure, but it's the best option available.

(BTW, if I had to use a car analogy here, SPEC CPU would be F1 racing)

As for the specific setups and flags we used, those are the settings that were recommended to us. -Ofast is essentially the fastest way to go on XCode/LLVM, and the Intel settings, though a bit more complex, are what they believe are best for SPEC CPU. We wanted to make this as typical as possible for a SPEC CPU run, and that included using the best settings available.

Arachnotronic · Jan 22, 2016

edit: nevermind.

PPB · Jan 22, 2016

He does not AFAIK. It is indeed funny how you try to jump on him without even doing some basic google search before jumping to conclusions.

Arachnotronic · Jan 22, 2016

Thala said:
I am not talking about compiler optimizations Apple would have to do or not. I am saying if you want to compare the microarchitectures you would have to use the same compiler with the same options. The options used by Anandtech for different compilers enable vastly different optimizations.

Have to agree. If you're trying to get an apples to apples CPU comparison, I'd argue everything needs to be apples to apples from a compiler perspective.

The more interesting thing is what is valid for a platform comparison? As people have rightly pointed out, ICC is not that popular for client software but Apple's compiler is used, well...for pretty much everything on iOS.

Schmide · Jan 22, 2016

Could there be algorithm drift in SPEC CPU 2006? Especially with the simulations that use random numbers. (462.libquantum, 456.hmmer, etc) Do you know if these pieces carry their own random number generator or do they rely on the compiler's library? If the latter, I can certainly see some better guesses from one platform explaining the near 4x performance gap.

Unless the pseudo random numbers are identical the benchmark is useless.

Edit: https://www.spec.org/cpu2006/Docs/faq.html apparently number generator is built in.

tempestglen · Jan 22, 2016

OK guys, here is the way how to improve 30% spec06 score on newest llvm compiler for A9X.

http://llvm.org/devmtg/2015-10/slides/Gerolf-PerformanceImprovementsAndHeadroom.pdf
Conclusion: twister has same IPC with lastest x86.

Hans de Vries · Jan 22, 2016

Ryan Smith said:
Hi guys, just dropping by to respond to a couple of comments.

and the Intel settings, though a bit more complex, are what they believe are best for SPEC CPU.

"They" made you compare multi threaded Intel code versus single
threaded Apple code.

Especially for 462.libquantum: The score becomes 16+ times higher if
you have 16 times the cores at a higher frequency:

https://www.spec.org/cpu2006/results/res2015q4/cpu2006-20151130-38170.html

CHADBOGA · Jan 22, 2016

Ryan Smith said:
Hi guys, just dropping by to respond to a couple of comments.

To be clear here, we're well aware of the pros and cons of SPEC CPU 2006, which is why there was a fairly long preamble to that section describing them. SPEC is not perfect and should not be the only benchmark you ever listen to. However among cross-platform benchmarks it's very unique in its capabilities and a powerful tool as well.

In any case, that SPEC CPU is a "system's processor, memory subsystem and compiler" benchmark is an intentional aspect of its design. The traditional way to run SPEC CPU is to pair it up with the fastest compiler with the most aggressive settings you can get away with, to give the system every possibility opportunity to produce the best possible score. This is how we've chosen to use SPEC CPU as well, in accordance with how it's typically used.

The issue with compilers is that it's impossible to take them out of the equation. Even if one uses the same compiler for multiple architecture, you are now benchmarking how well a compiler optimizes for a specific architecture, and there are some big gaps there. In some ways it's definitely better, but in other ways it's worse.

Ultimately traditional wisdom is that it's better to admit that the compiler is part of the test and that it's another way to optimize execution of the benchmark, rather than trying (and failing) to remove it from the equation. It's imperfect for sure, but it's the best option available.

(BTW, if I had to use a car analogy here, SPEC CPU would be F1 racing)

As for the specific setups and flags we used, those are the settings that were recommended to us. -Ofast is essentially the fastest way to go on XCode/LLVM, and the Intel settings, though a bit more complex, are what they believe are best for SPEC CPU. We wanted to make this as typical as possible for a SPEC CPU run, and that included using the best settings available.

As you are the new boss of Anandtech, I was wanting to ask you if you could do more reviews of midrange CPU's & GPU's.

I'm surprised by the lack of these, as surely you would get heaps of hits for this.

Arachnotronic · Jan 22, 2016

Hans de Vries said:
"They" made you compare multi threaded Intel code versus single
threaded Apple code.

Especially for 462.libquantum: The score becomes 16+ times higher if
you have 16 times the cores at a higher frequency:

https://www.spec.org/cpu2006/results/res2015q4/cpu2006-20151130-38170.html

This is a good catch. So we throw out the 462.libquantum results in this particular test.

BTW, I am curious, what is your view of the best CPU benchmark today? How can we objectively get a good read on how CPUs like SKL and Twister compare? Seems like an extremely difficult problem. Geekbench is an attempt to do this, but a number of experts (i.e. David Kanter, Linus Torvalds) seem to think it is not useful in this way yet.

Thanks.

jhu · Jan 22, 2016

Arachnotronic said:
Have to agree. If you're trying to get an apples to apples CPU comparison, I'd argue everything needs to be apples to apples from a compiler perspective.

I don't agree. You really should be using the best compiler for the processor if you're trying to ascertain the highest perdormance. Some compilers may not have as good optimizations as others. For example I've found that on FreeBSD 10.2, the included LLVM is slightly slower than gcc in the ports collection on my FX8350. It can be even more pronounced on other architectures such as Itanium where gcc is often 10-20% slower than icc.

Arachnotronic said:
The more interesting thing is what is valid for a platform comparison? As people have rightly pointed out, ICC is not that popular for client software but Apple's compiler is used, well...for pretty much everything on iOS.

That's a good point. icc also works with XCode for Mac OS X. But I don't think people willingly pay/use it when LLVM that comes with it is free and good enough. Thus it would have been useful if they'd tested with LLVM on Mac OS X as well since LLVM is the usual use case, not icc.

Exophase · Jan 22, 2016

Arachnotronic said:
This is a good catch. So we throw out the 462.libquantum results in this particular test.

BTW, I am curious, what is your view of the best CPU benchmark today? How can we objectively get a good read on how CPUs like SKL and Twister compare? Seems like an extremely difficult problem. Geekbench is an attempt to do this, but a number of experts (i.e. David Kanter, Linus Torvalds) seem to think it is not useful in this way yet.

Thanks.

I'm not Hans, but I think SPEC is pretty decent if you use the same compiler or at least don't use a compiler that breaks some of the subtests. At the very least the GCC test is well regarded. But if you break the bench it's all for nothing. I actually think a benchmark that's flawed in obvious ways like Geekbench is better than a broken SPEC.

I personally think console emulators would make good CPU benches, especially ones that emulate complex video subsystems in software. I am biased in this opinion, though. And I doubt I'd get sites on board with this.

ViRGE · Jan 23, 2016

Hans de Vries said:
"They" made you compare multi threaded Intel code versus single
threaded Apple code.

Especially for 462.libquantum: The score becomes 16+ times higher if
you have 16 times the cores at a higher frequency:

https://www.spec.org/cpu2006/results/res2015q4/cpu2006-20151130-38170.html

It sounds like your problem is more with libquantum than AT's settings? They ran each platform as fast as it would go, and Apple's compiler is not as good, it would seem.

Arachnotronic · Jan 23, 2016

Exophase said:
I personally think console emulators would make good CPU benches, especially ones that emulate complex video subsystems in software. I am biased in this opinion, though. And I doubt I'd get sites on board with this.

I think you are right that console emulators would be a very good measure of CPU performance.

Have you considered producing such a benchmark based on your work, particularly since you have produced arguably one of the best console emulators around for the mobile space?

videogames101 · Jan 23, 2016

Compilers comprise a large percentage of the performance uptick you see every few years. Intel, Apple, AMD, and ARM all pay software engineers quite a bit of money so that the next version of ICC or GCC or w/e provides a performance boost on their CPUs. Using an optimized compiler isn't cheating, it's the way to go for the best performance on a given CPU (for benchmarks AND real world applications!) You can argue about flags all day long, but suffice to say it's not going to shift the picture by 50% across the board. Take the article for what it is, and nothing more.

krumme · Jan 23, 2016

The systems benchmarked is intended for mobile use.
In what way is spec 2006, broken or not, reflecting working loads on modern mobile soc?

Exophase · Jan 23, 2016

Arachnotronic said:
I think you are right that console emulators would be a very good measure of CPU performance.

Have you considered producing such a benchmark based on your work, particularly since you have produced arguably one of the best console emulators around for the mobile space?

First, thank you

DraStic does have some benchmarking facilities, and if any review site is interested we can work with them to see if they can get it setup. Although I'm not sure how easy they are to get working on Android; it'd probably at least need to be ran through adb. It might be something we could do new interfaces for in a new version if there's a real kind of demand.

But really most emulators could probably be cooked up to do benchmarks, at least crudely. So long as they have savestates and unthrottled emulation, you can just load a savestate from somewhere and see how long it takes before some event happens. It's probably easier to do during something like a cutscene or attract mode, but you could even do something like waiting for music to loop.

PPB · Jan 23, 2016

You can use games with linear deveopments and no RNG on them and bench x time to beat y part of the game doing the very same thing on both systems. Usually emulator performance alters game speed, unlike native games where things will always take the same time, you just get lower fps if you have lower performance.

ViRGE · Jan 23, 2016

Arachnotronic said:
I think you are right that console emulators would be a very good measure of CPU performance.

Have you considered producing such a benchmark based on your work, particularly since you have produced arguably one of the best console emulators around for the mobile space?

The problem with emulators is then you've just created a proxy test for JIT compiler performance. Which is not to say that they aren't useful as an application benchmark, just that it's probably not a good architecture benchmark.

imported_ats · Jan 23, 2016

Thala said:
Nonsense. Look at the ridiculous selection of compiler options and you know why Intel wins.
Such a test has to be performed with the same compiler using the very same options. (e.g. gcc)

So which architecture are you going to leave out? You do realize that GCC != GCC, right? GCC will end up doing different optimizations depending on the architecture it is compiling for.

imported_ats · Jan 23, 2016

thunng8 said:
The only difference is the Apple compiler is use in every iOs app.

While ICC is known as the SPEC compiler. Good at SPEC runs but very little used in commercial or open source applications.

ICC is use by numerous organizations and software packages that care about performance. It has drop in compatibility for numerous IDEs. ICC is not a spec compiler. It is a general purpose high performance compiler.

Thala · Jan 23, 2016

So which architecture are you going to leave out? You do realize that GCC != GCC, right? GCC will end up doing different optimizations depending on the architecture it is compiling for.

The differences are mostly in the back-end. However there are important optimizations at front-end and IR level like for instance ipo and loop transformations for auto-vectorization, which are identical. Also things like aliasing hints are handled consistently by gcc. That having said, i am convinced that the gcc code generator/back-end is better optimized for x86 due to AArch64 being relatively new.

As for the specific setups and flags we used, those are the settings that were recommended to us. -Ofast is essentially the fastest way to go on XCode/LLVM, and the Intel settings, though a bit more complex, are what they believe are best for SPEC CPU. We wanted to make this as typical as possible for a SPEC CPU run, and that included using the best settings available.

I cannot simply go by "recommended" options. You need to understand what each options is doing to the generated code. This is an absolut must if your intention is to reason about the Core architecture in your article based on those benchmarks. Options like ipo for instance will agressively inline functions, which is a net gain given the small working-set of Spec CPU. Also did you check if LLVM will auto-vectorize with these options for AArch64? ICC surely does and you end up comparing parallel code with scalar code.
Better yet of course, you would use the same compiler, which would level the playing field at least what inlining, auto-vectorization/parallelization and aliasing related optimizations are concerned. There are still some differences in back-end/code generation and you could argue x86 back-ends are more mature than the code generators available for Aarch64 - but thats impossible to compensate for currently.

SPEC CPU 2006 : Skylake Core M vs Broadwell Core M vs Apple A9X

Member

Diamond Member

Diamond Member

Platinum Member

The New Boss

Lifer

Golden Member

Lifer

Diamond Member

Member

Senior member

Platinum Member

Lifer

Lifer

Diamond Member

Elite Member, Moderator Emeritus

Lifer

Diamond Member

Diamond Member

Diamond Member

Golden Member

Elite Member, Moderator Emeritus

Senior member

Senior member

Golden Member