SPEC CPU 2006 : Skylake Core M vs Broadwell Core M vs Apple A9X

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

thunng8

Member
Jan 8, 2013
167
72
101
And ban the Apple compiler too :p



:D

The only difference is the Apple compiler is use in every iOs app.

While ICC is known as the SPEC compiler. Good at SPEC runs but very little used in commercial or open source applications.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
And ban the Apple compiler too :p

:D

Even if that were really true there's nothing stopping you from using the so-called Apple compiler to generate x86 binaries too, where you'd probably get most of the benefit of whatever optimization benefits they put into it. Especially if those include benchmark breakers. And since that compiler is open source we can evaluate whatever they did or didn't do to it.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Considering the gap in overall CPU experience between Apple and Intel there is probably some serious concern among Intel execs every product refresh that they will see A series SoCs finally mixed into the traditional x86 Apple product stack.

ICC based Spec benchmarking is best case scenario for Intel vs Apple CPU and they actually lose roughly half the time, ouch.
 

Ryan Smith

The New Boss
Staff member
Oct 22, 2005
537
117
116
www.anandtech.com
Hi guys, just dropping by to respond to a couple of comments.

To be clear here, we're well aware of the pros and cons of SPEC CPU 2006, which is why there was a fairly long preamble to that section describing them. SPEC is not perfect and should not be the only benchmark you ever listen to. However among cross-platform benchmarks it's very unique in its capabilities and a powerful tool as well.

In any case, that SPEC CPU is a "system's processor, memory subsystem and compiler" benchmark is an intentional aspect of its design. The traditional way to run SPEC CPU is to pair it up with the fastest compiler with the most aggressive settings you can get away with, to give the system every possibility opportunity to produce the best possible score. This is how we've chosen to use SPEC CPU as well, in accordance with how it's typically used.

The issue with compilers is that it's impossible to take them out of the equation. Even if one uses the same compiler for multiple architecture, you are now benchmarking how well a compiler optimizes for a specific architecture, and there are some big gaps there. In some ways it's definitely better, but in other ways it's worse.

Ultimately traditional wisdom is that it's better to admit that the compiler is part of the test and that it's another way to optimize execution of the benchmark, rather than trying (and failing) to remove it from the equation. It's imperfect for sure, but it's the best option available.

(BTW, if I had to use a car analogy here, SPEC CPU would be F1 racing)

As for the specific setups and flags we used, those are the settings that were recommended to us. -Ofast is essentially the fastest way to go on XCode/LLVM, and the Intel settings, though a bit more complex, are what they believe are best for SPEC CPU. We wanted to make this as typical as possible for a SPEC CPU run, and that included using the best settings available.
 

PPB

Golden Member
Jul 5, 2013
1,118
168
106
He does not AFAIK. It is indeed funny how you try to jump on him without even doing some basic google search before jumping to conclusions.
 
Mar 10, 2006
11,715
2,012
126
I am not talking about compiler optimizations Apple would have to do or not. I am saying if you want to compare the microarchitectures you would have to use the same compiler with the same options. The options used by Anandtech for different compilers enable vastly different optimizations.

Have to agree. If you're trying to get an apples to apples CPU comparison, I'd argue everything needs to be apples to apples from a compiler perspective.

The more interesting thing is what is valid for a platform comparison? As people have rightly pointed out, ICC is not that popular for client software but Apple's compiler is used, well...for pretty much everything on iOS.
 

Schmide

Diamond Member
Mar 7, 2002
5,712
978
126
Could there be algorithm drift in SPEC CPU 2006? Especially with the simulations that use random numbers. (462.libquantum, 456.hmmer, etc) Do you know if these pieces carry their own random number generator or do they rely on the compiler's library? If the latter, I can certainly see some better guesses from one platform explaining the near 4x performance gap.

Unless the pseudo random numbers are identical the benchmark is useless.

Edit: https://www.spec.org/cpu2006/Docs/faq.html apparently number generator is built in.
 
Last edited:

Hans de Vries

Senior member
May 2, 2008
347
1,177
136
www.chip-architect.com

CHADBOGA

Platinum Member
Mar 31, 2009
2,135
833
136
Hi guys, just dropping by to respond to a couple of comments.

To be clear here, we're well aware of the pros and cons of SPEC CPU 2006, which is why there was a fairly long preamble to that section describing them. SPEC is not perfect and should not be the only benchmark you ever listen to. However among cross-platform benchmarks it's very unique in its capabilities and a powerful tool as well.

In any case, that SPEC CPU is a "system's processor, memory subsystem and compiler" benchmark is an intentional aspect of its design. The traditional way to run SPEC CPU is to pair it up with the fastest compiler with the most aggressive settings you can get away with, to give the system every possibility opportunity to produce the best possible score. This is how we've chosen to use SPEC CPU as well, in accordance with how it's typically used.

The issue with compilers is that it's impossible to take them out of the equation. Even if one uses the same compiler for multiple architecture, you are now benchmarking how well a compiler optimizes for a specific architecture, and there are some big gaps there. In some ways it's definitely better, but in other ways it's worse.

Ultimately traditional wisdom is that it's better to admit that the compiler is part of the test and that it's another way to optimize execution of the benchmark, rather than trying (and failing) to remove it from the equation. It's imperfect for sure, but it's the best option available.

(BTW, if I had to use a car analogy here, SPEC CPU would be F1 racing)

As for the specific setups and flags we used, those are the settings that were recommended to us. -Ofast is essentially the fastest way to go on XCode/LLVM, and the Intel settings, though a bit more complex, are what they believe are best for SPEC CPU. We wanted to make this as typical as possible for a SPEC CPU run, and that included using the best settings available.

As you are the new boss of Anandtech, I was wanting to ask you if you could do more reviews of midrange CPU's & GPU's.

I'm surprised by the lack of these, as surely you would get heaps of hits for this.
 
Mar 10, 2006
11,715
2,012
126
"They" made you compare multi threaded Intel code versus single
threaded
Apple code.

Especially for 462.libquantum: The score becomes 16+ times higher if
you have 16 times the cores at a higher frequency:

https://www.spec.org/cpu2006/results/res2015q4/cpu2006-20151130-38170.html

This is a good catch. So we throw out the 462.libquantum results in this particular test.

BTW, I am curious, what is your view of the best CPU benchmark today? How can we objectively get a good read on how CPUs like SKL and Twister compare? Seems like an extremely difficult problem. Geekbench is an attempt to do this, but a number of experts (i.e. David Kanter, Linus Torvalds) seem to think it is not useful in this way yet.

Thanks.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
Have to agree. If you're trying to get an apples to apples CPU comparison, I'd argue everything needs to be apples to apples from a compiler perspective.

I don't agree. You really should be using the best compiler for the processor if you're trying to ascertain the highest perdormance. Some compilers may not have as good optimizations as others. For example I've found that on FreeBSD 10.2, the included LLVM is slightly slower than gcc in the ports collection on my FX8350. It can be even more pronounced on other architectures such as Itanium where gcc is often 10-20% slower than icc.

The more interesting thing is what is valid for a platform comparison? As people have rightly pointed out, ICC is not that popular for client software but Apple's compiler is used, well...for pretty much everything on iOS.

That's a good point. icc also works with XCode for Mac OS X. But I don't think people willingly pay/use it when LLVM that comes with it is free and good enough. Thus it would have been useful if they'd tested with LLVM on Mac OS X as well since LLVM is the usual use case, not icc.
 
Last edited:

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
This is a good catch. So we throw out the 462.libquantum results in this particular test.

BTW, I am curious, what is your view of the best CPU benchmark today? How can we objectively get a good read on how CPUs like SKL and Twister compare? Seems like an extremely difficult problem. Geekbench is an attempt to do this, but a number of experts (i.e. David Kanter, Linus Torvalds) seem to think it is not useful in this way yet.

Thanks.

I'm not Hans, but I think SPEC is pretty decent if you use the same compiler or at least don't use a compiler that breaks some of the subtests. At the very least the GCC test is well regarded. But if you break the bench it's all for nothing. I actually think a benchmark that's flawed in obvious ways like Geekbench is better than a broken SPEC.

I personally think console emulators would make good CPU benches, especially ones that emulate complex video subsystems in software. I am biased in this opinion, though. And I doubt I'd get sites on board with this.
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
Mar 10, 2006
11,715
2,012
126
I personally think console emulators would make good CPU benches, especially ones that emulate complex video subsystems in software. I am biased in this opinion, though. And I doubt I'd get sites on board with this.

I think you are right that console emulators would be a very good measure of CPU performance.

Have you considered producing such a benchmark based on your work, particularly since you have produced arguably one of the best console emulators around for the mobile space?
 

videogames101

Diamond Member
Aug 24, 2005
6,783
27
91
Compilers comprise a large percentage of the performance uptick you see every few years. Intel, Apple, AMD, and ARM all pay software engineers quite a bit of money so that the next version of ICC or GCC or w/e provides a performance boost on their CPUs. Using an optimized compiler isn't cheating, it's the way to go for the best performance on a given CPU (for benchmarks AND real world applications!) You can argue about flags all day long, but suffice to say it's not going to shift the picture by 50% across the board. Take the article for what it is, and nothing more.
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
The systems benchmarked is intended for mobile use.
In what way is spec 2006, broken or not, reflecting working loads on modern mobile soc?
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
I think you are right that console emulators would be a very good measure of CPU performance.

Have you considered producing such a benchmark based on your work, particularly since you have produced arguably one of the best console emulators around for the mobile space?

First, thank you ;)

DraStic does have some benchmarking facilities, and if any review site is interested we can work with them to see if they can get it setup. Although I'm not sure how easy they are to get working on Android; it'd probably at least need to be ran through adb. It might be something we could do new interfaces for in a new version if there's a real kind of demand.

But really most emulators could probably be cooked up to do benchmarks, at least crudely. So long as they have savestates and unthrottled emulation, you can just load a savestate from somewhere and see how long it takes before some event happens. It's probably easier to do during something like a cutscene or attract mode, but you could even do something like waiting for music to loop.
 

PPB

Golden Member
Jul 5, 2013
1,118
168
106
You can use games with linear deveopments and no RNG on them and bench x time to beat y part of the game doing the very same thing on both systems. Usually emulator performance alters game speed, unlike native games where things will always take the same time, you just get lower fps if you have lower performance.
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
I think you are right that console emulators would be a very good measure of CPU performance.

Have you considered producing such a benchmark based on your work, particularly since you have produced arguably one of the best console emulators around for the mobile space?
The problem with emulators is then you've just created a proxy test for JIT compiler performance. Which is not to say that they aren't useful as an application benchmark, just that it's probably not a good architecture benchmark.
 

imported_ats

Senior member
Mar 21, 2008
422
63
86
Nonsense. Look at the ridiculous selection of compiler options and you know why Intel wins.
Such a test has to be performed with the same compiler using the very same options. (e.g. gcc)

So which architecture are you going to leave out? You do realize that GCC != GCC, right? GCC will end up doing different optimizations depending on the architecture it is compiling for.
 

imported_ats

Senior member
Mar 21, 2008
422
63
86
The only difference is the Apple compiler is use in every iOs app.

While ICC is known as the SPEC compiler. Good at SPEC runs but very little used in commercial or open source applications.

ICC is use by numerous organizations and software packages that care about performance. It has drop in compatibility for numerous IDEs. ICC is not a spec compiler. It is a general purpose high performance compiler.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
So which architecture are you going to leave out? You do realize that GCC != GCC, right? GCC will end up doing different optimizations depending on the architecture it is compiling for.

The differences are mostly in the back-end. However there are important optimizations at front-end and IR level like for instance ipo and loop transformations for auto-vectorization, which are identical. Also things like aliasing hints are handled consistently by gcc. That having said, i am convinced that the gcc code generator/back-end is better optimized for x86 due to AArch64 being relatively new.

As for the specific setups and flags we used, those are the settings that were recommended to us. -Ofast is essentially the fastest way to go on XCode/LLVM, and the Intel settings, though a bit more complex, are what they believe are best for SPEC CPU. We wanted to make this as typical as possible for a SPEC CPU run, and that included using the best settings available.

I cannot simply go by "recommended" options. You need to understand what each options is doing to the generated code. This is an absolut must if your intention is to reason about the Core architecture in your article based on those benchmarks. Options like ipo for instance will agressively inline functions, which is a net gain given the small working-set of Spec CPU. Also did you check if LLVM will auto-vectorize with these options for AArch64? ICC surely does and you end up comparing parallel code with scalar code.
Better yet of course, you would use the same compiler, which would level the playing field at least what inlining, auto-vectorization/parallelization and aliasing related optimizations are concerned. There are still some differences in back-end/code generation and you could argue x86 back-ends are more mature than the code generators available for Aarch64 - but thats impossible to compensate for currently.
 
Last edited: