Apple A8 Benchmarked

Pneumothorax · Sep 30, 2014

CHADBOGA said:
Considering how the A7 wowed everyone and the A8 has the benefit of a shrink, this is overall a pretty disappointing effort in my view.

What happened to all that ARM magic pixie dust that was supposed to leave Intel & x86 in its wake?

On the basis of the A8, I don't see ARM challenging x86 on the desktop, anytime soon.

Yup, I enjoy arguing all day with the Apple zealots on macrumors who treat the Ax series like the 2nd coming of the CPU. They all believe that in 1-2 generations Apple's cpu will blow away Intel's best mobile cpu. The moment Apple puts a craptastic ARM cpu in their Macbook is the day I stop buying their laptops.

Burpo · Sep 30, 2014

"A total of up to 13 Broadwell-U series processors, the processor will also be used for a new generation of Apple MacBook Air."

http://chinese.vr-zone.com/129253/intel-will-announced-broadwell-u-14nm-cpu-in-ces-2014-09292014/

Eug · Sep 30, 2014

Eug said:
The results seem to be similar for GFXbench 3.0 too, so I guess they did work for native on-screen resolution. This is from AnandTech's review:

Notice how much slower the 6 Plus performs. This is a factor of two difference.

I wonder what this means for the iPad Air 2, if anything.

The iPhone 6 Plus has a resolution that is lower than the iPad Air's 2048x1536 (although with the Air, there is no constant downsampling like there is with the 6 Plus). Will the A8 in the iPad Air 2 be significantly faster clocked? Or will the Air 2 actually get an A8X?

Actually, I'm thinking neither, and that the Air 2 will be only slightly faster clocked, with no A8X upgrade.

Nothingness · Oct 1, 2014

Woot SPECint results! That's great. Found this (and verified the results):

http://investorshub.advfn.com/boards/read_msg.aspx?message_id=106776907

the performance improves better for SPEC (~21% geomean) than for Geekbench (~13% geomean)

It seems the tweaks Apple has made to its CPU are benefiting more larger workloads.

Eug · Oct 1, 2014

Nothingness said:
Woot SPECint results! That's great. Found this (and verified the results):

http://investorshub.advfn.com/boards/read_msg.aspx?message_id=106776907

It seems the tweaks Apple has made to its CPU are benefiting more larger workloads.

Uh, this is from AnandTech's review. You may as well link the original article. 😛

http://www.anandtech.com/show/8554/the-iphone-6-review/3

Keeping in mind that A8 is clocked 100MHz (~7.7%) higher than A7, all of the SPECint2000 benchmarks show performance gains above and beyond the clock speed increase, indicating that every benchmark has benefited in some way. Of these benchmarks MCF, GCC, PerlBmk and GAP in particular show the greatest gains, at anywhere between 20% and 55%. Roughly speaking anything that is potentially branch-heavy sees some of the smallest gains while anything that plays into the multiplication changes benefits more.

MCF, a combinatorial optimization benchmark, ends up being the outlier here by far. Given that these are all integer benchmarks, it may very well be that MCF benefits from the integer multiplication improvements the most, as its performance comes very close to tracking the 2X increase in multiplication throughput. This also bodes well for any other kind of work that is similarly bounded by integer multiplication performance, though such workloads are not particularly common in the real world of smartphone use.

Nothingness · Oct 1, 2014

Eug said:
Uh, this is from AnandTech's review. You may as well link the original article. 😛

You already had linked the article, and the quoted message adds some information AT didn't have, that is the total score improvement rather than the individual scores. But I agree I should have also linked AT page 😉

BTW AT got it wrong: the most branch heavy SPEC tests are perlbmk and gcc. perlbmk depends on indirect branch prediction correctness, while gcc depends on general branch prediction.

And while I am at it, A8 SPECint score is 1493. Close to Itanium 2 :biggrin:

Arachnotronic · Oct 1, 2014

Nothingness said:
You already had linked the article, and the quoted message adds some information AT didn't have, that is the total score improvement rather than the individual scores. But I agree I should have also linked AT page 😉

BTW AT got it wrong: the most branch heavy SPEC tests are perlbmk and gcc. perlbmk depends on indirect branch prediction correctness, while gcc depends on general branch prediction.

And while I am at it, A8 SPECint score is 1493. Close to Itanium 2 :biggrin:

Allegedly, the ARM Cortex A57 delivers 1250 in SpecInt 2k @ 1.7GHz.

http://www.eetimes.com/document.asp?doc_id=1262718

Assuming linear scaling with clocks, the 2GHz Cortex A57s in the S810 should do about as well.

Nothingness · Oct 1, 2014

Intel17 said:
Allegedly, the ARM Cortex A57 delivers 1250 in SpecInt 2k @ 1.7GHz.

http://www.eetimes.com/document.asp?doc_id=1262718

Assuming linear scaling with clocks, the 2GHz Cortex A57s in the S810 should do about as well.

Interesting thanks! Note that a "real" SPEC result would need a lot of tuning; for instance some of the tests suffer when compiled for 64-bit (e.g. mcf) so some tests should be compiled for 32-bit while others should be compiled for 64-bit.

BTW do we have SPECint 2000 results for some recent Intel CPUs?

Arachnotronic · Oct 1, 2014

Nothingness said:
Interesting thanks! Note that a "real" SPEC result would need a lot of tuning; for instance some of the tests suffer when compiled for 64-bit (e.g. mcf) so some tests should be compiled for 32-bit while others should be compiled for 64-bit.

BTW do we have SPECint 2000 results for some recent Intel CPUs?

Don't think so; Intel has to my knowledge only publicized SpecInt2k6 for Avoton.

Abwx · Oct 1, 2014

Intel17 said:
Don't think so; Intel has to my knowledge only publicized SpecInt2k6 for Avoton.

https://semiaccurate.com/forums/showpost.php?p=219993&postcount=7

https://semiaccurate.com/forums/showpost.php?p=220098&postcount=13

Nothingness · Oct 1, 2014

Abwx said:
https://semiaccurate.com/forums/showpost.php?p=219993&postcount=7

SPECint rate

https://semiaccurate.com/forums/showpost.php?p=220098&postcount=13

Click to expand...

SPEC 2006

So these are not directly comparable 🙂

Abwx · Oct 1, 2014

Nothingness said:
SPECint rate

SPEC 2006

So these are not directly comparable 🙂

Of course, SPECint_rate is a measure of bandwith more than anything else, double the memory channels and it will scale accordingly, SPECint 2006 is the computing throughput and much more useful to estimate the computing capabilities.

Khato · Oct 1, 2014

Nothingness said:
BTW do we have SPECint 2000 results for some recent Intel CPUs?

Not that I'm aware of, unless a Core 2 Duo is still considered recent. However there's the nice fact that Intel submitted Core 2 Duo results for both SPECint 2000 and 2006 on the same setup. For example, you can take the Core 2 Duo E4200 1.8 Ghz:

SPECint 2000 - http://www.spec.org/cpu2000/results/res2007q1/cpu2000-20070122-08386.html
SPECint 2006 - http://www.spec.org/cpu2006/results/res2007q1/cpu2006-20070122-00291.html

Then you can easily see how that 'value' Core 2 Duo compares to an i3 (using i3 due to guaranteed clock speed, no turbo skew) - http://www.spec.org/cpu2006/results/res2014q3/cpu2006-20140701-30256.html - and adjust for clock frequency to see that the i3 score is roughly twice that of the Core 2 Duo at comparable clocks.

Oh, and to tie this back in with the A8... Adjust a Core 2 Duo result for frequency and compare it to A8. They trade blows, with the A8 coming ahead by maybe 5%. And then by comparing how Core 2 Duo stands against Haswell in SPECint 2006 we can derive a rough idea of how A8 compares to Haswell in something other than geekbench.

Abwx · Oct 1, 2014

Khato said:
Oh, and to tie this back in with the A8... Adjust a Core 2 Duo result for frequency and compare it to A8. They trade blows, with the A8 coming ahead by maybe 5%. And then by comparing how Core 2 Duo stands against Haswell in SPECint 2006 we can derive a rough idea of how A8 compares to Haswell in something other than geekbench.

For more accurate comparisons one should look at the subscores, the global score is exageratly dominated by a sub test in recent uarch, Libquantum, that is the cause of most of the difference between a Core 2 and a recent i3.

IntelUser2000 · Oct 1, 2014

Calculating few numbers in that chart shows that Core i3 is average 2x the performance without Libquantum.

My unofficial estimates were-
Core 2 E4xxx-->E6xxx: 5%
Core 2 65nm to Core 2 45nm: 5%
Core 2 45nm to Nehalem: 10%
Nehalem to Westmere: 0%
Westmere to Sandy Bridge: 20%
Sandy Bridge to Ivy Bridge: 5%
Ivy Bridge to Haswell: 10%

Total: 68%

So that's pretty close, since Spec is more indicative of server use and server gains are better than for client.

Khato · Oct 1, 2014

Abwx said:
For more accurate comparisons one should look at the subscores, the global score is exageratly dominated by a sub test in recent uarch, Libquantum, that is the cause of most of the difference between a Core 2 and a recent i3.

Fair enough, though just to provide everyone who doesn't understand geometric means an example of how removing that one outlying score affects the end results:

i3-4130 scaled down to 1.8 GHz: 25.2 with all tests, 19.4 without libquantum.
e4300 at 1.8 GHz: 11.4 with all tests, 11.3 without libquantum.

So instead of being 2.21x faster per clock it's 1.72x.

witeken · Oct 1, 2014

Khato said:
So instead of being 2.21x faster per clock it's 1.72x.

I didn't really follow the discussion too much, but do you mean that Haswell has a 1.72x higher IPC than Cyclone? I thought they were about the same?

Khato · Oct 1, 2014

witeken said:
I didn't really follow the discussion too much, but do you mean that Haswell has a 1.72x higher IPC than Cyclone? I thought they were about the same?

That's a comparison between a Core 2 Duo E4300 and an i3-4130 scaled to equivalent clock speed in SPECint 2006. Which is of interest because a Core 2 Duo E4300 result in SPECint 2000 scaled down to 1.4 GHz is only around 5% slower than A8's score. Which would imply that if we had either an A8 score for SPECint 2006 or a Haswell score for SPECint 2000 we'd most likely see Haswell something around 1.64x faster than A8.

Enigmoid · Oct 1, 2014

Interesting to compare memory access latencies. Note that the tests (Sandra and whatever AT is using are different).

Iphone

Haswell

Note that L1 latencies are roughly equal at ~4ns. L2 is roughly equal at ~14-17 ns.

However Haswell's L3 is only a tad bit slower than L2 and significantly faster than Apple's L3. Edram, despite being located on a separate die is faster (55 vs. 75 ns). Access to system Ram is nearly twice as fast on Haswell.

Not sure what A9 will bring.

Homeles · Oct 1, 2014

Where have you heard that about Skylake?

Enigmoid · Oct 1, 2014

Homeles said:
Where have you heard that about Skylake?

On research it now looks to be a rumour. I redact my point.
I remember now looking at it on Wikipedia and never checked sources. The original wiki said (now redacted).

14mm manufacturing process
3.5 billion transistor without integrated graphic unit.
LGA 1151 socket
Z170/H170 chipset (Sunrise Point)
Thermal Design Power (TDP) up to 95 W (LGA 1151)
Support for both DDR3 and DDR4 SDRAM in mainstream variants, with up to 64 GB of RAM on LGA 1151 variants.
Support for 20 PCI Express 3.0 lanes (LGA 1151)
Support for PCI Express 4.0 (Skylake-E/EP/EX)
Support for Thunderbolt 3.0 (Alpine Ridge)
128 KB L1 cache (64 KB 16-way set associative instruction cache + 64 KB 16-way set associative data cache; two cycles)
512 KB L2 cache, 16-way set associative (six cycles)
12 MB L3 cache, 24-way set associative (12 cycles)
64 to 128 MB L4 eDRAM cache on certain SKUs.
Up to four cores as the default mainstream configuration
Support for SATA Express
AVX-512F: Advanced Vector Extensions 3.2
Instruction decoder increase from 4 (since core 2) to 6 in skylake and stage pipeline are also increase to 6 issue.
Intel SHA Extensions: SHA-1 and SHA-256 (Secure Hash Algorithms)
Intel MPX (Memory Protection Extensions)
Intel ADX (Multi-Precision Add-Carry Instruction Extensions)
Skylake's integrated GPU supports Direct3D 12 at feature level 12.

Of course that transistor count is completely bonkers.

I was wrong.

Homeles · Oct 1, 2014

It probably is time for Intel to bump up their L1 cache sizes, though. But 2 cycle latency, while simultaneously doubling the size?

Enigmoid · Oct 1, 2014

Homeles said:
It probably is time for Intel to bump up their L1 cache sizes, though. But 2 cycle latency, while simultaneously doubling the size?

That's why I thought so. Cache and memory controller have been the same since sandy bridge. Moving to DDR4 seems like the time to revamp the system.

Edit: Its certainly possible Phenom II has 128 KB (64 Data and Instructions) @ 3 cycle latency. 512 KB L2.

Abwx · Oct 2, 2014

IntelUser2000 said:
Calculating few numbers in that chart shows that Core i3 is average 2x the performance without Libquantum.

My unofficial estimates were-
Core 2 E4xxx-->E6xxx: 5%
Core 2 65nm to Core 2 45nm: 5%
Core 2 45nm to Nehalem: 10%
Nehalem to Westmere: 0%
Westmere to Sandy Bridge: 20%
Sandy Bridge to Ivy Bridge: 5%
Ivy Bridge to Haswell: 10%

Total: 68%

So that's pretty close, since Spec is more indicative of server use and server gains are better than for client.

IPC delta between a Haswell core and a core 2 is 50% in Cinebench R15 and 13-14% in 7Zip and Fritzchess.

Nothingness · Oct 2, 2014

Khato said:
That's a comparison between a Core 2 Duo E4300 and an i3-4130 scaled to equivalent clock speed in SPECint 2006. Which is of interest because a Core 2 Duo E4300 result in SPECint 2000 scaled down to 1.4 GHz is only around 5% slower than A8's score. Which would imply that if we had either an A8 score for SPECint 2006 or a Haswell score for SPECint 2000 we'd most likely see Haswell something around 1.64x faster than A8.

One has to be careful with SPEC results: Intel is known to tune icc with it, while I doubt LLVM is being tuned with SPEC, so there's some bias (other companies such as Sun did/do the same). Also between the i3 and Core Duo there are more than 7 years of compiler tuning 🙂

OTOH I have little doubt that Haswell IPC is higher for bigger workloads than A8, as I think the uncore performance is much higher. But as a reminder, A8 gains more on SPEC than on Geekbench, so this might hint that Apple is starting to push uncore (wish them the best to catch up with Intel who have a huge lead there :biggrin🙂.

Apple A8 Benchmarked

Golden Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Lifer

Diamond Member

Lifer

Golden Member

Lifer

Elite Member

Golden Member

Diamond Member

Golden Member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Lifer

Diamond Member