• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Apple A8 Benchmarked

Page 19 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Considering how the A7 wowed everyone and the A8 has the benefit of a shrink, this is overall a pretty disappointing effort in my view.

What happened to all that ARM magic pixie dust that was supposed to leave Intel & x86 in its wake?

On the basis of the A8, I don't see ARM challenging x86 on the desktop, anytime soon.

Yup, I enjoy arguing all day with the Apple zealots on macrumors who treat the Ax series like the 2nd coming of the CPU. They all believe that in 1-2 generations Apple's cpu will blow away Intel's best mobile cpu. The moment Apple puts a craptastic ARM cpu in their Macbook is the day I stop buying their laptops.
 
The results seem to be similar for GFXbench 3.0 too, so I guess they did work for native on-screen resolution. This is from AnandTech's review:

68000.png


Notice how much slower the 6 Plus performs. This is a factor of two difference.

I wonder what this means for the iPad Air 2, if anything.

The iPhone 6 Plus has a resolution that is lower than the iPad Air's 2048x1536 (although with the Air, there is no constant downsampling like there is with the 6 Plus). Will the A8 in the iPad Air 2 be significantly faster clocked? Or will the Air 2 actually get an A8X?

Actually, I'm thinking neither, and that the Air 2 will be only slightly faster clocked, with no A8X upgrade.
 
Woot SPECint results! That's great. Found this (and verified the results):

http://investorshub.advfn.com/boards/read_msg.aspx?message_id=106776907


It seems the tweaks Apple has made to its CPU are benefiting more larger workloads.

Uh, this is from AnandTech's review. You may as well link the original article. 😛

http://www.anandtech.com/show/8554/the-iphone-6-review/3

Keeping in mind that A8 is clocked 100MHz (~7.7%) higher than A7, all of the SPECint2000 benchmarks show performance gains above and beyond the clock speed increase, indicating that every benchmark has benefited in some way. Of these benchmarks MCF, GCC, PerlBmk and GAP in particular show the greatest gains, at anywhere between 20% and 55%. Roughly speaking anything that is potentially branch-heavy sees some of the smallest gains while anything that plays into the multiplication changes benefits more.

MCF, a combinatorial optimization benchmark, ends up being the outlier here by far. Given that these are all integer benchmarks, it may very well be that MCF benefits from the integer multiplication improvements the most, as its performance comes very close to tracking the 2X increase in multiplication throughput. This also bodes well for any other kind of work that is similarly bounded by integer multiplication performance, though such workloads are not particularly common in the real world of smartphone use.
 
Last edited:
Uh, this is from AnandTech's review. You may as well link the original article. 😛
You already had linked the article, and the quoted message adds some information AT didn't have, that is the total score improvement rather than the individual scores. But I agree I should have also linked AT page 😉

BTW AT got it wrong: the most branch heavy SPEC tests are perlbmk and gcc. perlbmk depends on indirect branch prediction correctness, while gcc depends on general branch prediction.

And while I am at it, A8 SPECint score is 1493. Close to Itanium 2 :biggrin:
 
You already had linked the article, and the quoted message adds some information AT didn't have, that is the total score improvement rather than the individual scores. But I agree I should have also linked AT page 😉

BTW AT got it wrong: the most branch heavy SPEC tests are perlbmk and gcc. perlbmk depends on indirect branch prediction correctness, while gcc depends on general branch prediction.

And while I am at it, A8 SPECint score is 1493. Close to Itanium 2 :biggrin:

Allegedly, the ARM Cortex A57 delivers 1250 in SpecInt 2k @ 1.7GHz.

http://www.eetimes.com/document.asp?doc_id=1262718

Assuming linear scaling with clocks, the 2GHz Cortex A57s in the S810 should do about as well.
 
Allegedly, the ARM Cortex A57 delivers 1250 in SpecInt 2k @ 1.7GHz.

http://www.eetimes.com/document.asp?doc_id=1262718

Assuming linear scaling with clocks, the 2GHz Cortex A57s in the S810 should do about as well.
Interesting thanks! Note that a "real" SPEC result would need a lot of tuning; for instance some of the tests suffer when compiled for 64-bit (e.g. mcf) so some tests should be compiled for 32-bit while others should be compiled for 64-bit.

BTW do we have SPECint 2000 results for some recent Intel CPUs?
 
Interesting thanks! Note that a "real" SPEC result would need a lot of tuning; for instance some of the tests suffer when compiled for 64-bit (e.g. mcf) so some tests should be compiled for 32-bit while others should be compiled for 64-bit.

BTW do we have SPECint 2000 results for some recent Intel CPUs?

Don't think so; Intel has to my knowledge only publicized SpecInt2k6 for Avoton.
 
SPECint rate

SPEC 2006

So these are not directly comparable 🙂

Of course, SPECint_rate is a measure of bandwith more than anything else, double the memory channels and it will scale accordingly, SPECint 2006 is the computing throughput and much more useful to estimate the computing capabilities.
 
BTW do we have SPECint 2000 results for some recent Intel CPUs?

Not that I'm aware of, unless a Core 2 Duo is still considered recent. However there's the nice fact that Intel submitted Core 2 Duo results for both SPECint 2000 and 2006 on the same setup. For example, you can take the Core 2 Duo E4200 1.8 Ghz:

SPECint 2000 - http://www.spec.org/cpu2000/results/res2007q1/cpu2000-20070122-08386.html
SPECint 2006 - http://www.spec.org/cpu2006/results/res2007q1/cpu2006-20070122-00291.html

Then you can easily see how that 'value' Core 2 Duo compares to an i3 (using i3 due to guaranteed clock speed, no turbo skew) - http://www.spec.org/cpu2006/results/res2014q3/cpu2006-20140701-30256.html - and adjust for clock frequency to see that the i3 score is roughly twice that of the Core 2 Duo at comparable clocks.

Oh, and to tie this back in with the A8... Adjust a Core 2 Duo result for frequency and compare it to A8. They trade blows, with the A8 coming ahead by maybe 5%. And then by comparing how Core 2 Duo stands against Haswell in SPECint 2006 we can derive a rough idea of how A8 compares to Haswell in something other than geekbench.
 
Oh, and to tie this back in with the A8... Adjust a Core 2 Duo result for frequency and compare it to A8. They trade blows, with the A8 coming ahead by maybe 5%. And then by comparing how Core 2 Duo stands against Haswell in SPECint 2006 we can derive a rough idea of how A8 compares to Haswell in something other than geekbench.

For more accurate comparisons one should look at the subscores, the global score is exageratly dominated by a sub test in recent uarch, Libquantum, that is the cause of most of the difference between a Core 2 and a recent i3.

cpu2006-20070122-00291.gif


cpu2006-20140701-30256.gif
 
Calculating few numbers in that chart shows that Core i3 is average 2x the performance without Libquantum.

My unofficial estimates were-
Core 2 E4xxx-->E6xxx: 5%
Core 2 65nm to Core 2 45nm: 5%
Core 2 45nm to Nehalem: 10%
Nehalem to Westmere: 0%
Westmere to Sandy Bridge: 20%
Sandy Bridge to Ivy Bridge: 5%
Ivy Bridge to Haswell: 10%

Total: 68%

So that's pretty close, since Spec is more indicative of server use and server gains are better than for client.
 
Last edited:
For more accurate comparisons one should look at the subscores, the global score is exageratly dominated by a sub test in recent uarch, Libquantum, that is the cause of most of the difference between a Core 2 and a recent i3.

Fair enough, though just to provide everyone who doesn't understand geometric means an example of how removing that one outlying score affects the end results:

i3-4130 scaled down to 1.8 GHz: 25.2 with all tests, 19.4 without libquantum.
e4300 at 1.8 GHz: 11.4 with all tests, 11.3 without libquantum.

So instead of being 2.21x faster per clock it's 1.72x.
 
I didn't really follow the discussion too much, but do you mean that Haswell has a 1.72x higher IPC than Cyclone? I thought they were about the same?

That's a comparison between a Core 2 Duo E4300 and an i3-4130 scaled to equivalent clock speed in SPECint 2006. Which is of interest because a Core 2 Duo E4300 result in SPECint 2000 scaled down to 1.4 GHz is only around 5% slower than A8's score. Which would imply that if we had either an A8 score for SPECint 2006 or a Haswell score for SPECint 2000 we'd most likely see Haswell something around 1.64x faster than A8.
 
Interesting to compare memory access latencies. Note that the tests (Sandra and whatever AT is using are different).

Iphone

A8_Latency_575px.png


Haswell

latency_575px.png


Note that L1 latencies are roughly equal at ~4ns. L2 is roughly equal at ~14-17 ns.

However Haswell's L3 is only a tad bit slower than L2 and significantly faster than Apple's L3. Edram, despite being located on a separate die is faster (55 vs. 75 ns). Access to system Ram is nearly twice as fast on Haswell.

Not sure what A9 will bring.
 
Last edited:
Where have you heard that about Skylake?

On research it now looks to be a rumour. I redact my point.
I remember now looking at it on Wikipedia and never checked sources. The original wiki said (now redacted).

14mm manufacturing process
3.5 billion transistor without integrated graphic unit.
LGA 1151 socket
Z170/H170 chipset (Sunrise Point)
Thermal Design Power (TDP) up to 95 W (LGA 1151)
Support for both DDR3 and DDR4 SDRAM in mainstream variants, with up to 64 GB of RAM on LGA 1151 variants.
Support for 20 PCI Express 3.0 lanes (LGA 1151)
Support for PCI Express 4.0 (Skylake-E/EP/EX)
Support for Thunderbolt 3.0 (Alpine Ridge)
128 KB L1 cache (64 KB 16-way set associative instruction cache + 64 KB 16-way set associative data cache; two cycles)
512 KB L2 cache, 16-way set associative (six cycles)
12 MB L3 cache, 24-way set associative (12 cycles)
64 to 128 MB L4 eDRAM cache on certain SKUs.
Up to four cores as the default mainstream configuration
Support for SATA Express
AVX-512F: Advanced Vector Extensions 3.2
Instruction decoder increase from 4 (since core 2) to 6 in skylake and stage pipeline are also increase to 6 issue.
Intel SHA Extensions: SHA-1 and SHA-256 (Secure Hash Algorithms)
Intel MPX (Memory Protection Extensions)
Intel ADX (Multi-Precision Add-Carry Instruction Extensions)
Skylake's integrated GPU supports Direct3D 12 at feature level 12.

Of course that transistor count is completely bonkers.

I was wrong.
 
It probably is time for Intel to bump up their L1 cache sizes, though. But 2 cycle latency, while simultaneously doubling the size?
 
It probably is time for Intel to bump up their L1 cache sizes, though. But 2 cycle latency, while simultaneously doubling the size?

That's why I thought so. Cache and memory controller have been the same since sandy bridge. Moving to DDR4 seems like the time to revamp the system.

Edit: Its certainly possible Phenom II has 128 KB (64 Data and Instructions) @ 3 cycle latency. 512 KB L2.
 
Last edited:
Calculating few numbers in that chart shows that Core i3 is average 2x the performance without Libquantum.

My unofficial estimates were-
Core 2 E4xxx-->E6xxx: 5%
Core 2 65nm to Core 2 45nm: 5%
Core 2 45nm to Nehalem: 10%
Nehalem to Westmere: 0%
Westmere to Sandy Bridge: 20%
Sandy Bridge to Ivy Bridge: 5%
Ivy Bridge to Haswell: 10%

Total: 68%

So that's pretty close, since Spec is more indicative of server use and server gains are better than for client.

IPC delta between a Haswell core and a core 2 is 50% in Cinebench R15 and 13-14% in 7Zip and Fritzchess.
 
That's a comparison between a Core 2 Duo E4300 and an i3-4130 scaled to equivalent clock speed in SPECint 2006. Which is of interest because a Core 2 Duo E4300 result in SPECint 2000 scaled down to 1.4 GHz is only around 5% slower than A8's score. Which would imply that if we had either an A8 score for SPECint 2006 or a Haswell score for SPECint 2000 we'd most likely see Haswell something around 1.64x faster than A8.
One has to be careful with SPEC results: Intel is known to tune icc with it, while I doubt LLVM is being tuned with SPEC, so there's some bias (other companies such as Sun did/do the same). Also between the i3 and Core Duo there are more than 7 years of compiler tuning 🙂

OTOH I have little doubt that Haswell IPC is higher for bigger workloads than A8, as I think the uncore performance is much higher. But as a reminder, A8 gains more on SPEC than on Geekbench, so this might hint that Apple is starting to push uncore (wish them the best to catch up with Intel who have a huge lead there :biggrin🙂.
 
Back
Top