Apple A9X Geekbench

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Hi-Fi Man

Senior member
Oct 19, 2013
601
120
106
Why is it silly? A9 is a wide design with a short pipeline. It could very well be a higher IPC design, but obviously Haswell/Skylake can go to much higher "C".

Because it's still a phone/tablet design with all the limitations of the ARM ISA (lack of robust SIMD instructions hurts IPC and performance in comparison to x86, Power or SPARC). Sure it may catch up in a simple integer only bench, but there is a reason Apple hasn't used their ARM chips in their Macs yet.
 
Last edited:

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
Maybe for servers/workstations, but even the maddest optimist would admit that this chip is way off that :)

If we're talking pragmatic, there is one definite upside to it all with the ipad pro - its pretty well guaranteed to demolish anything existing which will run in iOS. CoreM is almost the other way round.

Hardly a fair test of course, but it does make a definite difference to the expected user experience.

Anyway hopefully back to all that surprisingly constructive technical talk.
 

Rakehellion

Lifer
Jan 15, 2013
12,181
35
91
Out of curiosity, I put my 6700K at 2.2GHz without any Speed Step or turbo active. In other words, it's locked at 2.2GHz. Memory is 16GB DDR4 @ 2667MHz 15-15-15-36-1T. My Windows 10 installation is clean and I made sure there were no appliactions running in the background while Geekbench ran. Here's the comparison to the iPad:

http://browser.primatelabs.com/geekbench3/compare/4149721?baseline=4183457

If we assume that the iPad actually runs at 2.16GHz (and no higher), the performance/IPC advantage of Apples core is pretty big in some cases. Many of the test results seem to hover in the +30% area. That's if the results are actually meaningful at all. We can speculate, but I don't see any way of determining who's right with any certainty without having much more varied test results available.

For completeness, here's the bone stock Geekbench score of the system with otherwise exactly the same running conditions. Multicore enhancement disabled.

http://browser.primatelabs.com/geekbench3/4183670

Apple has better single core performance than Intel. Intel is becoming AMD. :awe:
 

Nothingness

Diamond Member
Jul 3, 2013
3,309
2,382
136
Because it's still a phone/tablet design with all the limitations of the ARM ISA (lack of robust SIMD instructions hurts IPC and performance in comparison to x86, Power or SPARC).
You should read about AArch64 instruction set. Except for a smaller width (128-bit vs 256-bit) it's considered as better by many than AVX.

Sure it may catch up in a simple integer only bench, but there is a reason Apple hasn't used their ARM chips in their Macs yet.
Yes there's a reason and I won't insult your intelligence by spelling the word L-E-G-A-C-Y for you. Ha I did it :D
 

Space69

Member
Aug 12, 2014
39
0
66
You should read about AArch64 instruction set. Except for a smaller width (128-bit vs 256-bit) it's considered as better by many than AVX.

Since you used the generic term 'AVX' - I'm curious, does that include AVX512 or are you just reference the first implementation of AVX (Sandy Bridge)?
 

tipoo

Senior member
Oct 4, 2012
245
7
81
Apparently the A9X GPU is Iris Pro 5200-class:

iPad-Pro-charts.009-980x735.png

I think mobile devices use FP16 for that test, while standard PCs use FP32. You could argue about that two ways, you could say that's an efficiency advantage for mobile, or you could say it's not comparable to full precision FP32 tests.

It's like how Nvidia said the X1 was 1Tflop...At FP16. Under half of that at full precision.

Note that Intel claims that it would take a 100 - 130GB/s GDDR memory interface to deliver similar effective performance to Crystalwell since the latter is a cache, so a 50GB/s part trading blows with it would indicate FP16 playing a large role, particularly in combination with it being a tiled based renderer (but that part is to Imagination's credit as an efficiency benefit).
 
Last edited:

Nothingness

Diamond Member
Jul 3, 2013
3,309
2,382
136
Since you used the generic term 'AVX' - I'm curious, does that include AVX512 or are you just reference the first implementation of AVX (Sandy Bridge)?
Sorry for the confusion, I meant AVX/AVX2.

AVX-512 is for HPC and I don't see it doing many inroads for end-user applications. If ARM ever wants to go into (floating-point) HPC they'll have to develop something similar to AVX-512.
 

Zodiark1593

Platinum Member
Oct 21, 2012
2,230
4
81
I'd like to see how A9X does in something like PovRay, or anything else concerning 3d rendering?
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,079
3,915
136
Since you used the generic term 'AVX' - I'm curious, does that include AVX512 or are you just reference the first implementation of AVX (Sandy Bridge)?

So what consumer workload can you pack 16/8 32/64bit operands into. Because if you can't your now burning lots of power all the way through your core.

What they mean is in terms of available instructions, they way the execute etc.
 

Hi-Fi Man

Senior member
Oct 19, 2013
601
120
106
You should read about AArch64 instruction set. Except for a smaller width (128-bit vs 256-bit) it's considered as better by many than AVX.


Yes there's a reason and I won't insult your intelligence by spelling the word L-E-G-A-C-Y for you. Ha I did it :D

Your failed attempt at insulting my intelligence is rather silly because it shows your lack of knowledge and ability to make a proper coherent argument.

Legacy applications aren't a problem for Apple, this was demonstrated by their switch to x86 from PowerPC. If their CPUs were truly better they would have switched already just like they did from PowerPC.

From what I have been able to gather NEON in it's current ARMv8 incarnation is roughly equivalent to SSE3, FMA, and AES. I would like to see a source about that AVX claim especially considering NEON just recently gained support for double precision math (SISD VFP had it for awhile). Even if NEON was "better" then AVX, how is that even possible if it doesn't support one of AVXs main features which is 256 bit registers? The main functionality of AVX/2 is the use of VEX encoding which replaces SSE in the lower XMM registers and the addition of the upper YMM registers.
 
Last edited:

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
From what I have been able to gather NEON in it's current ARMv8 incarnation is roughly equivalent to SSE3, FMA, and AES. I would like to see a source about that AVX claim especially considering NEON just recently gained support for double precision math (SISD VFP had it for awhile). Even if NEON was "better" then AVX, how is that even possible if it doesn't support one of AVXs main features which is 256 bit registers? The main functionality of AVX/2 is the use of VEX encoding which replaces SSE in the lower XMM registers and the addition of the upper YMM registers.

You said NEON lacks robust SIMD in comparison to x86, PowerPC, or SPARC but neither Power ISA v2.07 (POWER8) or SPARC v9 (M7) define 256-bit or higher SIMD. Actually, Oracle's server-oriented SPARC CPUs don't have SIMD at all, AFAIK. What were you referring to exactly with this comparison?

BTW this is a nitpick but I can't really help myself: wider SIMD doesn't increase IPC, it tends to lower it slightly. What it increases is overall perf/MHz.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
BTW this is a nitpick but I can't really help myself: wider SIMD doesn't increase IPC, it tends to lower it slightly. What it increases is overall perf/MHz.

While this is a nitpick, it is true nonetheless. Some 2 years ago a ported some code to NEON for Cortex A5. As result IPC dropped from about 1 to about 0.5. Still performance gain was factor 4 (for FIR filter like 16 bit fixed point code).
 

Nothingness

Diamond Member
Jul 3, 2013
3,309
2,382
136
Legacy applications aren't a problem for Apple, this was demonstrated by their switch to x86 from PowerPC. If their CPUs were truly better they would have switched already just like they did from PowerPC.
It's not a notion of "better", it's a notion of "faster".

These Apple chips don't have enough performance to overcome the cost of runtime translation. The PowerPC was so slow back then (and with no concrete evidence from IBM/Motorola that it would get significantly faster) that it was easy to be much faster and pay the price of DBT. Don't get me wrong, to me PowerPC was "better" than x86, everything is better than x86, but nothing is faster.

But Apple chips are getting faster at a rate that Intel can only dream of (for good reasons, their chips have been tuned for years and hence are more difficult to make faster).
 

teejee

Senior member
Jul 4, 2013
361
199
116
It's not a notion of "better", it's a notion of "faster".

These Apple chips don't have enough performance to overcome the cost of runtime translation. The PowerPC was so slow back then (and with no concrete evidence from IBM/Motorola that it would get significantly faster) that it was easy to be much faster and pay the price of DBT. Don't get me wrong, to me PowerPC was "better" than x86, everything is better than x86, but nothing is faster.

But Apple chips are getting faster at a rate that Intel can only dream of (for good reasons, their chips have been tuned for years and hence are more difficult to make faster).
Why do you assume run-time translation? I'm pretty sure Apple would choose binary translation at installation (or first execution). This will only give a moderate performance penalty.
And there are plenty of things Apple could do to ensure that most applications are recompiled by the developers (e g give away free developer boxes well ahead of first ARM Mac).
 

Nothingness

Diamond Member
Jul 3, 2013
3,309
2,382
136
Why do you assume run-time translation? I'm pretty sure Apple would choose binary translation at installation (or first execution). This will only give a moderate performance penalty.
Static binary translation of an executable is a non-decidable problem without assistance (see this for instance), so I'm afraid it's not a solution.

And there are plenty of things Apple could do to ensure that most applications are recompiled by the developers (e g give away free developer boxes well ahead of first ARM Mac).
What would you do for your old programs? These would need some form of translation. OTOH it can be argued that if these are old programs their performance needs is perhaps less of a constraint than more modern programs so the cost of dynamic translation might be acceptable.

I agree things can change, but it will take time at least for the higher-end Mac. I'm still hoping we'll see some ARM-based laptop in the MBA range where users typically don't run programs with a need for performance. But that'd be a nightmare to properly market...
 

Hi-Fi Man

Senior member
Oct 19, 2013
601
120
106
You said NEON lacks robust SIMD in comparison to x86, PowerPC, or SPARC but neither Power ISA v2.07 (POWER8) or SPARC v9 (M7) define 256-bit or higher SIMD. Actually, Oracle's server-oriented SPARC CPUs don't have SIMD at all, AFAIK. What were you referring to exactly with this comparison?

BTW this is a nitpick but I can't really help myself: wider SIMD doesn't increase IPC, it tends to lower it slightly. What it increases is overall perf/MHz.

SPARC does indeed have SIMD if that's what you are implying (not sure you may just be referring to Oracle's chips). You don't need 256 bit wide SIMD to be "robust". Power and SPARC have robust SIMD support for what their target market is. If ARM wants to invade x86 territory they will have to reach x86 parity instruction wise.

IPC is instructions per clock which is the same thing as perf/MHz is it not? Unless you mean SIMD doesn't increase IPC but rather increases performance with all things considered. However, in a ideal CPU I'm sure SIMD would actually increase IPC because it means executing more instructions in parallel but that's in an ideal scenario.
 

Hi-Fi Man

Senior member
Oct 19, 2013
601
120
106
It's not a notion of "better", it's a notion of "faster".

These Apple chips don't have enough performance to overcome the cost of runtime translation. The PowerPC was so slow back then (and with no concrete evidence from IBM/Motorola that it would get significantly faster) that it was easy to be much faster and pay the price of DBT. Don't get me wrong, to me PowerPC was "better" than x86, everything is better than x86, but nothing is faster.

But Apple chips are getting faster at a rate that Intel can only dream of (for good reasons, their chips have been tuned for years and hence are more difficult to make faster).

A better CPU is a faster CPU in this scenario. The fact that these chips don't have enough performance is kind of my point.

The PowerPC 970 was rather competitive with netburst at the time, the same can't be said with K8 though.

I wouldn't say everything is better than x86 either unless you're a RISC champion or something and if that's the case well I'm not ready to start a religious war here!
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
I don't think this 16bit vs 32bit thing is valid. A9X simply packs a ton of GFLOPS and probably a good frequency bump (I first thought it had double the clusters, because of double performance, so it's not that good).

A8X has 256 GFLOPS @ 0.5GHz with 8 clusters.
So A9X has 320 GFLOPS @ 0.5GHz with 10 clusters.
Skylake GT2 has 384 GFLOPS @ 1.0 GHz.
Skylake GT3 has 768 GFLOPS @ 1.0 GHz.

Those FLOPS are real FLOPS, so 32-bit ones.

Source: http://www.anandtech.com/show/8716/apple-a8xs-gpu-gxa6850-even-better-than-i-thought and http://www.anandtech.com/show/9780/taking-notes-with-ipad-pro/2 and http://www.anandtech.com/show/8814/intel-releases-broadwell-u-new-skus-up-to-48-eus-and-iris-6100/2
 

Space69

Member
Aug 12, 2014
39
0
66
So what consumer workload can you pack 16/8 32/64bit operands into. Because if you can't your now burning lots of power all the way through your core.

What they mean is in terms of available instructions, they way the execute etc.

I have no idea of you're talking about - AVX512 is way more than just adding support for 512bit operations.
 

Space69

Member
Aug 12, 2014
39
0
66
IPC is instructions per clock which is the same thing as perf/MHz is it not? Unless you mean SIMD doesn't increase IPC but rather increases performance with all things considered. However, in a ideal CPU I'm sure SIMD would actually increase IPC because it means executing more instructions in parallel but that's in an ideal scenario.

I suppose you can view it that way, but it's called SIMD for a reason.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
You know what's funny? The only reason that there is any argument over Apple's cpu performance is that you can't do jack shit with it.

Run a kernel compile, do some rendering, play a few common games, churn out a few F@H WU's, the real world performance of a cpu is really not that hard to gauge. But guess what? You can't do any of that on the ipad pro!

So the A9X might well have quantum computational powers but do you even care?

Clearly some people here would for e-peen or bragging (I think a lot of it is used to justify iOS/Apple vs. Android ***boy wars or justifying your own gadget upgrades, more than anything really). Another side to it I think is because AMD is not competitive with Intel, some are desperately to find proper competition for Intel's products, but are completely ignoring what it actually means to hardness that power in the real world. I guess since smartphone / ARM devices are increasing performance at a rapid pace, that industry sector is to some extent more exciting to follow than the traditional desktop/laptop PC segments. But let's not get carried away -- in the end what matters is HOW you can use that power to your benefit NOT how much power is theoretically there but will never be used.

It's the same reason how discussing gaming benchmarks of Steam OS, Linux or OSX and trying to declare a winner is a waste of time. If you need a proper gaming device, you are using Windows OS, not Steam OS, not Linux not OSX. The same applies to consumer gadgets.

Your post is thus one of the best posts in this entire thread. Provided by work, I ended up using iPhone 5, 5S, 6, 6S and iPad Air but guess what, the main differences between all these devices in the real world are: (1) Screen size (2) Battery life (3) 2D/3D camera/video. GPU and CPU speed is basically 99% irrelevant for productivity or any real world tasks among these types of devices. Why? Because as actual consumer devices, they are poor products as far as productivity and functionality is concerned. Adding more horsepower to an inferior consumer good doesn't make it more productive. The 6S does browse the web better with 2GB of RAM due to less Safari reloading. However....in the real world:

- The number of professional reports (MS Word) I've written on iOS devices in the last 10 years: 0
- The number of professional presentations (MS PowerPoint) I've done on iOS devices in the last 10 years: 0
- The number of professional financial forecasting, budget, risk/outcome modeling analysis (Monte Carlo simulation) I've done on iOS devices: 0
- The number of great gaming experiences I've had on iOS devices: 0
- The number of times I connected an iOS device to my Panasonic plasma to watch movies: 0
- The number of times I watched a BluRay movie on an iOS device either natively or via a connection to the TV: 0

And then there is the completely horrendous file storage sub-system for all files and music (iTunes is pure trash).

My Canon and Olympus micro-4/3rds cameras wipe the floor with all iOS devices for image quality which means for evening, any night time or photos that are needed for work (zoom in details/macro shots), I use a dedicated camera. The only area where the latest iOS devices are hands down better are 4K video.

In other words, are iOS devices good products? Yes, they are for basic functions such as social media, average style internet browsing, text/Multi-media text messaging, quick photo for FB/Instagram, basic YouTube videos, etc. But if we compare them to any dedicated productivity devices such as a 15.6" laptop, a proper Windows OS - an actual functional OS for multi-tasking, a proper camera, they are just toys. That means even if the iPhone 7 and iPad Air 3 had the power of Core i7 6950X and Pascal SLI, they are still very limited devices for anything useful besides pure basics. Whenever I have access to my 32" 2560x1440 desktop or my 15.6" i7 quad-core laptop, the iOS devices just collect dust. That's not a knock against iOS devices per say because the same applies for Android devices too. It's just the fact that smartphones and tablets are of very limited use when trying to get real work done. A proper 15.6" laptop + any 2015 $300 smartphone annihilates iOS of those devices for productivity and functionality. I bet a lot of consumers are realizing this and why tablet sales are declining. Are there going to be niche use cases where the iPad Pro will be a good product? Sure, but I bet most for people a laptop + smartphone is a far better choice for productivity than an iOS tablet.

But hey, lets discuss how in an imaginary world an A9X CPU has IPC that seems comparable to Skylake in some arbitrary GeekBench that in the real world almost no one uses for productivity. :D :sneaky:

I wonder how many CATIA and AutoCAD designs will be done over that powerful iPad Pro in the automotive industry? Responsible people who are put in charge of running $30-200 million projects don't use toys -- they need the best equipment possible to ensure efficiency and accuracy. And what exactly is the average consumer going to do with a CPU+GPU 10X more powerful than a Skylake-E in an iPhone 7 or iPad Pro 2? Max out Infinity Blade 4? Crossy Road 2? Candy Crush 2? Awesome. Marketing FTW.
 
Last edited:

Hi-Fi Man

Senior member
Oct 19, 2013
601
120
106
I suppose you can view it that way, but it's called SIMD for a reason.

I should probably rephrase, operating on data in parallel instead of sequentially increases IPC in a well designed CPU because you're still getting more work done per clock.
 

ChronoReverse

Platinum Member
Mar 4, 2004
2,562
31
91
I should probably rephrase, operating on data in parallel instead of sequentially increases IPC in a well designed CPU because you're still getting more work done per clock.
IPC stands for instructions per clock not work per cycle. But if you use one instruction to do work on four pieces of data simultaneously instead of one, then you're doing more work even if your instruction takes two cycles instead of one.