• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

Apple A12 & A12X [EDIT 2020-03-18] *** Now A12Z as well ***

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Nothingness

Platinum Member
Jul 3, 2013
2,136
368
126
Of course, they would not just add a SVE decoders using existing FP units. If you want single cycle throughput you need to match the vector length. I wonder if they will go 256 or 512 bits wide SVE?
Why do you think single cycle throughput is mandatory? Or what if for instance your 256-bit SVE units can be split into 2 128-bit AdvSIMD units?

My point is just that SVE doesn't imply more FP performance. They can improve FP performance without going to SVE (and they didn't do it), or they can add SVE without increasing peak FP performance.

The question rather is: do they need more FP performance? In their current market I'd say no. After all they already are at 384-bit per cycle vs 512-bit for AVX2 Intel chips. OTOH if they indeed want to go to laptop, I think they will have to increase FP perf.
 

name99

Member
Sep 11, 2010
124
93
101
SVE won't make a CPU execute more FP operations. Ignoring the fact SVE enables more vectorization, it's a win because you reduce decode/dispatch bandwidth requirements when your vector length is 256-bit or more. If Apple doesn't add more FP units or doesn't make them wider then SVE won't show a gain on code already vectorized.
Apple already has 3 128bit neon units. They can’t efficiently run more MACs per clock because of load+decode limitations — it just doesn’t make sense to add a 4th or more NEON units when you’re still limited to 128bit wife loads.
The whole point of going to SVE is that it WILL allow Apple to add more MAC units. After all, by your logic, Intel didn’t need to add AVX, it could have just added more and more ports supporting SSE...
 

name99

Member
Sep 11, 2010
124
93
101
If A12X can be used as a laptop chip, could 2 x A12X make sense for a desktop? Or at least an 8 performance core version?

BTW, die size of A12 (non-X) is 83.27 mm2.
I’ve been giving this a lot of thought. I don’t think the fundamental Mac unit will be the A13X or whatever. I think there will be

- ONE T3 chip (like todays’s T2 — Secure Enclave, flash controller, ISP, media, ...; stuff that doesn’t need to scale up from MacBook to Mac Pro)

- an “A13Z” which will be something like the 4 large+4 small cores, GPU, and NPU, and memory controller from the A13X. Assume that’s about 70mm^2, duplicate it, and you have the basic “tile” of a kick-ass GPU and 8+8 cores, at ~140mm^2.
MacBook Pro, Mac mini, iMac get one of these. iMac Pro gets two. Mac Pro gets 3 or 4. Only question is are they treated as separate sockets or (more likely) mounted using something like CoWoS or EMIB on a single package with higher bandwidth between the tiles.

I suspect RAM is mounted on package, not as DIMMs. This means (yeah, I know people will complain!!!) that you’ll have to choose your RAM at purchase time, no upgrades. But that’s already what we have in mobile, and in the flash of modern macs; and the payoff will be higher bandwidth RAM (MUCH higher for the 2 and 4 tile systems) at lower power, and in a smaller volume. Overall, I think it’s worth it — but yes, people will still be whining about it in 2025...
 
  • Like
Reactions: Eug and deathBOB

Thala

Senior member
Nov 12, 2014
904
255
116
Why do you think single cycle throughput is mandatory? Or what if for instance your 256-bit SVE units can be split into 2 128-bit AdvSIMD units?

My point is just that SVE doesn't imply more FP performance. They can improve FP performance without going to SVE (and they didn't do it), or they can add SVE without increasing peak FP performance.

The question rather is: do they need more FP performance? In their current market I'd say no. After all they already are at 384-bit per cycle vs 512-bit for AVX2 Intel chips. OTOH if they indeed want to go to laptop, I think they will have to increase FP perf.
I don not think single cycle throughput is mandatory. I am just expecting that if Apple adds SVE it will not just be a minor upgrade compared to the existing 128bit NEON units - keep in mind we are speculating laptop and desktop versions.
 

Eug

Lifer
Mar 11, 2000
22,647
256
126
I’ve been giving this a lot of thought. I don’t think the fundamental Mac unit will be the A13X or whatever. I think there will be

- ONE T3 chip (like todays’s T2 — Secure Enclave, flash controller, ISP, media, ...; stuff that doesn’t need to scale up from MacBook to Mac Pro)

- an “A13Z” which will be something like the 4 large+4 small cores, GPU, and NPU, and memory controller from the A13X. Assume that’s about 70mm^2, duplicate it, and you have the basic “tile” of a kick-ass GPU and 8+8 cores, at ~140mm^2.
MacBook Pro, Mac mini, iMac get one of these. iMac Pro gets two. Mac Pro gets 3 or 4. Only question is are they treated as separate sockets or (more likely) mounted using something like CoWoS or EMIB on a single package with higher bandwidth between the tiles.

I suspect RAM is mounted on package, not as DIMMs. This means (yeah, I know people will complain!!!) that you’ll have to choose your RAM at purchase time, no upgrades. But that’s already what we have in mobile, and in the flash of modern macs; and the payoff will be higher bandwidth RAM (MUCH higher for the 2 and 4 tile systems) at lower power, and in a smaller volume. Overall, I think it’s worth it — but yes, people will still be whining about it in 2025...
This has been the progress of discussion about ARM in Macs.

2010: ARM is too slow for Macs. Only good for phones and tablets. 2010 is when Apple’s A4 came out.

2015: ARM might be fast enough for MacBooks but nothing else. However that would only be in native mode. If ARM is trying to emulate Intel, it would be too slow.

2018: In native software, ARM can compete well with even mid-range MacBook Pros and iMacs and might even be able to reasonably emulate Intel at MacBook speeds. Two or more such ARM chips can compete in native mode against higher end Macs or perhaps could emulate mid-range Intel MacBook Pros and iMacs.
 
Mar 10, 2006
11,719
1,999
126
It would be funny if Apple's T-series chips (which seem to be rebranded A-series processors) wound up faster than the Intel processors that they're subservient to.
 

Eug

Lifer
Mar 11, 2000
22,647
256
126
It would be funny if Apple's T-series chips (which seem to be rebranded A-series processors) wound up faster than the Intel processors that they're subservient to.
Why would you think T chips are faster? They seem to be ultra low power purpose built chips, not general computing chips.

Are they even on the latest fab process?
 
Mar 10, 2006
11,719
1,999
126
Why would you think T chips are faster? They seem to be ultra low power purpose built chips, not general computing chips.

Are they even on the latest fab process?
As I understand it, T-series are just A-series chips with different names.
 

Charlie22911

Senior member
Mar 19, 2005
579
221
116
They are ARM chips that have model numbers that begin with the letter A, and that is likely where most of the similarities end.
One way to know for sure is if someone like iFixit has done a teardown and shared the dieshot\floor plan of the T series chips.
 

Eug

Lifer
Mar 11, 2000
22,647
256
126
T1 came out in 2016 and according to Wikipedia is an ARMv7 chip (32-bit).

The Ax series chips have been ARMv8 (64 bit) since 2013.

T2 is ARMv8 though.
 
  • Like
Reactions: Charlie22911

name99

Member
Sep 11, 2010
124
93
101
T1 came out in 2016 and according to Wikipedia is an ARMv7 chip (32-bit).

The Ax series chips have been ARMv8 (64 bit) since 2013.

T2 is ARMv8 though.
The coprocessor cores (controlling the GPU, ISP, flash controller, etc) on A12 are 64-bit. Some of those cores are presumably in the T2. THAT is what “T2 is 64-bit ARM” means.

There is no evidence for, or reason to believe, that there’s any of the heavy duty computational power (NPU, GPU, media, even just a zephy core ) beyond what Apple has told us (namely that there IS an ISP).
 
  • Like
Reactions: Eug

thunng8

Member
Jan 8, 2013
123
11
81
regarding the T2 chip. It was in the keynote that they mentioned that the T2 chip can be used for video encoding on the new macbook air. So it does have the video encoding block in there and it is presumably faster than what is available on the Intel Amber lake CPU.

If not - why would they enable that feature?
 

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
106
regarding the T2 chip. It was in the keynote that they mentioned that the T2 chip can be used for video encoding on the new macbook air. So it does have the video encoding block in there and it is presumably faster than what is available on the Intel Amber lake CPU.

If not - why would they enable that feature?
What? Ok they do say something about that. But I really have to wonder what is up with that. Check the wording for Mac Mini:

1 Testing conducted by Apple in October 2018 using preproduction 3.2GHz 6-core Intel Core i7-based Mac mini systems with 64GB of RAM and 2TB SSD, and shipping 3.0GHz dual-core Intel Core i7-based Mac mini systems with 16GB of RAM and 1TB SSD. Performance tests are conducted using specific computer systems and reflect the approximate performance of Mac mini.
The fact that they are using the 6 core to get the new higher numbers kind of implies the CPU is still heavily involved. Maybe T2 does one specific operation to help. I'd really love for someone tech journalist to get to the bottom of that one.

Also the T2 encoding blurb is only on the Mac Mini PR, but not the new Macbook Air PR...
 
Last edited:

Charlie22911

Senior member
Mar 19, 2005
579
221
116
What does apple do with defective A series dice that either have broken cores or don’t meet their specified clock/voltage/power characteristics? Is it possible these dice become T series chips? Apple has used cut down A series SOCs in their Apple TV in the past, so there is a precedent.
 

name99

Member
Sep 11, 2010
124
93
101
What does apple do with defective A series dice that either have broken cores or don’t meet their specified clock/voltage/power characteristics? Is it possible these dice become T series chips? Apple has used cut down A series SOCs in their Apple TV in the past, so there is a precedent.
I have yet to find conclusive proof that this sort of parts recycling is an integral part of MODERN semiconductors. As far as I can tell the primary reason this argument still exists is for people to justify various crazy market segmentation undertaken by Intel...

Apple has (perhaps) binned A10Xs that ran slightly hotter for aTVs, but there’s no real proof of this, and I can’t believe they “rely” on this (as opposed to putting aside a few chips that ran hotter than acceptable for an iPad at the start of manufacturing, before the process was optimized). There are just too many variables in terms of what the expected sales and lifetimes of different products will be to make this a “strategy” rather than a minor quirk at the start of the new iPads.

Or to put it differently, are you CONFIDENT that there will be a new aTV say next year that will use up all the supposed A12X’s that currently run too hot? If not what will Apple do with them? Perhaps nothing because there simply ARE NOT many cores that test out of spec?
 

Nothingness

Platinum Member
Jul 3, 2013
2,136
368
126
The whole point of going to SVE is that it WILL allow Apple to add more MAC units
The question I already asked is: does Apple need more FP performance for iPhone / iPad? If yes, then definitely SVE is the way to go. If no, then why bother?

After all, by your logic, Intel didn’t need to add AVX, it could have just added more and more ports supporting SSE...
My logic is just that adding SVE doesn't mean more performance, all the rest has to be improved.
 

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
106
I have yet to find conclusive proof that this sort of parts recycling is an integral part of MODERN semiconductors. As far as I can tell the primary reason this argument still exists is for people to justify various crazy market segmentation undertaken by Intel...
:rolleyes:

Actually Intel does less disabling than AMD or NVidia. GPUs do extensive segmentation by disabling functional unit and/or memory channels. AMDs CPU model has always been to segment by disabling cores, where Intel in the mainstream usually just built a new die for each core count and segmented by disabling HT, though in the high core count pro market it segmented by core disabling instead.
 

StinkyPinky

Diamond Member
Jul 6, 2002
6,440
369
126
Just out of interest, I have the Mate 20 Pro which has a Kirin 980 soc and it scores 3338/10179 on geekbench. So assuming Huawei (or Apple) haven't gamed the results then the single core is only about the same as an A10 and around the A11 at multicore. So definitely slower than the A12 despite Huawei's claims.

(amazing phone though, camera is something else)
 
  • Like
Reactions: Etain05

Nothingness

Platinum Member
Jul 3, 2013
2,136
368
126
Actually Intel does less disabling than AMD or NVidia. GPUs do extensive segmentation by disabling functional unit and/or memory channels. AMDs CPU model has always been to segment by disabling cores, where Intel in the mainstream usually just built a new die for each core count and segmented by disabling HT, though in the high core count pro market it segmented by core disabling instead.
Intel also segment by disabling parts of the instruction set such as AVX. I don't know if AMD ever did that.

Nvidia did something similar with their ARM chips: their first SoC did not implement NEON (the ARM equivalent to AVX/SSE). This resulted in Android not enabling NEON by default and forcing developers to implement two code paths. This impact lasted several years.
 
  • Like
Reactions: Lodix

Nothingness

Platinum Member
Jul 3, 2013
2,136
368
126
I think it makes sense to add these results here too.

Here is CINT 2006 on a Kaby Lake R i7-8650U gcc 7.3.0 -O3 -march=native

Code:
                8650u    A12    A12/8650u
400.perlbench   -        45.38  -
401.bzip2       30.2     28.54  0.95
403.gcc         43       44.56  1.04
429.mcf         33.3     49.92  1.50
445.gobmk       31.5     38.54  1.22
456.hmmer       37.3     43.24  1.16
458.sjeng       32.9     27.97  0.85
462.libquantum  95.5    113.40  1.19
464.h264ref     69.6     66.59  0.96
471.omnetpp     20.5     35.73  1.74
473.astar       24.4     27.25  1.12
483.xalancbmk   47.3     57.03  1.21
400.perlbench failed to compile and I have no time to investigate.
 

name99

Member
Sep 11, 2010
124
93
101
I think it makes sense to add these results here too.

Here is CINT 2006 on a Kaby Lake R i7-8650U gcc 7.3.0 -O3 -march=native

Code:
                8650u    A12    A12/8650u
400.perlbench   -        45.38  -
401.bzip2       30.2     28.54  0.95
403.gcc         43       44.56  1.04
429.mcf         33.3     49.92  1.50
445.gobmk       31.5     38.54  1.22
456.hmmer       37.3     43.24  1.16
458.sjeng       32.9     27.97  0.85
462.libquantum  95.5    113.40  1.19
464.h264ref     69.6     66.59  0.96
471.omnetpp     20.5     35.73  1.74
473.astar       24.4     27.25  1.12
483.xalancbmk   47.3     57.03  1.21
400.perlbench failed to compile and I have no time to investigate.
SPEC2006 is "standard" C code in the sense that it is garbage code riddled with bugs and undefined behavior, like most C code.
Of particular relevance is that 400.perlbench has at least two pointer overflow bugs...
https://blog.regehr.org/archives/1395

It's possible that LLVM is now good enough to figure out for itself that those overflows occur, and refuse to compile; or it may be some other bug in the code (code that's willing to tolerate pointer overflows is likely to tolerate a lot of other crap).

(Just so there's no misunderstanding, I think SPEC2006 does its job of pushing hard various aspects of the CPU, particularly testing code with a large data footprint, and with aggressive memory bandwidth or latency demands [though it does a poor job of testing code with a large instruction footprint].
But it's important not to valorize it. Much of the code is lousy quality, and we should be calling out crappy code WHEREVER we see it, not making excuses for it. It CERTAINLY should not be considered as exemplar code for newbies to emulate and learn from.)
 
Last edited:

ASK THE COMMUNITY