Apple A12 & A12X * Now A12Z as well * Now in a Mac mini

Nothingness · Nov 1, 2018

Thala said:
Of course, they would not just add a SVE decoders using existing FP units. If you want single cycle throughput you need to match the vector length. I wonder if they will go 256 or 512 bits wide SVE?

Why do you think single cycle throughput is mandatory? Or what if for instance your 256-bit SVE units can be split into 2 128-bit AdvSIMD units?

My point is just that SVE doesn't imply more FP performance. They can improve FP performance without going to SVE (and they didn't do it), or they can add SVE without increasing peak FP performance.

The question rather is: do they need more FP performance? In their current market I'd say no. After all they already are at 384-bit per cycle vs 512-bit for AVX2 Intel chips. OTOH if they indeed want to go to laptop, I think they will have to increase FP perf.

name99 · Nov 1, 2018

Nothingness said:
SVE won't make a CPU execute more FP operations. Ignoring the fact SVE enables more vectorization, it's a win because you reduce decode/dispatch bandwidth requirements when your vector length is 256-bit or more. If Apple doesn't add more FP units or doesn't make them wider then SVE won't show a gain on code already vectorized.

Apple already has 3 128bit neon units. They can’t efficiently run more MACs per clock because of load+decode limitations — it just doesn’t make sense to add a 4th or more NEON units when you’re still limited to 128bit wife loads.
The whole point of going to SVE is that it WILL allow Apple to add more MAC units. After all, by your logic, Intel didn’t need to add AVX, it could have just added more and more ports supporting SSE...

name99 · Nov 1, 2018

Eug said:
If A12X can be used as a laptop chip, could 2 x A12X make sense for a desktop? Or at least an 8 performance core version?

BTW, die size of A12 (non-X) is 83.27 mm2.

I’ve been giving this a lot of thought. I don’t think the fundamental Mac unit will be the A13X or whatever. I think there will be

- ONE T3 chip (like todays’s T2 — Secure Enclave, flash controller, ISP, media, ...; stuff that doesn’t need to scale up from MacBook to Mac Pro)

- an “A13Z” which will be something like the 4 large+4 small cores, GPU, and NPU, and memory controller from the A13X. Assume that’s about 70mm^2, duplicate it, and you have the basic “tile” of a kick-ass GPU and 8+8 cores, at ~140mm^2.
MacBook Pro, Mac mini, iMac get one of these. iMac Pro gets two. Mac Pro gets 3 or 4. Only question is are they treated as separate sockets or (more likely) mounted using something like CoWoS or EMIB on a single package with higher bandwidth between the tiles.

I suspect RAM is mounted on package, not as DIMMs. This means (yeah, I know people will complain!!!) that you’ll have to choose your RAM at purchase time, no upgrades. But that’s already what we have in mobile, and in the flash of modern macs; and the payoff will be higher bandwidth RAM (MUCH higher for the 2 and 4 tile systems) at lower power, and in a smaller volume. Overall, I think it’s worth it — but yes, people will still be whining about it in 2025...

Thala · Nov 1, 2018

Nothingness said:
Why do you think single cycle throughput is mandatory? Or what if for instance your 256-bit SVE units can be split into 2 128-bit AdvSIMD units?

My point is just that SVE doesn't imply more FP performance. They can improve FP performance without going to SVE (and they didn't do it), or they can add SVE without increasing peak FP performance.

The question rather is: do they need more FP performance? In their current market I'd say no. After all they already are at 384-bit per cycle vs 512-bit for AVX2 Intel chips. OTOH if they indeed want to go to laptop, I think they will have to increase FP perf.

I don not think single cycle throughput is mandatory. I am just expecting that if Apple adds SVE it will not just be a minor upgrade compared to the existing 128bit NEON units - keep in mind we are speculating laptop and desktop versions.

Eug · Nov 1, 2018

name99 said:
I’ve been giving this a lot of thought. I don’t think the fundamental Mac unit will be the A13X or whatever. I think there will be

- ONE T3 chip (like todays’s T2 — Secure Enclave, flash controller, ISP, media, ...; stuff that doesn’t need to scale up from MacBook to Mac Pro)

- an “A13Z” which will be something like the 4 large+4 small cores, GPU, and NPU, and memory controller from the A13X. Assume that’s about 70mm^2, duplicate it, and you have the basic “tile” of a kick-ass GPU and 8+8 cores, at ~140mm^2.
MacBook Pro, Mac mini, iMac get one of these. iMac Pro gets two. Mac Pro gets 3 or 4. Only question is are they treated as separate sockets or (more likely) mounted using something like CoWoS or EMIB on a single package with higher bandwidth between the tiles.

I suspect RAM is mounted on package, not as DIMMs. This means (yeah, I know people will complain!!!) that you’ll have to choose your RAM at purchase time, no upgrades. But that’s already what we have in mobile, and in the flash of modern macs; and the payoff will be higher bandwidth RAM (MUCH higher for the 2 and 4 tile systems) at lower power, and in a smaller volume. Overall, I think it’s worth it — but yes, people will still be whining about it in 2025...

This has been the progress of discussion about ARM in Macs.

2010: ARM is too slow for Macs. Only good for phones and tablets. 2010 is when Apple’s A4 came out.

2015: ARM might be fast enough for MacBooks but nothing else. However that would only be in native mode. If ARM is trying to emulate Intel, it would be too slow.

2018: In native software, ARM can compete well with even mid-range MacBook Pros and iMacs and might even be able to reasonably emulate Intel at MacBook speeds. Two or more such ARM chips can compete in native mode against higher end Macs or perhaps could emulate mid-range Intel MacBook Pros and iMacs.

Arachnotronic · Nov 1, 2018

It would be funny if Apple's T-series chips (which seem to be rebranded A-series processors) wound up faster than the Intel processors that they're subservient to.

Eug · Nov 1, 2018

Arachnotronic said:
It would be funny if Apple's T-series chips (which seem to be rebranded A-series processors) wound up faster than the Intel processors that they're subservient to.

Why would you think T chips are faster? They seem to be ultra low power purpose built chips, not general computing chips.

Are they even on the latest fab process?

Arachnotronic · Nov 1, 2018

Eug said:
Why would you think T chips are faster? They seem to be ultra low power purpose built chips, not general computing chips.

Are they even on the latest fab process?

As I understand it, T-series are just A-series chips with different names.

Eug · Nov 1, 2018

Arachnotronic said:
As I understand it, T-series are just A-series chips with different names.

They are ARM chips that have model numbers that begin with the letter A, and that is likely where most of the similarities end.

Charlie22911 · Nov 1, 2018

Eug said:
They are ARM chips that have model numbers that begin with the letter A, and that is likely where most of the similarities end.

One way to know for sure is if someone like iFixit has done a teardown and shared the dieshot\floor plan of the T series chips.

Eug · Nov 1, 2018

T1 came out in 2016 and according to Wikipedia is an ARMv7 chip (32-bit).

The Ax series chips have been ARMv8 (64 bit) since 2013.

T2 is ARMv8 though.

name99 · Nov 1, 2018

Eug said:
T1 came out in 2016 and according to Wikipedia is an ARMv7 chip (32-bit).

The Ax series chips have been ARMv8 (64 bit) since 2013.

T2 is ARMv8 though.

The coprocessor cores (controlling the GPU, ISP, flash controller, etc) on A12 are 64-bit. Some of those cores are presumably in the T2. THAT is what “T2 is 64-bit ARM” means.

There is no evidence for, or reason to believe, that there’s any of the heavy duty computational power (NPU, GPU, media, even just a zephy core ) beyond what Apple has told us (namely that there IS an ISP).

thunng8 · Nov 1, 2018

regarding the T2 chip. It was in the keynote that they mentioned that the T2 chip can be used for video encoding on the new macbook air. So it does have the video encoding block in there and it is presumably faster than what is available on the Intel Amber lake CPU.

If not - why would they enable that feature?

PeterScott · Nov 1, 2018

thunng8 said:
regarding the T2 chip. It was in the keynote that they mentioned that the T2 chip can be used for video encoding on the new macbook air. So it does have the video encoding block in there and it is presumably faster than what is available on the Intel Amber lake CPU.

If not - why would they enable that feature?

What? Ok they do say something about that. But I really have to wonder what is up with that. Check the wording for Mac Mini:

1 Testing conducted by Apple in October 2018 using preproduction 3.2GHz 6-core Intel Core i7-based Mac mini systems with 64GB of RAM and 2TB SSD, and shipping 3.0GHz dual-core Intel Core i7-based Mac mini systems with 16GB of RAM and 1TB SSD. Performance tests are conducted using specific computer systems and reflect the approximate performance of Mac mini.

The fact that they are using the 6 core to get the new higher numbers kind of implies the CPU is still heavily involved. Maybe T2 does one specific operation to help. I'd really love for someone tech journalist to get to the bottom of that one.

Also the T2 encoding blurb is only on the Mac Mini PR, but not the new Macbook Air PR...

Charlie22911 · Nov 2, 2018

What does apple do with defective A series dice that either have broken cores or don’t meet their specified clock/voltage/power characteristics? Is it possible these dice become T series chips? Apple has used cut down A series SOCs in their Apple TV in the past, so there is a precedent.

name99 · Nov 2, 2018

Charlie22911 said:
What does apple do with defective A series dice that either have broken cores or don’t meet their specified clock/voltage/power characteristics? Is it possible these dice become T series chips? Apple has used cut down A series SOCs in their Apple TV in the past, so there is a precedent.

I have yet to find conclusive proof that this sort of parts recycling is an integral part of MODERN semiconductors. As far as I can tell the primary reason this argument still exists is for people to justify various crazy market segmentation undertaken by Intel...

Apple has (perhaps) binned A10Xs that ran slightly hotter for aTVs, but there’s no real proof of this, and I can’t believe they “rely” on this (as opposed to putting aside a few chips that ran hotter than acceptable for an iPad at the start of manufacturing, before the process was optimized). There are just too many variables in terms of what the expected sales and lifetimes of different products will be to make this a “strategy” rather than a minor quirk at the start of the new iPads.

Or to put it differently, are you CONFIDENT that there will be a new aTV say next year that will use up all the supposed A12X’s that currently run too hot? If not what will Apple do with them? Perhaps nothing because there simply ARE NOT many cores that test out of spec?

dark zero · Nov 2, 2018

Geekbench results are out...

https://www.gizmochina.com/2018/11/...-prowess-of-the-most-powerful-mobile-chipset/

Near 5K ST score and 17K MT score.

Eug · Nov 2, 2018

dark zero said:
Geekbench results are out...

https://www.gizmochina.com/2018/11/...-prowess-of-the-most-powerful-mobile-chipset/

Near 5K ST score and 17K MT score.

Posted a couple days ago but thanks. Here are all of them:

https://browser.geekbench.com/v4/cpu/search?q=Ipad8

5000/18000

The scores also indicate they have 4 GB or 6 GB.

Nothingness · Nov 2, 2018

name99 said:
The whole point of going to SVE is that it WILL allow Apple to add more MAC units

The question I already asked is: does Apple need more FP performance for iPhone / iPad? If yes, then definitely SVE is the way to go. If no, then why bother?

After all, by your logic, Intel didn’t need to add AVX, it could have just added more and more ports supporting SSE...

My logic is just that adding SVE doesn't mean more performance, all the rest has to be improved.

PeterScott · Nov 2, 2018

name99 said:
I have yet to find conclusive proof that this sort of parts recycling is an integral part of MODERN semiconductors. As far as I can tell the primary reason this argument still exists is for people to justify various crazy market segmentation undertaken by Intel...

Actually Intel does less disabling than AMD or NVidia. GPUs do extensive segmentation by disabling functional unit and/or memory channels. AMDs CPU model has always been to segment by disabling cores, where Intel in the mainstream usually just built a new die for each core count and segmented by disabling HT, though in the high core count pro market it segmented by core disabling instead.

StinkyPinky · Nov 2, 2018

Just out of interest, I have the Mate 20 Pro which has a Kirin 980 soc and it scores 3338/10179 on geekbench. So assuming Huawei (or Apple) haven't gamed the results then the single core is only about the same as an A10 and around the A11 at multicore. So definitely slower than the A12 despite Huawei's claims.

(amazing phone though, camera is something else)

Nothingness · Nov 3, 2018

PeterScott said:
Actually Intel does less disabling than AMD or NVidia. GPUs do extensive segmentation by disabling functional unit and/or memory channels. AMDs CPU model has always been to segment by disabling cores, where Intel in the mainstream usually just built a new die for each core count and segmented by disabling HT, though in the high core count pro market it segmented by core disabling instead.

Intel also segment by disabling parts of the instruction set such as AVX. I don't know if AMD ever did that.

Nvidia did something similar with their ARM chips: their first SoC did not implement NEON (the ARM equivalent to AVX/SSE). This resulted in Android not enabling NEON by default and forcing developers to implement two code paths. This impact lasted several years.

Nothingness · Nov 5, 2018

I think it makes sense to add these results here too.

Here is CINT 2006 on a Kaby Lake R i7-8650U gcc 7.3.0 -O3 -march=native

Code:

                8650u    A12    A12/8650u
400.perlbench   -        45.38  -
401.bzip2       30.2     28.54  0.95
403.gcc         43       44.56  1.04
429.mcf         33.3     49.92  1.50
445.gobmk       31.5     38.54  1.22
456.hmmer       37.3     43.24  1.16
458.sjeng       32.9     27.97  0.85
462.libquantum  95.5    113.40  1.19
464.h264ref     69.6     66.59  0.96
471.omnetpp     20.5     35.73  1.74
473.astar       24.4     27.25  1.12
483.xalancbmk   47.3     57.03  1.21

400.perlbench failed to compile and I have no time to investigate.

name99 · Nov 5, 2018

Nothingness said:
I think it makes sense to add these results here too.

Here is CINT 2006 on a Kaby Lake R i7-8650U gcc 7.3.0 -O3 -march=native

Code:

8650u A12 A12/8650u 400.perlbench - 45.38 - 401.bzip2 30.2 28.54 0.95 403.gcc 43 44.56 1.04 429.mcf 33.3 49.92 1.50 445.gobmk 31.5 38.54 1.22 456.hmmer 37.3 43.24 1.16 458.sjeng 32.9 27.97 0.85 462.libquantum 95.5 113.40 1.19 464.h264ref 69.6 66.59 0.96 471.omnetpp 20.5 35.73 1.74 473.astar 24.4 27.25 1.12 483.xalancbmk 47.3 57.03 1.21

400.perlbench failed to compile and I have no time to investigate.

SPEC2006 is "standard" C code in the sense that it is garbage code riddled with bugs and undefined behavior, like most C code.
Of particular relevance is that 400.perlbench has at least two pointer overflow bugs...
https://blog.regehr.org/archives/1395

It's possible that LLVM is now good enough to figure out for itself that those overflows occur, and refuse to compile; or it may be some other bug in the code (code that's willing to tolerate pointer overflows is likely to tolerate a lot of other crap).

(Just so there's no misunderstanding, I think SPEC2006 does its job of pushing hard various aspects of the CPU, particularly testing code with a large data footprint, and with aggressive memory bandwidth or latency demands [though it does a poor job of testing code with a large instruction footprint].
But it's important not to valorize it. Much of the code is lousy quality, and we should be calling out crappy code WHEREVER we see it, not making excuses for it. It CERTAINLY should not be considered as exemplar code for newbies to emulate and learn from.)

thunng8 · Nov 5, 2018

Check out this review with a few application tests.

https://www.laptopmag.com/reviews/laptops/new-ipad-pro-2018-129-inch

The Adobe Lightroom test would be very interesting for photographers on the go. It seems to indicate that Lightroom on a12x is significantly faster than any intel mobile chip.

Apple A12 & A12X *** Now A12Z as well *** Now in a Mac mini

Platinum Member

Senior member

Senior member

Golden Member

Lifer

Lifer

Lifer

Lifer

Lifer

Senior member

Lifer

Senior member

Member

Platinum Member

Senior member

Senior member

Platinum Member

Lifer

Platinum Member

Platinum Member

Diamond Member

Platinum Member

Platinum Member

Senior member

Member

Apple A12 & A12X * Now A12Z as well * Now in a Mac mini