• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

Apple A12 & A12X [EDIT 2020-03-18] *** Now A12Z as well ***

Page 7 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

JoeRambo

Senior member
Jun 13, 2013
746
382
136
May be ipad use gpu for video transcode and photoediting?
For sure, but it's more of a problem for Intel CPUs that have obsolete technology and don't have proper full acceleration for H265 @ 4K encodes. The result of using ~2014 versus 2018 tech is slaughter even when we factor in usual tight Apple integration.
 

Nothingness

Platinum Member
Jul 3, 2013
2,136
368
126
SPEC2006 is "standard" C code in the sense that it is garbage code riddled with bugs and undefined behavior, like most C code.
Of particular relevance is that 400.perlbench has at least two pointer overflow bugs...
https://blog.regehr.org/archives/1395

It's possible that LLVM is now good enough to figure out for itself that those overflows occur, and refuse to compile; or it may be some other bug in the code (code that's willing to tolerate pointer overflows is likely to tolerate a lot of other crap).

(Just so there's no misunderstanding, I think SPEC2006 does its job of pushing hard various aspects of the CPU, particularly testing code with a large data footprint, and with aggressive memory bandwidth or latency demands [though it does a poor job of testing code with a large instruction footprint].
But it's important not to valorize it. Much of the code is lousy quality, and we should be calling out crappy code WHEREVER we see it, not making excuses for it. It CERTAINLY should not be considered as exemplar code for newbies to emulate and learn from.)
I have seen much worse "pro" code :p And many real programs are badly coded, so that just mimics reality.

I'd prefer to switch to CPU 2017 but I'm not sure Anandtech/Andrei have access to it.
 
  • Like
Reactions: Arachnotronic
Mar 10, 2006
11,719
1,999
126
I have seen much worse "pro" code :p And many real programs are badly coded, so that just mimics reality.

I'd prefer to switch to CPU 2017 but I'm not sure Anandtech/Andrei have access to it.
A CPU's ability to handle bad code is a good indicator of how good the machine is. If your code needs to be massaged and tuned to get the most out of it, you haven't built a very robust machine.
 
  • Like
Reactions: Nothingness

Nothingness

Platinum Member
Jul 3, 2013
2,136
368
126
A CPU's ability to handle bad code is a good indicator of how good the machine is. If your code needs to be massaged and tuned to get the most out of it, you haven't built a very robust machine.
Indeed. And here Intel for sure is really great! Because they have to run what on would consider obsolete bad code at good speed.
 
  • Like
Reactions: Arachnotronic

thunng8

Member
Jan 8, 2013
123
11
81
obviously. It's ultimately a software issue and I bet Apple paid Adobe to have the develop GPU acceleration and intel didn't.
I doubt it. Adobe Lightroom is the worlds most popular photo DAM manager. I would hope that Adobe would spend some time optimizing it for intel platforms.

As for it being gpu accelerated, I’m not so sure. Raw conversion is not easy to accelerate. Countless software have tried it and have not been successful in getting any speed increases.
 
Mar 11, 2004
19,524
1,893
126
Apple doesn't necessarily have to have paid for it in the typical sense, but rather, as part of them developing the app, Adobe just took the time to build in the extra capability, or that simply that's just how you code for that setup (meaning, Apple put in the work to make apps be able to use GPU and or specialized processing blocks for certain tasks, so that if you're doing video processing, the OS defaults to offloading it to the GPU or dedicated hardware).

May be ipad use gpu for video transcode and photoediting?
If it is then they need to do further analysis to compare the quality. From what I've gathered, GPU transcoding is very fast but is significantly worse in quality. If Apple can make the quality equal (or better even?) with the acceleration (regardless of it being GPU or specialized hardware for that task), on top of being faster then awesome. Also worth asking is if the Intel ones were using QuickSync as that would probably be the better comparison, although again you need to compare quality. Ideally you'd be able to compare different options (best quality of each, fastest, and then see if you could find an optimal quality/time settings where you find what is a suitable quality setting - or more than one for different users - but you try to control the quality and then compare the time each takes to achieve that result).
 
  • Like
Reactions: TheGiant

TheGiant

Senior member
Jun 12, 2017
639
242
86
Apple doesn't necessarily have to have paid for it in the typical sense, but rather, as part of them developing the app, Adobe just took the time to build in the extra capability, or that simply that's just how you code for that setup (meaning, Apple put in the work to make apps be able to use GPU and or specialized processing blocks for certain tasks, so that if you're doing video processing, the OS defaults to offloading it to the GPU or dedicated hardware).



If it is then they need to do further analysis to compare the quality. From what I've gathered, GPU transcoding is very fast but is significantly worse in quality. If Apple can make the quality equal (or better even?) with the acceleration (regardless of it being GPU or specialized hardware for that task), on top of being faster then awesome. Also worth asking is if the Intel ones were using QuickSync as that would probably be the better comparison, although again you need to compare quality. Ideally you'd be able to compare different options (best quality of each, fastest, and then see if you could find an optimal quality/time settings where you find what is a suitable quality setting - or more than one for different users - but you try to control the quality and then compare the time each takes to achieve that result).
This
Quality is worse on the GPU. That is why people are using CPUs now. The test isn't quality made.
As for the calculation test, that result only shows that like single/small number of threads performance wise, geekbench doesn't show results that are way off. Seriously apple has 5W CPU that has the performance of 50W CPU of this desktop age.
That is a general f..k up for Intel, but for AMD even bigger as they are frequency and also IPC wise behind.

I never found an excel calculation comparison on the web. With that RAM in the new iPhone or iPad, lighter calculations could be tested against an x86 cpu, like techspot does.
 

name99

Member
Sep 11, 2010
124
93
101
This
Quality is worse on the GPU. That is why people are using CPUs now. The test isn't quality made.
As for the calculation test, that result only shows that like single/small number of threads performance wise, geekbench doesn't show results that are way off. Seriously apple has 5W CPU that has the performance of 50W CPU of this desktop age.
That is a general f..k up for Intel, but for AMD even bigger as they are frequency and also IPC wise behind.

I never found an excel calculation comparison on the web. With that RAM in the new iPhone or iPad, lighter calculations could be tested against an x86 cpu, like techspot does.
When iPad Pro A10X came out, I ran tests of Mathematica (on iMac) vs Wolfram Player (Mathematica engine, different UI) on iPad Pro A10X. Basically 2.4GHz A10X was equivalent to the 3.6GHz i5 Ivy Bridge in my iMac.
When I get the new iPad Pro I'll update the tests.
 

Eug

Lifer
Mar 11, 2000
22,647
256
126
Discussion with Anand Shimpi (!) and Phil Schiller about the A12X on Arstechnica:

https://arstechnica.com/gadgets/2018/11/apple-walks-ars-through-the-ipad-pros-a12x-system-on-a-chip/
Heh. It's ironic AnandTech didn't get the Anand interview. Anyhoo, glad to see him out in front view too now.

“We've got our own custom-designed performance controller that lets you use all eight at the same time,” Shimpi told Ars. “And so when you're running these heavily-threaded workloads, things that you might find in pro workflows and pro applications, that's where you see the up to 90 percent improvement over A10X.”

---

This performance is unprecedented in anything like this form factor. In addition to the ability to engage all the cores simultaneously, there's reason to believe that cache sizes in the A12, and likely therefore the A12X, are a substantial factor driving this performance.

You could also make the case that the A12X's performance in general is partly so strong because Apple's architecture is a master class in optimized heterogeneous computing—that is, smartly using well-architected, specialized types of processors for matching specialized tasks. Though the A12X is of course related to ARM's big.LITTLE architecture, Apple has done a lot of work here to get results that others haven't.


---

"You typically only see this kind of performance in bigger machines—bigger machines with fans," Shimpi claimed. "You can deliver it in this 5.9 millimeter thin iPad Pro because we've built such a good, such a very efficient architecture."

---

Shimpi offered a pitch for the GPU. "It's our first 7-core implementation of our own custom-designed GPU," he said. "Each one of these cores is both faster and more efficient than what we had in the A10X and the result is, that's how you get to the 2x improved graphics performance. It's unheard of in this form factor, this is really an Xbox One S class GPU. And again, it's in a completely fanless design."

---

Generally, this GPU has a huge lead in the mobile space, but it's not encroaching on laptop territory the same way the CPU is—at least, not in these sorts of benchmarks. The advantage over other mobile devices is significant, though. There aren't any other devices this category that come close. As for performance gains relative to the iPhone XS and its A12, Shimpi said memory bandwidth is one part of that.

"The implementation is the same," he clarified. "But you do have much more memory bandwidth so there may be cases where it's actually faster than what you get on the phone if you do have a workload that has taken advantage of the fact that you have twice as big of a memory subsystem."

This impacts not just 3D graphics in games but a lot of the UI effects in iOS itself. Shimpi noted it's not just about peak memory bandwidth, but delivering bits efficiently. "Having that dynamic range is very important because there are times when you want to operate at a lower performance point in order to get efficiency and battery life," he said.

Mobile device comparisons aside, the laptop and desktop are more or less the ultimate target. "We’ll actually take content from the desktop, profile it, and use it to drive our GPU architectures. This is one of the things that you usually don't see in a lot of mobile GPU benchmarks," Shimpi explained.

---

Typically when you get this type of CPU and GPU performance, a combination of the two, you have a discrete memory system. So the CPU has its own set of memory and the GPU has its own set of memory, and for a lot of media workloads or pro workflows where you actually want both working on the same data set, you copy back and forth, generally over a very narrow slow bus, and so developers tend to not create their applications that way, because you don't want to copy back and forth.

We don't have any of those problems. We have the unified architecture, the CPU, the GPU, the ISP, the Neural Engine—everything sits behind the exact same memory interface, and you have one pool of memory.

On top of that, this is the only type of memory interface that iOS knows. You don't have the problem of, well, sometimes the unified pool may be a discrete pool, sometimes it may not. iOS, our frameworks, this is all it’s ever known, and so as a result developers benefit from that. By default, this is what they're optimized for, whereas in other ecosystems you might have to worry about, well, OK, sometimes I have to treat the two things as discrete; sometimes they share.
 
  • Like
Reactions: Etain05

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
106
Heh. It's ironic AnandTech didn't get the Anand interview. Anyhoo, glad to see him out in front view too now.

“We've got our own custom-designed performance controller that lets you use all eight at the same time,” Shimpi told Ars. “And so when you're running these heavily-threaded workloads, things that you might find in pro workflows and pro applications, that's where you see the up to 90 percent improvement over A10X.”

---

This performance is unprecedented in anything like this form factor. In addition to the ability to engage all the cores simultaneously, there's reason to believe that cache sizes in the A12, and likely therefore the A12X, are a substantial factor driving this performance.

You could also make the case that the A12X's performance in general is partly so strong because Apple's architecture is a master class in optimized heterogeneous computing—that is, smartly using well-architected, specialized types of processors for matching specialized tasks. Though the A12X is of course related to ARM's big.LITTLE architecture, Apple has done a lot of work here to get results that others haven't.


---

"You typically only see this kind of performance in bigger machines—bigger machines with fans," Shimpi claimed. "You can deliver it in this 5.9 millimeter thin iPad Pro because we've built such a good, such a very efficient architecture."

---

Shimpi offered a pitch for the GPU. "It's our first 7-core implementation of our own custom-designed GPU," he said. "Each one of these cores is both faster and more efficient than what we had in the A10X and the result is, that's how you get to the 2x improved graphics performance. It's unheard of in this form factor, this is really an Xbox One S class GPU. And again, it's in a completely fanless design."

---

Generally, this GPU has a huge lead in the mobile space, but it's not encroaching on laptop territory the same way the CPU is—at least, not in these sorts of benchmarks. The advantage over other mobile devices is significant, though. There aren't any other devices this category that come close. As for performance gains relative to the iPhone XS and its A12, Shimpi said memory bandwidth is one part of that.

"The implementation is the same," he clarified. "But you do have much more memory bandwidth so there may be cases where it's actually faster than what you get on the phone if you do have a workload that has taken advantage of the fact that you have twice as big of a memory subsystem."

This impacts not just 3D graphics in games but a lot of the UI effects in iOS itself. Shimpi noted it's not just about peak memory bandwidth, but delivering bits efficiently. "Having that dynamic range is very important because there are times when you want to operate at a lower performance point in order to get efficiency and battery life," he said.

Mobile device comparisons aside, the laptop and desktop are more or less the ultimate target. "We’ll actually take content from the desktop, profile it, and use it to drive our GPU architectures. This is one of the things that you usually don't see in a lot of mobile GPU benchmarks," Shimpi explained.

---

Typically when you get this type of CPU and GPU performance, a combination of the two, you have a discrete memory system. So the CPU has its own set of memory and the GPU has its own set of memory, and for a lot of media workloads or pro workflows where you actually want both working on the same data set, you copy back and forth, generally over a very narrow slow bus, and so developers tend to not create their applications that way, because you don't want to copy back and forth.

We don't have any of those problems. We have the unified architecture, the CPU, the GPU, the ISP, the Neural Engine—everything sits behind the exact same memory interface, and you have one pool of memory.

On top of that, this is the only type of memory interface that iOS knows. You don't have the problem of, well, sometimes the unified pool may be a discrete pool, sometimes it may not. iOS, our frameworks, this is all it’s ever known, and so as a result developers benefit from that. By default, this is what they're optimized for, whereas in other ecosystems you might have to worry about, well, OK, sometimes I have to treat the two things as discrete; sometimes they share.
Great that the SoC is getting more powerful, but that hasn't been the issue for me in a long time. The SoC has been powerful enough to make a good laptop repacement for a couple of years. But touch UI only works in the consumption role for me.

Until there is proper cursor support for smoother input, I am out.
 

Eug

Lifer
Mar 11, 2000
22,647
256
126
Great that the SoC is getting more powerful, but that hasn't been the issue for me in a long time. The SoC has been powerful enough to make a good laptop repacement for a couple of years. But touch UI only works in the consumption role for me.

Until there is proper cursor support for smoother input, I am out.
That’s why I highlighted that passage from The Boss. They are profiling desktop workloads to help plan their chip designs.

We are on the cusp of a very big change in the Apple-verse. 2019 or 2020
 
  • Like
Reactions: Etain05

IntelUser2000

Elite Member
Oct 14, 2003
6,577
1,104
126
On top of that, this is the only type of memory interface that iOS knows. You don't have the problem of, well, sometimes the unified pool may be a discrete pool, sometimes it may not. iOS, our frameworks, this is all it’s ever known, and so as a result developers benefit from that.
"At the end of the day we want to make sure that whatever vision we have set out for the thing, if it requires custom silicon, that we're there to deliver. For a given form factor, for a given industrial design in this thermal envelope, no one should be able to build a better, more performant chip."
Seems like a pretty good summary of why they can put out such a good architecture that eclipses everyone else.

It's not just they have lots of money. Intel has lots of money, but only requires a fraction of the company to get it working. Ideas can't be bought with cash.

Apple has all the advantages
-Money is no object.
-Related to the above, they have absolute presence selling premium devices. No need to artificially segment chips to maximize profits on a chip. You just care about the device as a whole.
-Top tier talent is there because they are the darling of the tech industry.
-A single OS and likely firmware to work on, so it has the advantages that console has(that its a stable platform). Reduces TTM(Time to Market), because they don't need to account for different vendors. No need to wait for microsoft on implementing their hardware features into the OS, no need to wait for notebook vendors that may or may not use a hardware feature.
 
Last edited:
  • Like
Reactions: Etain05

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
106
That’s why I highlighted that passage from The Boss. They are profiling desktop workloads to help plan their chip designs.

We are on the cusp of a very big change in the Apple-verse. 2019 or 2020
I didn't read their comment that way at all. More of the iPad as laptop/desktop replacement marketing. See how powerful it is, it can replace your other computers, when the issue about iPad replacing laptop/desktop is NOT about power, it's about UI.
 

wizfactor

Junior Member
May 9, 2018
2
0
11
Just wanted to say I am so glad that Anand Shimpi could finally talk about what he's been up to since leaving the site. I think most of us always assumed that he joined the Silicon team, but it's still great to hear that that is where he's working now, and that he's doing really well.

The chip design team is probably my favorite team working at Apple right now. Anand is practically working alongside Silicon Gods!
 

Eug

Lifer
Mar 11, 2000
22,647
256
126
Just wanted to say I am so glad that Anand Shimpi could finally talk about what he's been up to since leaving the site. I think most of us always assumed that he joined the Silicon team, but it's still great to hear that that is where he's working now, and that he's doing really well.

The chip design team is probably my favorite team working at Apple right now. Anand is practically working alongside Silicon Gods!
I actually originally guessed he would be doing something else.
 

TheGiant

Senior member
Jun 12, 2017
639
242
86
I see the personally as a bunch of "nothing" to be maybe extreme.
No info about desktop/mobile replacement timing etc, lots of technical stuff.
We have to see if you add sata, pci-e, usb and other things that are pretty normal and we don't even think about them to the chip and also the desktop/laptop operating system which isn't optimized to the max as mobile is what it does.
As I have and Ipad 4 and it doesn't run the latest iOS. Imagine that on the x86 space. Sorry, the new windows 11 doesn't run your SSE3 code or otherwise the new version of....wont run on your current hardware (like c2d from 2006, no hell sandy bridge generation).

I wonder when Intel says all before AVX2 runs with emulation (or smething like that). The progress in all aspects (not only) CPUs in mobile space are running so fast because there is the money and the desktop market stagnating is transferring exactly into the rate of improvement.
 

Gideon

Senior member
Nov 27, 2007
677
901
136

Mobile device comparisons aside, the laptop and desktop are more or less the ultimate target. "We’ll actually take content from the desktop, profile it, and use it to drive our GPU architectures. This is one of the things that you usually don't see in a lot of mobile GPU benchmarks," Shimpi explained.

---

Typically when you get this type of CPU and GPU performance, a combination of the two, you have a discrete memory system. So the CPU has its own set of memory and the GPU has its own set of memory, and for a lot of media workloads or pro workflows where you actually want both working on the same data set, you copy back and forth, generally over a very narrow slow bus, and so developers tend to not create their applications that way, because you don't want to copy back and forth.

We don't have any of those problems. We have the unified architecture, the CPU, the GPU, the ISP, the Neural Engine—everything sits behind the exact same memory interface, and you have one pool of memory.
To me, it looks more like, "The future of fusion is ... Apple". E.g. they do plan to tackle desktop/notebook and instead of going stupidly wide with vector units, will use the GPU to compensate to the relatively weak vector performance (vs Intel).

I mean, if there's any company that has the easiest time pulling that off, that's apple. They already have a pretty heterogenous environment there, a single memory pool, they own the entire vertical hardware/software stack etc ...
 

IntelUser2000

Elite Member
Oct 14, 2003
6,577
1,104
126
No info about desktop/mobile replacement timing etc, lots of technical stuff. We have to see if you add sata, pci-e, usb and other things that are pretty normal and we don't even think about them to the chip and also the desktop/laptop operating system which isn't optimized to the max as mobile is what it does.
If they are going to break compatibility anyway, then they can bring Macs that are likely scaled up versions of their iDevices. You'd have them super integrated, which will basically eliminate user upgrading, but they don't have to care about that. They'll be able to make it thinner and lighter that way or with a bigger battery with the saved space, and use less power as well and lower cost. There is a cost to flexibility.

I was reading an article that Apple is so focused on having the simplest, uncluttered design, sometimes it ends up being confusing in certain segments. The Mac Pro had everything in the back, the power buttons, plugs, expansion ports. Because they wanted it clean. Despite it being "Mac Pro".

Their design paradigm works extremely well for consumers, because it can become an extension of their fashion.
 

ASK THE COMMUNITY