Discussion Apple Silicon SoC thread

Page 22 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Carfax83

Diamond Member
Nov 1, 2010
6,547
1,340
136
Do you want to be part of the (not very) in crowd of Phoronix and Larabel? Or do you want to learn something? Because Phoronix is not a place to learn the future of technology; it's a place for a bunch of technology has-beens to get together to reminisce about their glory days, when x86 was all that mattered and people were impressed that you knew how to write a web page.
I guess you don't like Mr. Larabel very much! :D

That said, I'm not going to defend Phoronix, as I don't know enough about them to say whether their benchmarks are pertinent to this discussion. I was just throwing it out there to @guidryp for suggesting that Graviton was a good example of a high core count CPU implementation.
 

moinmoin

Diamond Member
Jun 1, 2017
4,057
6,105
136
Currently there is a Mac Pro based around Intel it has PCI-Express and apple sells an accelerator card that is an Afterburner Accelerator Card. Likewise they could always make a Mac Apple Silicon Card.

I assume there will be an eventual Apple Silicon Mac Pro device even if it may just be something smaller like a Mac Mini or a larger version between Mac Mini and Mac Pro.

But my point is if Apple is going to go chiplets they could always use older silicon for it is still competitive and the newer silicon is on the leading edge foundry. There is a point of having 8 or so, or 16 or so fast cores on the latest process hits diminishing returns and if you need more cores you are fine with cores that are 50% or 75% as fast per individual core and you just want as many cores as possible. Thus in that "Pro situation" that is not a laptop or tablet it could make sense to just have as many cores as possible and downclock them to the best desired performance per watt scenario.
And my point is that all the Apple Silicon so far don't contain the necessary I/O to do what you propose there. So no, Apple couldn't just use older silicon as chiplets, they'd need to be designed as such first. We don't even know if any of the Apple Silicon has the necessary SMP capability to begin with.
 

Carfax83

Diamond Member
Nov 1, 2010
6,547
1,340
136
Does it matter why the application is faster? Most people will care that their particular application is faster, and Apple makes it relatively easy for developers to tap into the extra accelerators available in the M1 SOC.
No I guess it doesn't, at least not for end consumers like us. But for this particular technical conversation we're having, it does. For Apple's claim that the M1 is the fastest CPU core to be verified, the hardware acceleration will need to be turned off to provide an equal level playing field with Zen 3.

This is where the M1 will shine. Although its core is arguably tied with a desktop Zen3 chip as the fastest core on the market, when it comes to real application performance, I suspect even the non actively cooled macbook air will surpass the performance of the Zen3 5950 in many applications.
You might be right. Machine learning and AI acceleration are wild cards. When Tiger Lake launched, Intel demonstrated some impressive use cases of how AI and inference acceleration can provide enormous performance increases.
 

guidryp

Platinum Member
Apr 3, 2006
2,682
3,560
136
Same for Intel. Intel's domination in manufacturing is a major part in why they were so successful over the years. Coincidentally, when they lost their process node leadership with the 10nm disaster, they lost their performance crown as well.
Intel had years of process advancement with negligible IPC gains.

Zen 3 big IPC advancement, is on the same process as Zen 2.

Intel is backporting it's Sunny Cove cores to 14nm for desktop, with reportedly similar boosts in IPC.

It's not the process that is providing the IPC boost. It's the designs.

Idealy everything is firing together. IPC improves from design, and there is some perf/watt gains from process.
 

thunng8

Member
Jan 8, 2013
150
53
101
No I guess it doesn't, at least not for end consumers like us. But for this particular technical conversation we're having, it does. For Apple's claim that the M1 is the fastest CPU core to be verified, the hardware acceleration will need to be turned off to provide an equal level playing field with Zen 3.



You might be right. Machine learning and AI acceleration are wild cards. When Tiger Lake launched, Intel demonstrated some impressive use cases of how AI and inference acceleration can provide enormous performance increases.
I don't think for most applications, that would not be possible to turn off. So the mainstream press will just say the M1 is faster without really knowing how or why. It also not just the machine learning / neural engine. Apple's SOC has dedicated best in industry imaging engine (BTW, I suspect that's why Lightrooom is so fast on an ipad pro, not the neural engine) as well as video encode/decode engines.
 

shady28

Platinum Member
Apr 11, 2004
2,520
397
126
No I guess it doesn't, at least not for end consumers like us. But for this particular technical conversation we're having, it does. For Apple's claim that the M1 is the fastest CPU core to be verified, the hardware acceleration will need to be turned off to provide an equal level playing field with Zen 3.
This doesn't make any sense. Would you also want to turn off MMX or AVX style SIMD instructions on x86?

I think what you are getting at is that a single use case benchmark on any platform is irrelevant to 99% of users. A good example of a single use case benchmark is Cinebench. Case in point, 3800XT ties with a 10700K in Cinebench single core and is beaten by the 3800XT in multi-core by a decent margin (3% or so), yet the 10700K is 10% faster on doing excel recalcs.

Performance is use-case dependent and all modern CPUs/SoC chips have specialized circuitry for certain use cases. A good set of tests will show where they are both strong, and weak.
 
  • Like
Reactions: Tlh97

Carfax83

Diamond Member
Nov 1, 2010
6,547
1,340
136
Intel had years of process advancement with negligible IPC gains.
That was due to lack of competition from AMD. Intel had a huge lead in performance per watt, and AMD only just caught up with the Zen series, particularly Zen 2 which brought parity more or less depending on the workload.

Zen 3 big IPC advancement, is on the same process as Zen 2.
I acknowledged this earlier as correct.

Intel is backporting it's Sunny Cove cores to 14nm for desktop, with reportedly similar boosts in IPC.
Rocket Lake is reportedly a bit slower per clock than Sunny Cove. Sunny Cove had a 18% average performance gain, so Rocket Lake will probably have around 10-12% or so.

I guess they couldn't do a straight 1:1 port of Sunny Cove to the 14nm+++ process.

It's not the process that is providing the IPC boost. It's the designs.
Yes I conceded this point already, but process node shrinks makes it much easier to gain IPC.
 
  • Like
Reactions: Tlh97

Carfax83

Diamond Member
Nov 1, 2010
6,547
1,340
136
I don't think for most applications, that would not be possible to turn off. So the mainstream press will just say the M1 is faster without really knowing how or why. It also not just the machine learning / neural engine. Apple's SOC has dedicated best in industry imaging engine (BTW, I suspect that's why Lightrooom is so fast on an ipad pro, not the neural engine) as well as video encode/decode engines.
Quite a few applications have toggles or settings where you can turn off hardware acceleration. And Cinebench itself does not support hardware acceleration if I'm not mistaken.
 

Roland00Address

Platinum Member
Dec 17, 2008
2,163
243
106
I just feel sorry for Intel with the process and how "Today and in the next 2 years" many of their products will be some version of 14nm.

Here is a general idea about densities.

TSMC 16nm has 28.9mm2
TSMC 10nm has 52.5mm2
TSMC 7nm has 96.5mm2
TSMC 7nm* has 114mm2 (aka N7FF vs N7FF+ put another way 2018 production vs 2019 production)
TSMC 6nm has 114mm2
TSMC 5nm has 173mm2

Intel 14nm has 37.5mm2 (these are 2014 numbers, they haven't released their density numbers for 14nm ++ that 2020 and later chips will use)
Intel 10nm has 100mm2 (this is the 2018 original run which they have retooled for it was so bad, who knows the current density.)

The exact numbers really does not matter for they are not "precisely comparable" but when your 7nm competition is likely to be more dense than your 10nm, and with your desktop chips you are going to be shipping some version of 14nm while your competition with AMD is doing 7nm.

And now with the M1 your competition is doing 5nm. Well it is a bad place for Intel to be.

Now density is not the only thing that matters for chips. In some ways it is just "your cost" and it is not the only thing that matters for performance. But it is not good for your competition to have a 3x to 4.5x density advantage.
 
  • Like
Reactions: Tlh97

Carfax83

Diamond Member
Nov 1, 2010
6,547
1,340
136
This doesn't make any sense. Would you also want to turn off MMX or AVX style SIMD instructions on x86?
That's not the same thing. Vector instructions are native to the CPU and use the CPU's innate resources. The Apple neural engine is a separate hardware bloc on the other hand which can process on its own.

Performance is use-case dependent and all modern CPUs/SoC chips have specialized circuitry for certain use cases. A good set of tests will show where they are both strong, and weak.
I'm not saying reviewers shouldn't include tests or benchmarks which leverage hardware/AI acceleration. That's absolutely fine, because that's what the end consumer will experience.

I'm only saying that for RAW CPU performance testing, all hardware/AI acceleration should be disabled, and I'm sure the reviewers will do this; especially Andrei.

If you don't disable it, it will be akin to doing a CPU test where you do a 4K60 FPS video playback with hardware acceleration enabled on the GPU. In that case, you're not really testing the CPU's capabilities.
 

Roland00Address

Platinum Member
Dec 17, 2008
2,163
243
106
That's not the same thing. Vector instructions are native to the CPU and use the CPU's innate resources. The Apple neural engine is a separate hardware bloc on the other hand which can process on its own.
This is a "legalistic" answer but it misses the point. Ever since Intel went turbo with its 1st gen core, it doesn't matter which part of a chip / soc is doing the work for the final product is a matter of power draw / heat created (such as tdp) vs the worked performed.

If the chip did it via its own internal gpu in the soc, or its own cpu, or its own cpu vector instructions, or other dedicated silicon it does not matter. Does anyone complain that h264 decoding is now done with dedicated units instead of generalized gpu cores or cpu cores?

------

Only caring about raw cpu performance and not the real world experience is creating a simulacra participation trophy a "toy" instead of a tool to be used.
 
  • Like
Reactions: shady28

amrnuke

Golden Member
Apr 24, 2019
1,175
1,767
106
Intel had years of process advancement with negligible IPC gains.

Zen 3 big IPC advancement, is on the same process as Zen 2.

Intel is backporting it's Sunny Cove cores to 14nm for desktop, with reportedly similar boosts in IPC.

It's not the process that is providing the IPC boost. It's the designs.

Idealy everything is firing together. IPC improves from design, and there is some perf/watt gains from process.
Having a great process helps a lot with uarch and chip design. Process shrinks give you more transistors on the same die area which gives you more room for logic, cache, etc. (And efficiency +/- clock improvements as you said.)

Though, we can't leverage process shrinks forever, so while we're still relying on electrons, effort naturally must turn to other things at some point - to IPC, yes, but also to ISA improvements/extensions, packaging, accelerators, etc.

For companies designing mobile, where power usage and package size is so important, I think that's why they have necessarily had great success leveraging greater density, because it allows them the headroom to work on the logic and such necessary to drive IPC up, add accelerators to do specialized work more efficiently, and focus on packaging to again improve efficiency and speed.

If you don't disable it, it will be akin to doing a CPU test where you do a 4K60 FPS video playback with hardware acceleration enabled on the GPU. In that case, you're not really testing the CPU's capabilities.
I guess the question I have there is: if the accelerator comes on the CPU package and isn't a separate item I have to install, why would I not benchmark it when doing general testing? For example, I can't buy an A14 without the NE cores and AMX blocks. For the pure core IPC/PPC evaluation, not using them makes sense, but only from a scientific method standpoint. But to disable certain features of a chip to "even the playing field" seems wrong if the features are native to the chip. It's like excluding AVX512-capable tests from your benchmark of the test. I mean, if the chip can do it, the end users are going to use it, and it makes no sense to disable it.
 

name99

Senior member
Sep 11, 2010
402
302
136
Quotes from your last three posts:

1:


Patronizing and insulting instead of talking about the technology.

2:

More patronizing.

3:

Again, patronizing, no discussion. Just attacking people and insulting them instead of discussing and teaching.




It seems like you're the one not all that interested in talking about the technology. Let's get this straight first: you're the one who called Apple "smarter" than AMD and have failed to back-up and discuss why. That's not a factual statement, it's a "political" one (though I think you fail to understand the definition of political...). In response to me, you threw out some statement about better branch predictors, etc., and then went off and cited features of the Arm ISA that have nothing to do with whether Apple are smart or not. You can't just say "because their branch predictors are better" and just have us all believe it. Can you explain why? Give references to why Apple's branch prediction on their uarch and Arm's ISA is better than AMD's neural branch prediction or TAGE branch prediction in their uarch on x86?

Again, I'm not the one who brought up these derailing statements about one side being better than the other, you are. I was just responding to it. I've talked plenty about the tech and how it might translate. So we can get back to that. I'm open to listening to what knowledge you have, but when you continue to just attack people (from me to other posters to other sites to entire groups of users to companies/computer engineers) instead of actually discussing the tech, you're not doing anyone, or the discussion, any service.

What statements have I made that are flat-out crazy regarding the technology? I've talked about A12's F/V curve and what that might mean for A14/M1 and I've talked plenty about the marketplace. But if there's something about the technology that I've been flat-out crazy on, please respond to that instead of ad hominem attacks.
What technical posts? Let's see

link
link

link
link
link

That's just in the past week, just on AnandTech.

But it seems to do no good. Don't you think it's understandable I get angry when I try to explain in detail exactly what's going on with these cores only to have responses at the intellectual level that we see in this thread?
At some point, As Paul Krugman says in _Arguing With Zombies_, the time arrives where you have to give up the hypothesis that your opponent is debating in good faith and assume there's a deliberate campaign in place to flood the zone with every lie imaginable.
 

trivik12

Member
Jan 26, 2006
111
67
101
I suspect the rumored 8+4 core chip will slide in those (and maybe be offered as an optional upgrade for the 13" MBP) and it will also have more GPU cores.

I would also not be surprised if that 8+4 chip was able to operate as a chiplet so 2 or 4 of them could be used to create a 16 or 32 core MCM for the high end stuff. But they said the transition would take two years (like I had previously guessed) so they have plenty of time to do a monolithic design for those should they choose to do so.

I wonder how a Mac Pro with 32 cores taken from a 3nm A16 would do against contemporary Intel and AMD based workstations?
I thought it came up on the day of announcement and subsequently confirmed as well.


Confirmed through this interview

 

Eug

Lifer
Mar 11, 2000
23,382
803
126
I thought it came up on the day of announcement and subsequently confirmed as well.


Confirmed through this interview

I already posted this in this thread. The guy didn't actually confirm 10-15 Watts. He mentioned those numbers after being prompted by the interviewer, but he was quite vague about it.

Basically, the way I interpreted it was those numbers were probably in about the right ballpark, but he wasn't going to be any more specific about it.
 

senttoschool

Golden Member
Jan 30, 2010
1,590
278
126
Because it's already less than $100. You can't shave $150 off.

They also make near NOTHING on Mac services. Macs are just a fraction of iPad/iPhone sales and always will be.

Plus they have no lock in on Macs.
First, a reliable source on $100 BOM for M1 SoC? Second, if you read carefully, I said I expect other components to get cheaper as well to add up to $150. 256GB of SSD, 8GB of RAM, LCD screens, etc. get cheaper over time.

Lastly, Macs do have services. I can use all Apple One services on my Mac including native Mac apps for iCloud backup, Apple News+, TV+, Music. And now that M1 is here, I can play iOS and iPadOS games on my Mac with their Arcade service.

The whole point of Apple transitioning to a service-oriented company is that they're not making a service just for Mac or just for iOS. They're making the same services across their devices: Mac, iPhone, iPad, Watch.

And the more Apple devices you use, the more likely you are to sign up for Apple Services. Fact.

Screenshot of native News, TV, Music, and iCloud apps running on my Mac now.

1605496304811.png
 
Last edited:
  • Like
Reactions: name99

senttoschool

Golden Member
Jan 30, 2010
1,590
278
126
Is this a joke? Everything on MBA is vastly more expensive than the $330 iPad. It's not a remotely valid comparison.

Here is a more realistic comparison, the 13" iPad Pro and MBA:

Model Price Screen Size, Storage, RAM
MBA $999 13" 256GB 8GB
iPro $999 13" 128GB 6GB

For the same price you get an iPad with less RAM, less Flash Storage, no keyboard, smaller battery.

You already get more for the money with MBA, so margins are already significantly thinner on the MBA, than on a 13" iPad, plus there is no Service lock in on the Mac, to make up for any lack of margins.
The three most expensive aspects of a modern mobile device is the screen, the cameras, and how small it is (assuming it costs Apple the same to make the SoCs between devices). The iPad Pro would be more expensive than the MBA for all three of them.

iPad Pro screen's screen advantage vs MBA:
  • Higher nits
  • Has Ultra-low reflectivity
  • Has 120Mhz display
  • Has Ultra-low response touchscreen
And it has a significantly better camera system. And it's smaller, which means far more expensive to manufacture parts and putting everything together.

The iPad Pro is at a different class in terms of hardware.
 
Last edited:
  • Like
Reactions: Tlh97 and name99

shady28

Platinum Member
Apr 11, 2004
2,520
397
126
That's not the same thing. Vector instructions are native to the CPU and use the CPU's innate resources. The Apple neural engine is a separate hardware bloc on the other hand which can process on its own.

I'm not saying reviewers shouldn't include tests or benchmarks which leverage hardware/AI acceleration. That's absolutely fine, because that's what the end consumer will experience.

I'm only saying that for RAW CPU performance testing, all hardware/AI acceleration should be disabled, and I'm sure the reviewers will do this; especially Andrei.

If you don't disable it, it will be akin to doing a CPU test where you do a 4K60 FPS video playback with hardware acceleration enabled on the GPU. In that case, you're not really testing the CPU's capabilities.
Yeah no. I know what they are doing and it makes absolutely no sense whatsoever. People have all kinds of bizarre misconceptions about what platform to use for encoding for example, because these review sites do those stupid things. It's basically disinformation.

For example, in a typical 4K encode depending on the exact media a 10600K can beat a 3900X by a solid 40% in encode time. That would be because of quicksync. But that also is irrelevant for someone doing encoding, because a 1660 Ti is more than 3x faster than a 3900X and over twice as fast as the Quicksync encodes.

And ya know what? It makes almost no difference what CPU you use for that task. But here we are, encoding is one of the biggest categories of benchmarks these "enthusiast" sites use....

pic_disp.php.jpg

Reference :

 

senttoschool

Golden Member
Jan 30, 2010
1,590
278
126
I'm only saying that for RAW CPU performance testing, all hardware/AI acceleration should be disabled, and I'm sure the reviewers will do this; especially Andrei.
I hope Andrei does do that but only for a small part. We want to know if the CPU core is the fastest out of curiosity but at the end of the day, it doesn't matter because the M1 is a SoC with more accelerators such as AI, storage, security, encode/decode, and has a GPU. It looks like the CPU cores take up 30% of the M1 die at most. It would be unfair to just compare raw CPU performance and say the 5950x is faster when it's likely that the M1 will be faster for most applications people use.
 
  • Like
Reactions: shady28

guidryp

Platinum Member
Apr 3, 2006
2,682
3,560
136
The three most expensive aspects of a modern mobile device is the screen, the cameras, and how small it is (assuming it costs Apple the same to make the SoCs between devices). The iPad Pro would be more expensive than the MBA for all three of them.

iPad Pro screen's screen advantage vs MBA:
  • Higher nits
  • Has Ultra-low reflectivity
  • Has 120Mhz display
  • Has Ultra-low response touchscreen
And it has a significantly better camera system. And it's smaller, which means far more expensive to manufacture parts and putting everything together.

The iPad Pro is at a different class in terms of hardware.
It has a slightly more expensive screen and cameras, and that's it.

It's not more expensive to make it "smaller". What specifically do you think costs more because it's smaller on the iPad Pro vs MBA?

The screen and camera price increases are more than offset in cost by Extra RAM, Extra Flash Storage, keyboard, extra battery, Trackpad, and aluminum Clamshell case of the the MBA. It just has more of everything and more everything means higher BOM.
 

amrnuke

Golden Member
Apr 24, 2019
1,175
1,767
106
What technical posts? Let's see

link
link

link
link
link

That's just in the past week, just on AnandTech.

But it seems to do no good. Don't you think it's understandable I get angry when I try to explain in detail exactly what's going on with these cores only to have responses at the intellectual level that we see in this thread?
At some point, As Paul Krugman says in _Arguing With Zombies_, the time arrives where you have to give up the hypothesis that your opponent is debating in good faith and assume there's a deliberate campaign in place to flood the zone with every lie imaginable.
Thank you for bringing things back around to technical discussion.

One area is most interesting to me, brought up in two separate posts of yours but seem related:

I don't think there's any question that Apple are using way prediction. For any of the major players to not be using way-predicted set-associative cache would be absurd, no? AMD and Intel have been using microtag way prediction since W's presidency. In any case, how does Apple's implementation differ, if at all, and do you have any explanation for why you pinpoint that exact comment about way prediction, speculative scheduling, and replay?

Some things I was thinking about:

As far as I was aware, if flushing were the "problem" and by "flushing" you mean non-selective replay (which you may not - in which case, please explain!) then way predictors haven't been "problematic" since at least Pentium 4, which doesn't use non-selective replay; however, Seznac and Michaud indicate that non-selective replay may be viable as long as an ROB or buffer (a la the replay queue on P4) is available and efficient. As for needing a quality replay mechanism, isn't it better to prevent the need for a replay in the first place by designing better speculative scheduling? Seeing as AMD switched from a neural network branch predictor to TAGE, I doubt they missed those papers... and since Apple were using TAGE since 2013 at least, it would make sense that Apple is also aware of the benefit of preventing replay in the first place with better speculative scheduling.

That being said, given the width of Apple's core and the large ROB do you imagine they're using token-based selective replay (or something similar, or, since 2013 they called their core Cyclone, maybe they're using the cyclone replay scheme!) rather than other mechanisms? Wouldn't be the first time they've used WARF research! This may explain the large ROB. Though, a large ROB would solve a lot of replay issues regardless of replay scheme, including those associated with flushing.

Edit: Andrei says the ROB size "outclasses" any other design and questions how Apple can "achieve" this design, but it's not that simple, is it? A large ROB could be a Band-Aid for a larger speculation problem or could be relieving a bottleneck, or could reflect a different speculative scheduling/replay mechanism... or, my suspicion is that it's larger by virtue of Apple having such a wide core. Indeed, the next largest ROBs are on Intel chips, and the smallest on Zen3. I don't know enough about X1 to comment on why it'd be the smallest, but if this is indeed a bottleneck and X1 uses a similar branch prediction / replay scheme, then it wouldn't make much sense to have such a small ROB.
 
Last edited:
  • Like
Reactions: Tlh97

wlee15

Senior member
Jan 7, 2009
313
31
91
We have our first M1 Cinebench R23 courtesy of Bits and Chips.

990
Yeah no. I know what they are doing and it makes absolutely no sense whatsoever. People have all kinds of bizarre misconceptions about what platform to use for encoding for example, because these review sites do those stupid things. It's basically disinformation.

For example, in a typical 4K encode depending on the exact media a 10600K can beat a 3900X by a solid 40% in encode time. That would be because of quicksync. But that also is irrelevant for someone doing encoding, because a 1660 Ti is more than 3x faster than a 3900X and over twice as fast as the Quicksync encodes.

And ya know what? It makes almost no difference what CPU you use for that task. But here we are, encoding is one of the biggest categories of benchmarks these "enthusiast" sites use....

View attachment 33925

Reference :

CPU encoders usually produce better quality output than hardware encoders. I occasionally do a bit of encoding an I use x264.
 
  • Like
Reactions: Carfax83

senttoschool

Golden Member
Jan 30, 2010
1,590
278
126
It has a slightly more expensive screen and cameras, and that's it.

The screen and camera price increases are more than offset in cost by Extra RAM, Extra Flash Storage, keyboard, extra battery, Trackpad, and aluminum Clamshell case of the the MBA. It just has more of everything and more everything means higher BOM.
Source?
 

senttoschool

Golden Member
Jan 30, 2010
1,590
278
126
We have our first M1 Cinebench R23 courtesy of Bits and Chips.

990
Referring to this tweet? Comments from Twitter have called out that those scores are from A12Z dev kit running non-M1 native Cinebench.

Bits and Chips is either intentionally publishing fake numbers or they are just stupid.

So no. It'll be significantly higher than 990.


Another source (Chinese) showing 987 score for A12Z: https://www.ithome.com/0/519/193.htm
 
Last edited:

ASK THE COMMUNITY