Question x86 and ARM architectures comparison thread.

Page 20 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

mikegg

Platinum Member
Jan 30, 2010
2,091
633
136
Meds

A solid, solid chunk is CPU once again!
Which you know and this is just low effort bait.
2/10 you can do better.
Haha. "solid chunk". Define solid chunk. 5%?

Anyway, AMD CPUs are being pushed out of the AI stack. Example: Google ditched x86 in their latest TPU pods.

Ironwood pods — based on Axion CPUs and Ironwood TPUs — can be joined into clusters running hundreds of thousands of TPUs, which form part of Google's adequately dubbed AI Hypercomputer.
 

Nothingness

Diamond Member
Jul 3, 2013
3,364
2,457
136

johnsonwax

Senior member
Jun 27, 2024
467
674
96
Yeah, I think it's more useful to talk about the intended market and target wattage rather than what products actually came out. Jaguar was put in an Opteron SKU, but it's clearly not designed for high performance. AMD had server aspirations for the Bulldozer family, but it was so uncompetitive that they cancelled most of them.

In terms of current archs, I would say Zen and Intel's P-cores are server-first. Whereas all the major ARM cores are mobile-first (with the possible exception of Neoverse). The x86 cores are designed to operate with a higher average power draw.
I wouldn't consider Apple's M series to be mobile first, and they're 50% higher volume than the total server market per the chart above. And I would think we would have by now internalized the lesson of the failed market analysis for the A7 going 64 bit and why that was considered unnecessary for mobile.

Additionally, the server/desktop differentiation is insufficient as desktop has an additional differentiation between consumer and enterprise, and the share of enterprise desktop machines bought based on performance approaches zero which means that a trimmed down server core in a reliability-forward package with less margin needed to cover a separate core design is likely quite appealing to enterprise desktop. This is another factor in why Apple Silicon looks the way it does - Apple's enterprise market is negligible, so M series is being designed primarily to a consumer desktop market where the user, not the CIO, is making the purchasing decision and where performance is a larger consideration (as it is for the PC consumer, especially gaming market). So the market for what people here want to consider a desktop core is actually competing against both the server market and the enterprise desktop market. What the PC desktop consumer probably really wants is to build off of is the console architectures for consumer desktop - fewer cores, unified memory so there's no memory copy to the GPU, etc. and Microsoft might allow building off of the Xbox architecture, but they have had that choice now for some time with their own PCs and they went with ARM instead. So, I mean, the test case exists and we can see the choice that was made.
 

johnsonwax

Senior member
Jun 27, 2024
467
674
96
I might have missed something but I don't see Google original announcement claiming Ironwood is tied to Axion.
If it's not currently hanging off is Axion it will be soon. Google is pretty clear their goal is 100% in-house silicon and they can't make x86 silicon.
 
  • Like
Reactions: Nothingness

poke01

Diamond Member
Mar 8, 2022
4,753
6,092
106
Here’s some Cinebench R23 figures for Strix Halo (EVO X2) and M4 Max. Used R23 as it doesn’t scale as bandwidth gets better.



IMG_2925.jpeg


IMG_2927.jpeg

IMG_2929.jpeg

IMO, this is actually pretty good showing for Apples and AMD cores.

At ~60 watts the M4 Max 12P+4E scores a bit better than Strix Halo 16 cores does at 54 watts. Yes the M4 Max is a node ahead but it also lacks SMT.

You can see Apples core design here matching AMD which is no small thing as SMT provides a good 30-40% boost in R23.
This also shows that going wide isn’t the only way to design cores and AMDs design is clearly more suited for nT workloads.
 
  • Like
Reactions: Tlh97 and Geddagod

gdansk

Diamond Member
Feb 8, 2011
4,722
7,996
136
It's not really a solution in the server market.
But for consumers it's many times better to have 12 fast cores than 32 threads.

M4 mops the floor with every other core on the market and yet in that MT throughput memebench it is merely competitive with a 4nm part. Probably not a good fit for the server market despite its stalwart defenders.
 

Geddagod

Golden Member
Dec 28, 2021
1,650
1,683
136
Lol Notebookcheck has the M5 as being 16% faster than the 9950x in GB5 ST.
The M4 roughly ties it.
The SD8 elite gen 5 coming within 5%.
The 9500 within 10%.
 
  • Like
Reactions: Joe NYC

Geddagod

Golden Member
Dec 28, 2021
1,650
1,683
136
It's not really a solution in the server market.
But for consumers it's many times better to have 12 fast cores than 32 threads.

M4 mops the floor with every other core on the market and yet in that MT throughput memebench it is merely competitive with a 4nm part. Probably not a good fit for the server market despite its stalwart defenders.
I don't think there's anything intrinsically wrong with ARM that makes it impossible to use SMT.
I think at worst you lose a bit of ST, and compared to AMD maybe you won't get as much SMT gains per core. I'm waiting for Vera to come out though, to see if there's anything intrinsically bad about using SMT on a very wide, low pipeline stage core.
Though Intel never really got that much perf gain from SMT on their cores vs AMD either. But at least they had a legit trade off there as they also consumed less power from using SMT...
 

poke01

Diamond Member
Mar 8, 2022
4,753
6,092
106
Lol Notebookcheck has the M5 as being 16% faster than the 9950x in GB5 ST.
The M4 roughly ties it.
The SD8 elite gen 5 coming within 5%.
The 9500 within 10%.
IMG_2930.jpeg

It’s suprising to me that the 8 Elite Gen5 with the same clock speed and node as M5 is 18% slower in GB5

That’s like a full generation behind….
 

poke01

Diamond Member
Mar 8, 2022
4,753
6,092
106
I don't think there's anything intrinsically wrong with ARM that makes it impossible to use SMT.
I think at worst you lose a bit of ST, and compared to AMD maybe you won't get as much SMT gains per core. I'm waiting for Vera to come out though, to see if there's anything intrinsically bad about using SMT on a very wide, low pipeline stage core.
Though Intel never really got that much perf gain from SMT on their cores vs AMD either. But at least they had a legit trade off there as they also consumed less power from using SMT...
Doesn’t SMT require more validation? With Apple’s/Qualcomms strict annual cadence it would be hard..
 

poke01

Diamond Member
Mar 8, 2022
4,753
6,092
106
. Probably not a good fit for the server market despite its stalwart defenders.
It’s not a good fit at all. The M4 P core needs to halved, needs SVE2 etc.

Memory structure needs to be reworked.
 

Geddagod

Golden Member
Dec 28, 2021
1,650
1,683
136
View attachment 133579

It’s suprising to me that the 8 Elite Gen5 with the same clock speed and node as M5 is 18% slower in GB5

That’s like a full generation behind….
Fosholy. Though the area gap between Apple and Qcomm cores is still very large. Since they have the same type of cache hierarchy too, comparing the two cores area shouldn't have the caveats some other cores do.
Doesn’t SMT require more validation? With Apple’s/Qualcomms strict annual cadence it would be hard..
Good point.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,386
16,227
136
well, I can't edit that thread, but 130,883 for 128 cores. How does that rate in your chart ? for 370 or less watts (using kill-a-watt, so thats total system with a 4080
 
  • Wow
Reactions: poke01

poke01

Diamond Member
Mar 8, 2022
4,753
6,092
106
well, I can't edit that thread, but 130,883 for 128 cores. How does that rate in your chart ? for 370 or less watts (using kill-a-watt, so thats total system with a 4080
That Notebookcheck chart only has desktop and laptop CPUs for CB23. But if it was included it would be at the top for nT efficiency lol
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,386
16,227
136
That Notebookcheck chart only has desktop and laptop CPUs for CB23. But if it was included it would be at the top for nT efficiency lol
So the cores for server are more efficient than any Apple or notebook CPU That should be noted.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,386
16,227
136
Yeah 128c/256 threads would win in nT efficiency especially compared to cpus with 12-16 cores
Also, these cores are avx-512 capable. and this was done with SMT on.

AMD server is VERY efficient, and capable.

Oh, and since if does not show, this is a 9755 QS chip, essentially a retail core.. and it was running 3.5-4.x GHZ speed during this test, I could not catch it, as it was changing so fast.
 
  • Like
Reactions: poke01

Geddagod

Golden Member
Dec 28, 2021
1,650
1,683
136
Also, these cores are avx-512 capable.
I mean, Apple and Qcomm and ARM don't really bother.
and this was done with SMT on.
I kid you not the M4 is more power efficient than Zen 5 on N4P with SMT.
1762833862575.png
Maybe with N3E you can get that number to be marginally better in terms of perf/watt, but again, this is Zen 5 with SMT against an Apple core that doesn't bother implementing it.
The reason I say that Zen 5 on N3E may be marginally ahead is because rn the M4 has a ~10% perf/watt lead, and giving Zen 5 maybe a 20% ppw bump (which I doubt the gap is that large between N4P and N3E)
Unfortunately I don't think anyone has Turin Dense samples that have tested perf/watt either.
 

Doug S

Diamond Member
Feb 8, 2020
3,794
6,725
136
Doesn’t SMT require more validation? With Apple’s/Qualcomms strict annual cadence it would be hard..

It isn't like the day they pronounced M5 "done" and ready for manufacturing that they said "OK well we'll start on M6 tomorrow". When they finished M5 they were already working on M6 and M7 and maybe even were putting together some early plans for M8. Adding a little extra validation time towards the end just means they'd have to start a little earlier, but wouldn't impact their ability to make yearly releases. I'm sure AMD is working on both Zen 6 and Zen 7 right now.

If anything Apple's strict release timing (at least for iPhone SoCs) makes things harder for them than it does for AMD, who can put "H2 2026" on a roadmap and they have a six month window to release it, versus a one week window for Apple. And they have the ability to push it back to H1 2027 if they want, and since it is only on a leaked roadmap but not a public commitment it isn't really even considered "late" especially if it still hits in Q1 2027. And if the stars align and they end up having it ready early then they can benefit from it by shipping a quarter early. Apple surely has their iPhone SoCs ready months in advance some years, because they have to build slop into the schedule for the times they need a revision or two. But they can't say "OK well this year the new iPhone launches in July" because they aren't selling CPUs like AMD but a fully finished product with hundreds of components.

It appears Apple has split the core designs up from the release cycles of the products they go into, most likely to relieve some of the pressure the iPhone schedule imposed on them in the past. So they'd already have passed some sort of drop dead date when they had to decide which P core, which E core etc. will go into A20 so if there was a big update to the P core that wasn't ready by the deadline would just have to wait until A21, but might make M6 since the release schedule of the Mac isn't fixed to a single week like the iPhone.
 
  • Like
Reactions: Joe NYC

johnsonwax

Senior member
Jun 27, 2024
467
674
96
Doesn’t SMT require more validation? With Apple’s/Qualcomms strict annual cadence it would be hard..
SMT has serious security challenges, which given Apple's recent willingness to take a performance hit for MIE, is likely something they are inclined to avoid.

I seriously don't understand why so many people have such a boner for shifting resources from single to multiple threads on desktop when developers are to this day still pretty bad at threading their code and with a scheduler setup like apple has and e cores to soak up the low priority stuff, yeah, you'll get some benefit in a very narrow set of cases, but that set of cases is increasingly small as that kind of stuff gets shoved to GPU and TPU which are MUCH better at highly parallelized stuff. Again, Apple's not selling servers where that is much more valuable.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,663
3,350
136
Given how OSes and computing devices generally work today, once you have the ability to simultaneously push 8 or so threads with reasonable throughput and a couple with maximum performance in mind, the average user isn't really going to notice the difference between any of the higher end models 99% of the time. If you can accomplish that with 2 P cores and 8 E cores, or 4 and 4, or 6 with SMT, it doesn't matter, as long as the OS knows how to handle it.

SMT in higher core count processors is for very particular circumstances and benchmark measuring contests.