• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Question Zen 6 Speculation Thread

Page 395 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Also Apple has been extending the ISA and changed the uarch several times (bpred changes, number of units, etc.) so they are facing similar issues as Intel and AMD.
That is true but instruction wise, M1 is not far from M5 (I am not sure how AMX/SME is handled but since it's mostly handled via Apple's libs, I guess they take care of that). While current baseline for x64 is SSE 4.1 (x86_v2?). I mean even AVX2 is not yet the universal baseline target. It's of course not Apple's fault, but still I would say the relatively young age of Apple Silicon plays to their advantage😉

Or, if this could nudge compiler devs and software devs to up their game, to use more thorough performance tuning, runtime SIMD detection and selection and other performance boosting tricks by default.
Compiler devs already try as much as they can. Track clang-cl performance evolution on Windows from version 15 to 21. For more giggles, do that on high core count systems too😉 Problem with more thorough performance tuning is that's hard to do for environment like PC. You can do PGO for consoles, as every console is the same, but PGO benefits for PCs are of limited use.
 
That is true but instruction wise, M1 is not far from M5 (I am not sure how AMX/SME is handled but since it's mostly handled via Apple's libs, I guess they take care of that). While current baseline for x64 is SSE 4.1 (x86_v2?). I mean even AVX2 is not yet the universal baseline target. It's of course not Apple's fault, but still I would say the relatively young age of Apple Silicon plays to their advantage😉
From the SIMD point of view I agree (well until they add SVE, if they ever do). But some other interesting features are probably unused; look for M5 FEAT_CSSC as an example; I agree it's niche but the problems are creeping in.
 
I think the presumed advantage for Apple CPUs is that this kind of optimization is already baked in at compilation time, anything intended for MacOS is targeting a single vendor and a relatively small family of cores.

Apple only has four older generations of Apple Silicon to worry about supporting at the moment, so it is less of a problem but only because they only recently switched to it. They will have similar issues down the road as the number of generations of Apple Silicon developers want to support continues to grow - especially if Apple makes some big changes in their cores so optimal scheduling changes, or new instructions improve things. For Apple they might not even want to "adjust" the binaries to improve performance, but rather to improve security.

Apple could leverage the work they've already done with Rosetta 2 to do something like this. It would be a lot easier to translate ARM64 to ARM64 than to translate x86 to ARM64! On the first run it could recompile the binary optimally for the hardware it is being run on. Since almost everyone is compiling iOS/macOS code on XCode, they could have XCode include some additional information to help that process.

In fact I dimly recall Apple had something sorta like a while back - I think it was for iOS before they switched Macs to ARM. I don't recall the details (maybe @name99 does?) but something to do with XCode emitting something a bit higher level than standard object code, which could then complete the final compilation/optimization step when it is installed. Can't remember it well enough to find via search, and I'm not 100% sure if it is still being used...
 
Exactly. Apple doesn't need the tool because software is ONLY compiled and optimized for Apple architectures. AMD currently is the one who doesn't need the tool because they are competing on the same instruction set as Intel.

Apple has great architecture, a tightly controlled platform, curated software, limited hardware variation, and they don't need compatibilies with ISA's that have been changing over the last 50 years.

If Apple were to compete in the x86 market today with a CPU it would be competitive (at best) with AMD and Intel. Apple engineers aren't any better or worse than the AMD/Intel engineers, they are working with the inherent advantages mentioned above.

This is not a knock on Apple, AMD, or Intel. It's simply the reality of the situation.
SPEC is distributed in source code form, meaning you're free to compile it with whatever flags necessary. Apple's cores still beat the competition, whether the test suite is compiled with clang or gcc. Huang's SPEC results show exactly that, and he's typically picks whatever -march setting that's best for the given CPU.
 
They will have similar issues down the road as the number of generations of Apple Silicon developers want to support continues to grow - especially if Apple makes some big changes in their cores so optimal scheduling changes, or new instructions improve things
Apple's way is to direct their customers to throw away and buy new at Apple's leisure. And ISVs have but to work in this framework.
 
Exactly. Apple doesn't need the tool because software is ONLY compiled and optimized for Apple architectures. AMD currently is the one who doesn't need the tool because they are competing on the same instruction set as Intel.

Apple has great architecture, a tightly controlled platform, curated software, limited hardware variation, and they don't need compatibilies with ISA's that have been changing over the last 50 years.

If Apple were to compete in the x86 market today with a CPU it would be competitive (at best) with AMD and Intel. Apple engineers aren't any better or worse than the AMD/Intel engineers, they are working with the inherent advantages mentioned above.

This is not a knock on Apple, AMD, or Intel. It's simply the reality of the situation.
I believe this to be true as well.

I do think that Apple engineers are likely better at low power phone type processors simply because they have been making their bread and butter here for some time.

AMD and Intel on the other hand are better at DC and HEDT.

The interesting intersection point is laptop IMO.

Now, before everyone jumps up with their hair on fire and points out how great the apple laptop scores are, I think its only fair to take into account the PRICE of each solution when comparing them.

Apple (as Hulk so well laid out) also enjoys a completely vertical integration clear from silicon to OS and even application layers. That is a business advantage, not a technical advantage IMO. It's also why apple EVERYTHING is more expensive than PC ANYTHING.

I find the entire situation ironic considering that apple nearly went under with the SAME exact model with the iMac (ie total vertical integration). Mac's were more expensive, but performed better than PC's (especially for graphics IIRC), yet price won and application selection also took a hit. There were LOTS of programs you couldn't run on a Mac, but that were abundant for the PC.
 
I think the presumed advantage for Apple CPUs is that this kind of optimization is already baked in at compilation time, anything intended for MacOS is targeting a single vendor and a relatively small family of cores
This isn’t true. You still need to optimise. Heck x265 or AV1 cpu encoding wasn’t even good on Apple’s CPU until last year because a lack of support for ARM specific extensions and it still could be improved.

What Geekbench shows is performance you will achieve when the CPUs available extensions are fully utilised.


M5 FEAT_CSSC as an example; I agree it's niche but the problems are creeping in.
support for that requires compiler support and recompiling and LLVM on Linux already support it.

Should be good for INT workflows
Apple's way is to direct their customers to throw away and buy new at Apple's leisure. And ISVs have but to work in this framework.
if that was the case they would’ve dropped Intel support back in 2022.

Apple makes some big changes in their cores so optimal scheduling changes
Already happened with M5 Pro/Max. Should be new standard of scheduling on high end Apple computers for a couple of years.
Now, before everyone jumps up with their hair on fire and points out how great the apple laptop scores are, I think it’s only fair to take into account the PRICE of each solution when comparing them.

Apple (as Hulk so well laid out) also enjoys a completely vertical integration clear from silicon to OS and even application layers. That is a business advantage, not a technical advantage IMO. It's also why apple EVERYTHING is more expensive than PC ANYTHING.
? What does the OS have anything to do with making good cpu cores? AMD has Linux and the kernel is open source, your distinction does not make sense.

Regrading price, not true. Have you seen the price of recent premium Panther Lake laptops?

more expensive than M5/M5 Pro MacBook Pros.
 

Soooo. Is it now Intel that is saying muli-threading is important?

In all fairness, it looks like Intel is winning in everything except gaming when it comes to desktop and laptop correct?

Now, I am still of the opinion that when it comes to making money, DC is where it counts and AMD is so far out ahead here that it is hard to imagine Intel catching up.

Also, as I have said in the past, once you start getting really high core count, you are into a pro grade HEDT segment. I believe that the people in this segment will be using applications that need more memory bandwidth than a normal desktop processor gets and therefore will be shelling out the money for Thread ripper ..... which as far as I am aware is unopposed by anything Intel has to offer.

Still, for the apps that Tom's used, the ST of the ARL refresh has a pretty good lead on AMD and the MT isn't close on those apps. Am I missing anything?
 
good cpu cores don't care about “vertical integration” or which OS they are running on.

case in point, Arrow Lake works on x86 macOS and there is ZERO optimisation for this CPU and yet it performs just as well as it does in Windows in apps like Cinebench etc. The 265K in Logic Pro beats every Apple mobile CPU apart from the M5 Pro/Max.

 
good cpu cores don't care about “vertical integration” or which OS they are running on.

case in point, Arrow Lake works on x86 macOS and there is ZERO optimisation for this CPU and yet it performs just as well as it does in Windows in apps like Cinebench etc. The 265K in Logic Pro beats every Apple mobile CPU apart from the M5 Pro/Max.

The cores don't but the 50 years of legacy instructions do. The point that ARL performs as good on the macOS is a great example of what I'm saying because ARW was DESIGNED to work in well in under different circumstances/platforms, unlike the Apple CPUs. It was not optimized for a certain OS or set of instruction of software, it has to be a "generalist" by nature.

But I do see Apple CPU's not being so fast in Windows, right? I mean really this kind of perfectly illustrates my point.
 
ARW was DESIGNED to work in well in under different circumstances/platforms, unlike the Apple CPUs. It was not optimized for a certain OS or set of instruction of software, it has to be a "generalist" by nature.

But I do see Apple CPU's not being so fast in Windows, right? I mean really this kind of perfectly illustrates my point.
No CPU engineer optimises for a specific OS or specific software. You either have a good cpu core or not.

As to why Windows doesn’t work Apple systems have a different boot model and don’t use UEFI. Linux doesn’t need native UEFI to boot, that’s why Apple CPUs also work on bare Linux installs and are just as fast as they are in macOS and don’t need any special optimisation.

If you really want to how Apple CPUs perform in Windows, here’s test done in a VM against Lunar Lake.
IMG_3367.jpeg
 
No CPU engineer optimises for a specific OS or specific software. You either have a good cpu core or not.

As to why Windows doesn’t work Apple systems have a different boot model and don’t use UEFI. Linux doesn’t need native UEFI to boot, that’s why Apple CPUs also work on bare Linux installs and are just as fast as they are in macOS and don’t need any special optimisation.

If you really want to how Apple CPUs perform in Windows, here’s test done in a VM against Lunar Lake.
View attachment 140626
Tough to make sense of that without knowing frequency. Were the ARMs running x86 code or native. If native then that isn't really Windows.
 
good cpu cores don't care about “vertical integration” or which OS they are running on.
Oh come on. If you have control over the hardware and operating system, the compiler, AND how apps are written, you can obviously optimize in ways that non-vertical models can't. Lets not get silly here.
The cores don't but the 50 years of legacy instructions do. The point that ARL performs as good on the macOS is a great example of what I'm saying because ARW was DESIGNED to work in well in under different circumstances/platforms, unlike the Apple CPUs. It was not optimized for a certain OS or set of instruction of software, it has to be a "generalist" by nature.

But I do see Apple CPU's not being so fast in Windows, right? I mean really this kind of perfectly illustrates my point.
You beat me to it 😉.
 
No CPU engineer optimises for a specific OS or specific software. You either have a good cpu core or not.

As to why Windows doesn’t work Apple systems have a different boot model and don’t use UEFI. Linux doesn’t need native UEFI to boot, that’s why Apple CPUs also work on bare Linux installs and are just as fast as they are in macOS and don’t need any special optimisation.

If you really want to how Apple CPUs perform in Windows, here’s test done in a VM against Lunar Lake.
View attachment 140626
Linex was built to be architecture agnostic. ARM doesn't carry 50 years of baggage like x86 so there is less translation, among other things.

Anyway, I'm still not convinced my original point is not correct so let me get back to it to make sure we're not talking across one another as often happens in these discussions.

Apple processors do better in terms of performance and efficiency compared to x86 for the following reasons:

No legacy baggage from 50 years of "add ons" to the ISA that much be accounted for.
Tight hardware-software integration that can only be achieved when you have control over the entire ecosystem.
Very wide modern design (yes I give you that!)
Unified memory architecture more so than x86

Furthermore I believe that if Apple decided to make an x86 CPU it would simply be competitive with AMD/Intel, that's it. This is because they wouldn't have the advantages above.

This discussion is one of degrees. You favor architecture being the primary reason for Apple's performance and efficiency, I believe they a great design bolstered by the above points. Intel and AMD also have great designs but NOT bolstered by the above points. I believe we are going to have to agree to disagree until Apple makes a x86 part and proves one of us wrong.
 
Last edited:
Oh come on. If you have control over the hardware and operating system, the compiler, AND how apps are written, you can obviously optimize in ways that non-vertical models can't. Lets not get silly here
AMD and Intel also have control over a OS it’s called Linux and there are open source compliers like GCC and LLVM.

I wasn’t aware that Apple wrote code and optimised for third party applications on macOS.

Do you who know who optimises certain apps now? Intel does with BOT.

Apple processors do better in terms of performance and efficiency compared to x86 for the following reasons:

No legacy baggage from 50 years of "add ons" to the ISA that much be accounted for.
Tight hardware-software integration that can only be achieved when you have control over the entire ecosystem.
Very wide modern design (yes I give you that!)
Unified memory architecture more so than x86
just wait for Zen6, the “baggage” won’t stop AMD from designing a fast core. x86 doesn’t have any limitations that prevent it from surpassing Apple in applications that most people use.

As for the last three points, Qualcomm CPUs don’t have those “advantages” but are still able to surpass mobile x86 CPUs.
 
No legacy baggage from 50 years of "add ons" to the ISA that much be accounted for.
Tight hardware-software integration that can only be achieved when you have control over the entire ecosystem.
Very wide modern design (yes I give you that!)
Unified memory architecture more so than x86
Meaningless buzzwords.
Apple has fattest cores at a very nice fmax and they have the rest of the SOC to match, too.
That's how they win.
 
I don't see the scores as particularly tainted. Geekbench is such a mess anyway (the ST Mk.2 a.k.a. MT score for example) and has a huge variance anyway.
How you and me view GB is of little importance here, what matters is how Primate Labs view and market their product. If a CPU starts getting higher scores while performance in apps is the same, then GB is in trouble. It could also set a nasty precedent for anyone who decides to tamper with the benchmark code. Primate Labs just marked all ARL and PTL CPUs supported by iBOT as invalid in their GB6 database.

I'm just not totally sure if it is viable wrt risk of miscompilation (transcompilation?) bugs. Kinda sucks if you can get your game or program 8% faster BUT you don't know it's working properly.
Intel says that iBOT currently requires extensive validation, a little over a quarter. They claim to be working on optimization for content creation software.
 
Linex was built to be architecture agnostic. ARM doesn't carry 50 years of baggage like x86 so there is less translation, among other things.
Arm architecture is about 40 years old. Granted it was vastly improved when they defined the 64-bit arch and then they gradually removed 32-bit support to concentrate on the more modern ISA which is now about 15 years old.

The 50 years baggage of x86 is utterly stupid and just useless weight. What's the point of supporting 16-bit ISA in HW now? Last time I wanted to play a 30 yo 32-bit game, it was *much* easier to install an emulator than to play it natively. I bet most old SW people use will run fine on any emulator at good enough speed and with good integration with the host OS.

Static and dynamic recompilation have evolved to the point where such transitions can be succesfully made (even Google has done it on Android, it's not only Apple). Intel is gradually moving in that direction with iBOT and APX, let's see how far they go.
 
You can do PGO for consoles, as every console is the same, but PGO benefits for PCs are of limited use.
PGO is fantastic for any software that runs heavy stuff: it helps compiler decide how to inline better and that can make huge difference vs function call.
 
The 50 years baggage of x86 is utterly stupid and just useless weight. What's the point of supporting 16-bit ISA in HW now?
That support takes very few transistors, keeping it for the sake of 100% backwards compatibility is what made x86 successful, once you start cutting "old stuff" it's a slipper slope and will result in fragmentation, basically neither Intel nor AMD will do such madness.

APX is a better way forward, probably only game in town, plus FRED stuff will be useful and it's agreed now.
 
Back
Top