Question Zen 6 Speculation Thread

Page 202 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Josh128

Golden Member
Oct 14, 2022
1,060
1,607
106
How reliable is that message, considering the rumored H2 2026 release?
Its super early, given the presumed late summer/early fall 2026 release timeframe estimate. Thats what makes me the most skeptical about it. If there are already desktop ES floating around that are pushing scores like that, AMD should have a ton of time to optimize microcode, produce new steppings, and hone the boost algos. 9950X released August 15, and we didnt start seeing actual GB or CPU-z scores before around what, May/June or so?
 

511

Platinum Member
Jul 12, 2024
2,937
2,927
106
Nope. It uses SSE2NEON which is a translator. It is problematic for ARM CPUs. You ever wondered why Apple’s chips do poorly in this subtest vs native Blender where they beat Intels equivalent. This subtest doesn’t reflect real applications because of outdated code.

AMD also beats Intel in Blender but loses in Cinebench R23.

GB6 probably uses an old build or some very badly optimised build of Intel Embree.
The Blender open test shows that GB6 Ray tracer test isn’t indicative of real world performance.

In fact it shows how behind Intel is in FP when it can’t feed more watts to its cores.

View attachment 127420
Blender uses Embree as well but as you said don't know the exact version they are using for Geekbench also Skymont's FP is lagging when I looked at CnC it needs a boost.
 
  • Like
Reactions: Io Magnesso

MS_AT

Senior member
Jul 15, 2024
742
1,500
96
AVX-512 screwing around is fine though
Enabling AVX512 has benefits for the scalar fp code gen path too, I don't think this holds for SME. While the benefits are situational and might be they don't apply in geekebench, they are there (compiling for avx512 gives compiler access to 32 architectural fp registers, even when running a scalar code, what means fewer spills to stack, this level the field against ARM which also have 32 architectural registers for the fp paths (both scalar and simd)).

This subtest doesn’t reflect real applications because of outdated code.
Geekbench is using outdated compilers anyway. I wonder if they will update for the APX rollout, or pretend it doesn't exist.

Nope. It uses SSE2NEON which is a translator. It is problematic for ARM CPUs.
Somebody would need to rewrite code to wrap the translated intrinsics in something more high level, like google's highway, but I guess using SSE2NEON is fast way of getting it to compile instead of optimizing for perf. If ARM cared, they would contribute the resources, I wonder if they maintain they own alternative to embree.
 

511

Platinum Member
Jul 12, 2024
2,937
2,927
106
Enabling AVX512 has benefits for the scalar fp code gen path too, I don't think this holds for SME. While the benefits are situational and might be they don't apply in geekebench, they are there (compiling for avx512 gives compiler access to 32 architectural fp registers, even when running a scalar code, what means fewer spills to stack, this level the field against ARM which also have 32 architectural registers for the fp paths (both scalar and simd)).
yeah kind of funny x86_64 is stuck at 16 architectural registers since introductions and they haven't bothered to increase it till now
Geekbench is using outdated compilers anyway. I wonder if they will update for the APX rollout, or pretend it doesn't exist.
don't know about Geekbench but we will get a new Cine bench version lol if their release cycle continues.
Somebody would need to rewrite code to wrap the translated intrinsics in something more high level, like google's highway, but I guess using SSE2NEON is fast way of getting it to compile instead of optimizing for perf. If ARM cared, they would contribute the resources, I wonder if they maintain they own alternative to embree.
 
  • Like
Reactions: Io Magnesso

poke01

Diamond Member
Mar 8, 2022
3,769
5,103
106
Somebody would need to rewrite code to wrap the translated intrinsics in something more high level, like google's highway, but I guess using SSE2NEON is fast way of getting it to compile instead of optimizing for perf. If ARM cared, they would contribute the resources, I wonder if they maintain they own alternative to embree.
It just depends on how the devs implement Intel Embree, Blender uses SIMD everywhere and is much better implementation than the SS2NEON in RenderKit.
 
  • Like
Reactions: Io Magnesso

Win2012R2

Senior member
Dec 5, 2024
981
980
96
yeah kind of funny x86_64 is stuck at 16 architectural registers since introductions and they haven't bothered to increase it till now
1) large scale (compared to available) register renaming helped mitigate the problem
2) breaking backwards compatibility is the last resort - a fair few other things were done first to gain more

I hope AMD gets APX ASAP, but doubt very much it's in Zen 6, seems impossibility really.
 
  • Like
Reactions: Io Magnesso

poke01

Diamond Member
Mar 8, 2022
3,769
5,103
106
Enabling AVX512 has benefits for the scalar fp code gen path too, I don't think this holds for SME. While the benefits are situational and might be they don't apply in geekebench, they are there (compiling for avx512 gives compiler access to 32 architectural fp registers, even when running a scalar code, what means fewer spills to stack, this level the field against ARM which also have 32 architectural registers for the fp paths (both scalar and simd)).
AVX512 should be compared with SVE2 and is much more useful than SME. SME is an AI extension and Primate Labs is ruining its database. They need to separate or have a toggle for cores that have shared accelerators.
 

Win2012R2

Senior member
Dec 5, 2024
981
980
96
APX is already supported by LLVM
Adoption should be in theory much quicker than AVX512 since rewrite for SIMD isn't required, but the jury is out on what real benefit APX will bring, Intel claims are pretty modest - is extra complexity of a core worth it when you can just add another core?
 

MS_AT

Senior member
Jul 15, 2024
742
1,500
96
large scale (compared to available) register renaming helped mitigate the problem
It does not mitigate the problem I outlined. You can have 1000 entry big register file, but compiler can operate only on the architectural registers, so it will spill. The costs of spills can be mitigated by some tricks, but these do not always apply, so the best way is to minimise spills.
If they even bother? Intel just shut down their Clear Linux distro, so they won't even have quick support for their own new stuff.
What that does have to do with anything? How many users Clear Linux had compared to RedHat, Ubuntu or whatever M$ is rolling in Azure. It's important the compiler get the support, and they got it already (this is one thing Intel still does much better than AMD, which is unable to get Zen5 scheduler model merged to mainstream llvm after a year since Zen5 release)
 

Io Magnesso

Senior member
Jun 12, 2025
539
146
71
APX is already supported by LLVM
That's right,
New features that software can already be used, and tools, etc. are often used early before the new features are actually installed in the hardware.
No, that's normal
it must be ready for use when new features are introduced.
 

Io Magnesso

Senior member
Jun 12, 2025
539
146
71
It does not mitigate the problem I outlined. You can have 1000 entry big register file, but compiler can operate only on the architectural registers, so it will spill. The costs of spills can be mitigated by some tricks, but these do not always apply, so the best way is to minimise spills.

What that does have to do with anything? How many users Clear Linux had compared to RedHat, Ubuntu or whatever M$ is rolling in Azure. It's important the compiler get the support, and they got it already (this is one thing Intel still does much better than AMD, which is unable to get Zen5 scheduler model merged to mainstream llvm after a year since Zen5 release)
Come to think of it, I don't remember whose article it was, but I've seen people say that the scheduling of Zen5 is not going well.
What will happen to zen6…
 

MS_AT

Senior member
Jul 15, 2024
742
1,500
96
Come to think of it, I don't remember whose article it was, but I've seen people say that the scheduling of Zen5 is not going well.
I doubt you have read anywhere an article about llvm scheduling model ;) Might be you confuse the issue with dual decoders or the fp scheduler latency problem, which are separate things to what I complain about. Right now in llvm, Zen5 is using Zen4 scheduling/cost model, generally it's not like it's tanking the performance, the difference between the two would most likely be visible only in code gen for some specific corner cases, but it's just showing that AMD doesn't care. In GCC I think RedHat engineers provided the corrected model and final tunings, though I might be wrong, and in upstream llvm the initial enablement for Zen5 (basically to ensure that people using -march=native don't shoot themselves in the foot) came barely in time, exceptionally landing into release candidate, and the PR to update the cost/scheduling model is open since March, without any updates from AMD side. Now, I would be more understanding if their own AOCC wouldn't be a downstream fork of LLVM itself. It's not like they don't know the codebase... And AOCC is not available on Windows, when Clang can be directly installed from within Visual Studio...

GPUs on the other hand seem to get timely updates, ahead of release.

ok, rant over;)
 
  • Like
Reactions: Kryohi and 511

511

Platinum Member
Jul 12, 2024
2,937
2,927
106
Don’t worry, I’m sure Lip Bu Tan will cut out all the unnecessary instructions as a cost reduction measure. He heard SIMD uses a lot of “cash registers” and thought - we can’t have that.
He ain't cutting core business like NEX/Auto as for clear linux the company that wants to maintain it can fork it themselves.
 
  • Like
Reactions: Io Magnesso

Io Magnesso

Senior member
Jun 12, 2025
539
146
71
He ain't cutting core business like NEX/Auto as for clear linux the company that wants to maintain it can fork it themselves.
Well, it's OPENSource, so you can do it by entrusting the rest to someone else.
Maybe the volunteers will take over.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,243
4,741
136
Let me just clarify that both of those scores are exceedingly high outliers for GB 5. That 9950X MT is top 0.01%. Both ST scores are top .1% or better.

For reference, I had the top GB6 9900X score for several months at least (havent checked lately but Im sure its been surpassed), I ran GB5 using the same setup.


GB5
View attachment 127422

GB6
View attachment 127423

Like I said, I did not gather the data myself, by hand, I used Bing / CoPilot. (it's very fast).

I did come across one outlier, where Bing gave me 3499 ST for Arrow Lake, prompted CoPilot about it and it corrected itself, that the 3499 was a GeekBench 6 score.