Question Zen 6 Speculation Thread

Page 349 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

adroc_thurston

Diamond Member
Jul 2, 2023
8,287
11,067
106
Interesting that the newer Intel CPUs (MTL, LNL, PTL, ARL) all have either same or lower clock speeds compared to years old ADL, RPL. While moving to better process nodes.
Way different design methodologies.
They don't have any excuse for being slower on N3 though.
It's possible, but it would be a historically bad.
Needing 1.21 gigavolts to hit 5.7 on N3 is also historically bad.
 

inquiss

Senior member
Oct 13, 2010
621
883
136
Interesting that the newer Intel CPUs (MTL, LNL, PTL, ARL) all have either same or lower clock speeds compared to years old ADL, RPL. While moving to better process nodes.
Almost like the haven't yet got to grips with eeking out the best of external nodes because they're having to learn what other companies already know. Maybe they'll get better. Need to learn to be fast, fast though. AMD knows how to do this very very well.
 

adroc_thurston

Diamond Member
Jul 2, 2023
8,287
11,067
106
Almost like the haven't yet got to grips with eeking out the best of external nodes because they're having to learn what other companies already know. Maybe they'll get better. Need to learn to be fast, fast though. AMD knows how to do this very very well.
everything pre-UC is garbage.
UC is a do or die moment, they gotta fix their logic/phys-des by UC or they're toast.
 

Doug S

Diamond Member
Feb 8, 2020
3,800
6,734
136
Caching might drive performance with RAM at the ridiculous prices. Of course every reviewer has buttloads of RAM rather than accurate reflections of Joe Consumer.

Honestly that's kind of fair though. The people who are reading reviews are the enthusiasts who are not going to skimp on RAM. The people buying the 8 GB entry level laptops are not looking at reviews, they are looking at prices.
 

OneEng2

Senior member
Sep 19, 2022
975
1,186
106
Pentium 4 to Core / Core 2?
Good call ;).

I personally don't feel that ARL architecture is as critically flawed as the entire design philosophy of Netburst though.

Still, your general assertion is correct. It is possible that NVL does more with less, and that this is a good thing all the way around. That is certainly how Core/Core 2 worked out.
 

adroc_thurston

Diamond Member
Jul 2, 2023
8,287
11,067
106
Still, your general assertion is correct. It is possible that NVL does more with less, and that this is a good thing all the way around. That is certainly how Core/Core 2 worked out.
It's not about what they do, it's what they don't.
Intel is just incompetent at extracting speed from anything that looks like a normal foundry PDK.
 

Geddagod

Golden Member
Dec 28, 2021
1,663
1,695
136
everything pre-UC is garbage.
UC is a do or die moment, they gotta fix their logic/phys-des by UC or they're toast.
LNC is kinda smol though (on the logic front)
Needing 1.21 gigavolts to hit 5.7 on N3 is also historically bad.
Maybe they deserve a bit of leeway given that's the first core with the new phys des, n3b was apparently kinda scuffed, etc etc
PTC vs Zen 6 would be telling tho
Both are quite important. For client, it is ST boost clock that disproportionately affects the performance.
frfr
Interesting that the newer Intel CPUs (MTL, LNL, PTL, ARL) all have either same or lower clock speeds compared to years old ADL, RPL. While moving to better process nodes.
Intel 7 ultra the goat
Obviously, but claiming that Nova Lake gets zero clockspeed increase when it's moving from N3B to N2 is tantamount to saying that Coyote Cove clocks 10% worse than Lion Cove, iso-node. It's possible, but it would be a historically bad.
gonk
Well, it’s a new core and design compared to ARL. There’s no reason that it can’t clock the same (or even lower) on a better node. Presumably, the implication is that Intel messed up the physical design or what have you. No idea whether the 5.7GHz figure is accurate or not, to be clear.
My new hopium is that the core is way more power efficient and is very small, area wise.
They're already paying for N3 to have the same freq as AMD at more voltage.
Honestly the results are so bad I'm unsure if AMD is reporting their figures correctly or if AMD's power rails are somehow inherently better T-T
BPU no (already class-leading by a country mile),
All of Zen 5's front end latency weirdness seems to be caused by the BPU.
 

Geddagod

Golden Member
Dec 28, 2021
1,663
1,695
136
Haven't seen this talked about, but Zen 5's branch mispredict penalty appears to be way higher than LNC's. What's up with these test results?:
1769189854852.png
1769189880046.png
And if it's true, is Zen 5 pipelined way more than LNC? Would it make sense for Intel's next gen core to clock a bit lower than AMD's one then if it just has fewer stages?
 
  • Like
Reactions: Tlh97 and Hulk

adroc_thurston

Diamond Member
Jul 2, 2023
8,287
11,067
106
LNC is kinda smol though (on the logic front)
noeperf.
Maybe they deserve a bit of leeway given that's the first core with the new phys des, n3b was apparently kinda scuffed, etc etc
no.
Honestly the results are so bad I'm unsure if AMD is reporting their figures correctly or if AMD's power rails are somehow inherently better T-T
uh.
All of Zen 5's front end latency weirdness seems to be caused by the BPU.
no.
Haven't seen this talked about, but Zen 5's branch mispredict penalty appears to be way higher than LNC's
yeah man pipeline flushes hurt when your L1's are tiny.
And if it's true, is Zen 5 pipelined way more than LNC? Would it make sense for Intel's next gen core to clock a bit lower than AMD's one then if it just has fewer stages?
you should read the chart again.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
6,834
10,960
136
by the time all cores are loaded there is a 500MHz gap.
Depends a lot on 1. workload (algorithm and dataset) and 2. power limit.
Furthermore, bare clock speed is not a reliable performance metric especially when more cores get utilized. Cores can clock very high if most of their cycles are spent waiting for main memory accesses instead of actually computing something.
 

Geddagod

Golden Member
Dec 28, 2021
1,663
1,695
136
Tons of changes, also first time they moved to 2-2. I think they deserve to catch a bit of a break.
Mostly cuz ARL-H core v/f curve looks so bad, but when you look at strix point vs arl-h platform power tests they look close, if not Intel outright leading.
AFAIK no one has tested ARL-H package power readings yet vs AMD, but unless strix point uncore power sucks vs arl-h, it's very hard to explain the differences.
certainly seems that way
yeah man pipeline flushes hurt when your L1's are tiny.
How does this relate?
you should read the chart again.
Similar clocked products, much higher latency in ns for AMD's core?
 

adroc_thurston

Diamond Member
Jul 2, 2023
8,287
11,067
106
Tons of changes, also first time they moved to 2-2. I think they deserve to catch a bit of a break.
no.
Mostly cuz ARL-H core v/f curve looks so bad, but when you look at strix point vs arl-h platform power tests they look close, if not Intel outright leading.
N3 has Cac reduction vs N4.
certainly seems that way
read the chart again.
How does this relate?
Pipeline flushes mean L2 fetches with tiny L1's.
Similar clocked products, much higher latency in ns for AMD's core?
Yes?
AMD front-end latency is down to having a miserably small 32K L1i (with a much nicer core otherwise).
 
Last edited:

Hulk

Diamond Member
Oct 9, 1999
5,337
4,030
136
Depends a lot on 1. workload (algorithm and dataset) and 2. power limit.
Furthermore, bare clock speed is not a reliable performance metric especially when more cores get utilized. Cores can clock very high if most of their cycles are spent waiting for main memory accesses instead of actually computing something.
Absolutely, 100% true. Those TechPowerUp tests only show under default BIOS setting how high the chip will go assuming you have adequate power and cooling.

It's just one data set to be evaluated while keeping what you posted in mind.

This is why, as I have been posting, I'm very interested in how these high core parts on the horizon will clock under MT loads.

For example, drawing 230W my 9950X hits the following all-core clocks
CB R26 - 5270
CB R23 - 5030
OCCT cpu stress steady - 4670

All results are below TechPowerUp's 5336MHZ "light load" best case scenario. Of course you CAN get there if you have adequate cooling and power. Or higher if you want to go PBO.

I am interested in what happens with MT clocks in various loads when you cap Zen 6/Nova Lake at 200W or 230W, etc?
 
  • Like
Reactions: Joe NYC

Fjodor2001

Diamond Member
Feb 6, 2010
4,547
727
126
Why? It doesn't make ST perf lower. It only lowers stupid metrics no one cares about, like Cinebench points per thread.
Point is Zen(5/6) SMT results in Cinememe thread count spam.

We”re talking about max MT perf, so perf/thread and thread count is what matters, since both Zen6 and NVL-S will have 48T. And Zen6 SMT will have lower perf/thread than even an NVL-S E-Core thread, and far lower perf than a P-Core thread. This due to Zen6 using SMT and having one T per C.
 
Last edited:

gdansk

Diamond Member
Feb 8, 2011
4,734
8,019
136
We”re talking about max MT perf, so perf/thread and thread count is what matters
Perf per thread does not matter for Cinebench or any other embarrassingly parallel workload.
Consider: if AMD adds SMT4 (lol) suddenly their perf *per thread* drops precipitously even though total throughput would likely increase. It's a useless measurement here. There is no reason to care about it for a test like Cinebench.
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,547
727
126
Perf per thread does not matter for Cinebench or any other embarrassingly parallel workload.
Consider: if AMD adds SMT4 (lol) suddenly their perf *per thread* drops precipitously even though total throughput would likely increase. It's a useless measurement here. There is no reason to care about it for a test like Cinebench.
As I mentioned, it’s perf/thread AND thread count. Thread count is 48T in both cases though, so that’s leaves perf/thread for comparison.