Question Zen 6 Speculation Thread

Page 350 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Fjodor2001

Diamond Member
Feb 6, 2010
4,560
727
126
Thing is that some continue to talk about core count spam, implying that many cores with low perf/core is bad. But actually we should talk about Cinememe thread count spam, with many threads with low perf/thread. Because in the end, it’s threads that are executing, not cores.
 

gdansk

Diamond Member
Feb 8, 2011
4,742
8,027
136
Because in the end, it’s threads that are executing, not cores.
No, actually I'm pretty sure front ends do not execute a instruction (they can, however, fold and eliminate instructions).

Moreover it's almost deliberately misleading to focus on perf per thread. It obscures core performance in the heterogenous configurations. In the M5, for example, there is nothing working at the rate of "perf per thread". There are 4 cores working at some higher rate and 6 cores working at some lower rate (more or less).
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,560
727
126
No, actually I'm pretty sure front ends do not execute a instruction (they can, however, fold and eliminate instructions)
Not sure what your point is.
Moreover it's almost deliberately misleading to focus on perf per thread. It obscures core performance in the heterogenous configurations. In the M5, for example, there is nothing working at the rate of "perf per thread". There are 4 cores working at some higher rate and 6 cores working at some lower rate (more or less).
Zen is not heterogenous, so N/A.

But otherwise you’d have to consider average perf/thread instead. And then what I mentioned is still valid.

And w.r.t. Zen6 vs NVL-S, even an NVL-S E-Core will have higher perf/thread than an Zen6 SMT thread.
 
Last edited:

gdansk

Diamond Member
Feb 8, 2011
4,742
8,027
136
Not sure what your point is.
Threads don't execute anything. They are simply steams of (decoded) instructions. The only valid thing to care about for rendering is total throughput. Any uniform divisor is nearly useless. Perf per core is misleading in heterogenous designs and perf per thread is misleading for SMT designs.
 
Last edited:
  • Like
Reactions: CouncilorIrissa

Fjodor2001

Diamond Member
Feb 6, 2010
4,560
727
126
Threads don't execute anything. They are simply steams of (decoded) instructions. The only valid thing to care about for rendering is total throughput. Any divisor is nearly useless. Perf per core is misleading in heterogenous designs and perf per thread is misleading for SMT designs.
See my previous post. You’re trying to overcomplicate the issue to diverge from the main point.

But go with this instead then:

MT throughput = Perf thread on average * Thread count
 

inquiss

Senior member
Oct 13, 2010
625
884
136
No that's a bad proxy.
Total MT throughput in Cinebench = (Perf per core * number of cores with that performance level)

Really, I'm not sure why you would want to bring perf per thread into the discussion at any point. It's needlessly inaccurate.
Sometimes a thread will be happily meandering along on topic and then *pow* a 48T fairy appears and it's hard to know why or if anyone wished the fairy to change the topic at all. But it happens.
 
  • Haha
Reactions: gdansk

inquiss

Senior member
Oct 13, 2010
625
884
136
Not sure what your point is.

Zen is not heterogenous, so N/A.

But otherwise you’d have to consider average perf/thread instead. And then what I mentioned is still valid.

And w.r.t. Zen6 vs NVL-S, even an NVL-S E-Core will have higher perf/thread than an Zen6 SMT thread.
No it won't. You keep saying this as fact based on your assumption that "real cores" are faster than threads but it really depends on the workload and the cores in question.
 

AMDK11

Senior member
Jul 15, 2019
494
435
136
SMT thread performance is irrelevant if both increase the efficient use of single-core resources at the cost of 5% complexity.

SMT Zen5 in CB26 gives +37%.

And 20%+ vs ST LionCove.
 

AMDK11

Senior member
Jul 15, 2019
494
435
136
It matters, especially since Intel's 48T are physical cores that occupy physical space in the silicon as full-fledged cores. 48T Zen has 24 physical cores and only a 5% increase in SMT.

As I mentioned, SMT is designed to fully utilize core resources where ST is difficult or impossible.
 

AMDK11

Senior member
Jul 15, 2019
494
435
136
The performance of the SMT thread does not matter if both threads provide significantly higher performance for a single core.

And that's precisely what SMT is all about. It's not just about the performance of the thread itself, but the sum of both effects on the resource of the entire core.

If SMT gives in certain conditions (I don't mean in every case) 20-40+% compared to ST at the cost of complexity of 5%, then SMT still gives a big profit, especially since it allows to increase performance within the same core.
 
Last edited:

Geddagod

Golden Member
Dec 28, 2021
1,667
1,696
136
N3 has Cac reduction vs N4.
Doesn't seem all that large tbh
1769212529078.png
But also how does improvements in active power help them reduce idle/SOC power usage that would make up the much lower reported core V/F curve?
read the chart again.
I'm referring to Huang's performance profiling for the BPU comments, not the mis predict penalty graph
Pipeline flushes mean L2 fetches with tiny L1's.
I don't even think this graph would show that for 3 reasons:
1, I don't think the array length or branch count are high enough to necessitate a fetch from the L2
2, even if it did, they would require that from both the predictable and random cases, meaning that it should be eliminated from the difference,
3, and if it wasn't, missing the L1i and requiring an L2 hit would cause a latency penalty in cycles almost as much as the mis predict penalty already is. It doesn't make any sense, the graph can't be showing that.
Yes?
AMD front-end latency is down to having a miserably small 32K L1i (with a much nicer core otherwise).
That would be reflected in a graph like this:
1769217172948.png
Or this:
1769217195679.png
But not on the graph I posted.
Doesn't seem likely when NVL-S has 75% higher TDP than OR
Way more cores though.
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,560
727
126
The performance of the SMT thread does not matter if both threads provide significantly higher performance for a single core.

And that's precisely what SMT is all about. It's not just about the performance of the thread itself, but the sum of both effects on the resource of the entire core.

If SMT gives in certain conditions (I don't mean in every case) 20-40+% compared to ST at the cost of complexity of 5%, then SMT still gives a big profit, especially since it allows to increase performance within the same core.
Agreed in principle. But to get the total MT perf you also have to multiply by core count, which differs per CPU. I.e. something like this:

Total MT perf = Core count * Perf/core without SMT * SMT boost (if applicable)

So e.g. if core count is 2x for CPU A vs B, then SMT boost 20-40% of B will not be sufficient to counter that, all else equal.
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,560
727
126
Regarding release dates, are we expecting the X3D Zen6 SKUs to be available already in 2026H2? Or like for Zen5 only non-X3D in first wave, then X3D in e.g. 2027Q3, and X3D2 some time after that?