• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Question Zen 6 Speculation Thread

Page 317 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
No, that's not what this post says. This post shows "TDP scaling (Cinebench 2024 MC)", plotted by Computerbase.

C i n e b e n c h

It's something quite special. Better to not make general statements based on data like that.
  • 3D CPU rendering
  • Video encoding (CPU-heavy)
  • Scientific/M&E multi-thread workloads
  • Compilers
  • Anything that scales perfectly with threads and uses FP math heavily

I'll give it to you that this isn't the bulk of most day to day usage for the vast majority of x86 users, but that is another subject all together.

The post I have been debating is that some still believe that somehow Zen 6 24c will beat NVL 52c in these kinds of apps. To me, the math just doesn't make sense. Seems like NVL will win easily in these applications.
 
I think the confidence comes from the fact that arctic wolf e core will be amazing.


It’s still an E core doesn’t matter if it has AVX10.2. No e core or small core ever is substitute for an actual P core like Zen6 or Coyote cove.

So for me it’s 16P cores+ 32 Cinebench accelerators and 4LPE cores that don’t belong in desktop.

I rather have 24 zen6 cores. You all getting caught up the number of cores but what matters is how good 1t. That’s it.
 
I think the confidence comes from the fact that arctic wolf e core will be amazing
Indeed.
Unfortunately, 288 DKTs on 18A struggle to compete with 192c Z5dense on N3e so back to reality they go.
It’s still an E core doesn’t matter if it has AVX10.2. No e core or small core ever is substitute for an actual P core like Zen6 or Coyote cove.

So for me it’s 16P cores+ 32 Cinebench accelerators and 4LPE cores that don’t belong in desktop.
Really not the problem.
Atoms are competent all-around cores with good area and horrible power.
They just kinda suck at power-limited nT.
 
1t is all that matter for MT. Oookey.
Ever heard of https://en.wikipedia.org/wiki/Amdahl's_law ?
Video encoding (CPU-heavy)
Has trouble scaling unless you split it into parts and encode in parallel. At least what I have read on x265's enthusiast forums.
Compilers
Real build systems hit https://en.wikipedia.org/wiki/Amdahl's_law If you want proof, go over servethehome.com Linux compile tests over the years and how they changed the methodology. Or better yet, try to build chromium with varying number of workers on your own.
Anything that scales perfectly with threads and uses FP math heavily
Shouldn't these end up on GPUs?
 
No need to spam the thread with such basics again and again. It’s already been posted and discussed numerous times.

We’re talking about a 48T scenario here. So it’s the same situation for both Zen6 24C/48T and NVL-S 48C/T.

In PCs (Personal Computers) gains from the threads at thrdr thread counts are flatlined, meaning you are getting practically no gains, and 2nd CCD (in both cases of Zen 6 and NVL) delivers almost nothing.

I am not sure why you are making this argument, in mostly irrelevant (flat) part of the Amdahl curve, where you could make an argument instead in comparing 1 CCD Zen 6 and NVL, where, in both cases additional threads still may deliver performance increment that is > 0.

What's the obsession with 48+ threads?
 
It’s already been posted and discussed numerous times.
And so many times you failed to understand why this is a problem😉
We’re talking about a 48T scenario here. So it’s the same situation for both Zen6 24C/48T and NVL-S 48C/T.
It's not. When the Amdahl's law hits you, you basically care that the longest subtask that others are waiting for is running on fastest core. It's easier to achieve with homogeneous cores. For example most builds systems are not aware of hybrid CPUs and are trusting the OS and Thread director will do a good job. Spoiler alert, they sometimes fail miserably. I mean people at work explicitly disable E cores, so the compiles go faster... (keep in mind it's a dev work, which is different than CI work). Disclaimer we don't have Arrow Lake to test with, only Raptor and Meteor Lake. But Windows still have problem with handling that so many years after Alder Lake.
There's a ton of usecases for CPU SIMD.
I don't deny it. But the conditions he specified are perfect for GPUs. I mean if something scales perfectly with increasing thread count it's probably embarrassingly parallel workload with little communication between workers, and since it's pure number crunching little need for branch prediction.
 
  • 3D CPU rendering
If we pick just this one as an example MT application area and look around different renderers (and different test scenes on top maybe), we observe that Cinebench is not giving the full picture of perf/W by CPU vendor x vs. CPU vendor y. Not at all.
 
Indeed.
Unfortunately, 288 DKTs on 18A struggle to compete with 192c Z5dense on N3e so back to reality they go.

The worst part about Clearwater Forest is that it isn't even available yet, while Turin-dense has been on the market for awhile now.

In the meantime, Lisa Su's Thanksgiving turkey size seems to be scaling proportionally with AMD profits.
She hasn't been showing off huge rings recently, so maybe she has to compete with JHH's leather jackets using roast turkeys.
 
In PCs (Personal Computers) gains from the threads at thrdr thread counts are flatlined, meaning you are getting practically no gains, and 2nd CCD (in both cases of Zen 6 and NVL) delivers almost nothing.

I am not sure why you are making this argument, in mostly irrelevant (flat) part of the Amdahl curve, where you could make an argument instead in comparing 1 CCD Zen 6 and NVL, where, in both cases additional threads still may deliver performance increment that is > 0.

What's the obsession with 48+ threads?
The context was OneEng2’s statement that NVL-S 48C/T is likely to win over Zen6 24C/48T in 48T MT scenarios. So both CPUs will be executing 48T and thus they are at the same point on Amdahl’s curve, so throwing that curve into the discussion does not add anything for that scenario.
 
And so many times you failed to understand why this is a problem😉

It's not. When the Amdahl's law hits you, you basically care that the longest subtask that others are waiting for is running on fastest core. It's easier to achieve with homogeneous cores. For example most builds systems are not aware of hybrid CPUs and are trusting the OS and Thread director will do a good job. Spoiler alert, they sometimes fail miserably. I mean people at work explicitly disable E cores, so the compiles go faster... (keep in mind it's a dev work, which is different than CI work). Disclaimer we don't have Arrow Lake to test with, only Raptor and Meteor Lake. But Windows still have problem with handling that so many years after Alder Lake.

I don't deny it. But the conditions he specified are perfect for GPUs. I mean if something scales perfectly with increasing thread count it's probably embarrassingly parallel workload with little communication between workers, and since it's pure number crunching little need for branch prediction.
See my previous post above. Also, you have similar problems with fast vs slow threads on Zen6, due to SMT and some tasks/threads executing faster or slower due to that.

Then we also have cases where multiple apps are executing in parallel. Does not have to be a single app using all 48T.

The scenario discussed was when all 48T are actually being used, regardless of how that is done.
 
Last edited:
the conditions he specified are perfect for GPUs. I mean if something scales perfectly with increasing thread count it's probably embarrassingly parallel workload with little communication between workers, and since it's pure number crunching little need for branch prediction.
Eh in the real world it often doesn't work like this.
Maybe for big companies, but otherwise
1. Porting software to use gpgpu is a PITA, only worth for big and very reusable stuff
2. Abysmal FP64 performance on modern GPUs if you need that
3. Who says code with a lot of branches must necessarily have a lot of communication between threads?
4. Often you need more cores because you have a lot of data to work on in parallel (e.g. a 3 hour long 4k video to encode vs a 10m 1080p one, or a bionformatics pipeline), not because the actual algorithms used are particularly parallelizzabile (see again e.g. x265)

A lot of CPU cores are useful for a lot of people, though I personally do not like at all P+E configurations, and I know for a fact people have had trouble with them with a couple of different and widely used scientific software.
 
Last edited:
1. Porting software to use gpgpu is a PITA, only worth for big and very reusable stuff
That is generally true but this particular case (massively parallel, heavy math) should be easier to port relative to other things.
Abysmal FP64 performance on modern GPUs if you need that
That's the case only for consumer GPUs and even then except for iGPUs I am not sure 9950x has higher FP64 performance than mid class consumer GPU. Especially if you factor in the massive memory BW disadvantage.
3. Who says code with a lot of branches must necessarily have a lot of communication between threads?
I don't know. If you reread my message it said code with lots of number crunching will not have a lot of branches. FFT kernels, matmul kernels. The only branches come from loops or you have done something wrong.
Often you need more cores because you have a lot of data to work on in parallel (e.g. a 3 hour long 4k video to encode vs a 10m 1080p one, or a bionformatics pipeline), not because the actual algorithms used are particularly parallelizzabile (see again e.g. x265)
Sorry, but I am not sure where this came from. Anyway I was not saying that people shouldn't get more cores if their workflow demands it. Just that in a lot of MT cases 1T perf still matters.

I know for a fact people have had trouble with them with a couple of different and widely used scientific software.
Add virtualization software to the list.
 
Is anything known about whether there will be any NPU on Zen6 DT?

NVL-S is expected to have NPU6 @ 74 TOPS INT8. I assume one of the reasons is to comply with the Microsoft Copilot+ PC requirement of 40+ TOPS. So will Zen6 DT follow the same path, or be declared non-compliant with that requirement?
 
Back
Top