Question Zen 6 Speculation Thread

Joe NYC · Oct 6, 2025

techjunkie123 said:
So dense has the same L3/core as classic this time, instead of half?

I wonder what the difference is size is between the classic and dense then.

Not only that, but with

32 cores * 4 MB = 128 MB

The size of the L3 pool goes up, which allows the active cores that use a lot of memory to get even bigger allocation of that L3. The size of the L3 pool goes up 4x from Turin Dense to Venice dense.

Joe NYC · Oct 6, 2025

basix said:
SP7 then 😉

Zen 7 then

Tuna-Fish · Oct 6, 2025

adroc_thurston said:
you do not want either of those for per-C licensing.
you want a 32 or 48c fmax part.

Or even less cores. There are EPYCs with 1 enabled core per CCD, they are popular for this crowd.

Joe NYC · Oct 6, 2025

adroc_thurston said:
you do not want either of those for per-C licensing.
you want a 32 or 48c fmax part.

I think, better than 50% probability that AMD comes back with V-Cache for the Zen 6 classic, for highest single core performance.

adroc_thurston · Oct 6, 2025

Tuna-Fish said:
Or even less cores. There are EPYCs with 1 enabled core per CCD, they are popular for this crowd.

Kind of, you're not winning much freq from going to 16c.

Joe NYC said:
I think, better than 50% probability that AMD comes back with V-Cache for the Zen 6 classic, for highest single core performance.

V$ is completely irrelevant for fmax SKUs.
It's completely irrelevant in server outside of specific HPC workloads.

Tuna-Fish · Oct 6, 2025

adroc_thurston said:
Kind of, you're not winning much freq from going to 16c.

The main purpose being to push core counts down because Oracle doesn't accept "sure we have 32 but we are only using 16 of them".

adroc_thurston said:
It's completely irrelevant in server outside of specific HPC workloads.

If you can fit some of your most important indexes in cache, it suddenly becomes very relevant for databases.

adroc_thurston · Oct 6, 2025

Tuna-Fish said:
The main purpose being to push core counts down because Oracle doesn't accept "sure we have 32 but we are only using 16 of them".

World's ain't ending on Oracle.

Tuna-Fish said:
If you can fit some of your most important indexes in cache, it suddenly becomes very relevant for databases.

Niche and hard to execute on.
So far main V$ usecases in DC are down to carefully MPI-sliced HPC workloads.
That's why Turin-X went the way of the dodo.

Joe NYC · Oct 6, 2025

Tuna-Fish said:
The main purpose being to push core counts down because Oracle doesn't accept "sure we have 32 but we are only using 16 of them".

If you can fit some of your most important indexes in cache, it suddenly becomes very relevant for databases.

One of the benefits of F CPUs with reduced core count per CCD was increased L3 per CCD. With Genoa-X, V-Cache on top of the CPU was still slowing down the FMax.

The new options on the table for Zen 6 with V-Cache:
- also increase L3 per core
- but don't reduce FMax
So both can be done without wasting precious N2 die by disabling perfectly fine cores.

Joe NYC · Oct 6, 2025

BTW, does anyone know which version of Venice is being use in Helios rack scale installation? 256 core dense or 96 core classic?

Saylick · Oct 6, 2025

Joe NYC said:
BTW, does anyone know which version of Venice is being use in Helios rack scale installation? 256 core dense or 96 core classic?

Total guess, but probably the 96 core classic? For reference, Vera is a 88 core CPU, also with SMT.

adroc_thurston · Oct 6, 2025

Joe NYC said:
With Genoa-X, V-Cache on top of the CPU was still slowing down the FMax.

The new options on the table for Zen 6 with V-Cache:
- also increase L3 per core
- but don't reduce FMax
So both can be done without wasting precious N2 die by disabling perfectly fine cores.

just forget about the V$.

Saylick said:
For reference, Vera is a 88 core CPU, also with SMT.

We don't care what NV does, server Tegras are poo poo.

luro · Oct 6, 2025

OneEng2 said:
Does anyone have some market research to prove this theory?

No. We are speculating here.

Joe NYC · Oct 6, 2025

adroc_thurston said:
We don't care what NV does, server Tegras are poo poo.

So, you think it will be 256 core Venice Dense in those Helios nodes?

adroc_thurston · Oct 6, 2025

Joe NYC said:
So, you think it will be 256 core Venice Dense in those Helios nodes?

Probably.
You want fat membw for offloading host-specific functions.

Abwx · Oct 6, 2025

Tuna-Fish said:
Or even less cores. There are EPYCs with 1 enabled core per CCD, they are popular for this crowd.

Seems that it is 4 cores per CCD with 8 chiplets and 32 cores.

Joe NYC · Oct 6, 2025

adroc_thurston said:
Probably.
You want fat membw for offloading host-specific functions.

Looks like 1 CPU per node from Dylan Patel picture:

https://twitter.com/x/status/1923143004715614294

adroc_thurston · Oct 6, 2025

Joe NYC said:
Looks like 1 CPU per node from Dylan Patel picture:

I know what it is; you don't have to quote dolan to me either.

StefanR5R · Oct 7, 2025

Markfw said:
I am not sure the total market for 128 core Turin, or Venice, but my 9755 is not alone in several applications that it uses and uses 128 core fat (NO SMT) or 256 thin (with SMT), and I just don't know how many do, but it certainly is not zero.

Turin yes, but not Venice in the very same way. The relationship of Venice's classic and dense parts to each other will not be the same anymore as we are used from Turin and from Genoa, Bergamo, Siena. (And coinciding, the relationship of the sockets SP7 and SP8 to each other will be very different from the relationship of SP5 and SP6 to each other.) Hence, certain conclusions which some have drawn from Turin to Venice here in this thread are not valid.

So, the Turin situation that 9755 > 9745 and that 9755 has got its use (niche or not) is readily apparent. But with Venice, things will play out differently. Remember that the dense CCX is presumed to be heavily upgraded to the same ratio of L3$ to cores as the classic CCX, and thereby to much more L3$ per core complex than classic. And this is only one system-level change among several more from Turin to Venice.

StefanR5R · Oct 7, 2025

Joe NYC said:
One of the benefits of F CPUs with reduced core count per CCD was increased L3 per CCD. [...V-cache as potential alternative...]

Another effect of using fewer cores per CCD is a higher ratio of die-to-die bandwidth to core count. (With the current IFoP implementation, "wide GMI" is an alternative or additional means to this end.) Will be interesting to learn what die-to-die bandwidth the 12c and 32c CCDs will be configured with.

BorisTheBlade82 · Oct 7, 2025

StefanR5R said:
Will be interesting to learn what die-to-die bandwidth the 12c and 32c CCDs will be configured with.

As stated before, it must be at least 200 GByte/s/CCD in order to saturate the new max. Mem BW with an 8 CCD SKU under perfect circumstances.
According to C&C, total RAM BW is shared by reads and writes, so with a typical factor of 2:1 I see 128 GByte/s read, 64 GByte/s write as the lower boundary. I would hope for 256 GByte/s r, 128 GByte/s w for two reasons:
It is better for a broader range of workloads with less even distributed RAM BW demands.
It gives SKUs with <8 CCD more room to stretch their legs, just like GMI-Wide does today. And no, I cannot imagine them to be able to combine two links for one CCD, as the layout/routing constraints will make that impossible.

Joe NYC · Oct 7, 2025

StefanR5R said:
Another effect of using fewer cores per CCD is a higher ratio of die-to-die bandwidth to core count. (With the current IFoP implementation, "wide GMI" is an alternative or additional means to this end.) Will be interesting to learn what die-to-die bandwidth the 12c and 32c CCDs will be configured with.

With new technology allowing higher bandwidth delivered efficiently, the cap will likely be increased so much that it is almost never a bottleneck.

So that argument should go away - fewer cores per CCD in order for each core to have more bandwidth.

BorisTheBlade82 · Oct 7, 2025

Joe NYC said:
With new technology allowing higher bandwidth delivered efficiently, the cap will likely be increased so much that it is almost never a bottleneck.

So that argument should go away - fewer cores per CCD in order for each core to have more bandwidth.

Well, for AMD it would have been easy to do just what you wrote with Strix Halo - but they decided to keep the exact same bandwidth. So I'd guess that power consumption, area and routing effort is still enough of a thing for them to not widen the interconnect beyond sanity. This is why I fear them to align more with the lower bound described above than anything else.

adroc_thurston · Oct 7, 2025

BorisTheBlade82 said:
Well, for AMD it would have been easy to do just what you wrote with Strix Halo - but they decided to keep the exact same bandwidth

it's not the exact same since the writes are symmetrical to reads now, both 32B a clock cycle.
It's one SDP there.
My guess is Zen6 will move to an SDP-per-mesh-column arrangement, giving the baby CCD 64bytes o'clock, and the big boy with a nice and chonky 128B/clk.

basix · Oct 7, 2025

That makes sense:
- 64B/clk = 205 GByte/s at MCRDIMM-12'800
- 128B/clk = 410 GByte/s at MCRDIMM-12'800

That is perfectly suited so that you can max. out the 1.6 TB/s total memory bandwidth with 8x 12C chiplets (96C) or with all Zen 6c SKUs (4x 32C = 128C or more CCDs).

Josh128 · Oct 7, 2025

StefanR5R said:
Remember that the dense CCX is presumed to be heavily upgraded to the same ratio of L3$ to cores as the classic CCX, and thereby to much more L3$ per core complex than classic.

Where are you getting this from? No longer just 2 MB L3 per core for Zen 6c, but 4 MB?? Thats news to me.

Question Zen 6 Speculation Thread

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Elite Member

Elite Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Banned