• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Question Zen 6 Speculation Thread

Page 265 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
So dense has the same L3/core as classic this time, instead of half?

I wonder what the difference is size is between the classic and dense then.

Not only that, but with

32 cores * 4 MB = 128 MB

The size of the L3 pool goes up, which allows the active cores that use a lot of memory to get even bigger allocation of that L3. The size of the L3 pool goes up 4x from Turin Dense to Venice dense.
 
Or even less cores. There are EPYCs with 1 enabled core per CCD, they are popular for this crowd.
Kind of, you're not winning much freq from going to 16c.
I think, better than 50% probability that AMD comes back with V-Cache for the Zen 6 classic, for highest single core performance.
V$ is completely irrelevant for fmax SKUs.
It's completely irrelevant in server outside of specific HPC workloads.
 
Kind of, you're not winning much freq from going to 16c.
The main purpose being to push core counts down because Oracle doesn't accept "sure we have 32 but we are only using 16 of them".
It's completely irrelevant in server outside of specific HPC workloads.
If you can fit some of your most important indexes in cache, it suddenly becomes very relevant for databases.
 
The main purpose being to push core counts down because Oracle doesn't accept "sure we have 32 but we are only using 16 of them".
World's ain't ending on Oracle.
If you can fit some of your most important indexes in cache, it suddenly becomes very relevant for databases.
Niche and hard to execute on.
So far main V$ usecases in DC are down to carefully MPI-sliced HPC workloads.
That's why Turin-X went the way of the dodo.
 
The main purpose being to push core counts down because Oracle doesn't accept "sure we have 32 but we are only using 16 of them".

If you can fit some of your most important indexes in cache, it suddenly becomes very relevant for databases.

One of the benefits of F CPUs with reduced core count per CCD was increased L3 per CCD. With Genoa-X, V-Cache on top of the CPU was still slowing down the FMax.

The new options on the table for Zen 6 with V-Cache:
- also increase L3 per core
- but don't reduce FMax
So both can be done without wasting precious N2 die by disabling perfectly fine cores.
 
BTW, does anyone know which version of Venice is being use in Helios rack scale installation? 256 core dense or 96 core classic?
 
With Genoa-X, V-Cache on top of the CPU was still slowing down the FMax.

The new options on the table for Zen 6 with V-Cache:
- also increase L3 per core
- but don't reduce FMax
So both can be done without wasting precious N2 die by disabling perfectly fine cores.
just forget about the V$.
For reference, Vera is a 88 core CPU, also with SMT.
We don't care what NV does, server Tegras are poo poo.
 
I am not sure the total market for 128 core Turin, or Venice, but my 9755 is not alone in several applications that it uses and uses 128 core fat (NO SMT) or 256 thin (with SMT), and I just don't know how many do, but it certainly is not zero.
Turin yes, but not Venice in the very same way. The relationship of Venice's classic and dense parts to each other will not be the same anymore as we are used from Turin and from Genoa, Bergamo, Siena. (And coinciding, the relationship of the sockets SP7 and SP8 to each other will be very different from the relationship of SP5 and SP6 to each other.) Hence, certain conclusions which some have drawn from Turin to Venice here in this thread are not valid.

So, the Turin situation that 9755 > 9745 and that 9755 has got its use (niche or not) is readily apparent. But with Venice, things will play out differently. Remember that the dense CCX is presumed to be heavily upgraded to the same ratio of L3$ to cores as the classic CCX, and thereby to much more L3$ per core complex than classic. And this is only one system-level change among several more from Turin to Venice.
 
One of the benefits of F CPUs with reduced core count per CCD was increased L3 per CCD. [...V-cache as potential alternative...]
Another effect of using fewer cores per CCD is a higher ratio of die-to-die bandwidth to core count. (With the current IFoP implementation, "wide GMI" is an alternative or additional means to this end.) Will be interesting to learn what die-to-die bandwidth the 12c and 32c CCDs will be configured with.
 
Will be interesting to learn what die-to-die bandwidth the 12c and 32c CCDs will be configured with.
As stated before, it must be at least 200 GByte/s/CCD in order to saturate the new max. Mem BW with an 8 CCD SKU under perfect circumstances.
According to C&C, total RAM BW is shared by reads and writes, so with a typical factor of 2:1 I see 128 GByte/s read, 64 GByte/s write as the lower boundary. I would hope for 256 GByte/s r, 128 GByte/s w for two reasons:
It is better for a broader range of workloads with less even distributed RAM BW demands.
It gives SKUs with <8 CCD more room to stretch their legs, just like GMI-Wide does today. And no, I cannot imagine them to be able to combine two links for one CCD, as the layout/routing constraints will make that impossible.
 
Another effect of using fewer cores per CCD is a higher ratio of die-to-die bandwidth to core count. (With the current IFoP implementation, "wide GMI" is an alternative or additional means to this end.) Will be interesting to learn what die-to-die bandwidth the 12c and 32c CCDs will be configured with.

With new technology allowing higher bandwidth delivered efficiently, the cap will likely be increased so much that it is almost never a bottleneck.

So that argument should go away - fewer cores per CCD in order for each core to have more bandwidth.
 
With new technology allowing higher bandwidth delivered efficiently, the cap will likely be increased so much that it is almost never a bottleneck.

So that argument should go away - fewer cores per CCD in order for each core to have more bandwidth.
Well, for AMD it would have been easy to do just what you wrote with Strix Halo - but they decided to keep the exact same bandwidth. So I'd guess that power consumption, area and routing effort is still enough of a thing for them to not widen the interconnect beyond sanity. This is why I fear them to align more with the lower bound described above than anything else.
 
Well, for AMD it would have been easy to do just what you wrote with Strix Halo - but they decided to keep the exact same bandwidth
it's not the exact same since the writes are symmetrical to reads now, both 32B a clock cycle.
It's one SDP there.
My guess is Zen6 will move to an SDP-per-mesh-column arrangement, giving the baby CCD 64bytes o'clock, and the big boy with a nice and chonky 128B/clk.
 
That makes sense:
- 64B/clk = 205 GByte/s at MCRDIMM-12'800
- 128B/clk = 410 GByte/s at MCRDIMM-12'800

That is perfectly suited so that you can max. out the 1.6 TB/s total memory bandwidth with 8x 12C chiplets (96C) or with all Zen 6c SKUs (4x 32C = 128C or more CCDs).
 
Remember that the dense CCX is presumed to be heavily upgraded to the same ratio of L3$ to cores as the classic CCX, and thereby to much more L3$ per core complex than classic.
Where are you getting this from? No longer just 2 MB L3 per core for Zen 6c, but 4 MB?? Thats news to me.
 
Back
Top