Question Zen 6 Speculation Thread

Kepler_L2 · 2025-10-07T19:17:10-0400

Josh128 said:
Where are you getting this from? No longer just 2 MB L3 per core for Zen 6c, but 4 MB?? Thats news to me.

Yes that is correct

adroc_thurston · 2025-10-07T19:42:04-0400

Josh128 said:
No longer just 2 MB L3 per core for Zen 6c, but 4 MB??

yeah you get a flat unified 128M L3$ slab for the dense CCD.
Full socket is like a whole gig-o-cache. snazzy.

Joe NYC · 2025-10-08T01:56:29-0400

adroc_thurston said:
V$ is completely irrelevant for fmax SKUs.
It's completely irrelevant in server outside of specific HPC workloads.

I think that information is dated to Zen 3 / Zen 4 era, where you had fmax penalty to V-Cache. So the winning cases were only those where V-Cache could make up more than 300-400 MHz clock speed deficit.

In Zen 5, where boost and sustained clocks are much closer between V-Cache and non-V-Cache, V-Cache processors.

Overall, Phoronix showed 12% gain of V-Cache processors vs. non-V-Cache across server / workstation tests. Database tests, for example, showed big boosts in performance.

Now, that Intel is closed the gap a little bit with Xeon 6, I don't think AMD should extend the leads with Zen 6, so that Intel is, again, not competitive.

adroc_thurston · 2025-10-08T01:57:47-0400

Joe NYC said:
I think that information is dated to Zen 3 / Zen 4 era, where you had fmax penalty to V-Cache. So the winning cases were only those where V-Cache could make up more than 300-400 MHz clock speed deficit.

Oh I'm not talking about fmax penalties or anything.
V$ is just usable in a few niche workloads in DC and that's it.

Joe NYC said:
Database tests, for example, showed big boosts in performance.

They're fixed tiny worksets, irrelevant to how real DBs work irl.

Joe NYC said:
I don't think AMD should extend the leads with Zen 6, so that Intel is, again, not competitive.

venice-d inherently makes the lead bigger than ever.

Joe NYC · 2025-10-08T02:13:03-0400

adroc_thurston said:
Oh I'm not talking about fmax penalties or anything.
V$ is just usable in a few niche workloads in DC and that's it.

Those were the old comparisons. They showed some workloads with big leads, some tied and some where V-Cache chip was behind.

But once you eliminate the clock speed penalty, all workloads move up. That's why 9800x3d is leading 9700x in phoronics tests by 11.9%

AMD Ryzen 7 9800X3D Linux Performance: Zen 5 With 3D V-Cache Review - Phoronix

www.phoronix.com

adroc_thurston said:
They're fixed tiny worksets, irrelevant to how real DBs work irl.

Database tests are a mixture of data that can fit and that cannot fit in the caches. More table you can fit into cache, the less bandwidth is consumed, and more processing can take place at speed of L3 latency.

adroc_thurston said:
venice-d inherently makes the lead bigger than ever.

For Oracle databases, you would probably want classic Zen 6, due to licensing costs.

And offering them with classic Zen 6 with V-Cache would move the performance a generation ahead. Something like Zen 7 performance out of Zen 6 processor.

We can assume that Zen 6 will already be quite well endowed as far as clock speeds, so the main thing holding it back will be memory latency.

adroc_thurston · 2025-10-08T02:20:43-0400

Joe NYC said:
That's why 9800x3d is leading 9700x in phoronics tests by 11.9%

12% perf bump on a part that's normally crippled by cIOD being Pretty Bad?
Whoa man serious stuff right here.

Joe NYC said:
Database tests are a mixture of data that can fit and that cannot fit in the caches. More table you can fit into cache, the less bandwidth is consumed, and more processing can take place at speed of L3 latency.

The average perf bump will be tiny.

Joe NYC said:
We can assume that Zen 6 will already be quite well endowed as far as clock speeds, so the main thing holding it back will be memory latency.

Joe NYC said:
And offering them with classic Zen 6 with V-Cache would move the performance a generation ahead. Something like Zen 7 performance out of Zen 6 processor.

no, 10% skt perf bump is not "Florence-like performance".
Please get real and stop projecting nerd dreams onto products made for serious people.

basix · 2025-10-08T02:50:39-0400

V-Cache in servers will probably get less relevant in the future:
- Z5 = 32 MByte per CCD
- Z6 classic = 48 MByte per CCD
- Z6c = 128 MByte per CCD
- Z7 = 7 MB/core are rumored, not clear if classic or dense. I would assume dense. Which would mean 224MB+ per CCD. Classic could stick to that or move to even more cache per core, because their cores are bigger

For some use case V-Cache will still bring a performance bump. Bit the margin should get thinner.

adroc_thurston · 2025-10-08T02:52:43-0400

basix said:
Bit the margin should get thinner.

Margin's already paper thin as it is.

basix · 2025-10-08T03:13:49-0400

Some applications show >50% performance increase with V-Cache (some CFD solvers, RTL simulation, ...). I would not say that this is thin. But that will get thinner, the more cache per core and in total per CCD you have. Diminishing returns as usual.

V-Cache benefits are very application specific. For most use cases the gains are slim. So V-Cache SKUs are not a no-brainer. You need to know if it's worth for your use case.

itsmydamnation · 2025-10-08T04:22:55-0400

I wonder if they would ever go memory side cache v-cache. those would/should get better hit rates on in memory DB's or systems using > then a CDD core allocation, obviously worse latency but you will still execute sooner then going all the way to slow ass server dram.

basix · 2025-10-08T05:19:46-0400

You need very big amounts of such a memory side cache to get gains compared to a large L3$. Theoretically possible (e.g. stacking a large cache below the IOD), but not sure how big the gains will be.
I think it would make more sense to stack DRAM instead of SRAM on the IOD. Because of being stacked, you could reduce latency and increase bandwidth compared to regular DDR. It will be worse than SRAM in both regards, but feature much higher capacities (e.g. 16 GByte instead of 512 MByte). The result would be similar like what Intel did with their Xeon Max HBM integration, but without the high HBM costs.

StefanR5R · 2025-10-08T05:47:42-0400

[dense CCD cache size]

Josh128 said:
Where are you getting this from?

I'd answer if I only could. It's been claimed a while ago, certainly here in this thread. Unfortunately it is too hard to keep track of where the rumors are buried within the speculation.

[CCD--IOD IF width]

BorisTheBlade82 said:
Well, for AMD it would have been easy to do just what you wrote with Strix Halo - but they decided to keep the exact same bandwidth.

adroc_thurston said:
it's not the exact same since the writes are symmetrical to reads now, both 32B a clock cycle.

I understood that Strix Point has this also, per CCX. (On-die there, of course, but still.) Strix Halo's off-die fabric design was evidently a one-step-at-a-time thing.

BorisTheBlade82 said:
[...] I fear them to align more with the lower bound described above than anything else.

I sure hope they don't skimp on that; power requirements are supposed to be a lot lower. Their cores have so much SIMD execution width nowadays... Furthermore, concerning their die-to-die interconnects, hopefully they don't regress from the impressive internal uniformity of their current sIOD, if the next sIOD is made of two chiplets instead.

ToTTenTranz · 2025-10-08T06:15:57-0400

So the only difference between Zen 6 and Zen6 dense is transistor library? Or is the FPU less wide as well?

I guess now it makes more sense that the PS6 handheld only uses Zen6c cores. Performance should scale better towards the home console CPU cores.

BorisTheBlade82 · 2025-10-08T06:47:34-0400

basix said:
That makes sense:
- 64B/clk = 205 GByte/s at MCRDIMM-12'800
- 128B/clk = 410 GByte/s at MCRDIMM-12'800

That is perfectly suited so that you can max. out the 1.6 TB/s total memory bandwidth with 8x 12C chiplets (96C) or with all Zen 6c SKUs (4x 32C = 128C or more CCDs).

Isn't the big unknown the clock for a new interconnect? I mean, with a BoW couldn't they also simply go very wide per clock but clock rather low as long as latency does not nosedive?
I mean, in isolation 2,2 GHz is only worth 0,5ns of latency, so going down to 1 GHz would still be only around 1ns.

adroc_thurston · 2025-10-08T10:01:41-0400

ToTTenTranz said:
So the only difference between Zen 6 and Zen6 dense is transistor library

And physdes. Makes all the difference in the world.

ToTTenTranz said:
I guess now it makes more sense that the PS6 handheld only uses Zen6c cores

It's only dense because of a magical thing called 'cost'.

BorisTheBlade82 said:
Isn't the big unknown the clock for a new interconnect?

There is no 'interconnect', it's just wires.
Everything's running at fclk speed for very obvious reasons.

basix · 2025-10-08T12:10:47-0400

You could theoretically clock it to 50% or 200% fclk and change the bus width accordingly for the chip-to-chip interface. But that is probably not worth it and you are adding some SERDES again (altough very simple 2:1 or 1:2 ones). I mean, 3.2 Gbps PHY speed is very slow and energy efficient. I do not see a reason the reduce clocks even further. RDNA 3 MCDs used much higher 9.2 Gbps, HBM runs also at 8+ Gbps and IFOP xGMI links even higher at 16 Gbps (Zen 3) or 32 Gbps (Zen 4).

When maybe adding 2:1 SERDES again for DDR6 (keep fabric clock at 3.2 GHz due to energy efficiency reasons but increase PHY speed to 6.4 Gbps) you are still in a quite cozy frequency range and keep a bunch-of-wires connecting scheme.
Or you can simply increase bus width, if Die area allows it.

adroc_thurston · 2025-10-08T12:15:37-0400

basix said:
But that is probably not worth it and you are adding some SERDES again

SERDES is a very specific thing, and putting d2d sludge into a separate clock domain does *not* make it serdes lmao.

basix said:
RDNA 3 MCDs used much higher 9.2 Gbps

uh no they ain't.

basix said:
When maybe adding 2:1 SERDES again for DDR6 (keep fabric clock at 3.2 GHz due to energy efficiency reasons but increase PHY speed to 6.4 Gbps)

you do understand that 2.5D at 25um pitch makes pins cheap? SDP spam is way of the future(tm).

basix · 2025-10-08T12:29:52-0400

If you are changing interface width, but keep the bandwidth the same, you have to add SERDES in one way or another. Or change modulation from NRZ to PAM4, use DDR or whatever.
But you can explain, how you want to do that with just a separate clock domain. Show me your magic tricks.

And yeah, RDNA3 used 9.2 Gbps. If you don't believe me, check AMDs presentations.

adroc_thurston · 2025-10-08T12:36:54-0400

basix said:
If you are changing interface width, but keep the bandwidth the same, you have to add SERDES in one way or another.

uh, no?
no.
Again, nothing stops you from putting the d2d sludge in its own clk domain.
That's what Intel does on their client parts.

basix said:
how you want to do that with just a separate clock domain

how Intel does it.

Question Zen 6 Speculation Thread

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Elite Member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member