Question Zen 6 Speculation Thread

Page 180 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Joe NYC

Diamond Member
Jun 26, 2021
3,264
4,749
136
https://www.anandtech.com/show/16026/tsmc-teases-12-high-3d-stacked-silicon The technology AMD uses to 3D stack L3, SOIC, has been shown to support 12-hi stacks since 2019 with test chips. It shouldn't come as a surprise that AMD's got the ability in 2025/2026 to go above 1-hi layers
View attachment 126740

There must be some cost to validate > 1 layer of V-Cache, and past sales of V-Cache CPUs probably did not justify the investment to turn it into a product.

Fast forward from 2022 to 2025 level sales of V-Cache processors, underlying assumptions have changed. Best gaming CPUs are now synonymous with V-Cache and sell in high numbers.

In addition to high profit margins, V-Cache CPUs are a free advertisement for AMD. A brand building product. Now, that Intel is going to be releasing bLLC, I think Lisa will want to smack it down really hard.

Additional layer would do just that, but it would do even more. It would show AMD's technological superiority, showing that AMD can just slap an extra layer while Itel has to redesign the entire CPU chiplet every time Intel wants to give it more L3.
 
  • Like
Reactions: Tlh97 and Win2012R2

adroc_thurston

Diamond Member
Jul 2, 2023
6,038
8,526
106
There must be some cost to validate > 1 layer of V-Cache
They already validated 4-Hi stacks on Milan-X.
4 years ago.
Fast forward from 2022 to 2025 level sales of V-Cache processors, underlying assumptions have changed. Best gaming CPUs are now synonymous with V-Cache and sell in high numbers.

In addition to high profit margins, V-Cache CPUs are a free advertisement for AMD. A brand building product. Now, that Intel is going to be releasing bLLC, I think Lisa will want to smack it down really hard.

Additional layer would do just that, but it would do even more. It would show AMD's technological superiority, showing that AMD can just slap an extra layer while Itel has to redesign the entire CPU chiplet every time Intel wants to give it more L3.
I get that fatter L3 piles gives you a raging stiffy, but the ROI of 2-Hi SoIC is zero.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,264
4,749
136
But how do you implement it? There is a problem that
If you want to use 3D-V Cache not only from the CPU but also from other units, you have to think about where to put it.

You can look at how it is implemented in Strix Halo. Strix Halo compute chiplet has its own L3, and MALL cache is programmed to priorities GPU requests.

But it is a small MALL. 32 MB SRAM vs. Zen 6 12 chiplet with its own V-Cache would be 144 MB SRAM.

L3 can't be moved away from CPU cores and the ring bus. The latency penalty would be too high, it would defeat the purpose.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,264
4,749
136
They already validated 4-Hi stacks on Milan-X.
4 years ago.

I get that fatter L3 piles gives you a raging stiffy, but the ROI of 2-Hi SoIC is zero.

My points were:
- ROI in 2026 >> ROI 2022
- free advertisement worth $100 of millions

Your point was valid in 2022, but probably no longer valid in 2026
 

adroc_thurston

Diamond Member
Jul 2, 2023
6,038
8,526
106
My points were
You had none.
It's all imaginary reasons for personal feefees.
- ROI in 2026 >> ROI 2022
Nope, games aren't more cache-local than they used to be.
- free advertisement worth $100 of millions
the what.
Your point was valid in 2022, but probably no longer valid in 2026
It's valid forever until the games neatly fit their workset into a 240M-sized slab of L3 (which is never).
 

Joe NYC

Diamond Member
Jun 26, 2021
3,264
4,749
136
Nope, games aren't more cache-local than they used to be.

It's valid forever until the games neatly fit their workset into a 240M-sized slab of L3 (which is never).

It's about cache miss rate.

If the cache size goes up by a factor of 2.5x (240 MB vs. 96 MB), then the cache miss rate would go down to of 63% of the original miss rate.

You don't have to fit the entire instruction / dataset to L3 to realize the benefit. Any increase provides a benefit.

The only time there would be no benefit if the game / task already fit into L3 - which is rarely the case.
 

adroc_thurston

Diamond Member
Jul 2, 2023
6,038
8,526
106
It's about cache miss rate.
Well duh.
If the cache size goes up by a factor of 2.5x (240 MB vs. 96 MB), then the cache miss rate would go down to of 63% of the original miss rate.
Point being is that 96M is already enough for hot data, and mem hits won't stop until games fit directly into cache (which they won't).
Any increase provides a benefit.
Very minor.
Again, you should stop.
2-hi V$ is just not happening.
 

adroc_thurston

Diamond Member
Jul 2, 2023
6,038
8,526
106
Building a brand, marketing.
the what.
You may not be familiar, or may not like it, but like it or not, they have a big impact on whether a CPU is sold or not.
You have a big impact when you have the fastest gaming CPU.
People dgaf about how much L3 it has or if it comes with a free redacted pony).
 
Last edited by a moderator:
  • Like
Reactions: Io Magnesso

Joe NYC

Diamond Member
Jun 26, 2021
3,264
4,749
136
You have a big impact when you have the fastest gaming CPU.
People dgaf about how much L3 it has or if it comes with a and a pony).

That's what I am saying. Having the fastest CPU is worth $100 million in marketing / shilling / bribing of OEMs.

With that (fastest CPU, brand loyalty) you can also sell for higher price. And you can also up-sell. To say 2-hi V-Cache.
 
  • Haha
Reactions: Io Magnesso

Io Magnesso

Senior member
Jun 12, 2025
562
148
71
You can look at how it is implemented in Strix Halo. Strix Halo compute chiplet has its own L3, and MALL cache is programmed to priorities GPU requests.

But it is a small MALL. 32 MB SRAM vs. Zen 6 12 chiplet with its own V-Cache would be 144 MB SRAM.

L3 can't be moved away from CPU cores and the ring bus. The latency penalty would be too high, it would defeat the purpose.
what are you talking about?
you first You've brought up the story of Infinity Cache, right?
 
Last edited:

LightningZ71

Platinum Member
Mar 10, 2017
2,322
2,915
136
b) no one actually uses iGPUs.
The what?!?!

Are you smoking the good stuff and not sharing? Quick, someone rush out and tell AMD that they're doing the Z series all wrong! Berate them for continuing to expand the iGPU on all their mobile processors. Call them absolutely ignorant for developing the processors for the major consoles and the Steam Deck! Roast them all over the investor calls for the fortune they spent for their Halo line.

ALL of those things use variations on a theme: iGPUs.

Now, if you just mean their desktop processors with their puny "makes a monitor light up" iGPUs, then still the majority of their customers STILL use their iGPUs. They just don't give an excrement about the performance.
 

adroc_thurston

Diamond Member
Jul 2, 2023
6,038
8,526
106
The what?!?!
yeah.
Quick, someone rush out and tell AMD that they're doing the Z series all wrong!
Rebrands of existing parts?
Berate them for continuing to expand the iGPU on all their mobile processors.
MDS1 literally has half the iGFX of Strix1.
Call them absolutely ignorant for developing the processors for the major consoles and the Steam Deck!
Consoles are a distinct market with millions of units.
Gabeboy uses MS Surface salvage.
ALL of those things use variations on a theme: iGPUs.
Which is getting smaller with Medusa. Because no one actually really uses the GFX (where it matters, in ultrathin laptops. Including commercial).
 
  • Like
Reactions: Io Magnesso

LightningZ71

Platinum Member
Mar 10, 2017
2,322
2,915
136
Rebrands of existing parts?
That all use the iGPU at max performance...
MDS1 literally has half the iGFX of Strix1.
You know full well that the iGPU of STP1 was an oversized relic of when it had MALL cache and no NPU in early design. 16CU is WAY too much for dual DDR5.
Consoles are a distinct market with millions of units.
Gabeboy uses MS Surface salvage.
Which all rely on the iGPU
Which is getting smaller with Medusa. Because no one actually really uses the GFX (where it matters, in ultrathin laptops. Including commercial).
You mean right sized for the rest of the chip? Still uses the iGPU regularly, often even when configured with a dGPU when on battery.
 

adroc_thurston

Diamond Member
Jul 2, 2023
6,038
8,526
106
That all use the iGPU at max performance...
In a tiny irrelevant market.
You know full well that the iGPU of STP1 was an oversized relic of when it had MALL cache and no NPU in early design. 16CU is WAY too much for dual DDR5.
idk chief, 8CU is smaller than even 12CU in Phoenix.
We're so back!
Which all rely on the iGPU
Calling that iGPU is very dishonest.
You mean right sized for the rest of the chip?
Smaller than everything shipped since 2022?
8CUs, completely and utterly castrated versus LPDDR speeds they'll be shipping for MDS1.
Too bad!
Still uses the iGPU regularly, often even when configured with a dGPU when on battery.
You don't need more than 4CUs for that anyway.
 
  • Like
Reactions: SteinFG

HurleyBird

Platinum Member
Apr 22, 2003
2,800
1,528
136
There would still be the problem of inter-CCD latency that way.

I'm not thinking about gaming (although it would help those scenarios where the 9950X3D lags the 9800X3D), but more the awkwardness of having two sets of cores that sometimes have next to no performance difference, and other times an enormous difference.

The dual CCD parts are for productivity and gaming, where the second CCD is basically wasted on gaming. The frequency hit is so minimal now that it would be nicer if both CCDs had the same performance profile.
 

soresu

Diamond Member
Dec 19, 2014
3,898
3,331
136
They've only improved it just about now in 5.6 - most big game dev use much older versions as it takes 5-6-7 years to make game these days, and upgrading isn't trivial so they will ship using older versions most certainly. And frankly 5.6 isn't exactly solving the problem completely, they only hope to achieve it in UE 6 - so that's for games a decade from now.
It's a big shift starting at the lowest levels of the engine code.

Like a similar effort on Firefox/Gecko (Project Electrolysis) it takes time.

Like bringing in new features starts with one version at experimental -> beta in a later version and then finally production on an even later version.

Only this effort is rewriting fundamental parts of the engine rather than just adding parts on, so it's going to affect (and potentially break) everything sitting on top of it, which is going to require an insane amount of testing by comparison.
 
Last edited:

MS_AT

Senior member
Jul 15, 2024
742
1,502
96
The dual CCD parts are for productivity and gaming, where the second CCD is basically wasted on gaming. The frequency hit is so minimal now that it would be nicer if both CCDs had the same performance profile.
The second x3D CCD would be less than ideal for gaming, unless engine developers would take CPU topology into account, and would try to avoid placing threads that like to often talk to each other on different CCDs.
 
  • Like
Reactions: Io Magnesso