This seems to blow past one of the key features of using die stacking. You can (and AMD has specifically stated they are) use a variant of a given process that is FAR more ideal for L3 cache when you use die stacking for L3. When you put a large L3 on a die with the CPU core, you have to balance the process between achieving the desired density in the L3 and the desired performance of the CPU core transistors. If AMD decides to, for example, swing the process for the CPU die over to one that favors the core + L2 over the L3 density, and reduces the L3 space on the CPU die to half or eve nothing, they can achieve far higher transistor density and better performance for that part, while reaping the benefits of a more L3 cache friendly process on the stacked cache die. You can hide the latency hit for the L3 by expanding the L2, which we know that AMD is doing with Zen4.It is not a straight answer, there are pros and cons of increasing cache
For applications that have huge datasets it is a big plus but not for all.
Therefore V Cache is the best solution, stacking additional dies only on the specific SKUs.
Good writeup here on why increased cache at the cost of latency is not good for most use cases
IBM showed off a giant 256 MB L3 during its Telum presentation at Hot Chips 2021, and ignited discussion about whether that represents the future of caches. That’s not the first time we’…chipsandcheese.com
L3 design is also a tradeoff, you can make L3 faster but the cost is density and power.
There are so many dials and levers at play.