- Mar 3, 2017
- 1,777
- 6,791
- 136
This makes too much sense. You can probably vary memory type and capacity on a per module basis, giving major flexibility with target markets. How much capacity is the million dollar question.I think Mi300 (rather than Zen 5) is the product where AMD is going "all in".
Since, according to rumors and AMD Financial Analyst Day presentation, the compute modules placed on the base die may be interchangeable, it might be possible that the Mi300 approach may overtake other dedicated implementations in CPU (desktop, server) APU, GPU.
The basic approach of Mi300 is rumored to be a large (360 mm2?) N6 die, connected to HBM modules and stacked on top of the base die, there would be 2 compute units on N5 node:
View attachment 72428
It doesn't really make sense since AMD already confirmed Zen 5 vanilla and Zen 5 3D stacked variants. So unless they just have a lineup without L3 at all (which I HIGHLY doubt), I don't think 3D stacked L3 cache is going to be the standard on zen 5This whole thing sounds like complete nonsense. Doubling the cores? +30% IPC? Unified L2 cache? Unified stacked L3 for everything? Yeah, I'm calling BS.
Coudl X sku's have vcache and non-X won't have be a possibility?It doesn't really make sense since AMD already confirmed Zen 5 vanilla and Zen 5 3D stacked variants. So unless they just have a lineup without L3 at all (which I HIGHLY doubt), I don't think 3D stacked L3 cache is going to be the standard on zen 5
It doesn't really make sense since AMD already confirmed Zen 5 vanilla and Zen 5 3D stacked variants. So unless they just have a lineup without L3 at all (which I HIGHLY doubt), I don't think 3D stacked L3 cache is going to be the standard on zen 5
Unfortunately, no one has done a multitasking benchmark comparison of 5800X and 5800X3D. I think the multitasking experience on the 5800X3D would be considerably improved as more application data would fit on the larger L3 cache.People keep acting as though v-cache is a magical solution to everything, when in reality is does very little or nothing for most productivity workloads.
The main reason why V-Cache might become the new normal is this:People keep acting as though v-cache is a magical solution to everything, when in reality is does very little or nothing for most productivity workloads. Insisting that it be a part of every CPU is just making them more expensive for the people who won't benefit from the inclusion.
It's certainly good for games, particularly so for a small handful of titles where it's better than moving to a top-end GPU, and cloud providers find it valuable, but for anyone else it really depends on your workload and mix. Mark has said it's good for some of his compute workloads, but for a lot of other professional work it doesn't do much if anything.
Unfortunately, no one has done a multitasking benchmark comparison of 5800X and 5800X3D. I think the multitasking experience on the 5800X3D would be considerably improved as more application data would fit on the larger L3 cache.
The main reason why V-Cache might become the new normal is this:
![]()
IEDM 2022: Did We Just Witness The Death Of SRAM?
IEDM 2022: Did we just witness the death of SRAM? While foundries continue to show strong logic transistor scaling, SRAM scaling has completely collapsed.fuse.wikichip.org
It's an interesting idea, but the misalignment between the amount of IO die silicon and compute die silicon might be a problem. Also, it means that for lower tier SKUs, they can't really scale the L3 cache down. Though perhaps the biggest issue is that it would really constrain them if they wanted to reuse the IO die for another gen.Now that the io-die is made on N6, an errant thought that's bouncing around in the back of my head, why not put L3 on it and stack the CPU on it? For more v-cache, add the layers in between the ccd and iod.
This would eliminate the fairly power-hungry and latency-adding interface between the iod and the ccd, and allow making L3 with the more economical N6 node which is still quite competitive in SRAM density.
Of course, since it would mean every product is stacked they would need a lot more manufacturing throughput. Probably not viable for Zen5 yet.
The entire L3 need not move. They can shrink the L3 on the compute die to 16 or 8 MB and have a large stacked L3 cache die in the 96MB range. This could allow more efficient die area usage, still allow lower end products without the stacked L3 die, allow more agressive core transistor growth for a given die size, or allows them more agressive ccd shrinkage.If he speculated as much myself, but that's going to be a difficult thing to quantify and the switching has to be frequent enough for people to care.
Unless the entire L3 cache can get moved it wouldn't make a lot of sense. The L1 and L2 cache sizes don't really need to increase because the added hit time probably doesn't offset the reduction in miss rate. Moving those to another layer might not be as viable due to timing reasons.
Just like IO which hasn't scaled well for a while now, the companies will just have to suck it up and accept that some parts of the die won't shrink. Smart designers will just use it as a way to help spread heat out more by moving hot spots close to the cache.
This. They can also do this with Zen 4 if they wanted. That would actually give them room to add a third chiplet or more cores. I don’t see AMD sticking with a 16 core limit when Intel is clearly willing to push things further.The entire L3 need not move. They can shrink the L3 on the compute die to 16 or 8 MB and have a large stacked L3 cache die in the 96MB range. This could allow more efficient die area usage, still allow lower end products without the stacked L3 die, allow more agressive core transistor growth for a given die size, or allows them more agressive ccd shrinkage.
Now that the io-die is made on N6, an errant thought that's bouncing around in the back of my head, why not put L3 on it and stack the CPU on it? For more v-cache, add the layers in between the ccd and iod.
This would eliminate the fairly power-hungry and latency-adding interface between the iod and the ccd, and allow making L3 with the more economical N6 node which is still quite competitive in SRAM density.
Of course, since it would mean every product is stacked they would need a lot more manufacturing throughput. Probably not viable for Zen5 yet.
The entire L3 need not move. They can shrink the L3 on the compute die to 16 or 8 MB and have a large stacked L3 cache die in the 96MB range. This could allow more efficient die area usage, still allow lower end products without the stacked L3 die, allow more agressive core transistor growth for a given die size, or allows them more agressive ccd shrinkage.
For me the writing is on the wall that exactly this will happen - Cache on legacy processes.The silicon cost ends up about the same either way unless the v-cache is on an older node that can integrate.
That's the whole point. SRAM doesn't scale further after N5 despite the higher cost. So V-Cache is bound to stay on N5, and as @LightningZ71 suggested it may even be worth moving existing SRAM to V-Cache to save area on the newer nodes.unless the v-cache is on an older node that can integrate.
Well, it doesn't scale for N3, and quite likely not even N2, but who knows what the future holds. I don't think SRAM scaling is completely dead.SRAM doesn't scale further after N5 despite the higher cost.
GAA should improve it again, to be seen by how much. But until then calling it dead is apt. SRAM not scaling makes it very expensive to keep at the same size, nevermind increase in size. No more doubling of cache sizes on monolithic dies. Moving SRAM to a separate die on N5 or older is the only economical solution.Well, it doesn't scale for N3, and quite likely not even N2, but who knows what the future holds. I don't think SRAM scaling is completely dead.
I think that SRAM cache dies will find their next long-term home on N4. While it doesn't offer much in the way of scaling, it should provide the needed switching speed to keep the L3 fast enough to be relevant. GAA should allow for a useful bump in total density, but, it will likely not carry forward shrinking even more. At that node, it'll still likely be more efficient to move that die area to N4 stacked cache (for at least a sizeable chunk of it) and use the rest on compute logic.
I would not be shocked if when backside power is available, the lowest cost per bit is still on some older process, despite the greatly increased density.
Where are you seeing that backside power delivery should double the density for SRAM?Beyond GAA, backside power delivery is expected to approximately double SRAM density (because at that point, the wiring will be the hard constraint by far, and backside power will approximately double the achievable density of logic wiring on the frontside). TSMC is now expecting that to show up in some N2 node (but not the first one).
However, backside power delivery will also add a lot of cost, because it will add a whole new flow of manufacturing steps, and a lot of new things that can hurt yield.
I would not be shocked if when backside power is available, the lowest cost per bit is still on some older process, despite the greatly increased density.
Yeah, reading this 'news' it looks like on die L3$ will wind up going the way of the dodo in a few node cycles. So stacked L3$ will rule the day then. I wonder if memory side caches will come into to alleviate some of the latency issues in the whole memory system (or L4$, either on the IOD). I don't know if memory side caches need to be snoopable, seems like it wouldn't be worth it if that was the case.Sure but even if that's the case you will benefit for lower cache layers that must remain on chip. Sure maybe it is cheaper to stack an L3 that is 3x as much silicon area per bit in an older process on top of a N2+ BPR die in exchange for increased latency.
Yeah, reading this 'news' it looks like on die L3$ will wind up going the way of the dodo in a few node cycles. So stacked L3$ will rule the day then. I wonder if memory side caches will come into to alleviate some of the latency issues in the whole memory system (or L4$, either on the IOD). I don't know if memory side caches need to be snoopable, seems like it wouldn't be worth it if that was the case.