Info 64MB V-Cache on 5XXX Zen3 Average +15% in Games

Page 11 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Kedas

Senior member
Dec 6, 2018
355
339
136
Well we know now how they will bridge the long wait to Zen4 on AM5 Q4 2022.
Production start for V-cache is end this year so too early for Zen4 so this is certainly coming to AM4.
+15% Lisa said is "like an entire architectural generation"
 
Last edited:
  • Like
Reactions: Tlh97 and Gideon

Joe NYC

Diamond Member
Jun 26, 2021
3,240
4,738
136
AMD can't use N6 if they want to do stacking because the TSMC tech is N7 on N7 or N5 on N5

I think for stacking purposes, N6 might be considered N7 variant.

And when N5 on N5 becomes available, I think N5 on N7 and N7 on N5 stacking can also be used. But I am sure we will get some progress updates from TSMC along the way. We are still approx. one year away from N5 stacking.
 
  • Like
Reactions: Tlh97 and Gideon

Gideon

Platinum Member
Nov 27, 2007
2,013
4,992
136
I think for stacking purposes, N6 might be considered N7 variant.

And when N5 on N5 becomes available, I think N5 on N7 and N7 on N5 stacking can also be used. But I am sure we will get some progress updates from TSMC along the way. We are still approx. one year away from N5 stacking.
Yeah 99% 6nm is 7nm equivalent in terms of stacking they already are really similar. TSMC predicts over 50% of 7nm users to pivot to 6nm. No way they'll just shut them out of stacking
 
May 17, 2020
123
233
116
In the case of AMD stacking SRAM on top on L3 on CCD, the SRAM stacked and the L3 should have the same surface area, so it should be made both in same process. How the TSV points will match between bottom SRAM and L3 cache CCD if it's in different process ? Maybe TSMC will bring too N6 on N6...
 
Last edited:

Joe NYC

Diamond Member
Jun 26, 2021
3,240
4,738
136
In the case of AMD stacking SRAM on top on L3 on CCD, the SRAM stacked and the L3 should have the same surface area, so it should be made both in same process. How the TSV points will match between bottom SRAM and L3 cache CCD if it's in different process ? Maybe TSMC will bring too N6 on N6...

There are defined fixed positions of the TSVs, and they must be consistent, even if other areas of die shrinks.

As far as L3 SRAM, I think it is going to be nearly certain to be on N6, even if the Zen 3 CCD stays at N7. But it would be nice if AMD transitioned Zen 3 CCDs to N6 as well.
 

biostud

Lifer
Feb 27, 2003
19,731
6,808
136
I think I am going to disagree here.

AMD specifically emphasized gaming performance. AMD is now fully on board in keeping that crown. The best gaming performance could be obtained from single chiplet 8 core CPU with high L3 cache.

2 reasons:
- higher power to cores in 8 core vs. 16 core config, can lead to higher sustained turbo clocks
- 1 chiplet eliminates coherency traffic and some memory latency hit

The fact that 5950 has a higher boost clock just means someone was not thinking things through fully.

The easiest way for AMD to keep gaming performance crown is to release highest clock 8 core CPU with 1-4 layers of V-Cache.

I think they might release an 8 core if there is a gap in the market.
 
  • Like
Reactions: Tlh97 and Joe NYC

Topweasel

Diamond Member
Oct 19, 2000
5,437
1,659
136
There's no guarantee that Genoa will have vcache options at launch. It's entirely possible those come later.
You may be right. But I am assuming there is a cache increase in Genoa anyways, don't think they need to race themselves in cache sizes at this point (need to stay ahead but they are by miles already). But its more likely AMD would launch as a second or third batch of sku's 6 months or more into the Genoa 2+ years of offering (specially if Zen 5 is Big.little and not meant for enterprise) 2 years into Milan. I think its for Ryzen and maybe TR at first. RDNA3 second then Genoa. Two to three opportunities and uses cases to make sure it works and yields are good before trying to apply it to your server product.
 
  • Like
Reactions: Tlh97

naukkis

Golden Member
Jun 5, 2002
1,004
849
136
You may be right. But I am assuming there is a cache increase in Genoa anyways, don't think they need to race themselves in cache sizes at this point (need to stay ahead but they are by miles already). But its more likely AMD would launch as a second or third batch of sku's 6 months or more into the Genoa 2+ years of offering (specially if Zen 5 is Big.little and not meant for enterprise) 2 years into Milan. I think its for Ryzen and maybe TR at first. RDNA3 second then Genoa. Two to three opportunities and uses cases to make sure it works and yields are good before trying to apply it to your server product.

They don't need to be massively ahead, but if they could with some product they could sell it for arm and leg. Sure staying ahead isn't AMD main priority but making buttloads of money......
 

Zucker2k

Golden Member
Feb 15, 2006
1,810
1,159
136
They don't need to be massively ahead, but if they could with some product they could sell it for arm and leg. Sure staying ahead isn't AMD main priority but making buttloads of money......
I don't think the two are mutually exclusive in this instance.
 
  • Like
Reactions: Joe NYC

naukkis

Golden Member
Jun 5, 2002
1,004
849
136
I don't think the two are mutually exclusive in this instance.

I mean that gigabytes of SRAM version of any product is a niche product for special case where customers are happily paying 50K + per chip to get them. So AMD won't delay such a product but make and sell them just as soon as they could.
 

Topweasel

Diamond Member
Oct 19, 2000
5,437
1,659
136
They don't need to be massively ahead, but if they could with some product they could sell it for arm and leg. Sure staying ahead isn't AMD main priority but making buttloads of money......
But they are already ahead by a huge margin 3x more than Intel's glued CPU. More than a 2s system of socket. They could have half a gig of cache in a 2s system right now. So unless a company signed a contract saying they would buy X of Milan with 1GB of L3 for insane amounts of money, AMD wouldn't do it. The margin is pretty damn great there but the process for getting that out the door, testing and dev wise would be pretty high for a CPU that otherwise already be getting sold (again Milan is sooooooo far ahead if it mattered these guys are already purchasing Milan for it). So yeah Semi custom and a customer buying thousands at 50k-100k pop, I could buy it. But I don't think it would have been a normal part of the dev plan as again Milan3d considering Milan's lead would just be canabalizing sales.
 
Last edited:

jpiniero

Lifer
Oct 1, 2010
16,493
6,987
136
Milan-X could be a bit of a trial, see how much of an effort it would be to do the stacked cache at volume.
 

dr1337

Senior member
May 25, 2020
477
769
136
But I don't think it would have been a normal part of the dev plan as again Milan3d considering Milan's lead would just be canabalizing sales.
The server market moves so slowly though, Rome is still their majority selling platform. Im not sure they would cannibalize sales at all by announcing faster skus. And after all there is no such thing as a free lunch, the extra cache will help many workloads but it will require more power if not exceptional binning, or lowered clocks. Also don't forget it will absolutely add cost over regular milan. Not all customers are going to have that need and I assume most will keep buying skus from the mainstream enterprise lineup.
 

TBytemaster

Junior Member
Jun 23, 2020
7
19
91
Lurker here chiming in, forgive me if this thought is silly.

Might it be that Zen 3 with stacking is coming to AM5? AMD demonstrated/showed off a sample on an AM4 package, (at least as far as I am aware), but would there be anything major stopping AMD from using a new AM5/DDR5 IO die with these stacked zen 3 chiplets?

Maybe even launch it on AM4 first and then AM5 shortly after to get the new platform rolling? Or both at once?

Could also help smooth out any DDR5 or other platform teething issues ahead of Zen 4 by getting early adopters/enthusiasts to stress test AM5.

(Edit for grammar/clarity)
 
Last edited:

Joe NYC

Diamond Member
Jun 26, 2021
3,240
4,738
136
The server market moves so slowly though, Rome is still their majority selling platform. Im not sure they would cannibalize sales at all by announcing faster skus. And after all there is no such thing as a free lunch, the extra cache will help many workloads but it will require more power if not exceptional binning, or lowered clocks. Also don't forget it will absolutely add cost over regular milan. Not all customers are going to have that need and I assume most will keep buying skus from the mainstream enterprise lineup.

There is no rush to announce faster Milan X SKUs now.

But AMD should release it coinciding with release of Sapphire Rapids.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,240
4,738
136
Lurker here chiming in, forgive me if this thought is silly.

Might it be that Zen 3 with stacking is coming to AM5? AMD demonstrated/showed off a sample on an AM4 sample (at least as far as I am aware), but would there be anything major stopping AMD from using a new AM5/DDR5 IO die with these stacked zen 3 chiplets?

Maybe even launch it on AM4 first and then AM5 shortly after to get the new platform rolling? Or both at once?

Could also help smooth out any DDR5 or other platform teething issues ahead of Zen 4 by getting early adopters/enthusiasts to stress test AM5.

AM5 motherboards are only coming out mid 2022, so a little late for Zen 3D. Also, some of the good features of Zen 3D will be socket compatibility (for upgraders and platforms that don't have a super high premium.

AM5 motherboards will start at higher price and DDR5 will add additional cost to the new builds.

But there are some rumors about future Zen 3 APUs that will support DDR5. The problem with existing APUs is that the can't accommodate stacking. Future ones? We will see...
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
Lurker here chiming in, forgive me if this thought is silly.

Might it be that Zen 3 with stacking is coming to AM5? AMD demonstrated/showed off a sample on an AM4 package, (at least as far as I am aware), but would there be anything major stopping AMD from using a new AM5/DDR5 IO die with these stacked zen 3 chiplets?

Maybe even launch it on AM4 first and then AM5 shortly after to get the new platform rolling? Or both at once?

Could also help smooth out any DDR5 or other platform teething issues ahead of Zen 4 by getting early adopters/enthusiasts to stress test AM5.

(Edit for grammar/clarity)
This has been brought up previously. If the Zen 4 IO die uses the same IFIS (SerDes) then it should be backward compatible. It is like pci-express backward comparability; there wouldn’t be any technological reason why they wouldn’t be able to make such a part. It might be useful to get AM5 shipping before Zen 4 is available. Would you buy it though? With Zen 4 likely coming soon after it, it doesn’t seem like there is that much reason to make it.

the Zen 4 IO die might be incompatible anyway, especially if it uses embedded silicon bridges (LSI) for the IO die to CCD connections. There are quite a few different reasons for either serdes connections or LSI, so it is hard to tell.
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
My, completely wild guess would be that there would be 6950x and 6800x, both 2x8 and 1x8 core versions, and no 6 core version (as was demonstrated by Lisa Su).

Because, Zen 3D stacking is based on known good die, so why waste stacking on die that has only 6 good cores, if there are plentiful die with 8 good cores?
They might use one cache die all the way down to 6 core parts. Anything lower should probably just be an APU anyway. M The cache die are probably relatively cheap. It is only 36 mm2, which is tiny. It probably doesn’t use very many metal layers and it might be made on N6 for actual volume production. That will have more EUV which avoids double and quad patterning (multiplies the number of mask and process steps by 2 or 4 for some layers). They may be making CCD with stacked cache all of the way down to 1 active core for Epyc, like the current 8 core 72F3 (8-core, 256 MB of cache). I kind of hope that we get a full N6 version for Zen 3D. I still doubt that we will see 4 high stacking in anything other than very high end Epyc. The full 288 MB per CCD version, if it exists, would have 2304 MB total L3 cache, which is truly ridiculous.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,311
2,900
136
I suspect that there is still space in the SKU stack for a 12 core part and an 8 core part. It makes sense to me to bin the CCDs for stacking against thermal efficiency, in that, the stacking will likely reduce the thermal disipation rate of the CCDs, so binning the dies for lowest power consumption at the target all core boost so that they can achieve a clock bump over the existing parts while also getting the L3 stack as well seems doable. I think that the 12 core part could be a particularly interesting part as the individual cores could boost higher together and, with 96MB of L3 with the 64MB stack will give 16MB of L3 per core if the cache is being thrashed. On the desktop, and currently on low end HEDT, the 12 core part will be unparalleled in cache per core and likely all core clocks.
 

bigboxes

Lifer
Apr 6, 2002
41,831
12,341
146
Well pretty much yeah. Except it was the 1.13GHz model that was recalled, not the 1GHz. Small detail, not trying to nitpick just wanted to clarify in case someone tried to go search for info on it.

AMD did beat Intel to 1GHz by a matter of a week or two maybe, but the Pentium 3 model was a paper launch whereas you could get the Athlon 1 GHz.

Kyle Bennett, there's a name I haven't heard in awhile. He was right up there with Anand. Anandtech was involved in the recalled P3 as well. With so many reputable sources confirming the problem Intel had no choice but to go into damage control mode.

I was here for all that. Using Anand's advice, I built my first PC with the Athlon 1200 (Thunderbird). That was a blazing fast pc for it's day. I had 768MB of ram (3 x 256).
 

Kedas

Senior member
Dec 6, 2018
355
339
136
Yeah 99% 6nm is 7nm equivalent in terms of stacking they already are really similar. TSMC predicts over 50% of 7nm users to pivot to 6nm. No way they'll just shut them out of stacking
About stacking different nodes I don't see an obvious problem to do that, the contacts are 900nm pitch metal layer, I'm pretty sure we can make that with all the latest process nodes even 12nm, the only 'problem' that I could see is different transistor properties when you want them to be as equal as possible on both ends of the wire, but you can take that in your design.
 
  • Like
Reactions: Tlh97 and Gideon

jamescox

Senior member
Nov 11, 2009
644
1,105
136
I suspect that there is still space in the SKU stack for a 12 core part and an 8 core part. It makes sense to me to bin the CCDs for stacking against thermal efficiency, in that, the stacking will likely reduce the thermal disipation rate of the CCDs, so binning the dies for lowest power consumption at the target all core boost so that they can achieve a clock bump over the existing parts while also getting the L3 stack as well seems doable. I think that the 12 core part could be a particularly interesting part as the individual cores could boost higher together and, with 96MB of L3 with the 64MB stack will give 16MB of L3 per core if the cache is being thrashed. On the desktop, and currently on low end HEDT, the 12 core part will be unparalleled in cache per core and likely all core clocks.
Given the “chonky” package renderings that have been showing up, which may be completely bogus, I am wondering if the chiplet based parts will actually have an integrated vapor chamber. Spreading the chips apart rather than monolithic helps with the thermal density, but we are talking about a lot of power out of a very small chip at 5 nm. Perhaps they will have APUs with just a cheap lid.

The stacked cache chips stack over top of the existing cache. The die has been thinned down with some silicon spacers added on top, but the total thickness is the same, so the heat has the same amount of material to transfer through. There is an extra thermal interface, but the silicon will be exceptionally flat, so I don’t think it matters. The thermals shouldn’t be much worse.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,311
2,900
136
While I agree that they have taken measures to aid in thermal transfer, the 64MB of L3 cache is not free with respect to energy disipation. In addition, while the thermal interface between the stacked due will be highly optimized, it is not perfect. I think that the CCDs will have to be very carefully binned.

I would like to be surprised and have AMD use N6 for these CCDs. The reduction in energy draw and additional performance available on that node would likely be enough to push Zen3d into a performance leadership position against Alder Lake's best parts.
 
  • Like
Reactions: Tlh97 and Joe NYC

Joe NYC

Diamond Member
Jun 26, 2021
3,240
4,738
136
They might use one cache die all the way down to 6 core parts. Anything lower should probably just be an APU anyway. M The cache die are probably relatively cheap. It is only 36 mm2, which is tiny. It probably doesn’t use very many metal layers and it might be made on N6 for actual volume production. That will have more EUV which avoids double and quad patterning (multiplies the number of mask and process steps by 2 or 4 for some layers). They may be making CCD with stacked cache all of the way down to 1 active core for Epyc, like the current 8 core 72F3 (8-core, 256 MB of cache). I kind of hope that we get a full N6 version for Zen 3D. I still doubt that we will see 4 high stacking in anything other than very high end Epyc. The full 288 MB per CCD version, if it exists, would have 2304 MB total L3 cache, which is truly ridiculous.

I think it comes down to packaging cost.

By some calculations, price of 36mm2 N7 normal, logic chip is estimated to be $6. If you take into consideration smaller number of metal layers of SRAM and fewer processing steps on N6 node, the price can drop even lower, to almost a trivial amount

As far as capacity, the wafer capacity is majorly influenced by how quickly the wafer makes it through the fab. If the SRAM wafer makes it through the fab twice as fast, than the wafer capacity doubled.

TMSC has this technology of Wafer on Wafer stacking. So, in theory, TSMC can stack and bond 4 wafers, and then some 1600 individual SRAM dies get bonded all at once, and then the SRAM could already be in stacks of 4.
 

Kedas

Senior member
Dec 6, 2018
355
339
136
I would like to be surprised and have AMD use N6 for these CCDs. The reduction in energy draw and additional performance available on that node would likely be enough to push Zen3d into a performance leadership position against Alder Lake's best parts.
7nm to 6nm is a production improvement not a product improvement.
So they get more dies for less cost, not better die performance.
(unless you have some TSMC info that claims otherwise)

Doesn't mean they cannot tweak the design to improve it.