Info 64MB V-Cache on 5XXX Zen3 Average +15% in Games

Page 61 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Kedas

Senior member
Dec 6, 2018
355
339
136
Well we know now how they will bridge the long wait to Zen4 on AM5 Q4 2022.
Production start for V-cache is end this year so too early for Zen4 so this is certainly coming to AM4.
+15% Lisa said is "like an entire architectural generation"
 
Last edited:
  • Like
Reactions: Tlh97 and Gideon

jamescox

Senior member
Nov 11, 2009
644
1,105
136
CoW makes more sense in this situation. AMD also doesn't have to dice before binning at this point, I don't think (not a digital guy so not sure what all they do but I know it's possible). They can use wafer probers with some basic tests to know which dies are which corners before stacking or dicing. They would still thin the wafers though so all the dies would need to either go into a V-cache SKU at that point, or be stacked with just the structural die to make up for the thinned die but I doubt AMD wants to take that path at this point.

I don't see why the CPU would need to be thinned to 20 microns, that's approaching the thickness of what the chip has to have to support something with 9-10 metal layers and is less than what TSMC needs to get a 12 level stack. I don't know if AMD is already thinning their wafers from the ~700 micron default from the fab, but it is extremely rare for companies to go less than ~250 micron or so. Handling the dies becomes a real pain after that. That would mean AMD needs to thin to ~100 micron to fit another layer on top and have the same height as their standard die. I have a feeling they start with a thicker standard die though.
This has been discussed before, either in this thread or zen 4 / Ryzen 7000 thread. Someone had a link for the 20 micron figure. I can’t find it right now. What you describe is not how it works. The thickness of the cpu die for stacking is not determined by the z-height requirement. The thickness for the cache die is likely determined by hitting the z-height requirement though.

The thickness isn’t just required height divided by the number of stacks. For the v-cache part, the cpu die must have TSVs for connection to the cache chip. The cache chip doesn’t need TSVs for a single stack high, so it can just be polished down to meet the z-height requirement. TSVs are made by etching holes in the wafer and filling with metal. To make a 250 micron thick die with TSVs they would need to etch holes over 250 microns deep. That might be on the edge of possible with current tech, but would likely not yield well at all.

So, current v-cache parts are likely super thin cpu die with the rest of the thickness from the cache die and filler silicon. The stack is likely made with CoW tech; cpu wafer is very thin, cache chiplets and filler silicon is thicker and already diced. If they go for more than 1 cache die stacked, then the cache die would probably be made with WoW (wafer on wafer) tech. They would stack something like 4 cache wafers, bond, then dice into chiplets. The cache chiplets would then be put into a carrier wafer with filler silicon and stacked on top of the cpu wafer (CoW) and then diced.

edit: this says less than 50 micron per layer:

Still can’t find the older discussion.

edit 2: Also, they would never handle die at 20 micron thickness. The cpu wafer would possibly be that thickness, but it would have a thick cache chip and filler silicon stacked on top before it is diced.
 
Last edited:

Schmide

Diamond Member
Mar 7, 2002
5,712
978
126

Saylick

Diamond Member
Sep 10, 2012
3,923
9,142
136
Dude, premium CPUs command premium prices. The 1800X was launched at $499 and was not even the top dog at gaming. Intel has always charged a pretty penny for top of the line(see 11700K vs 11900K).

The 5800X3D will launch at $500 or more, it's a Halo product for extreme gamers. This not for people worrying about budgets, like many of you do.
Just because it sits on top of AMD's internal stack doesn't mean it can command halo prices. It's still subject to the market forces. A halo product needs to be at the top of the stack, including a competitor's stack, to command halo prices. If your top SKU is only as fast as your competitor's middle level SKU, guess what, you're going to have to limit your asking price to that competitor's middle level SKU price. More importantly, that mid-level SKU ain't going to have halo level pricing.

The question remains: does the 5800X3D offer halo level performance to justify halo level pricing? For gaming, it might perform similarly to say a 12700K (MRSP $409), but for non-gaming workloads, it maybe potentially less impressive. I think Intel's Alderlake offers a lot of gaming performance for the dollar, and the 5800X3D has to have a price which reflects its gaming performance relative to Intel's stack and not within AMD's stack.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
So far these are the lists of games with 15%+ performance improvements(3D-V Cache) over a stock 5900X and 12900KS Alder Lake. Pay attention to the Tie/Even games where it does the same with the 12900K so it must be a frame cap or something. Also CS:GO Three of the CPUS ties, the 5900X, 12900K and 5800X3D so the issue is on the game.

DOTA 2: 18% Over 5900X - No info on 12900K
Monster Hunter World: 25% over 5900X - No info on 12900K
League of Legends: 4% Over 5900X - No info on 12900K
Fornite: 17% Over 5900X - No info on 12900K
Final Fantasy XIV: 20% Over 5900X - 20% Over 12900K
Shadow of the Tomb Raider: 10% Over 5900X - 10% Over 12900K
Far Cry V: 20% Over 5900X - 11% Over 12900K
Gears V: 12% Over 5900X - Tie/Even with 12900K
Watch Dogs Legion: 40% Over 5900X - Ties/Even with 12900K
CS:GO: Tie/Even with 5900X - Tie/Even with 12900K
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
The question remains: does the 5800X3D offer halo level performance to justify halo level pricing? For gaming, it might perform similarly to say a 12700K (MRSP $409)
What are you on about? The 5800X3D beats the 12900K At games. the 12700K is an after thought
 
  • Like
Reactions: Tlh97

Saylick

Diamond Member
Sep 10, 2012
3,923
9,142
136
What are you on about? The 5800X3D beats the 12900K At games. the 12700K is an after thought
The 12900K is a few percent faster than the 12700K for like $200 more (+50% price). According to your logic, if Intel priced the 12900K at $10000, then that makes the 5800X3D a $10000 product then?
 

Hitman928

Diamond Member
Apr 15, 2012
6,642
12,245
136
This has been discussed before, either in this thread or zen 4 / Ryzen 7000 thread. Someone had a link for the 20 micron figure. I can’t find it right now. What you describe is not how it works. The thickness of the cpu die for stacking is not determined by the z-height requirement. The thickness for the cache die is likely determined by hitting the z-height requirement though.

There are actually multiple ways of doing TSVs (mid, first, last) with two of those the TSVs are implanted before the wafer is even finished. I was assuming that they would only thin the die as much as needed to have equal heights between the two layers, but one die could be significantly thinner than another.

The thickness isn’t just required height divided by the number of stacks. For the v-cache part, the cpu die must have TSVs for connection to the cache chip. The cache chip doesn’t need TSVs for a single stack high, so it can just be polished down to meet the z-height requirement. TSVs are made by etching holes in the wafer and filling with metal. To make a 250 micron thick die with TSVs they would need to etch holes over 250 microns deep. That might be on the edge of possible with current tech, but would likely not yield well at all.

So, current v-cache parts are likely super thin cpu die with the rest of the thickness from the cache die and filler silicon. The stack is likely made with CoW tech; cpu wafer is very thin, cache chiplets and filler silicon is thicker and already diced. If they go for more than 1 cache die stacked, then the cache die would probably be made with WoW (wafer on wafer) tech. They would stack something like 4 cache wafers, bond, then dice into chiplets. The cache chiplets would then be put into a carrier wafer with filler silicon and stacked on top of the cpu wafer (CoW) and then diced.

TSV's don't have to go all the way through. From what I've seen, I don't know how they would connect the cache to the CPU TSVs without having TSVs on the cache die. You can do TSV to bumps but the ball pitch is significantly larger than TSV pitch and I don't see how they could connect all of the Zen3 TSVs to bumps unless they are using multiple TSVs for the same net to connect to 1 bump with some kind of interposer which I haven't seen done before. Either that or AMD is using far fewer TSVs than it appears and they are spread out across the cache area. If both dies do need TSVs then it makes more sense to keep them at least decently close in height. Doing 200-300 micron deep TSVs can be done though I'm not sure what the limits are given the TSV pitch AMD is using.

If they do go for multiple stacks or ever had it as a possibility, then the cache dies definitely have to have TSVs. I agree that CoW for the CPU layer makes the most sense with dicing after bonding.

edit: this says less than 50 micron per layer:

Still can’t find the older discussion.

That's for a 12 high stack where it has to be that thin to get that many layers.

edit 2: Also, they would never handle die at 20 micron thickness. The cpu wafer would possibly be that thickness, but it would have a thick cache chip and filler silicon stacked on top before it is diced.

The handling comment was for their standard die thickness, not for the stacked dies.
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
There are actually multiple ways of doing TSVs (mid, first, last) with two of those the TSVs are implanted before the wafer is even finished. I was assuming that they would only thin the die as much as needed to have equal heights between the two layers, but one die could be significantly thinner than another.



TSV's don't have to go all the way through. From what I've seen, I don't know how they would connect the cache to the CPU TSVs without having TSVs on the cache die. You can do TSV to bumps but the ball pitch is significantly larger than TSV pitch and I don't see how they could connect all of the Zen3 TSVs to bumps unless they are using multiple TSVs for the same net to connect to 1 bump with some kind of interposer which I haven't seen done before. Either that or AMD is using far fewer TSVs than it appears and they are spread out across the cache area. If both dies do need TSVs then it makes more sense to keep them at least decently close in height. Doing 200-300 micron deep TSVs can be done though I'm not sure what the limits are given the TSV pitch AMD is using.

If they do go for multiple stacks or ever had it as a possibility, then the cache dies definitely have to have TSVs. I agree that CoW for the CPU layer makes the most sense with dicing after bonding.



That's for a 12 high stack where it has to be that thin to get that many layers.



The handling comment was for their standard die thickness, not for the stacked dies.

TSVs have to go all the way through. It is right in the name (Through Silicon Via). Also, TSMC SoIC doesn’t use bumps at all. It is direct contact through the metal of the TSV.


The cpu die will be on the order of 20 microns thick. As I have said before (somewhere), to make the TSVs, they etch into the wafer and fill the holes with metal. Then they build the normal device layers and metal layers on top of that. Once finished, it is flipped over and polished down from what was the bottom of the wafer to expose the TSVs that were etched from the top. As I said, for a single layer of cache, the cache die doesn’t need any TSVs since there is nothing that will be stacked on top of the cache die. For the cache die, they can just make a normal wafer except with contact points on top to match the TSVs on the cpu. This is the same as a normal flip-chip except with contact points to match the the TSVs on the cpu instead of IO pads. So they make the cache wafer, dice it into separate chips, flip it over, put it in a carrier wafer with a lot of other cache die/filler silicon, and then bond it to the CPU wafer. It would technically be the original bottom of the cpu wafer. The top of the cpu wafer will have the IO pads for connection to the substrate, like a normal flip-chip device.

See the hybrid bonding images here:


I have seen some SEM images of actual devices where the silicon thickness appears to be down to about 2x the thickness of the device plus metal layers.


Edit: found the previous discussion on page 44 of this thread. Joe NYC posted a video from AMD mentioning the 20 micron thickness.

 
Last edited:

maddie

Diamond Member
Jul 18, 2010
5,147
5,523
136
TSVs have to go all the way through. It is right in the name (Through Silicon Via). Also, TSMC SoIC doesn’t use bumps at all. It is direct contact through the metal of the TSV.


The cpu die will be on the order of 20 microns thick. As I have said before (somewhere), to make the TSVs, they etch into the wafer and fill the holes with metal. Then they build the normal device layers and metal layers on top of that. Once finished, it is flipped over and polished down from what was the bottom of the wafer to expose the TSVs that were etched from the top. As I said, for a single layer of cache, the cache die doesn’t need any TSVs since there is nothing that will be stacked on top of the cache die. For the cache die, they can just make a normal wafer except with contact points on top to match the TSVs on the cpu. This is the same as a normal flip-chip except with contact points to match the the TSVs on the cpu instead of IO pads. So they make the cache wafer, dice it into separate chips, flip it over, put it in a carrier wafer with a lot of other cache die/filler silicon, and then bond it to the CPU wafer. It would technically be the original bottom of the cpu wafer. The top of the cpu wafer will have the IO pads for connection to the substrate, like a normal flip-chip device.

See the hybrid bonding images here:


I have seen some SEM images of actual devices where the silicon thickness appears to be down to about 2x the thickness of the device plus metal layers.
So it isn't a TSV at first, only when the die is thinned. ;)

That article and none I've read says anything substantial on the bonding. The appearance of an explanation. We simply don't know how it's done.

I wonder if they etch away a thin layer of silicon leaving the Cu slightly (nm) elevated and then pressure fuse the dies in a vacuum.
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
So it isn't a TSV at first, only when the die is thinned. ;)

That article and none I've read says anything substantial on the bonding. The appearance of an explanation. We simply don't know how it's done.

I wonder if they etch away a thin layer of silicon leaving the Cu slightly (nm) elevated and then pressure fuse the dies in a vacuum.
It isn’t elevated. It is completely flat. See the video I just added to my previous post from the last discussion on this. It takes advantage of the fact that two flat pieces of the same metal will actually weld together in a vacuum. I have seen quite a bit about die stacking over the years and they do polish the wafer down so thin that they are actually floppy.

Edit: also, the first link I posted above literally has SEM images of some of the tech, so we know exactly what some of it looks like.
 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
6,642
12,245
136
TSVs have to go all the way through. It is right in the name (Through Silicon Via).

They go through the substrate, they don't have to go through the full die. Some do, some don't and depending on the process, the via won't be really a TSV until the wafer is complete and then thinned.

2560px-Through-Silicon_Via_Flavours.svg.png



Also, TSMC SoIC doesn’t use bumps at all. It is direct contact through the metal of the TSV.

I had forgotten about the direct bonding method. This way apparently allows for pitch matching the TSVs which is pretty great.


The cpu die will be on the order of 20 microns thick. As I have said before (somewhere), to make the TSVs, they etch into the wafer and fill the holes with metal. Then they build the normal device layers and metal layers on top of that. Once finished, it is flipped over and polished down from what was the bottom of the wafer to expose the TSVs that were etched from the top. As I said, for a single layer of cache, the cache die doesn’t need any TSVs since there is nothing that will be stacked on top of the cache die. For the cache die, they can just make a normal wafer except with contact points on top to match the TSVs on the cpu. This is the same as a normal flip-chip except with contact points to match the the TSVs on the cpu instead of IO pads. So they make the cache wafer, dice it into separate chips, flip it over, put it in a carrier wafer with a lot of other cache die/filler silicon, and then bond it to the CPU wafer. It would technically be the original bottom of the cpu wafer. The top of the cpu wafer will have the IO pads for connection to the substrate, like a normal flip-chip device.

Yeah, the direct bonding makes the dual flip chip design work. I still have a hard time believing the die is 20 microns thick. The FEOL for a 10+ layer process is going to be ~15 um alone, that doesn't even include the top layer they add for flip chips or the device implants/wells. Maybe for a process with a relatively few, short metal layers, but a CPU wouldn't use such a stackup. I am willing to be wrong on this, but it's hard for me to see the math working out at 20 um. The more I think about it though, I agree that they will make the base die as thin as possible, they want to keep the TSV as short as possible to reduce any signal delays and parasitics associated with it.
 

Mopetar

Diamond Member
Jan 31, 2011
8,436
7,631
136
What are you on about? The 5800X3D beats the 12900K At games. the 12700K is an after thought

The 5800X3D should probably come out slightly ahead of the 12900K on average just based on AMD's claims of 15% over the 5800X and looking at relative figures. Using TPUs charts for 1080p gaming the 5800X3D should be about 6.5% ahead. Tom's would put it 1.5% ahead.

However, from those same reviews the 12700K is really only 2% or so off of the 12900K in most games. Even the 12600K manages to hang in there pretty well and although I haven't seen any 12600 numbers yet, it shouldn't be much worse and for $220 it's going to be hard to beat. Frankly the more I look at it, the real competition for AMD is going to be the $400 12700K.

A $220 12600 is going to give people an option of getting on to Intel's new platform with a solid upgrade path in the future. Really a person doesn't need to do a completely new build since they could reuse their RAM, which makes sense given the benchmarks tend to show DDR4 performing better.

Depending how much AMD charges for the 5800X3D it may be close to the cost of a new board and a 12600 that doesn't perform much worse and with an option of upgrading the CPU a few years down the road. This makes even more sense for anyone who will be gaming at 1440p since that shifts the bottleneck back to the GPU in most cases and the 12900K is only about 3% better than a 12600K in the benchmarks.

I think it's the 12900K that's mostly irrelevant to the discussion since most gamers can get practically equivalent performance for almost $200 already. For anyone who doesn't mind a slightly larger bump, is running a mid-range GPU, or is gaming at 1440p then a 12600/K will get them similar performance at another $100-$200 discount. To me it's the 12900K that's the afterthought.
 

naukkis

Golden Member
Jun 5, 2002
1,004
849
136
The 5800X3D should probably come out slightly ahead of the 12900K on average just based on AMD's claims of 15% over the 5800X and looking at relative figures. Using TPUs charts for 1080p gaming the 5800X3D should be about 6.5% ahead. Tom's would put it 1.5% ahead.

They did compare it to 5900x - so on average 15% faster than 5900x. When they introduced their v-cache they say 15% more performance with same cpu @ iso clocks but gains seems to be much more as reduced clock 5800x is still 15% faster than 5900x. 5900x itself is what, something like 5% faster than normal clocked 5800x?

And as that speed is coming from increased IPC if that 5800x3d will overclock well overclocking benefits will be much greater than with any other chip - unquestionable fastest overclocked gaming cpu.
 

Timorous

Golden Member
Oct 27, 2008
1,966
3,850
136
A $220 12600 is going to give people an option of getting on to Intel's new platform with a solid upgrade path in the future. Really a person doesn't need to do a completely new build since they could reuse their RAM, which makes sense given the benchmarks tend to show DDR4 performing better.

12600 does not have E-Cores and the 12400 is on par with a 5600X in gaming and a small amount ahead for productivity. The 12600 will be at best a few % faster due to higher clocks.

Based on UK prices the 12600 is £230 + £200 for a B660M Mortar. OTOH you could go 5600X + GB Aurous Elite for around £350. Sure the 12600 will be a bit faster but it also costs 20% more for CPU + Mobo. If a user already has say B450 because they paired it with a 2xxx or 3xxx part then the upgrade to a 5600X is far better value than switching to ADL.

TBH the best none K part is probably the 12700F which is £340. Pair that with the B660M Mortar and for £540 you have a very good all around platform. Depending on 5800X3D pricing though I could easily see it be competitive even if the CPU is £400 because there are cheaper B550 motherboards available with good enough VRMs to comfortably handle the 5800X3D. It will be use case dependent, gaming only and a £550 5800X3D + Wifi B550M Mortar might be the way to go, general productivity and the 12700 may be the way to go but I do expect certain niche productivity workloads to excel with the 5800X3D, will just have to wait and see on that front. Just like with the 5600X example above if a user already has B450 or above then £400 for the 5800X3D is going to offer far better value than any ADL part for gaming.

Obviously if the 5800X3D is a lot more expensive then it makes no sense but I don't think AMD will price it as a halo part because it only has 1 true class leading category and that is gaming. Even in AMDs own stack the 5800X3D will fall behind the 5900X and 5950X in productivity workloads.
 
  • Like
Reactions: Tlh97

Tuna-Fish

Golden Member
Mar 4, 2011
1,645
2,464
136
You don't always need such a massive cache. Many programs fit just fine inside Zen's 32 MB of existing L3 cache and won't see any uplift with the extra cache. If the extra cache isn't being used, it doesn't make sense to waste power by continually refreshing it.

SRAM does not need to be refreshed. On a low-leakage process, SRAM power use is very close to 0 when it's not being accessed.
 

maddie

Diamond Member
Jul 18, 2010
5,147
5,523
136
It isn’t elevated. It is completely flat. See the video I just added to my previous post from the last discussion on this. It takes advantage of the fact that two flat pieces of the same metal will actually weld together in a vacuum. I have seen quite a bit about die stacking over the years and they do polish the wafer down so thin that they are actually floppy.

Edit: also, the first link I posted above literally has SEM images of some of the tech, so we know exactly what some of it looks like.
Using that video and assuming detailed knowledge, compares to getting a degree by reading Scientific American.

Those SEM shots from the anandtech article?

Thanks for that. Forgot those exist.

Look at the following (Sub-micron CoW interconnect demonstrated). The boundaries clearly show a very slight (nm ?) penetration into the next level. Notice the gentle saucer shaped deformation? That's material displacement from the center due to pressure by the slimmer shaft. You can even just see the slim pillars penetrating slightly into the conical ones at the ends.

Definitely the Cu pillars are standing a few nm above the silicon plane before bonding.

I'm more convinced now that some sort of pressure welding (+ vacuum & heat ?) is used to fuse the Cu pillars. Clears up a big part of the question I had. How is this actually being manufactured?

The other related topic is the silicon blanks.

Seeing that the cache die are bonding the Cu interconnects by allowing a given pillar to slightly penetrate it's mating pillar, we can reasonably assume that the silicon is NOT vacuum bonded just by the common assumption of being perfectly flat ( :rolleyes: ) and being placed next to each other.

Those silicon blanks over the cores must be using a thermal glue/paste for attachment. This might have a negative effect on heat transfer to a larger degree that might have been thought.


Advanced Packaging Technology Leadership.mkv_snapshot_08.56_[2020.08.25_14.13.52].jpg
 

Hitman928

Diamond Member
Apr 15, 2012
6,642
12,245
136
SRAM does not need to be refreshed. On a low-leakage process, SRAM power use is very close to 0 when it's not being accessed.

That would kind of make AMD's comment about powering it off when not in use silly then :shrug:
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
The 12900K is a few percent faster than the 12700K for like $200 more (+50% price). According to your logic, if Intel priced the 12900K at $10000, then that makes the 5800X3D a $10000 product then?
Remember when Intel used to charge $1,000 for their 6900K? If the 1800X would have beaten the 6900K in gaming you bet they would have price it at a premium higher, that is how things work, heck even the 11900K was overpriced compared to the 11700K but for the few extra performance was worth it for many