Info 64MB V-Cache on 5XXX Zen3 Average +15% in Games

Page 63 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Kedas

Senior member
Dec 6, 2018
355
339
136
Well we know now how they will bridge the long wait to Zen4 on AM5 Q4 2022.
Production start for V-cache is end this year so too early for Zen4 so this is certainly coming to AM4.
+15% Lisa said is "like an entire architectural generation"
 
Last edited:
  • Like
Reactions: Tlh97 and Gideon

maddie

Diamond Member
Jul 18, 2010
5,147
5,523
136
From my understanding, the TSV pillars are ever so slightly exposed above the silicon in the end, but I don't know how they are doing it with this new direct bonding technique.
I agree on the slightly elevated pillars. When I mentioned possibly pressure fused at an elevated temp, the imprecision of just using an word descriptor causes a lot of grief.

Elevated temp might be < 200C.
Pressure might just be a few Newtons.
 

Mopetar

Diamond Member
Jan 31, 2011
8,436
7,631
136
For turning on and off effect, AMD said that the extra V-cache is striped with the existing L3 cache so as long as the addressing logic knows if it's on or off, it shouldn't be a problem but again, not a digital guy so I'm just guessing here.

What does that actually mean though? If anyone has some article that actually describes how it integrates with the existing L3 I'd appreciate it.

Outside of just increasing the number of sets that are available, being able to turn the entirety of that additional cache on or off would require two different sets of logic to handle where anything gets stored in the cache. It would also mean that switching between the two states would require moving all of the cache entries around to where they belong under the other state since there's no guarantee that they wind up in the same place.

If it's just extra sets it turns the L3 from a 16-way associative cache into a 48-way associative cache. That probably has some significant diminishing returns, but maybe it's useful for certain applications like games that can easily use up several GB of memory and would benefit from having some of that data stick around in the cache for much longer.
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
I thought I had made it pretty clear in my post but I guess not. Some TSVs go through the substrate and attach to (stop at) the FEOL, some go thorough substrate and attach to (stop at) the BEOL, and some go fully through the die including substrate, FEOL, and BEOL. There isn't just one way of doing TSVs though my understanding that attaching to BEOL has been more or less adopted as the industry standard as via first (FEOL stop) forces the TSV material to go through high heat when the FEOL is processed which limits the types of material you can use for the TSV and via last makes it difficult to align with the rest of the circuit but can be easier when you want to go through a full die. In every case the TSV goes through the substrate and is exposed at the bottom of the die but how far up the die it goes can vary.

I really don't think going decent depth with a TSV is a yield issue but I don't have any data on that specifically. I do know that there are papers showing you can do 200 um+ TSVs without issue but papers don't usually have high volume yield in mind and this is most likely more a limitation in relation to TSV diameter/pitch. I'm also sure processing wafers that are thinned to absolute thinnest possible also has introduces its own issues. I'll have to read up on how they handle these issues over the weekend and see what the gives and takes are.
The image you posted does not match with what you are saying. You seem to be looking at it upside down. The image is a diagram of the wafer in its original processing orientation. Device layers on the bottom and metal layers on the top. For TSVs, it would be flipped over and polished down from what is the bottom in the image. The TSVs go all of the way through the silicon to the other side, which I would think most people would say that it goes all of the way through the die. I don’t think it is relevant whether they add more metal later to connect it to a specific layer or even build it up enough to be an IO pad directly. There is polishing steps after each deposition, so that is just extra metal layer down later. It still goes all of the way through the silicon to the opposite side of the wafer after polishing.
 
  • Like
Reactions: BorisTheBlade82

jamescox

Senior member
Nov 11, 2009
644
1,105
136
1)Pure Cu is a very "soft" metal, much more ductile than Si. Easily possible
2)If you magnify the image, you'll see the the uniformly cylindrical pillars slightly penetrating the cone shaped top & bottom connectors.
3)The boundary of the bond is definitely not flat. All of them have a saucer shaped depression.
4)Cold welding doesn't have to mean cold temps, just lower temps than the melting/crystallization temps point. Pressure is often used.
4)The sides of the pillars are supported by the surrounding material, preventing compressive buckling failure and distortion.
5) All of the articles do not give details, only giving the illusion of understanding.

I am interested in the actual details of how this is done
However, one is free to believe that you just rest them together and the weld happens. No problem.

I don’t see any pancaking along the intersection between the two die (horizontal line, center) and what you propose also goes against just about every article I have seen on the subject.
 
  • Like
Reactions: BorisTheBlade82

Hitman928

Diamond Member
Apr 15, 2012
6,642
12,245
136
The image you posted does not match with what you are saying. You seem to be looking at it upside down. The image is a diagram of the wafer in its original processing orientation. Device layers on the bottom and metal layers on the top. For TSVs, it would be flipped over and polished down from what is the bottom in the image. The TSVs go all of the way through the silicon to the other side, which I would think most people would say that it goes all of the way through the die. I don’t think it is relevant whether they add more metal later to connect it to a specific layer or even build it up enough to be an IO pad directly. There is polishing steps after each deposition, so that is just extra metal layer down later. It still goes all of the way through the silicon to the opposite side of the wafer after polishing.

I'm not confused at all. The whole point of that image was to show that there are currently 3 different ways to do TSVs. Two of the ways the TSV is etched and deposited before the wafer is finished. In via first, it is done before the FEOL, in via mid it is done after FEOL but before BEOL. In both of those cases the wafer is then completed after TSV formed, flipped, and thinned to expose the TSVs on the backside. In the last way (via last), the FEOL and BEOL are both finished, the wafer is flipped, then thinned, and then the TSV is made. Only in the via last method can you take the TSV through the entire die because you bring it in from the backside and can then go through as much of the die as you want. You are building the TSV from the bottom up. For via first and via mid, you can't do this because you build from the top down and are starting in the first layer or two. You cannot take via first or via mid all the way through the die.
 

Hitman928

Diamond Member
Apr 15, 2012
6,642
12,245
136
What does that actually mean though? If anyone has some article that actually describes how it integrates with the existing L3 I'd appreciate it.

Outside of just increasing the number of sets that are available, being able to turn the entirety of that additional cache on or off would require two different sets of logic to handle where anything gets stored in the cache. It would also mean that switching between the two states would require moving all of the cache entries around to where they belong under the other state since there's no guarantee that they wind up in the same place.

If it's just extra sets it turns the L3 from a 16-way associative cache into a 48-way associative cache. That probably has some significant diminishing returns, but maybe it's useful for certain applications like games that can easily use up several GB of memory and would benefit from having some of that data stick around in the cache for much longer.

I wish I knew

Maybe by AMD saying it can be powered off, they are just referring to the natural power/clock gating that is happening at whatever granularity they have when cells aren't actively being used, but there's nothing that controls the V-cache being 'on' or 'off' besides it being actively used or not. o_O
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
I'm not confused at all. The whole point of that image was to show that there are currently 3 different ways to do TSVs. Two of the ways the TSV is etched and deposited before the wafer is finished. In via first, it is done before the FEOL, in via mid it is done after FEOL but before BEOL. In both of those cases the wafer is then completed after TSV formed, flipped, and thinned to expose the TSVs on the backside. In the last way (via last), the FEOL and BEOL are both finished, the wafer is flipped, then thinned, and then the TSV is made. Only in the via last method can you take the TSV through the entire die because you bring it in from the backside and can then go through as much of the die as you want. You are building the TSV from the bottom up. For via first and via mid, you can't do this because you build from the top down and are starting in the first layer or two. You cannot take via first or via mid all the way through the die.
I am not saying you are confused. I am saying you are using different meanings for words than I or most others would use. I consider all of the forms of TSVs (Through Silicon Vias) to go through the die. If you want to not count it as going “through the die” if it isn’t routed to the top of the metal stack, then so be it.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
On this video, AMD says that the was a regression on performance on the two chiplets using 3D V Cache. From what we have seen I can see that too. On the 5900X3D Prototype was clocked at 4 Ghz and the 5900X was also clocked at 4 GHZ, they still showed 15% performance improvement but at ISO Speed, the performance would have been close to 12% the 5900X3D was clocked at 4.5 and the 5900X remain with stock boost of 4.8 Ghz. This is the case because games can't take advantage of the added L3 Cache that is on another chiplet.

 
  • Like
Reactions: Elfear and Mopetar

Det0x

Golden Member
Sep 11, 2014
1,455
4,948
136
On this video, AMD says that the was a regression on performance on the two chiplets using 3D V Cache. From what we have seen I can see that too. On the 5900X3D Prototype was clocked at 4 Ghz and the 5900X was also clocked at 4 GHZ, they still showed 15% performance improvement but at ISO Speed, the performance would have been close to 12% the 5900X3D was clocked at 4.5 and the 5900X remain with stock boost of 4.8 Ghz. This is the case because games can't take advantage of the added L3 Cache that is on another chiplet.

Can you link timestamp where that was said ? Thanks :)
 

moinmoin

Diamond Member
Jun 1, 2017
5,234
8,442
136
As for pricing I expect AMD to stay true to its price/perf ratio which at the regularly cited ~15% perf improvement would amount to ~$517.5. So an MSRP outside $499 to $519 would be a slight surprise to me.

I'd also expect 5800X3D to be a good overclocker, stock specs playing it safe.

AMD said that the extra V-cache is striped with the existing L3 cache
Don't you mean interlaced?
 
  • Like
Reactions: Tlh97

eek2121

Diamond Member
Aug 2, 2005
3,384
5,011
136
As for pricing I expect AMD to stay true to its price/perf ratio which at the regularly cited ~15% perf improvement would amount to ~$517.5. So an MSRP outside $499 to $519 would be a slight surprise to me.

I'd also expect 5800X3D to be a good overclocker, stock specs playing it safe.


Don't you mean interlaced?

They compared to to a 5900X, so I suspect it will be priced a bit higher than the 5900X. Anywhere from $500-$550.
 

Hitman928

Diamond Member
Apr 15, 2012
6,642
12,245
136
On this video, AMD says that the was a regression on performance on the two chiplets using 3D V Cache. From what we have seen I can see that too. On the 5900X3D Prototype was clocked at 4 Ghz and the 5900X was also clocked at 4 GHZ, they still showed 15% performance improvement but at ISO Speed, the performance would have been close to 12% the 5900X3D was clocked at 4.5 and the 5900X remain with stock boost of 4.8 Ghz. This is the case because games can't take advantage of the added L3 Cache that is on another chiplet.


What he says is that if you had 1 CCD with V-cache and 1 without, you might have a performance reduction due to the asynchronous caches. He comments later on if they both have V-cache but doesn't talk about performance regression. Honestly, the answer seemed more like a marketing response just beating around the bush trying to come up with some reason why they only have a 5800x3d because I don't think his comments about the synchronous cache having extra latency is even accurate or makes sense.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
What he says is that if you had 1 CCD with V-cache and 1 without, you might have a performance reduction due to the asynchronous caches. He comments later on if they both have V-cache but doesn't talk about performance regression. Honestly, the answer seemed more like a marketing response just beating around the bush trying to come up with some reason why they only have a 5800x3d because I don't think his comments about the synchronous cache having extra latency is even accurate or makes sense.

There were two 5900X3D Prototypes at Computex, one with two chiplets with 3D V Cache that was shown gaming and provided the 15% performance claims and the one that was held on the hands of The illustrious Dr. Lisa Su with only one Chiplet with 3D V cache. I believe they were trying to show a POC for a possible cost effective 5900X3D with the second prototype since both of them show the same average 15% performance improvements. but they might have come across a weird game or two that actually had a regression.
 
Last edited:

RnR_au

Platinum Member
Jun 6, 2021
2,532
5,938
136
I made a comment over on /r/Amd in regards to V-Cache sku's before the announcement. Shortly after I received a private message. Only just saw it now, but the info is 8 days old. Haven't posted it at Reddit, but thought that a few here might find it interesting

Hey,

Can you post this as a reply to your comment. I'm banned from Amd and /Hardware for leaking info I wasn't supposed to let ou

I know there are working samples of 5800Xs using a single 8 core CCX made on the 6nm node. 5Ghz+ AC OCs are easily attainable but it remains to be seen when/if that launches.

Even without Vcache, the move to 6nm and the more refined layout and I/O are allowing IF clocks up to 4200 running memory 1:1. With tuned B die this allows for near parity with the 12600K. This is quite impressive given it does not have the 3D cache.

Keep your fingers crossed;)

My gut says its bogus. I just can't see AMD spending the coin getting a 5800 design working on N6 for evaluation purposes. Especially when we know they have Zen3+ on N6. And I believe the consensus now is that 3D stacking is not ready for N6? So another reason for AMD wouldn't spend the effort for a 5800 silicon on N6.

Anyways, had to share it with someone. Its my first 'leak'... almost feel the need to make a youtube video... :p
 

LightningZ71

Platinum Member
Mar 10, 2017
2,317
2,908
136
Supposedly, a straight shrink to N6 is supposed to be a minimal effort process.

Just think, there could be significant demand in the HPC world for an EPYC that is firmware and logic compatible with existing Milan SKUs that can sustain significantly higher sustained all-core clocks. The power and density improvements of N6 could make those CCDs both economical to make and worth more once packaged. Given the stated improvements of the process and where EPYC lives in high density deployments, it could be 500Mhz or more better at the same power and thermal load.
 
  • Like
Reactions: lightmanek

NostaSeronx

Diamond Member
Sep 18, 2011
3,809
1,289
136
Supposedly, a straight shrink to N6 is supposed to be a minimal effort process.
It isn't a straight shrink, it is a library swap.
Semiwiki 2019:
  • N7 designs could simply “re-tapeout” (RTO) to N6 for improved yield with EUV mask lithography
  • or, N7 designs could submit a new tapeout (NTO) by re-implementing logic blocks using an N6 standard cell library (H240) that leverages a “common PODE” (CPODE) device between cells for an ~18% improvement in logic block density
N6-EUV is exact N7 logic device or N6-EUV is unique N6 logic device. SRAM(Memories) and AMS(SerDes/IO) is the same between the two.
 
Last edited:
  • Like
Reactions: RnR_au and Tlh97

RnR_au

Platinum Member
Jun 6, 2021
2,532
5,938
136
From my understanding, masks are also not cheap. Surely cpu simulation tools are good enough nowadays to be able to at least hit the side of the barn.
 
  • Like
Reactions: lightmanek

DrMrLordX

Lifer
Apr 27, 2000
22,701
12,652
136
What he says is that if you had 1 CCD with V-cache and 1 without, you might have a performance reduction due to the asynchronous caches. He comments later on if they both have V-cache but doesn't talk about performance regression. Honestly, the answer seemed more like a marketing response just beating around the bush trying to come up with some reason why they only have a 5800x3d because I don't think his comments about the synchronous cache having extra latency is even accurate or makes sense.

Agreed, seemed like a non-answer. AMD already has separated L3s local to a specific CCD so there's already an IF penalty moving data from one L3 to another (as opposed to grabbing the same data from RAM, which incurs an even-worse penalty). They already have cache tags and such to streamline that process. It would be even smoother with larger L3 blocks per CCD. Nothing was said about clockspeed regression on 2xCCD units.

I made a comment over on /r/Amd in regards to V-Cache sku's before the announcement. Shortly after I received a private message. Only just saw it now, but the info is 8 days old. Haven't posted it at Reddit, but thought that a few here might find it interesting

Who knows? N6 Zen3 is basically Warhol territory, which from what has been leaked in the past, is a cancelled product?
 

coercitiv

Diamond Member
Jan 24, 2014
7,225
16,982
136
Ah nevermind, i found the comment
Yeah, I don't fully buy the "you might actually lose a little performance" when going with dual chiplets, the same logic could have been applied to 5800X vs. 5900X at the time of launch, and the 5900X won or traded blows with 5800X in gaming. That being said, the extra cache on both parts would probably help the single die product more, essentially matching the dual die product in most games.

I do believe them when they say that normal consumer apps won't see much of a benefit from 3D cache though, which reinforces their decision to go for single chiplet only on the consumer side.
 

HurleyBird

Platinum Member
Apr 22, 2003
2,800
1,528
136
I mean, it is Robert Hallock. You can tell he's delivering spin and propaganda because his mouth is moving.