Info 64MB V-Cache on 5XXX Zen3 Average +15% in Games

Page 63 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Toggle sidebar Toggle sidebar

K

Kedas

Senior member

Jun 1, 2021

#1

Well we know now how they will bridge the long wait to Zen4 on AM5 Q4 2022.
Production start for V-cache is end this year so too early for Zen4 so this is certainly coming to AM4.
+15% Lisa said is "like an entire architectural generation"

Last edited: Jun 1, 2021

Reactions: Tlh97 and Gideon

M

maddie

Diamond Member

Jan 7, 2022

#1,551

Hitman928 said:
From my understanding, the TSV pillars are ever so slightly exposed above the silicon in the end, but I don't know how they are doing it with this new direct bonding technique.

I agree on the slightly elevated pillars. When I mentioned possibly pressure fused at an elevated temp, the imprecision of just using an word descriptor causes a lot of grief.

Elevated temp might be < 200C.
Pressure might just be a few Newtons.

M

Mopetar

Diamond Member

Jan 7, 2022

#1,552

Hitman928 said:
For turning on and off effect, AMD said that the extra V-cache is striped with the existing L3 cache so as long as the addressing logic knows if it's on or off, it shouldn't be a problem but again, not a digital guy so I'm just guessing here.

What does that actually mean though? If anyone has some article that actually describes how it integrates with the existing L3 I'd appreciate it.

Outside of just increasing the number of sets that are available, being able to turn the entirety of that additional cache on or off would require two different sets of logic to handle where anything gets stored in the cache. It would also mean that switching between the two states would require moving all of the cache entries around to where they belong under the other state since there's no guarantee that they wind up in the same place.

If it's just extra sets it turns the L3 from a 16-way associative cache into a 48-way associative cache. That probably has some significant diminishing returns, but maybe it's useful for certain applications like games that can easily use up several GB of memory and would benefit from having some of that data stick around in the cache for much longer.

J

jamescox

Senior member

Jan 7, 2022

#1,553

Hitman928 said:
I thought I had made it pretty clear in my post but I guess not. Some TSVs go through the substrate and attach to (stop at) the FEOL, some go thorough substrate and attach to (stop at) the BEOL, and some go fully through the die including substrate, FEOL, and BEOL. There isn't just one way of doing TSVs though my understanding that attaching to BEOL has been more or less adopted as the industry standard as via first (FEOL stop) forces the TSV material to go through high heat when the FEOL is processed which limits the types of material you can use for the TSV and via last makes it difficult to align with the rest of the circuit but can be easier when you want to go through a full die. In every case the TSV goes through the substrate and is exposed at the bottom of the die but how far up the die it goes can vary.

I really don't think going decent depth with a TSV is a yield issue but I don't have any data on that specifically. I do know that there are papers showing you can do 200 um+ TSVs without issue but papers don't usually have high volume yield in mind and this is most likely more a limitation in relation to TSV diameter/pitch. I'm also sure processing wafers that are thinned to absolute thinnest possible also has introduces its own issues. I'll have to read up on how they handle these issues over the weekend and see what the gives and takes are.

The image you posted does not match with what you are saying. You seem to be looking at it upside down. The image is a diagram of the wafer in its original processing orientation. Device layers on the bottom and metal layers on the top. For TSVs, it would be flipped over and polished down from what is the bottom in the image. The TSVs go all of the way through the silicon to the other side, which I would think most people would say that it goes all of the way through the die. I don’t think it is relevant whether they add more metal later to connect it to a specific layer or even build it up enough to be an IO pad directly. There is polishing steps after each deposition, so that is just extra metal layer down later. It still goes all of the way through the silicon to the opposite side of the wafer after polishing.

Reactions: BorisTheBlade82

J

jamescox

Senior member

Jan 7, 2022

#1,554

maddie said:
1)Pure Cu is a very "soft" metal, much more ductile than Si. Easily possible
2)If you magnify the image, you'll see the the uniformly cylindrical pillars slightly penetrating the cone shaped top & bottom connectors.
3)The boundary of the bond is definitely not flat. All of them have a saucer shaped depression.
4)Cold welding doesn't have to mean cold temps, just lower temps than the melting/crystallization temps point. Pressure is often used.
4)The sides of the pillars are supported by the surrounding material, preventing compressive buckling failure and distortion.
5) All of the articles do not give details, only giving the illusion of understanding.

I am interested in the actual details of how this is done
However, one is free to believe that you just rest them together and the weld happens. No problem.

I don’t see any pancaking along the intersection between the two die (horizontal line, center) and what you propose also goes against just about every article I have seen on the subject.

Reactions: BorisTheBlade82

H

Hitman928

Diamond Member

Jan 7, 2022

#1,555

jamescox said:
The image you posted does not match with what you are saying. You seem to be looking at it upside down. The image is a diagram of the wafer in its original processing orientation. Device layers on the bottom and metal layers on the top. For TSVs, it would be flipped over and polished down from what is the bottom in the image. The TSVs go all of the way through the silicon to the other side, which I would think most people would say that it goes all of the way through the die. I don’t think it is relevant whether they add more metal later to connect it to a specific layer or even build it up enough to be an IO pad directly. There is polishing steps after each deposition, so that is just extra metal layer down later. It still goes all of the way through the silicon to the opposite side of the wafer after polishing.

I'm not confused at all. The whole point of that image was to show that there are currently 3 different ways to do TSVs. Two of the ways the TSV is etched and deposited before the wafer is finished. In via first, it is done before the FEOL, in via mid it is done after FEOL but before BEOL. In both of those cases the wafer is then completed after TSV formed, flipped, and thinned to expose the TSVs on the backside. In the last way (via last), the FEOL and BEOL are both finished, the wafer is flipped, then thinned, and then the TSV is made. Only in the via last method can you take the TSV through the entire die because you bring it in from the backside and can then go through as much of the die as you want. You are building the TSV from the bottom up. For via first and via mid, you can't do this because you build from the top down and are starting in the first layer or two. You cannot take via first or via mid all the way through the die.

H

Hitman928

Diamond Member

Jan 7, 2022

#1,556

Mopetar said:
What does that actually mean though? If anyone has some article that actually describes how it integrates with the existing L3 I'd appreciate it.

Outside of just increasing the number of sets that are available, being able to turn the entirety of that additional cache on or off would require two different sets of logic to handle where anything gets stored in the cache. It would also mean that switching between the two states would require moving all of the cache entries around to where they belong under the other state since there's no guarantee that they wind up in the same place.

If it's just extra sets it turns the L3 from a 16-way associative cache into a 48-way associative cache. That probably has some significant diminishing returns, but maybe it's useful for certain applications like games that can easily use up several GB of memory and would benefit from having some of that data stick around in the cache for much longer.

I wish I knew

Maybe by AMD saying it can be powered off, they are just referring to the natural power/clock gating that is happening at whatever granularity they have when cells aren't actively being used, but there's nothing that controls the V-cache being 'on' or 'off' besides it being actively used or not.

J

jamescox

Senior member

Jan 8, 2022

#1,557

Hitman928 said:
I'm not confused at all. The whole point of that image was to show that there are currently 3 different ways to do TSVs. Two of the ways the TSV is etched and deposited before the wafer is finished. In via first, it is done before the FEOL, in via mid it is done after FEOL but before BEOL. In both of those cases the wafer is then completed after TSV formed, flipped, and thinned to expose the TSVs on the backside. In the last way (via last), the FEOL and BEOL are both finished, the wafer is flipped, then thinned, and then the TSV is made. Only in the via last method can you take the TSV through the entire die because you bring it in from the backside and can then go through as much of the die as you want. You are building the TSV from the bottom up. For via first and via mid, you can't do this because you build from the top down and are starting in the first layer or two. You cannot take via first or via mid all the way through the die.

I am not saying you are confused. I am saying you are using different meanings for words than I or most others would use. I consider all of the forms of TSVs (Through Silicon Vias) to go through the die. If you want to not count it as going “through the die” if it isn’t routed to the top of the metal stack, then so be it.

N

nicalandia

Diamond Member

Jan 8, 2022

#1,558

On this video, AMD says that the was a regression on performance on the two chiplets using 3D V Cache. From what we have seen I can see that too. On the 5900X3D Prototype was clocked at 4 Ghz and the 5900X was also clocked at 4 GHZ, they still showed 15% performance improvement but at ISO Speed, the performance would have been close to 12% the 5900X3D was clocked at 4.5 and the 5900X remain with stock boost of 4.8 Ghz. This is the case because games can't take advantage of the added L3 Cache that is on another chiplet.

www.pcworld.com

The Full Nerd Special Episode: AMD talks V-Cache, Ryzen 6000, AM5, and laptops

Will we see a Ryzen 6000 laptop with the best GeForce GPUs this year? And why did the company ditch its beloved AM4 pins?!

www.pcworld.com

Reactions: Elfear and Mopetar

BTRY B 529th FA BN

Lifer

Jan 8, 2022

#1,559

BREAKING NEWS:

These 3D chips also require the user to wear 3D glasses when looking at their monitor(s).

j/k

Last edited: Jan 8, 2022

Reactions: Elfear, CHADBOGA, uzzi38 and 3 others

I

igor_kavinski

Lifer

Jan 8, 2022

#1,560

Maybe it's the V-cache that can't handle talking to a too high speed CPU core. Possibly the V-cache clock increases when the cores are clocked higher and that causes the V-cache to become unreliable, introducing errors. We only find out when reviewers get to torture the 5800X3D.

Reactions: BorisTheBlade82, uzzi38, Tlh97 and 1 other person

Det0x

Golden Member

Jan 8, 2022

#1,561

nicalandia said:
On this video, AMD says that the was a regression on performance on the two chiplets using 3D V Cache. From what we have seen I can see that too. On the 5900X3D Prototype was clocked at 4 Ghz and the 5900X was also clocked at 4 GHZ, they still showed 15% performance improvement but at ISO Speed, the performance would have been close to 12% the 5900X3D was clocked at 4.5 and the 5900X remain with stock boost of 4.8 Ghz. This is the case because games can't take advantage of the added L3 Cache that is on another chiplet.

The Full Nerd Special Episode: AMD talks V-Cache, Ryzen 6000, AM5, and laptops

Will we see a Ryzen 6000 laptop with the best GeForce GPUs this year? And why did the company ditch its beloved AM4 pins?!

www.pcworld.com

Can you link timestamp where that was said ? Thanks

moinmoin

Diamond Member

Jan 8, 2022

#1,562

As for pricing I expect AMD to stay true to its price/perf ratio which at the regularly cited ~15% perf improvement would amount to ~$517.5. So an MSRP outside $499 to $519 would be a slight surprise to me.

I'd also expect 5800X3D to be a good overclocker, stock specs playing it safe.

Hitman928 said:
AMD said that the extra V-cache is striped with the existing L3 cache

Don't you mean interlaced?

Reactions: Tlh97

N

nicalandia

Diamond Member

Jan 8, 2022

#1,563

Det0x said:
Can you link timestamp where that was said ? Thanks

Sorry, I can't. Just watch the video from the 4 minute mark

E

eek2121

Diamond Member

Jan 8, 2022

#1,564

moinmoin said:
As for pricing I expect AMD to stay true to its price/perf ratio which at the regularly cited ~15% perf improvement would amount to ~$517.5. So an MSRP outside $499 to $519 would be a slight surprise to me.

I'd also expect 5800X3D to be a good overclocker, stock specs playing it safe.

Don't you mean interlaced?

They compared to to a 5900X, so I suspect it will be priced a bit higher than the 5900X. Anywhere from $500-$550.

Det0x

Golden Member

Jan 8, 2022

#1,565

*edit*
Ah nevermind, i found the comment

Last edited: Jan 8, 2022

Reactions: Elfear, Tlh97 and lightmanek

H

Hitman928

Diamond Member

Jan 8, 2022

#1,566

nicalandia said:
On this video, AMD says that the was a regression on performance on the two chiplets using 3D V Cache. From what we have seen I can see that too. On the 5900X3D Prototype was clocked at 4 Ghz and the 5900X was also clocked at 4 GHZ, they still showed 15% performance improvement but at ISO Speed, the performance would have been close to 12% the 5900X3D was clocked at 4.5 and the 5900X remain with stock boost of 4.8 Ghz. This is the case because games can't take advantage of the added L3 Cache that is on another chiplet.

The Full Nerd Special Episode: AMD talks V-Cache, Ryzen 6000, AM5, and laptops

Will we see a Ryzen 6000 laptop with the best GeForce GPUs this year? And why did the company ditch its beloved AM4 pins?!

www.pcworld.com

What he says is that if you had 1 CCD with V-cache and 1 without, you might have a performance reduction due to the asynchronous caches. He comments later on if they both have V-cache but doesn't talk about performance regression. Honestly, the answer seemed more like a marketing response just beating around the bush trying to come up with some reason why they only have a 5800x3d because I don't think his comments about the synchronous cache having extra latency is even accurate or makes sense.

Reactions: uzzi38, Tlh97, lightmanek and 1 other person

N

nicalandia

Diamond Member

Jan 8, 2022

#1,567

Hitman928 said:
What he says is that if you had 1 CCD with V-cache and 1 without, you might have a performance reduction due to the asynchronous caches. He comments later on if they both have V-cache but doesn't talk about performance regression. Honestly, the answer seemed more like a marketing response just beating around the bush trying to come up with some reason why they only have a 5800x3d because I don't think his comments about the synchronous cache having extra latency is even accurate or makes sense.

There were two 5900X3D Prototypes at Computex, one with two chiplets with 3D V Cache that was shown gaming and provided the 15% performance claims and the one that was held on the hands of The illustrious Dr. Lisa Su with only one Chiplet with 3D V cache. I believe they were trying to show a POC for a possible cost effective 5900X3D with the second prototype since both of them show the same average 15% performance improvements. but they might have come across a weird game or two that actually had a regression.

Last edited: Jan 8, 2022

Reactions: Tlh97 and lightmanek

RnR_au

Platinum Member

Jan 8, 2022

#1,568

I made a comment over on /r/Amd in regards to V-Cache sku's before the announcement. Shortly after I received a private message. Only just saw it now, but the info is 8 days old. Haven't posted it at Reddit, but thought that a few here might find it interesting

Hey,

Can you post this as a reply to your comment. I'm banned from Amd and /Hardware for leaking info I wasn't supposed to let ou

I know there are working samples of 5800Xs using a single 8 core CCX made on the 6nm node. 5Ghz+ AC OCs are easily attainable but it remains to be seen when/if that launches.

Even without Vcache, the move to 6nm and the more refined layout and I/O are allowing IF clocks up to 4200 running memory 1:1. With tuned B die this allows for near parity with the 12600K. This is quite impressive given it does not have the 3D cache.

Keep your fingers crossed

My gut says its bogus. I just can't see AMD spending the coin getting a 5800 design working on N6 for evaluation purposes. Especially when we know they have Zen3+ on N6. And I believe the consensus now is that 3D stacking is not ready for N6? So another reason for AMD wouldn't spend the effort for a 5800 silicon on N6.

Anyways, had to share it with someone. Its my first 'leak'... almost feel the need to make a youtube video...

Reactions: Ranulf, lightmanek, Mopetar and 2 others

L

LightningZ71

Platinum Member

Jan 8, 2022

#1,569

Supposedly, a straight shrink to N6 is supposed to be a minimal effort process.

Just think, there could be significant demand in the HPC world for an EPYC that is firmware and logic compatible with existing Milan SKUs that can sustain significantly higher sustained all-core clocks. The power and density improvements of N6 could make those CCDs both economical to make and worth more once packaged. Given the stated improvements of the process and where EPYC lives in high density deployments, it could be 500Mhz or more better at the same power and thermal load.

Reactions: lightmanek

N

NostaSeronx

Diamond Member

Jan 8, 2022

#1,570

LightningZ71 said:
Supposedly, a straight shrink to N6 is supposed to be a minimal effort process.

It isn't a straight shrink, it is a library swap.
Semiwiki 2019:

N7 designs could simply “re-tapeout” (RTO) to N6 for improved yield with EUV mask lithography
or, N7 designs could submit a new tapeout (NTO) by re-implementing logic blocks using an N6 standard cell library (H240) that leverages a “common PODE” (CPODE) device between cells for an ~18% improvement in logic block density

N6-EUV is exact N7 logic device or N6-EUV is unique N6 logic device. SRAM(Memories) and AMS(SerDes/IO) is the same between the two.

Last edited: Jan 8, 2022

Reactions: RnR_au and Tlh97

RnR_au

Platinum Member

Jan 8, 2022

#1,571

From my understanding, masks are also not cheap. Surely cpu simulation tools are good enough nowadays to be able to at least hit the side of the barn.

Reactions: lightmanek

DrMrLordX

Lifer

Jan 8, 2022

#1,572

Hitman928 said:
What he says is that if you had 1 CCD with V-cache and 1 without, you might have a performance reduction due to the asynchronous caches. He comments later on if they both have V-cache but doesn't talk about performance regression. Honestly, the answer seemed more like a marketing response just beating around the bush trying to come up with some reason why they only have a 5800x3d because I don't think his comments about the synchronous cache having extra latency is even accurate or makes sense.

Agreed, seemed like a non-answer. AMD already has separated L3s local to a specific CCD so there's already an IF penalty moving data from one L3 to another (as opposed to grabbing the same data from RAM, which incurs an even-worse penalty). They already have cache tags and such to streamline that process. It would be even smoother with larger L3 blocks per CCD. Nothing was said about clockspeed regression on 2xCCD units.

RnR_au said:
I made a comment over on /r/Amd in regards to V-Cache sku's before the announcement. Shortly after I received a private message. Only just saw it now, but the info is 8 days old. Haven't posted it at Reddit, but thought that a few here might find it interesting

Who knows? N6 Zen3 is basically Warhol territory, which from what has been leaked in the past, is a cancelled product?

Reactions: Tlh97, lightmanek and RnR_au

coercitiv

Diamond Member

Jan 9, 2022

#1,573

Det0x said:
Can you link timestamp where that was said ? Thanks

coercitiv

Diamond Member

Jan 9, 2022

#1,574

Det0x said:
Ah nevermind, i found the comment

Yeah, I don't fully buy the "you might actually lose a little performance" when going with dual chiplets, the same logic could have been applied to 5800X vs. 5900X at the time of launch, and the 5900X won or traded blows with 5800X in gaming. That being said, the extra cache on both parts would probably help the single die product more, essentially matching the dual die product in most games.

I do believe them when they say that normal consumer apps won't see much of a benefit from 3D cache though, which reinforces their decision to go for single chiplet only on the consumer side.

Reactions: Tlh97 and lightmanek

H

HurleyBird

Platinum Member

Jan 9, 2022

#1,575

I mean, it is Robert Hallock. You can tell he's delivering spin and propaganda because his mouth is moving.

You must log in or register to reply here.

Share:

Facebook X (Twitter) Reddit Tumblr WhatsApp Email Link

TRENDING THREADS

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)
- Started by DisEnchantment
- Sep 29, 2022
- Replies: 25K
CPUs and Overclocking
T
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads
- Started by Tigerick
- Aug 22, 2022
- Replies: 23K
CPUs and Overclocking
Discussion Intel current and future Lakes & Rapids thread
- Started by TheF34RChannel
- Jun 18, 2017
- Replies: 23K
CPUs and Overclocking
Discussion Apple Silicon SoC thread
- Started by Eug
- Nov 10, 2020
- Replies: 11K
CPUs and Overclocking
Question Zen 6 Speculation Thread
- Started by IronLynx
- May 22, 2024
- Replies: 8K
CPUs and Overclocking

Top Bottom

This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.

Accept Learn more…