Info 64MB V-Cache on 5XXX Zen3 Average +15% in Games

Page 93 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Kedas

Senior member
Dec 6, 2018
355
339
136
Well we know now how they will bridge the long wait to Zen4 on AM5 Q4 2022.
Production start for V-cache is end this year so too early for Zen4 so this is certainly coming to AM4.
+15% Lisa said is "like an entire architectural generation"
 
Last edited:
  • Like
Reactions: Tlh97 and Gideon

Hitman928

Diamond Member
Apr 15, 2012
6,695
12,370
136
Apparently they will fill the void with silicon spacers.


I'm surprised that silicon is the solution.

Yes, they use dummy substrate to match the height of the stacked die. If this is what you were asking before, then I wasn't understanding your question, I thought you were asking about the gap between the top of the stacked dies and the heatspreader.
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
The wafers are typically around 700 um thick when completed. The layers that are used in stacking will have to be thinned to be a part of the process, so if you need to adjust heights to match heights on a bridged IC, it's not a problem. If you get really complex with multiple layers, then you'll obviously need to make sure you plan accordingly, but you shouldn't have a situation where you have different stack heights unless it's on purpose.
I don’t know if the SoIC will actually be used for bridging. The bridges parts will use Elevated FanOut Bridge (possibly TSMC’s InFO-L). This is no where near as dense as SoIC with hybrid bonding, if I understand correctly. EFB will likely be used for AMD GPUs. I am still wondering if it will be used for Bergamo, but without SoIC, it will not be L3 cache. It would need to be L4.
 

Hitman928

Diamond Member
Apr 15, 2012
6,695
12,370
136
I don’t know if the SoIC will actually be used for bridging. The bridges parts will use Elevated FanOut Bridge (possibly TSMC’s InFO-L). This is no where near as dense as SoIC with hybrid bonding, if I understand correctly. EFB will likely be used for AMD GPUs. I am still wondering if it will be used for Bergamo, but without SoIC, it will not be L3 cache. It would need to be L4.

Yeah, I don't know either. We don't even have confirmation from AMD yet that they are doing bridges. My remark was very much in the hypothetical realm and really just addressing the matching heights question.
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
The stack goes CPU > V-cache > Solder TIM > Heatspreader. The same as a non-Vcache CPU today.

The upper die does not have direct contact with the heatspreader. The V-cache does sit above the normal L3 cache and will cause greater heat buildup in the base L3 cache region, but the L3 cache region should still be less heat density than other parts of the CPU when under load. The FPU region, for instance, will typically have much higher heat density due to both the transistor density and how often the transistors are actually switching.
It is the same height as a standard die since the cpu die is polished down very thin. Since it is the same height, heat still has the same amount of silicon to transfer through. The TSVs are copper though, so they would likely have higher thermal conductivity. I don’t think the thermal interface between the two die is actually important. They are polished down to exceptional flatness. The main thing is that SRAM will produce some extra heat rather than just a passive piece of silicon.
 

uzzi38

Platinum Member
Oct 16, 2019
2,746
6,653
146
Geekbench is limited by memory throughput on MT so that makes sense.

Also why are we still entertaining the idea of V-Cache or anything similar on Bergamo ffs. The entire point of it is that it's a balance of price and per-core performance tailored to the hyperscaler market (which is inherently relatively low margin). Stacking memory on top is not an ideal trade-off to be making here, to say the least, PARTICULARLY not shared memory between all cores.
 
  • Like
Reactions: Tlh97 and moinmoin

deasd

Senior member
Dec 31, 2013
603
1,033
136
It's not a surprise since Vcache would be a great gamechanger in heavy workload while do way less in ST load. I guess when even comes to rendering like blender which use almost all threads of your CPUs have, the Vcache could bring much higher efficiency than any other CPUs without Vcache(same arch, same core/thread count).
 

coercitiv

Diamond Member
Jan 24, 2014
7,355
17,424
136
I guess when even comes to rendering like blender which use almost all threads of your CPUs have, the Vcache could bring much higher efficiency than any other CPUs without Vcache(same arch, same core/thread count).
1648021242334.png1648021255700.png

It's not about how many cores the workloads uses, but rather about dataset size and affinity towards memory throughput/latency.
 
  • Like
Reactions: Mopetar and Tlh97
Jul 27, 2020
27,998
19,122
146
Does Zen 3 have cache control instructions to prevent certain required data from being evicted again and again due to cache pressure?

My prediction for non-gaming workloads that V-cache should benefit include VMs, compression/decompression, anything using JITs so javascript performance in browsers, console emulators, compilation, dotnet/java runtime performance and last but not the least, possibly ZFS.

I'm like :hearteyes: whenever I see benchmarks catapulting i7-5775C ahead of 5950X or where the i7-5775C is really close. Scientists are going to love 5800X3D and I bet quite a few of them already love their i7-5775Cs. Makes me wish AMD had made 5950X3D instead. The ultimate swansong of AM4!

1648026244702.png

1648026304609.png
1648026360267.png
1648026404680.png
1648026426214.png
1648026455641.png
1648026536241.png
1648026563338.png
1648026628650.png
1648026648867.png
 

Timorous

Golden Member
Oct 27, 2008
1,978
3,864
136
Looking at the AMD slides again they showed a 40% gain over the 5900X in Watch Dogs: Legion which is really impressive. Makes me think games with high AI counts like Legion, grand strategy, rts, 4x games might see a benefit from the cache. It also makes me think that in a few games the 5800X3D might be faster than Zen 4.

It also makes me think that when the likes of HUB, GN, LTT, TPU etc test the CPU they will do the usual and compare it at 1080p in AAA games then call it a waste of sand. TPU less so since they test more games and use 720P.

Shame nobody bothers to test games where 60 FPS is easy to hit and more than enough for the game but simulation rates suffer late game where a beefier CPU matters more. You get the odd Civ 6 turn time test and that is about it for non FPS based metrics.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,665
2,531
136
Geekbench is limited by memory throughput on MT so that makes sense.

Also why are we still entertaining the idea of V-Cache or anything similar on Bergamo ffs. The entire point of it is that it's a balance of price and per-core performance tailored to the hyperscaler market (which is inherently relatively low margin). Stacking memory on top is not an ideal trade-off to be making here, to say the least, PARTICULARLY not shared memory between all cores.

Without a large, local cache at the chiplet, the power used for memory traffic totally blows out your power budget. You cannot make a CPU for the hyperscaler market by just doubling cores, pulling off the L3 and calling it good. It wouldn't work, it would either be so starved for memory bandwidth it would be weaker than the variant with less cores, or becasue of power limits you'd have to pull frequencies so low that it would, again, be weaker than the variant with less cores.

The rumors floating around is that Bergamo is what Zen looks like when you pull all of the L3 off the base die, and stack it on top. So they can fit twice the cores utilizing the space freed up by the L3, and use a cheaper process (probably N6) for the cache. Cache per chip is a bit lower than a normal epyc that also has L3 on the same die, cost per chip is similar, but you have twice the cores. The downside is that the cores under the cache have lower max frequency and power.
 

uzzi38

Platinum Member
Oct 16, 2019
2,746
6,653
146
Without a large, local cache at the chiplet, the power used for memory traffic totally blows out your power budget. You cannot make a CPU for the hyperscaler market by just doubling cores, pulling off the L3 and calling it good. It wouldn't work, it would either be so starved for memory bandwidth it would be weaker than the variant with less cores, or becasue of power limits you'd have to pull frequencies so low that it would, again, be weaker than the variant with less cores.

The rumors floating around is that Bergamo is what Zen looks like when you pull all of the L3 off the base die, and stack it on top. So they can fit twice the cores utilizing the space freed up by the L3, and use a cheaper process (probably N6) for the cache. Cache per chip is a bit lower than a normal epyc that also has L3 on the same die, cost per chip is similar, but you have twice the cores. The downside is that the cores under the cache have lower max frequency and power.

That wouldn't be cheaper at all, and what's more I don't know why you're fixated on removing the L3 cache from the base die. There's no point to that. You're trying to fit the exact same size cores in the exact same amount of space without realising that

1. The cores themselves might see some changes to better suit them towards hyperscaler workloads (or alternatively, cut corners that would hurt them in the general server market but not in the hyperscaler market to anywhere near the same degree).

2. Doubling the number of cores per die means you don't need to stick to the exact same die size - there's extra space on package. This would be a detriment if N5 was a poorly yielding node, but it's not.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
The rumors floating around is that Bergamo is what Zen looks like when you pull all of the L3 off the base die, and stack it on top. So they can fit twice the cores utilizing the space freed up by the L3, and use a cheaper process (probably N6) for the cache. Cache per chip is a bit lower than a normal epyc that also has L3 on the same die, cost per chip is similar, but you have twice the cores. The downside is that the cores under the cache have lower max frequency and power.

How else do you think they are going to fit 16 cores on a 72.225 mm2 Chiplet? I've done the math and even if they cut the L3 to 1/4th(the size of 8 MiB but as dense as the L3 Chiplet would make it a 16 MiB) it would be larger than that due to AVX-512 Registries taking twice as much size as the current 256, Also the L2 doubled to 1 Mib

Let me pull my mock up

Okay here it is, This is based on Locuza die annotations of Zen3, we know Zen4 will be a die shrink(from 7nm to 5nm), with AVX512 and double L2$. If we go by what Apple was able to accomplish in die area(Logic and SRAM) reduction from TSMC 7nm to TSMC 5nmis only about 20%

The Die on top is my Mock Up of the Zen4 Core with double L2 and double FP Registries and potential 20% die shrinkage(Logic and SRAM)

1648045003314.png
 
Last edited:

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Oh, the core count per CCD and the CCD size were already announced?
The core count in Genoa and Bergamo have been known for a while, but the chiplet size was leaked by Gigabyte


Genoa will have 12 Chiplets, that works out to a lane per chiplet(Milan had 8 chiplets and 8 channel PCIe), what we don't know is how Bergamo will have 128 Cores, it will it be 8 Chiplets with 16 cores each?

How is AMD going to fit so many cores in such small Chiplets? They said they rework/tweaked the cache and most likely will be using TSMC Super High Dense SRAM libraries so even with 1/4 of Die Area Size it will amount to half of Cache(16Mib) or perhaps even lower? 8 MiB? We know that Zen2 APUs had that amount of ram and did pretty good in performance.
 
Last edited:

uzzi38

Platinum Member
Oct 16, 2019
2,746
6,653
146
The core count in Genoa and Bergamo have been known for a while, but the chiplet size was leaked by Gigabyte


Genoa will have 12 Chiplets, that works out to a lane per chiplet(Milan had 8 chiplets and 8 channel PCIe), what we don't know is how Bergamo will have 128 Cores, it will it be 8 Chiplets with 16 cores each?

How is AMD going to fit so many cores in such small Chiplets? They said they rework/tweaked the cache and most likely will be using TSMC Super High Dense SRAM libraries so even with 1/4 of Die Area Size it will amount to half of Cache(16Mib) or perhaps even lower? 8 MiB? We know that Zen2 APUs had that amount of ram and did pretty good in performance.

Mind you that chiplet size is specific to Genoa. Bergamo fits more cores per chiplet, but nowhere did AMD say each chiplet was the same size as Genoa.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Mind you that chiplet size is specific to Genoa. Bergamo fits more cores per chiplet, but nowhere did AMD say each chiplet was the same size as Genoa.

But we know they will be using the same Socket and Die Package and they are subject to the same Genoa limitations on Lane count and size.
 

uzzi38

Platinum Member
Oct 16, 2019
2,746
6,653
146
But we know they will be using the same Socket and Die Package and they are subject to the same Genoa limitations on Lane count and size.
I don't see why that would mean the chiplets are smaller on size? You only have to fit 8 of them on package to get 128 cores if each sports 16 cores, not 12 like Genoa.
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
But we know they will be using the same Socket and Die Package and they are subject to the same Genoa limitations on Lane count and size.
I mean... OK? How does that relate to the exact area of CCDs? Or do you happen to know the exact parameters upon which AMD arranges any given yet to be releasee package? If so, by all means please, do share!!!
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
I don't see why that would mean the chiplets are smaller on size? You only have to fit 8 of them on package to get 128 cores if each sports 16 cores, not 12 like Genoa.

Without a Chiplet resize I just don't see where are they going to fit 128 Cores of regular sized Zen4 Cores... Hence Zen4C which is said to be dense(as in SRAM Density not Logic as TSMC it's not there yet specially in 5nm)