Info 64MB V-Cache on 5XXX Zen3 Average +15% in Games

Kedas

Senior member
Dec 6, 2018
355
339
136
Well we know now how they will bridge the long wait to Zen4 on AM5 Q4 2022.
Production start for V-cache is end this year so too early for Zen4 so this is certainly coming to AM4.
+15% Lisa said is "like an entire architectural generation"
 
Last edited:
  • Like
Reactions: Tlh97 and Gideon

lixlax

Member
Nov 6, 2014
183
150
116
I was expecting 3D stacked chips to first start appearing in some super expensive server solutions, but it seems to be client first.
Further 10%+ on gaming perf will be impressive, but I expect only a small number of other apps on the client side to actually benefit from this.
This+DDR5 could make APU graphics perfomance skyrocket though...exciting times ahead.
 

Hougy

Member
Jan 13, 2021
77
60
61
So if it's 36 mm^2 X2 for two stacks of additional cache to get 12% more performance in gaming only, it seems very inefficient. The additional 72 mm^2 of silicon should be costly, and since Zen 3 is 81 mm^2, 88.8% more silicon should be giving 37% more performance by the square root rule of thumb.
 

Mopetar

Diamond Member
Jan 31, 2011
7,831
5,980
136
It's called 3D V-cache for a reason. Treating the problem as though you're working with a traditional two dimensional chip with a larger area doesn't make sense in this context. They're just building on top of existing real estate, much like we add additional floors to buildings because trying to spread the same amount of office space out over a single floor at ground level would be too expensive from a real estate perspective.
 

TheELF

Diamond Member
Dec 22, 2012
3,973
730
126
So if it's 36 mm^2 X2 for two stacks of additional cache to get 12% more performance in gaming only, it seems very inefficient. The additional 72 mm^2 of silicon should be costly, and since Zen 3 is 81 mm^2, 88.8% more silicon should be giving 37% more performance by the square root rule of thumb.
Anything that will have all of it's data ready from the get go will have a performance increase and probably more than the 12-15% of games.
The thing is as you rightly said, the cost. This is going to be way too expensive for normal people.
 
  • Haha
Reactions: spursindonesia

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
Anything that will have all of it's data ready from the get go will have a performance increase and probably more than the 12-15% of games.
The thing is as you rightly said, the cost. This is going to be way too expensive for normal people.
Probably true on the costs. Maybe they'll just extend the range upward and drop prices for the lower SKUs without V-cache. I hope.
 

Kedas

Senior member
Dec 6, 2018
355
339
136
Lisa did say it's for the high end CPU's (at least at the start)
I don't agree on the cost being very high. There is only memory on that die so in this case it is really almost only the cost of the wafer + assembly which we all know isn't much compared to the high end CPU prices that includes much R&D.
And with a little redundancy you have almost 100% yield on those extra dies.
 

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
Lisa did say it's for the high end CPU's (at least at the start)
I don't agree on the cost being very high. There is only memory on that die so in this case it is really almost only the cost of the wafer + assembly which we all know isn't much compared to the high end CPU prices that includes much R&D.
And with a little redundancy you have almost 100% yield on those extra dies.
Disagree. HBM is just Dram stacked and yet expensive. Defective assemblies is a key aspect of cost, not just die area. Only if they can sell defective stacks as plain non V-cache die will costs be low.

Attempt to stack cache, if fail, then use as normal CPU chiplet. Could this be one way of lowering costs?
 

Kedas

Senior member
Dec 6, 2018
355
339
136
HBM is $120/GB we are talking about 0.064 GB

But you are right about the assembly, if the process in not running wel yet it can become costly.

die size is 6mm x 6mm less than half of one zen3 die.
$9000 / 1500 dies = $6 wafer cost extra for 64MB
 
Last edited:

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
Until Zen 4 launches, I can see the cache stack going live on Epyc first for a while, then TR or alongside TR's launch/announcement, and then mainstream DT. That is unless they have boxes ready to go of a finished product. I would expect the price to rise some for consumers. If you want a premium product the nearest competition can't hope to touch, then you have to pay for it.
 

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
Until Zen 4 launches, I can see the cache stack going live on Epyc first for a while, then TR or alongside TR's launch/announcement, and then mainstream DT. That is unless they have boxes ready to go of a finished product. I would expect the price to rise some for consumers. If you want a premium product the nearest competition can't hope to touch, then you have to pay for it.
AMD does not exist in isolation. This will hold off Intel till Zen4 launches. There's a reason they showed gaming benchmarks.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
So if it's 36 mm^2 X2 for two stacks of additional cache to get 12% more performance in gaming only, it seems very inefficient. The additional 72 mm^2 of silicon should be costly, and since Zen 3 is 81 mm^2, 88.8% more silicon should be giving 37% more performance by the square root rule of thumb.

Caches are different, because they are easier to manufacture and have higher yields because of the repetitive structure. Also they are quite power efficient.

You are applying the square root law in a wrong way. The square root law is a big penalty because the power consumption increase in the core also increases just as much.

This cache is going to add at best 3-4W.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,764
3,131
136
HBM is $120/GB we are talking about 0.064 GB

But you are right about the assembly, if the process in not running wel yet it can become costly.

die size is 6mm x 6mm less than half of one zen3 die.
$9000 / 1500 dies = $6 wafer cost extra for 64MB
So your tell me AMD sold my vega 56 to me at a loss just on the memory alone........

I think people are vastly over costing things, just like it seems we vastly over priced TMSC n7 wafer prices
 

CakeMonster

Golden Member
Nov 22, 2012
1,389
496
136
Until Zen 4 launches, I can see the cache stack going live on Epyc first for a while, then TR or alongside TR's launch/announcement, and then mainstream DT.
That would take so much time that Z4 would probably be releasing at the same time. That is unless Z4 also has cache.
 

CakeMonster

Golden Member
Nov 22, 2012
1,389
496
136
Anything that will have all of it's data ready from the get go will have a performance increase and probably more than the 12-15% of games.
I'm a bit confused about performance expectations. Someone cited the Broadwell with extra cache and that it only performed better in games.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
That would take so much time that Z4 would probably be releasing at the same time. That is unless Z4 also has cache.
Not necessarily. Release doesn't always mean production starts close to that time. Besides, there's a beauty here of AMD using the same chiplet design for Epyc, TR and Ryzen.
 

jpiniero

Lifer
Oct 1, 2010
14,584
5,207
136
Until Zen 4 launches, I can see the cache stack going live on Epyc first for a while, then TR or alongside TR's launch/announcement, and then mainstream DT.

I think it's going to be just this DT product. It kinda sounds like Milan-X is semi-custom for a specific customer and might not be publicly available.

The TR that's coming out soon isn't going to have the cache.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
That would take so much time that Z4 would probably be releasing at the same time. That is unless Z4 also has cache.

I agree.

They demoed with Zen 3, and Zen 3 already has infrastructure in place to have stacked SRAM. It's coming with Zen 3.

I'm a bit confused about performance expectations. Someone cited the Broadwell with extra cache and that it only performed better in games.

The per clock gain is about 5%, which is not too shabby, but it can be all over the place. There are some applications where the large cache will beat everything.

Remember though, Broadwell's eDRAM is more like L4. Meaning it has to go through all stages, plus it was off package so the bandwidth was much lower and latency is higher.

AMD's approach is literally L3 that's 3x as large. The benefits should be larger and more broad. Still won't be huge but 5-10% average will be great!

Another similar comparison is the Pentium 4 3.2EE. Compare that to the regular Pentium 4 3.2(not Prescott) and see how it compares.
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
I agree.

They demoed with Zen 3, and Zen 3 already has infrastructure in place to have stacked SRAM. It's coming with Zen 3.



The per clock gain is about 5%, which is not too shabby, but it can be all over the place. There are some applications where the large cache will beat everything.

Remember though, Broadwell's eDRAM is more like L4. Meaning it has to go through all stages, plus it was off package so the bandwidth was much lower and latency is higher.

AMD's approach is literally L3 that's 3x as large. The benefits should be larger and more broad. Still won't be huge but 5-10% average will be great!

Another similar comparison is the Pentium 4 3.2EE. Compare that to the regular Pentium 4 3.2(not Prescott) and see how it compares.
Good point to stress. This IS additional L3 cache, not an additional level of cache.
 
  • Like
Reactions: Tlh97 and Makaveli

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
So what i dont understand is , if its only 1 stack for 64mb, has the same performance/latency as the existing L3, why is the existing L3 so big. I wonder if it will limit clocks at some point?
From
https://www.anandtech.com/show/1672...acked-vcache-technology-2-tbsec-for-15-gaming
In a call with AMD, we have confirmed the following:

  • This technology will be productized with 7nm Zen 3-based Ryzen processors. Nothing was said about EPYC.
  • Those processors will start production at the end of the year. No comment on availability, although Q1 2022 would fit into AMD's regular cadence.
  • This V-Cache chiplet is 64 MB of additional L3, with no stepped penalty on latency. The V-Cache is address striped with the normal L3 and can be powered down when not in use. The V-Cache sits on the same power plane as the regular L3.
  • The processor with V-Cache is the same z-height as current Zen 3 products - both the core chiplet and the V-Cache are thinned to have an equal z-height as the IOD die for seamless integration
  • As the V-Cache is built over the L3 cache on the main CCX, it doesn't sit over any of the hotspots created by the cores and so thermal considerations are less of an issue. The support silicon above the cores is designed to be thermally efficient.
  • The V-Cache is a single 64 MB die, and is relatively denser than the normal L3 because it uses SRAM-optimized libraries of TSMC's 7nm process, AMD knows that TSMC can do multiple stacked dies, however AMD is only talking about a 1-High stack at this time which it will bring to market.

edit:
This reinforces my opinion that chiplets that fail the validation testing can be used for existing products. Z ht the same, V-cache can be switched off.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
So what i dont understand is , if its only 1 stack for 64mb, has the same performance/latency as the existing L3, why is the existing L3 so big. I wonder if it will limit clocks at some point?

It could be slower in terms of latency, so it might have slight differences on whether it's going to the V-stack or on the original one.

They said 2TB/s of bandwidth which is not lower than the bandwidth of the L3 caches in 5950X.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,764
3,131
136
It could be slower in terms of latency, so it might have slight differences on whether it's going to the V-stack or on the original one.

They said 2TB/s of bandwidth which is not lower than the bandwidth of the L3 caches in 5950X.
There might be a cycle or two because its further away but a CCD doesnt have uniform latency anyway, it cant be massively different because you would hit queuing / transfer/timing etc issues. The more bandwdith is because there is more cache slices, obviously this was designed this way from the start.