Info 64MB V-Cache on 5XXX Zen3 Average +15% in Games

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Kedas

Senior member
Dec 6, 2018
355
339
136
Well we know now how they will bridge the long wait to Zen4 on AM5 Q4 2022.
Production start for V-cache is end this year so too early for Zen4 so this is certainly coming to AM4.
+15% Lisa said is "like an entire architectural generation"
 
Last edited:
  • Like
Reactions: Tlh97 and Gideon

Hougy

Member
Jan 13, 2021
79
62
61
So if it's 36 mm^2 X2 for two stacks of additional cache to get 12% more performance in gaming only, it seems very inefficient. The additional 72 mm^2 of silicon should be costly, and since Zen 3 is 81 mm^2, 88.8% more silicon should be giving 37% more performance by the square root rule of thumb.
 

Mopetar

Diamond Member
Jan 31, 2011
8,086
6,699
136
It's called 3D V-cache for a reason. Treating the problem as though you're working with a traditional two dimensional chip with a larger area doesn't make sense in this context. They're just building on top of existing real estate, much like we add additional floors to buildings because trying to spread the same amount of office space out over a single floor at ground level would be too expensive from a real estate perspective.
 

TheELF

Diamond Member
Dec 22, 2012
4,026
753
126
So if it's 36 mm^2 X2 for two stacks of additional cache to get 12% more performance in gaming only, it seems very inefficient. The additional 72 mm^2 of silicon should be costly, and since Zen 3 is 81 mm^2, 88.8% more silicon should be giving 37% more performance by the square root rule of thumb.
Anything that will have all of it's data ready from the get go will have a performance increase and probably more than the 12-15% of games.
The thing is as you rightly said, the cost. This is going to be way too expensive for normal people.
 
  • Haha
Reactions: spursindonesia

maddie

Diamond Member
Jul 18, 2010
4,878
4,951
136
Anything that will have all of it's data ready from the get go will have a performance increase and probably more than the 12-15% of games.
The thing is as you rightly said, the cost. This is going to be way too expensive for normal people.
Probably true on the costs. Maybe they'll just extend the range upward and drop prices for the lower SKUs without V-cache. I hope.
 

Kedas

Senior member
Dec 6, 2018
355
339
136
Lisa did say it's for the high end CPU's (at least at the start)
I don't agree on the cost being very high. There is only memory on that die so in this case it is really almost only the cost of the wafer + assembly which we all know isn't much compared to the high end CPU prices that includes much R&D.
And with a little redundancy you have almost 100% yield on those extra dies.
 

maddie

Diamond Member
Jul 18, 2010
4,878
4,951
136
Lisa did say it's for the high end CPU's (at least at the start)
I don't agree on the cost being very high. There is only memory on that die so in this case it is really almost only the cost of the wafer + assembly which we all know isn't much compared to the high end CPU prices that includes much R&D.
And with a little redundancy you have almost 100% yield on those extra dies.
Disagree. HBM is just Dram stacked and yet expensive. Defective assemblies is a key aspect of cost, not just die area. Only if they can sell defective stacks as plain non V-cache die will costs be low.

Attempt to stack cache, if fail, then use as normal CPU chiplet. Could this be one way of lowering costs?
 

Kedas

Senior member
Dec 6, 2018
355
339
136
HBM is $120/GB we are talking about 0.064 GB

But you are right about the assembly, if the process in not running wel yet it can become costly.

die size is 6mm x 6mm less than half of one zen3 die.
$9000 / 1500 dies = $6 wafer cost extra for 64MB
 
Last edited:

A///

Diamond Member
Feb 24, 2017
4,351
3,158
136
Until Zen 4 launches, I can see the cache stack going live on Epyc first for a while, then TR or alongside TR's launch/announcement, and then mainstream DT. That is unless they have boxes ready to go of a finished product. I would expect the price to rise some for consumers. If you want a premium product the nearest competition can't hope to touch, then you have to pay for it.
 

maddie

Diamond Member
Jul 18, 2010
4,878
4,951
136
Until Zen 4 launches, I can see the cache stack going live on Epyc first for a while, then TR or alongside TR's launch/announcement, and then mainstream DT. That is unless they have boxes ready to go of a finished product. I would expect the price to rise some for consumers. If you want a premium product the nearest competition can't hope to touch, then you have to pay for it.
AMD does not exist in isolation. This will hold off Intel till Zen4 launches. There's a reason they showed gaming benchmarks.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,786
136
So if it's 36 mm^2 X2 for two stacks of additional cache to get 12% more performance in gaming only, it seems very inefficient. The additional 72 mm^2 of silicon should be costly, and since Zen 3 is 81 mm^2, 88.8% more silicon should be giving 37% more performance by the square root rule of thumb.

Caches are different, because they are easier to manufacture and have higher yields because of the repetitive structure. Also they are quite power efficient.

You are applying the square root law in a wrong way. The square root law is a big penalty because the power consumption increase in the core also increases just as much.

This cache is going to add at best 3-4W.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,914
3,532
136
HBM is $120/GB we are talking about 0.064 GB

But you are right about the assembly, if the process in not running wel yet it can become costly.

die size is 6mm x 6mm less than half of one zen3 die.
$9000 / 1500 dies = $6 wafer cost extra for 64MB
So your tell me AMD sold my vega 56 to me at a loss just on the memory alone........

I think people are vastly over costing things, just like it seems we vastly over priced TMSC n7 wafer prices
 

CakeMonster

Golden Member
Nov 22, 2012
1,495
658
136
Until Zen 4 launches, I can see the cache stack going live on Epyc first for a while, then TR or alongside TR's launch/announcement, and then mainstream DT.
That would take so much time that Z4 would probably be releasing at the same time. That is unless Z4 also has cache.
 

CakeMonster

Golden Member
Nov 22, 2012
1,495
658
136
Anything that will have all of it's data ready from the get go will have a performance increase and probably more than the 12-15% of games.
I'm a bit confused about performance expectations. Someone cited the Broadwell with extra cache and that it only performed better in games.
 

A///

Diamond Member
Feb 24, 2017
4,351
3,158
136
That would take so much time that Z4 would probably be releasing at the same time. That is unless Z4 also has cache.
Not necessarily. Release doesn't always mean production starts close to that time. Besides, there's a beauty here of AMD using the same chiplet design for Epyc, TR and Ryzen.
 

jpiniero

Lifer
Oct 1, 2010
15,125
5,671
136
Until Zen 4 launches, I can see the cache stack going live on Epyc first for a while, then TR or alongside TR's launch/announcement, and then mainstream DT.

I think it's going to be just this DT product. It kinda sounds like Milan-X is semi-custom for a specific customer and might not be publicly available.

The TR that's coming out soon isn't going to have the cache.
 

maddie

Diamond Member
Jul 18, 2010
4,878
4,951
136
I agree.

They demoed with Zen 3, and Zen 3 already has infrastructure in place to have stacked SRAM. It's coming with Zen 3.



The per clock gain is about 5%, which is not too shabby, but it can be all over the place. There are some applications where the large cache will beat everything.

Remember though, Broadwell's eDRAM is more like L4. Meaning it has to go through all stages, plus it was off package so the bandwidth was much lower and latency is higher.

AMD's approach is literally L3 that's 3x as large. The benefits should be larger and more broad. Still won't be huge but 5-10% average will be great!

Another similar comparison is the Pentium 4 3.2EE. Compare that to the regular Pentium 4 3.2(not Prescott) and see how it compares.
Good point to stress. This IS additional L3 cache, not an additional level of cache.
 
  • Like
Reactions: Tlh97 and Makaveli

maddie

Diamond Member
Jul 18, 2010
4,878
4,951
136
So what i dont understand is , if its only 1 stack for 64mb, has the same performance/latency as the existing L3, why is the existing L3 so big. I wonder if it will limit clocks at some point?
From
https://www.anandtech.com/show/1672...acked-vcache-technology-2-tbsec-for-15-gaming
In a call with AMD, we have confirmed the following:

  • This technology will be productized with 7nm Zen 3-based Ryzen processors. Nothing was said about EPYC.
  • Those processors will start production at the end of the year. No comment on availability, although Q1 2022 would fit into AMD's regular cadence.
  • This V-Cache chiplet is 64 MB of additional L3, with no stepped penalty on latency. The V-Cache is address striped with the normal L3 and can be powered down when not in use. The V-Cache sits on the same power plane as the regular L3.
  • The processor with V-Cache is the same z-height as current Zen 3 products - both the core chiplet and the V-Cache are thinned to have an equal z-height as the IOD die for seamless integration
  • As the V-Cache is built over the L3 cache on the main CCX, it doesn't sit over any of the hotspots created by the cores and so thermal considerations are less of an issue. The support silicon above the cores is designed to be thermally efficient.
  • The V-Cache is a single 64 MB die, and is relatively denser than the normal L3 because it uses SRAM-optimized libraries of TSMC's 7nm process, AMD knows that TSMC can do multiple stacked dies, however AMD is only talking about a 1-High stack at this time which it will bring to market.

edit:
This reinforces my opinion that chiplets that fail the validation testing can be used for existing products. Z ht the same, V-cache can be switched off.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,786
136
So what i dont understand is , if its only 1 stack for 64mb, has the same performance/latency as the existing L3, why is the existing L3 so big. I wonder if it will limit clocks at some point?

It could be slower in terms of latency, so it might have slight differences on whether it's going to the V-stack or on the original one.

They said 2TB/s of bandwidth which is not lower than the bandwidth of the L3 caches in 5950X.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,914
3,532
136
It could be slower in terms of latency, so it might have slight differences on whether it's going to the V-stack or on the original one.

They said 2TB/s of bandwidth which is not lower than the bandwidth of the L3 caches in 5950X.
There might be a cycle or two because its further away but a CCD doesnt have uniform latency anyway, it cant be massively different because you would hit queuing / transfer/timing etc issues. The more bandwdith is because there is more cache slices, obviously this was designed this way from the start.
 

eek2121

Diamond Member
Aug 2, 2005
3,100
4,398
136
FYI, according to Ian:

"Confirmed with AMD that V-Cache will be coming to Ryzen Zen 3 products, with production at end of year. "

Zen 3 it is. AMD gets to skip a generation. Let's go!
 
  • Like
Reactions: cytg111

Kedas

Senior member
Dec 6, 2018
355
339
136
About that You tube video some wrong info in there, the CPU that Lisa did show had 64MB on both dies, they just removed the top layer of one die so you could see the V-cache.

The fact that it can be switched of is a nice feature for low power usage.
Zen3 wat build with this add-on in mind, so that info took a long time to get out...
 

Gideon

Golden Member
Nov 27, 2007
1,765
4,114
136
I'm a bit confused about performance expectations. Someone cited the Broadwell with extra cache and that it only performed better in games.
Broadwell's L4 was a different beast though since it was an MCM solution.
  1. It had poor bandwidth for a cache (50 GiB/s vs 25.6 GiB/s for DDR3 1600)
  2. It also suffered from poor latency of around 150 cycles (vs 200+ cycles for DDR3)

AMD's solution meawhile has:
  1. 2 TB/s bandwidth
  2. According to this indiscernible latency from the rest of the L3, that would be around 50 cycles.
So despite the overall cache size being similar (128MB vs 96MB) V-cache can transfer 40x more data at once while doing it over 3x faster.

There will still be plenty of use-cases that won't benefit much from the extra cache, but it should be much more capable than Broadwell.
I'm particularly interested in software compiling benchmarks as these tend to scale very well with extra cache.
 
Last edited:

Gideon

Golden Member
Nov 27, 2007
1,765
4,114
136
A large L4 cache might be required for most enthusiast level future CPU's paired with DDR5, to hide the rumored higher DRAM latencies, the one critical thing that DDR5 doesn't improve upon.
Are you sure? Geil promised 10ns true latency modules (7200 MT/s @ CL36 and 6400 MT/s @ CL32) at launch, which is exactly the same as the vast majority of overclocked DDR4 modules. XPG even promises 7400 MT/s modues at unknown latencies.

I know that Ian was worried about latencies in this article, but it does not seem to have been materialized.