Info 64MB V-Cache on 5XXX Zen3 Average +15% in Games

Thread starter Kedas
Start date Jun 1, 2021

Toggle sidebar Toggle sidebar

K

Kedas

Senior member

Jun 1, 2021

#1

Well we know now how they will bridge the long wait to Zen4 on AM5 Q4 2022.
Production start for V-cache is end this year so too early for Zen4 so this is certainly coming to AM4.
+15% Lisa said is "like an entire architectural generation"

Last edited: Jun 1, 2021

Reactions: Tlh97 and Gideon

Sort by date Sort by votes

L

lixlax

Member

Jun 1, 2021

#2

I was expecting 3D stacked chips to first start appearing in some super expensive server solutions, but it seems to be client first.
Further 10%+ on gaming perf will be impressive, but I expect only a small number of other apps on the client side to actually benefit from this.
This+DDR5 could make APU graphics perfomance skyrocket though...exciting times ahead.

Reactions: Tlh97, french toast, DarthKyrie and 4 others

Upvote 0 Downvote

M

Mopetar

Diamond Member

Jun 1, 2021

#3

The flip side is that if you do have an application that can now largely fit into cache, the performance uplift is substantial. It wasn't all that long ago that 64 MB was your system memory.

Reactions: Tlh97, french toast and lightmanek

Upvote 0 Downvote

H

Hougy

Member

Jun 1, 2021

#4

So if it's 36 mm^2 X2 for two stacks of additional cache to get 12% more performance in gaming only, it seems very inefficient. The additional 72 mm^2 of silicon should be costly, and since Zen 3 is 81 mm^2, 88.8% more silicon should be giving 37% more performance by the square root rule of thumb.

Upvote 0 Downvote

Asterox

Golden Member

Jun 1, 2021

#5

It is much easier to listen, then to read an article.

Last edited: Jun 1, 2021

Reactions: Makaveli, Tlh97, lightmanek and 1 other person

Upvote 0 Downvote

M

Mopetar

Diamond Member

Jun 1, 2021

#6

It's called 3D V-cache for a reason. Treating the problem as though you're working with a traditional two dimensional chip with a larger area doesn't make sense in this context. They're just building on top of existing real estate, much like we add additional floors to buildings because trying to spread the same amount of office space out over a single floor at ground level would be too expensive from a real estate perspective.

Reactions: MangoX, Tlh97, french toast and 3 others

Upvote 0 Downvote

TheELF

Diamond Member

Jun 1, 2021

#7

Hougy said:
So if it's 36 mm^2 X2 for two stacks of additional cache to get 12% more performance in gaming only, it seems very inefficient. The additional 72 mm^2 of silicon should be costly, and since Zen 3 is 81 mm^2, 88.8% more silicon should be giving 37% more performance by the square root rule of thumb.

Anything that will have all of it's data ready from the get go will have a performance increase and probably more than the 12-15% of games.
The thing is as you rightly said, the cost. This is going to be way too expensive for normal people.

Reactions: spursindonesia

Upvote 0 Downvote

M

maddie

Diamond Member

Jun 1, 2021

#8

TheELF said:
Anything that will have all of it's data ready from the get go will have a performance increase and probably more than the 12-15% of games.
The thing is as you rightly said, the cost. This is going to be way too expensive for normal people.

Probably true on the costs. Maybe they'll just extend the range upward and drop prices for the lower SKUs without V-cache. I hope.

Upvote 0 Downvote

K

Kedas

Senior member

Jun 1, 2021

#9

Lisa did say it's for the high end CPU's (at least at the start)
I don't agree on the cost being very high. There is only memory on that die so in this case it is really almost only the cost of the wafer + assembly which we all know isn't much compared to the high end CPU prices that includes much R&D.
And with a little redundancy you have almost 100% yield on those extra dies.

Reactions: Tlh97 and lightmanek

Upvote 0 Downvote

M

maddie

Diamond Member

Jun 1, 2021

#10

Kedas said:
Lisa did say it's for the high end CPU's (at least at the start)
I don't agree on the cost being very high. There is only memory on that die so in this case it is really almost only the cost of the wafer + assembly which we all know isn't much compared to the high end CPU prices that includes much R&D.
And with a little redundancy you have almost 100% yield on those extra dies.

Disagree. HBM is just Dram stacked and yet expensive. Defective assemblies is a key aspect of cost, not just die area. Only if they can sell defective stacks as plain non V-cache die will costs be low.

Attempt to stack cache, if fail, then use as normal CPU chiplet. Could this be one way of lowering costs?

Reactions: pcp7, Tlh97, moinmoin and 2 others

Upvote 0 Downvote

K

Kedas

Senior member

Jun 1, 2021

#11

HBM is $120/GB we are talking about 0.064 GB

But you are right about the assembly, if the process in not running wel yet it can become costly.

die size is 6mm x 6mm less than half of one zen3 die.
$9000 / 1500 dies = $6 wafer cost extra for 64MB

Last edited: Jun 1, 2021

Reactions: psolord, Atari2600, Tlh97 and 2 others

Upvote 0 Downvote

A

A///

Diamond Member

Jun 1, 2021

#12

Until Zen 4 launches, I can see the cache stack going live on Epyc first for a while, then TR or alongside TR's launch/announcement, and then mainstream DT. That is unless they have boxes ready to go of a finished product. I would expect the price to rise some for consumers. If you want a premium product the nearest competition can't hope to touch, then you have to pay for it.

Upvote 0 Downvote

M

maddie

Diamond Member

Jun 1, 2021

#13

A/// said:
Until Zen 4 launches, I can see the cache stack going live on Epyc first for a while, then TR or alongside TR's launch/announcement, and then mainstream DT. That is unless they have boxes ready to go of a finished product. I would expect the price to rise some for consumers. If you want a premium product the nearest competition can't hope to touch, then you have to pay for it.

AMD does not exist in isolation. This will hold off Intel till Zen4 launches. There's a reason they showed gaming benchmarks.

Reactions: Makaveli, bigboxes, Tlh97 and 3 others

Upvote 0 Downvote

IntelUser2000

Elite Member

Jun 1, 2021

#14

Hougy said:
So if it's 36 mm^2 X2 for two stacks of additional cache to get 12% more performance in gaming only, it seems very inefficient. The additional 72 mm^2 of silicon should be costly, and since Zen 3 is 81 mm^2, 88.8% more silicon should be giving 37% more performance by the square root rule of thumb.

Caches are different, because they are easier to manufacture and have higher yields because of the repetitive structure. Also they are quite power efficient.

You are applying the square root law in a wrong way. The square root law is a big penalty because the power consumption increase in the core also increases just as much.

This cache is going to add at best 3-4W.

Reactions: ozzy702, Tlh97, Gideon and 3 others

Upvote 0 Downvote

I

itsmydamnation

Platinum Member

Jun 1, 2021

#15

Kedas said:
HBM is $120/GB we are talking about 0.064 GB

But you are right about the assembly, if the process in not running wel yet it can become costly.

die size is 6mm x 6mm less than half of one zen3 die.
$9000 / 1500 dies = $6 wafer cost extra for 64MB

So your tell me AMD sold my vega 56 to me at a loss just on the memory alone........

I think people are vastly over costing things, just like it seems we vastly over priced TMSC n7 wafer prices

Reactions: ozzy702, Tlh97, HurleyBird and 2 others

Upvote 0 Downvote

C

CakeMonster

Golden Member

Jun 1, 2021

#16

A/// said:
Until Zen 4 launches, I can see the cache stack going live on Epyc first for a while, then TR or alongside TR's launch/announcement, and then mainstream DT.

That would take so much time that Z4 would probably be releasing at the same time. That is unless Z4 also has cache.

Upvote 0 Downvote

C

CakeMonster

Golden Member

Jun 1, 2021

#17

TheELF said:
Anything that will have all of it's data ready from the get go will have a performance increase and probably more than the 12-15% of games.

I'm a bit confused about performance expectations. Someone cited the Broadwell with extra cache and that it only performed better in games.

Upvote 0 Downvote

A

A///

Diamond Member

Jun 1, 2021

#18

CakeMonster said:
That would take so much time that Z4 would probably be releasing at the same time. That is unless Z4 also has cache.

Not necessarily. Release doesn't always mean production starts close to that time. Besides, there's a beauty here of AMD using the same chiplet design for Epyc, TR and Ryzen.

Upvote 0 Downvote

J

jpiniero

Lifer

Jun 1, 2021

#19

A/// said:
Until Zen 4 launches, I can see the cache stack going live on Epyc first for a while, then TR or alongside TR's launch/announcement, and then mainstream DT.

I think it's going to be just this DT product. It kinda sounds like Milan-X is semi-custom for a specific customer and might not be publicly available.

The TR that's coming out soon isn't going to have the cache.

Upvote 0 Downvote

IntelUser2000

Elite Member

Jun 1, 2021

#20

CakeMonster said:
That would take so much time that Z4 would probably be releasing at the same time. That is unless Z4 also has cache.

I agree.

They demoed with Zen 3, and Zen 3 already has infrastructure in place to have stacked SRAM. It's coming with Zen 3.

CakeMonster said:
I'm a bit confused about performance expectations. Someone cited the Broadwell with extra cache and that it only performed better in games.

The per clock gain is about 5%, which is not too shabby, but it can be all over the place. There are some applications where the large cache will beat everything.

Remember though, Broadwell's eDRAM is more like L4. Meaning it has to go through all stages, plus it was off package so the bandwidth was much lower and latency is higher.

AMD's approach is literally L3 that's 3x as large. The benefits should be larger and more broad. Still won't be huge but 5-10% average will be great!

Another similar comparison is the Pentium 4 3.2EE. Compare that to the regular Pentium 4 3.2(not Prescott) and see how it compares.

Last edited: Jun 1, 2021

Reactions: Kaluan, Tlh97, moinmoin and 5 others

Upvote 2 Downvote

M

maddie

Diamond Member

Jun 1, 2021

#21

IntelUser2000 said:
I agree.

They demoed with Zen 3, and Zen 3 already has infrastructure in place to have stacked SRAM. It's coming with Zen 3.

The per clock gain is about 5%, which is not too shabby, but it can be all over the place. There are some applications where the large cache will beat everything.

Remember though, Broadwell's eDRAM is more like L4. Meaning it has to go through all stages, plus it was off package so the bandwidth was much lower and latency is higher.

AMD's approach is literally L3 that's 3x as large. The benefits should be larger and more broad. Still won't be huge but 5-10% average will be great!

Another similar comparison is the Pentium 4 3.2EE. Compare that to the regular Pentium 4 3.2(not Prescott) and see how it compares.

Good point to stress. This IS additional L3 cache, not an additional level of cache.

Reactions: Tlh97 and Makaveli

Upvote 0 Downvote

I

itsmydamnation

Platinum Member

Jun 1, 2021

#22

So what i dont understand is , if its only 1 stack for 64mb, has the same performance/latency as the existing L3, why is the existing L3 so big. I wonder if it will limit clocks at some point?

Reactions: Tlh97 and lightmanek

Upvote -1 Downvote

M

maddie

Diamond Member

Jun 1, 2021

#23

itsmydamnation said:
So what i dont understand is , if its only 1 stack for 64mb, has the same performance/latency as the existing L3, why is the existing L3 so big. I wonder if it will limit clocks at some point?

From
https://www.anandtech.com/show/1672...acked-vcache-technology-2-tbsec-for-15-gaming
In a call with AMD, we have confirmed the following:

This technology will be productized with 7nm Zen 3-based Ryzen processors. Nothing was said about EPYC.
Those processors will start production at the end of the year. No comment on availability, although Q1 2022 would fit into AMD's regular cadence.
This V-Cache chiplet is 64 MB of additional L3, with no stepped penalty on latency. The V-Cache is address striped with the normal L3 and can be powered down when not in use. The V-Cache sits on the same power plane as the regular L3.
The processor with V-Cache is the same z-height as current Zen 3 products - both the core chiplet and the V-Cache are thinned to have an equal z-height as the IOD die for seamless integration
As the V-Cache is built over the L3 cache on the main CCX, it doesn't sit over any of the hotspots created by the cores and so thermal considerations are less of an issue. The support silicon above the cores is designed to be thermally efficient.
The V-Cache is a single 64 MB die, and is relatively denser than the normal L3 because it uses SRAM-optimized libraries of TSMC's 7nm process, AMD knows that TSMC can do multiple stacked dies, however AMD is only talking about a 1-High stack at this time which it will bring to market.

edit:
This reinforces my opinion that chiplets that fail the validation testing can be used for existing products. Z ht the same, V-cache can be switched off.

Last edited: Jun 1, 2021

Reactions: Kaluan, Mopetar, Tlh97 and 4 others

Upvote 0 Downvote

IntelUser2000

Elite Member

Jun 1, 2021

#24

itsmydamnation said:
So what i dont understand is , if its only 1 stack for 64mb, has the same performance/latency as the existing L3, why is the existing L3 so big. I wonder if it will limit clocks at some point?

It could be slower in terms of latency, so it might have slight differences on whether it's going to the V-stack or on the original one.

They said 2TB/s of bandwidth which is not lower than the bandwidth of the L3 caches in 5950X.

Reactions: Tlh97 and lightmanek

Upvote 0 Downvote

I

itsmydamnation

Platinum Member

Jun 1, 2021

#25

IntelUser2000 said:
It could be slower in terms of latency, so it might have slight differences on whether it's going to the V-stack or on the original one.

They said 2TB/s of bandwidth which is not lower than the bandwidth of the L3 caches in 5950X.

There might be a cycle or two because its further away but a CCD doesnt have uniform latency anyway, it cant be massively different because you would hit queuing / transfer/timing etc issues. The more bandwdith is because there is more cache slices, obviously this was designed this way from the start.

Reactions: Kaluan, Tlh97 and maddie

Upvote 0 Downvote

You must log in or register to reply here.

Share:

Facebook X (Twitter) Reddit Tumblr WhatsApp Email Link

TRENDING THREADS

Discussion Intel current and future Lakes & Rapids thread
- Started by TheF34RChannel
- Jun 18, 2017
- Replies: 23K
CPUs and Overclocking
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)
- Started by DisEnchantment
- Sep 29, 2022
- Replies: 9K
CPUs and Overclocking
Discussion Apple Silicon SoC thread
- Started by Eug
- Nov 10, 2020
- Replies: 6K
CPUs and Overclocking
T
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads
- Started by Tigerick
- Aug 22, 2022
- Replies: 7K
CPUs and Overclocking
Question Raptor Lake - Official Thread
- Started by Hulk
- Dec 5, 2021
- Replies: 5K
CPUs and Overclocking

Top Bottom

This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.

Accept Learn more…