Info 64MB V-Cache on 5XXX Zen3 Average +15% in Games

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Toggle sidebar Toggle sidebar

K

Kedas

Senior member

Jun 1, 2021

#1

Well we know now how they will bridge the long wait to Zen4 on AM5 Q4 2022.
Production start for V-cache is end this year so too early for Zen4 so this is certainly coming to AM4.
+15% Lisa said is "like an entire architectural generation"

Last edited: Jun 1, 2021

Reactions: Tlh97 and Gideon

Sort by date Sort by votes

N

NostaSeronx

Diamond Member

Jun 12, 2021

#101

HBM_SoIC is a thing. So... gigabytes of DRAM cache is feasible down the road. On another design/generation of products.

Upvote 0 Downvote

L

lobz

Platinum Member

Jun 12, 2021

#102

lightmanek said:
Yep, I too had 2MB L2 or with my K6-III 400MHz, L3 cache on board.
Funny that your total RAM at the time will soon be equal to an L3 cache on consumer desktop CPU

Kind of first test I will run on that thing will be Quake 2 and Quake 3 timedemos

Quake 3 was perfection

Reactions: Kepler_L2, JimmyH, Tlh97 and 1 other person

Upvote 0 Downvote

J

jamescox

Senior member

Jun 12, 2021

#103

igor_kavinski said:
Talking about gigabytes of SRAM cache is getting a bit ahead of ourselves, methinks.

Isn’t there already a Zen 3 bios showing some settings for up to 4 stacked cache (X3D) die? 8 chiplets * (32 MB+ (64 MB * 4)) = 2304 MB. Over 2 GB seems possible, but it may pull too much power for the general case. Might be limited to specialized enterprise applications (some types of HPC, high-end database servers, etc.).

edit: I guess that could be a fake. I don’t actually know the origin of that bios setting image, but a lot of people seem to accept it.

Last edited: Jun 12, 2021

Upvote 0 Downvote

J

jamescox

Senior member

Jun 12, 2021

#104

NostaSeronx said:
View attachment 45665

HBM_SoIC is a thing. So... gigabytes of DRAM cache is feasible down the road. On another design/generation of products.

It will be interesting to see what massive SRAM caches can do for GPUs. We already have 128 MB on AMD GPUs, but I doubt that is being taken full advantage of yet. Probably not until we get a chiplet based GPU.

Upvote 0 Downvote

T

Topweasel

Diamond Member

Jun 13, 2021

#105

jamescox said:
It will be interesting to see what massive SRAM caches can do for GPUs. We already have 128 MB on AMD GPUs, but I doubt that is being taken full advantage of yet. Probably not until we get a chiplet based GPU.

Isn't that basically the first step to this. I wouldn't be surprised if that was step one for to align this tech to be used on every product line. Someone can tell me if I am wrong, but this isn't like nand where every level is another full plan of nand cells. They are stacking cache on top of cache so for example they can't have the stacked cache on top of the cores. It makes sense why AMD went with the 8 matrixed cores for the CCD's in Zen 3 over the CCX's. They would have to make 2 area's on the layer unconnected for the older CCX's. Now its just one area. So by creating a SRAM cache in the Navi 2x they have the building blocks to start adding more cache. When they do it on NAVI they could do 3 layers and have half a gig of cache. That would be even without direct software development a pretty big boost. That's more than all of the PS3 or Xbox had of shared memory. Navi from the looks of it has a bit of Zen1/2 look to it with them having split compute complexes. Entirely possible Navi following a step behind Zen development the next one has a single compute complex and a single shared L3 cache location. Navi 3x was all about tripling the performance efficiency over Navi 1. Maybe thats the final step, using this V-cache to double or quadruple the L3 (on top of whatever the L3 is next gen). Hell then you could have an easier time seperating performance. One layer of V-ram is bad, cut it off and its a 7800XT, 2 and its a 7800. You could cut out compute cores on top of that for overall Yields.

But that assumes I am right. I do think they will eventually go to multichip designs. But AMD's been as much about lowering waste and maximizing wafer returns for long while. Hell all of Zen might have been an attempt to stay profitable while having to play nice with GoFlo's WSA that allowed for low yields and maybe low clocks. Variable cache layers and the possible large impact that would have on large monolithic dies gives AMD another avenue for binning to reduce waste numbers. Specially on a video card. AMD is launching this on Zen 3, because its already on market, and with a single layer, any non-qualifying layer, can just be market a standard Zen 3 chip and the day is done. It gives a mild refresh while they prep Zen 4 and the process can be used as a pipe cleaner / process tester. But Ryzen isn't where the real money is. That is in Epyc. After that the best margin and sales will be in the GPU market, with Desktop/Workstation/Server all being high margin chips.

Reactions: Tlh97

Upvote 0 Downvote

Schmide

Diamond Member

Jun 14, 2021

#106

Ehh cache will grow by what the hierarchy demands and as the number of ways allows. Having more cache than you can search in the allotted divider will sooner or later bump up against the memory below.

Just because you an put it there does not necessarily make it useful. At some point the CPUs above will have to double their cache line size and push the bottleneck down the tree.

Going from 32mb to 96mb is more than a normal cache cycle increment. I wouldn't be surprised if there is some blending of the speed of the normal 32mb and the width of the extra (64)96mb to give the illusion of uniformity.

Last edited: Jun 14, 2021

Upvote 0 Downvote

J

JoeRambo

Golden Member

Jun 14, 2021

#107

Schmide said:
Going from 32mb to 96mb is more than a normal cache cycle increment. I wouldn't be surprised if there is some blending of the speed of the normal 32mb and the width of the extra (64)96mb to give the illusion of uniformity.

They have pretty much revealed all info and then some about how it works. They are adding more L3 cache slices and letting their usual partial address bits hash to select the slice where certain cache line is placed ( and what slice is "queried" when someone comes looking for address with bits for that ).
Currently 8C compete for 8 slices, that limit cumulative bandwidth and there are 2nd order effects like address conflicts, uneven loading of slices, cache way limits and conflicts.
They are adding either 8x8MB or 16x4MB slices. As long as chip is designed for it, average L3 latency does not have to increase much, no extra processing is done to select slices beyond what is done already. And hard to comment on extra "physical" latency, L3 caches are large and comparatively slow, several extra cycles won't be noticeable.

"Blending" is on already for both Intel/AMD, final latency depends on what address cache line has, and how far away on ring, X-bar, mesh or whatever a slice for said hash bits is.

Reactions: french toast, Tlh97, lightmanek and 2 others

Upvote 0 Downvote

T

Topweasel

Diamond Member

Jun 14, 2021

#108

The obvious answer to this is Epyc. Several datacenter DBs and HPC setups that have near endless improvements in cache and memory.

But your not wrong. But it's also seems like a fatalistic option. We shouldn't contemplate cache size improvements and possibilities because there is some limit at or a diminishing returns point.

This is where multilayer cache would come in handy. Specially with semi custom. You want 1GB in L3. Sure there you go. That's $10k. And even with diminishing returns a lot of times customers are willing to pay big to push through it. It's why companies like Amazon are making their own CPU's. They can scale things like cache and core counts based on their requirements.

Reactions: Joe NYC, Tlh97 and lightmanek

Upvote 0 Downvote

G

gdansk

Platinum Member

Jun 14, 2021

#109

Some of this cache talk reminds me of the late PA RISC chips. All PA-8x00 cores were similar except the cache arrangements. Later models had a massive on die 2.25MB L1 and 32 or 64MB external L2. Although those were much slower than what AMD is doing and I think the L2 was eDRAM of some sort, not SRAM. But it's interesting to see how much you can get out of a design just by changing the cache configurations.

Reactions: Tlh97, lightmanek and Vattila

Upvote 0 Downvote

J

jamescox

Senior member

Jun 15, 2021

#110

gdansk said:
Some of this cache talk reminds me of the late PA RISC chips. All PA-8x00 cores were similar except the cache arrangements. Later models had a massive on die 2.25MB L1 and 32 or 64MB external L2. Although those were much slower than what AMD is doing and I think the L2 was eDRAM of some sort, not SRAM. But it's interesting to see how much you can get out of a design just by changing the cache configurations.

I think I used one of those. It was a lot faster than x86 and SPARC of the time, if I remember correctly. Given the die area taken, you are buying more a memory chip than a processing chip these days. It will be interesting to see the compile benchmarks. The Linux kernel compile is already down to 20 seconds on Epyc. We’re going to need a bigger benchmark.

Upvote 0 Downvote

J

jamescox

Senior member

Jun 15, 2021

#111

Topweasel said:
The obvious answer to this is Epyc. Several datacenter DBs and HPC setups that have near endless improvements in cache and memory.

But your not wrong. But it's also seems like a fatalistic option. We shouldn't contemplate cache size improvements and possibilities because there is some limit at or a diminishing returns point.

This is where multilayer cache would come in handy. Specially with semi custom. You want 1GB in L3. Sure there you go. That's $10k. And even with diminishing returns a lot of times customers are willing to pay big to push through it. It's why companies like Amazon are making their own CPU's. They can scale things like cache and core counts based on their requirements.

Considering per core licensing fees and such, $10k might be the lower end model. Some applications can get a massive boost from this much cache, so the prices might be reasonable in that case. I am curious how much power that much sram will take though.

Ltt would probably get one and play video games on it, which would be amusing. They previously played a game on a 64 core Epyc using a software render; no gpu. If we get a zen 4 Epyc with massively improved FP and massive caches, it might actually be playable.

Reactions: Tlh97 and Topweasel

Upvote 0 Downvote

T

Topweasel

Diamond Member

Jun 15, 2021

#112

jamescox said:
Considering per core licensing fees and such, $10k might be the lower end model. Some applications can get a massive boost from this much cache, so the prices might be reasonable in that case. I am curious how much power that much sram will take though.

Ltt would probably get one and play video games on it, which would be amusing. They previously played a game on a 64 core Epyc using a software render; no gpu. If we get a zen 4 Epyc with massively improved FP and massive caches, it might actually be playable.

Possibly, but they have to offer insane value for companies to make the transition from Intel. Some of that requires performance and features. Some of that is in cost. So while I am sure they could offer these more than what you are even think let alone me. It would be smart for AMD to still try to be price competitive.

Upvote 0 Downvote

D

Doug S

Platinum Member

Jun 15, 2021

#113

gdansk said:
Some of this cache talk reminds me of the late PA RISC chips. All PA-8x00 cores were similar except the cache arrangements. Later models had a massive on die 2.25MB L1 and 32 or 64MB external L2. Although those were much slower than what AMD is doing and I think the L2 was eDRAM of some sort, not SRAM. But it's interesting to see how much you can get out of a design just by changing the cache configurations.

PA-RISC was always doing something different than everyone else with caches. Back in the days when 8K to 16K was state of the art for on chip L1, HP was doing wave pipelined off chip L1 caches from 256K to several MB. That was feasible since the CPU's clock rates were in the 50 to low hundreds of MHz range. They continued with huge off chip L1s until I think the PA-8500 finally allowed them to integrate it on chip.

Given that PA-RISCs market was servers running massive Oracle databases and workstations running eCAD and mCAD software that cost more to license than the very expensive hardware on which it was running, that sort of thing made sense.

Upvote 0 Downvote

J

jamescox

Senior member

Jun 15, 2021

#114

Topweasel said:
Possibly, but they have to offer insane value for companies to make the transition from Intel. Some of that requires performance and features. Some of that is in cost. So while I am sure they could offer these more than what you are even think let alone me. It would be smart for AMD to still try to be price competitive.

They already offer CPUs with 32 MB of L3 per core, 256 MB total. That is the 72F3 with 8 cores at 3.7 base clock, 4.1 boost for maximum per core performance. These are ~$2500 list, but if your software is licensed per core, that is probably a good deal. Intel has 38.5 MB on one chip, but that is a 28 core cpu. That is only 1.375 MB per core. I don’t know what their current max cache per core product is.

AMD with Rome kind of already offers an insane value before we even get into stacked caches; it is up to 256 MB on package already. If Milan-x actually goes up to 4 layers of cache die, then it will absolutely destroy everything else for certain applications. They will be able to sell those for really high prices. Milan-x with even 1 layer of cache die would probably dominate for many benchmarks, even more than they already do. Intel hasn’t had comparable products for a while.

If they can pull off gigabyte(s) of SRAM in package, then that will massively accelerate some HPC applications, high end database servers, and probably ray tracing applications that still run on the cpu. The render workstation / server using cpu may be quickly obsoleted, if it isn’t already, by GPUs. We are going to get chiplet based GPUs , possibly with massive amounts of on die SRAM and DRAM. I am also looking forward to compile benchmarks on the cpu.

I am a little suspicious of the X3D settings in the bios though. If you can enable or disable the cache die, then that might mean that there is a trade off there somewhere. Perhaps higher latency as more cache die are enabled. That would mean that some applications may perform worse with the larger cache due to higher latency and little benefit from the higher hit rate and / or bandwidth.

Reactions: Tlh97 and moinmoin

Upvote 0 Downvote

L

LightningZ71

Golden Member

Jun 15, 2021

#115

I'm sure that there is at least some sort of trade off with respect to total package power and peak clocks due to thermals. While the L3 due aren't big power hogs, they are also certainly not free.

Upvote 0 Downvote

E

eek2121

Platinum Member

Jun 16, 2021

#116

jamescox said:
They already offer CPUs with 32 MB of L3 per core, 256 MB total. That is the 72F3 with 8 cores at 3.7 base clock, 4.1 boost for maximum per core performance. These are ~$2500 list, but if your software is licensed per core, that is probably a good deal. Intel has 38.5 MB on one chip, but that is a 28 core cpu. That is only 1.375 MB per core. I don’t know what their current max cache per core product is.

AMD with Rome kind of already offers an insane value before we even get into stacked caches; it is up to 256 MB on package already. If Milan-x actually goes up to 4 layers of cache die, then it will absolutely destroy everything else for certain applications. They will be able to sell those for really high prices. Milan-x with even 1 layer of cache die would probably dominate for many benchmarks, even more than they already do. Intel hasn’t had comparable products for a while.

If they can pull off gigabyte(s) of SRAM in package, then that will massively accelerate some HPC applications, high end database servers, and probably ray tracing applications that still run on the cpu. The render workstation / server using cpu may be quickly obsoleted, if it isn’t already, by GPUs. We are going to get chiplet based GPUs , possibly with massive amounts of on die SRAM and DRAM. I am also looking forward to compile benchmarks on the cpu.

I am a little suspicious of the X3D settings in the bios though. If you can enable or disable the cache die, then that might mean that there is a trade off there somewhere. Perhaps higher latency as more cache die are enabled. That would mean that some applications may perform worse with the larger cache due to higher latency and little benefit from the higher hit rate and / or bandwidth.

I suspect it is for compatibility. Some applications may not particularly like large caches.

Also, being able to easily turn it off makes it easier to debug and benchmark.

Upvote 0 Downvote

NTMBK

Lifer

Jun 16, 2021

#117

jamescox said:
I am a little suspicious of the X3D settings in the bios though. If you can enable or disable the cache die, then that might mean that there is a trade off there somewhere. Perhaps higher latency as more cache die are enabled. That would mean that some applications may perform worse with the larger cache due to higher latency and little benefit from the higher hit rate and / or bandwidth.

Makes for a good review, doesn't it? If the reviewer can turn the new feature on and off and easily show a big improvement, then that helps a lot with marketing.

Reactions: Joe NYC

Upvote 0 Downvote

T

Topweasel

Diamond Member

Jun 16, 2021

#118

jamescox said:
They already offer CPUs with 32 MB of L3 per core, 256 MB total. That is the 72F3 with 8 cores at 3.7 base clock, 4.1 boost for maximum per core performance. These are ~$2500 list, but if your software is licensed per core, that is probably a good deal. Intel has 38.5 MB on one chip, but that is a 28 core cpu. That is only 1.375 MB per core. I don’t know what their current max cache per core product is.

AMD with Rome kind of already offers an insane value before we even get into stacked caches; it is up to 256 MB on package already. If Milan-x actually goes up to 4 layers of cache die, then it will absolutely destroy everything else for certain applications. They will be able to sell those for really high prices. Milan-x with even 1 layer of cache die would probably dominate for many benchmarks, even more than they already do. Intel hasn’t had comparable products for a while.

If they can pull off gigabyte(s) of SRAM in package, then that will massively accelerate some HPC applications, high end database servers, and probably ray tracing applications that still run on the cpu. The render workstation / server using cpu may be quickly obsoleted, if it isn’t already, by GPUs. We are going to get chiplet based GPUs , possibly with massive amounts of on die SRAM and DRAM. I am also looking forward to compile benchmarks on the cpu.

I am a little suspicious of the X3D settings in the bios though. If you can enable or disable the cache die, then that might mean that there is a trade off there somewhere. Perhaps higher latency as more cache die are enabled. That would mean that some applications may perform worse with the larger cache due to higher latency and little benefit from the higher hit rate and / or bandwidth.

My point was that they might want to be still be the value option here for market penetration. I get what Rome and Milan brought in comparison to the competition and even in worst case scenario (super high clock, requires water cooling) versions still don't quite reach Intels high end in cost. Intel is still a moving target. Their packaging moves in the future will allow them to toss on SRAM modules almost on demand. They will have packaging limits AMD won't have, but AMD also has to make sure they are using enough layers to keep up as it will be harder for them to up the layer count on a reasonable time table. Also just considering the sheer mass of requirements that it takes to get DC customers to make the switch, I could see AMD still requiring both in cost/features/performance needing to keep value up at least for DC.

Upvote 0 Downvote

L

Leeea

Diamond Member

Jun 23, 2021

#119

Any updates on when we will see this tech as purchasable? 3rd quarter 2021?

Upvote 0 Downvote

J

jpiniero

Lifer

Jun 23, 2021

#120

Leeea said:
Any updates on when we will see this tech as purchasable? 3rd quarter 2021?

AMD said it would start production at the end of the year. You're looking at a February or March release.

Reactions: Tlh97, BTRY B 529th FA BN and yuri69

Upvote 0 Downvote

F

french toast

Senior member

Jun 23, 2021

#121

jpiniero said:
AMD said it would start production at the end of the year. You're looking at a February or March release.

That being the case then Zen 4 is looking like Q4 2022, unless sticking on Zen 3 was a red herring?...

Upvote 0 Downvote

L

Leeea

Diamond Member

Jun 23, 2021

#122

jpiniero said:
AMD said it would start production at the end of the year. You're looking at a February or March release.

So sad!

( I am still using a long on tooth 4790k and I was hoping to jump into the last am4 proc before the end of the year. )

Upvote 0 Downvote

Ajay

Lifer

Jun 23, 2021

#123

Insert_Nickname said:
Had the same feeling back when I got my Ryzen 3600. Back in 1999 32MB RAM was pretty common for mid-range systems. 20 years later, and that's the CPUs L3 cache. Now that's progress.

Got 32GB RAM to go along with it. The symmetry is beautiful, isn't it?

32MB?! In '98 I was running WinNT 4.0 w/256MB of RAM. I think I doubled that (on different machines/OSes) every year for ~4 years. I think I was spending more on RAM than I was spending on my CPU & motherboard.

Upvote 0 Downvote

I

Insert_Nickname

Diamond Member

Jun 24, 2021

#124

Ajay said:
32MB?! In '98 I was running WinNT 4.0 w/256MB of RAM. I think I doubled that (on different machines/OSes) every year for ~4 years.

You were lucky. Not all of us could afford that at the time. I only got to 96MB in my personal machine in '99, later 160MB in '00 (64+64+32MB PC100 SD). Had an MVP3-G2 in it. Never did manage to get a K6-III for myself, before I got an Athlon (600MHz, I think).

Ajay said:
I think I was spending more on RAM than I was spending on my CPU & motherboard.

Now, that I can believe. Memory was expensive.

Reactions: lobz

Upvote 0 Downvote

J

Joe NYC

Golden Member

Jun 27, 2021

#125

jamescox said:
Isn’t there already a Zen 3 bios showing some settings for up to 4 stacked cache (X3D) die? 8 chiplets * (32 MB+ (64 MB * 4)) = 2304 MB. Over 2 GB seems possible, but it may pull too much power for the general case. Might be limited to specialized enterprise applications (some types of HPC, high-end database servers, etc.).

It could also be a way of recreating 8 chiplet 72F3 (8 core 256MB L3) with 1 chiplet (8 core, 288MB L3. (with slightly different power, performance profile).

If one layer of extra L3 is $6 in die cost and another $6 in assembly, AMD could sell these for $50 to $100 per layer, with great margins.

It would be Intel way of thinking, to limit accessible technology to some high priests of an ivory tower - when Intel had the performance crown. Intel would hold back the technology and play various marketing / segmentation games.

AMD does not owe anything to anybody, does not need to hold back.

As far as power, there are several dimensions to that question:
- If SRAM is busy serving data, it is going to use power, but at fraction of power of what it would take to send the request and receiver response from main RAM
- If cores are kept fed with data faster (from L3 rather than RAM), they will use more power, but will also do more work.
- Idle power of extra L3 should be quite low.

So I doubt there is going to be a lot of power being wasted.

jamescox said:
edit: I guess that could be a fake. I don’t actually know the origin of that bios setting image, but a lot of people seem to accept it.

That was actually a BIOS of AMD Milan test platform, provided to reviewers during Milan launch Called Daytona.

Reactions: Vattila and bsp2020

Upvote 0 Downvote

You must log in or register to reply here.

Share:

Facebook X (Twitter) Reddit Tumblr WhatsApp Email Link

TRENDING THREADS

Discussion Intel current and future Lakes & Rapids thread
- Started by TheF34RChannel
- Jun 18, 2017
- Replies: 23K
CPUs and Overclocking
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)
- Started by DisEnchantment
- Sep 29, 2022
- Replies: 10K
CPUs and Overclocking
Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)
- Started by Vattila
- Oct 6, 2019
- Replies: 13K
CPUs and Overclocking
T
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads
- Started by Tigerick
- Aug 22, 2022
- Replies: 7K
CPUs and Overclocking
Discussion Apple Silicon SoC thread
- Started by Eug
- Nov 10, 2020
- Replies: 6K
CPUs and Overclocking

Top Bottom

This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.

Accept Learn more…