Info 64MB V-Cache on 5XXX Zen3 Average +15% in Games

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Kedas

Senior member
Dec 6, 2018
355
339
136
Well we know now how they will bridge the long wait to Zen4 on AM5 Q4 2022.
Production start for V-cache is end this year so too early for Zen4 so this is certainly coming to AM4.
+15% Lisa said is "like an entire architectural generation"
 
Last edited:
  • Like
Reactions: Tlh97 and Gideon

Hitman928

Diamond Member
Apr 15, 2012
6,013
10,319
136
Regarding the building analogy, do we know how the data travel to this additional "floor"? Is it accessible at single place, like a floor would be via stairway/elevator shaft, or its connected on many points all over the surface of the chip?

There will be many TSVs for the routing between dies.
 

maddie

Diamond Member
Jul 18, 2010
4,871
4,934
136

jamescox

Senior member
Nov 11, 2009
644
1,105
136

This article indicates that the spacing between TSVs is 9 microns (9000 nm):


That is from last year though; these look like they may be closer, but it is hard to tell. The article talks about it going down to 0.9 microns, which is still 900 nm. It is a lot closer to being on die compared to micro-solder bump solutions (50 micron pitch), but it still is not the same as on die. It will be interesting to find out how this is logically arranged. The access latency is still very low even on 96 MB.
 
  • Like
Reactions: Elfear

DrMrLordX

Lifer
Apr 27, 2000
21,983
11,518
136
Anyone else getting a Pentium-Pro vibe?

Not entirely. I know what you're getting at - PPros had those on-chip 256k L2s in an era when, up to that point, L2 had been soldered onto the motherboard and ran at the bus speed rather than the CPU speed. Pentium Pro represented lots of other changes to the x86 world, however, that I don't see happening with v-cache-equipped Zen3. And honestly I don't expect the massive L3 of these v-cache CPUs to really outperform existing L3 in terms of bandwidth or latency. Socket 5 and Socket 7 L2 ran at 66 MHz in the days of Pentium Pro (100 mhz FSB Super 7 didn't come until later), whereas Pentium Pro L2 ran as high as 200 MHz, or higher if you were overclocking them (which some people did). L2 latency was hugely improved by PPro. I don't have numbers in front of me since that hardware is ancient by modern standards, but still. That on-board L2 of socket 5/socket 7 was slowwwww.
 
  • Like
Reactions: Tlh97 and krumme

Gideon

Golden Member
Nov 27, 2007
1,765
4,108
136
And honestly I don't expect the massive L3 of these v-cache CPUs to really outperform existing L3 in terms of bandwidth or latency.
It's true that latency will, if anything, be slightly worse, but AMD promised 2 TB/s bandiwidth. That has to be a considerable improvement (source):
index.php

https://www.guru3d.com/articles-pages/amd-ryzen-9-5900x-and-5950x-review,21.html
 

cytg111

Lifer
Mar 17, 2008
23,916
13,403
136
Not entirely. I know what you're getting at - PPros had those on-chip 256k L2s in an era when, up to that point, L2 had been soldered onto the motherboard and ran at the bus speed rather than the CPU speed. Pentium Pro represented lots of other changes to the x86 world, however, that I don't see happening with v-cache-equipped Zen3. And honestly I don't expect the massive L3 of these v-cache CPUs to really outperform existing L3 in terms of bandwidth or latency. Socket 5 and Socket 7 L2 ran at 66 MHz in the days of Pentium Pro (100 mhz FSB Super 7 didn't come until later), whereas Pentium Pro L2 ran as high as 200 MHz, or higher if you were overclocking them (which some people did). L2 latency was hugely improved by PPro. I don't have numbers in front of me since that hardware is ancient by modern standards, but still. That on-board L2 of socket 5/socket 7 was slowwwww.

Yea ok :). I was mearly hovering around "same cpu core and different cache configs" .. and on top of that, IIRC, ppro had a beefed FPU on top of that as well.
 
  • Like
Reactions: Tlh97 and Gideon

DrMrLordX

Lifer
Apr 27, 2000
21,983
11,518
136
It's true that latency will, if anything, be slightly worse, but AMD promised 2 TB/s bandiwidth. That has to be a considerable improvement (source):

Okay, fair point. Not sure how that additional bandwidth will affect things, but it should be interesting, especially if the branch prediction is good enough.

Yea ok :). I was mearly hovering around "same cpu core and different cache configs" .. and on top of that, IIRC, ppro had a beefed FPU on top of that as well.

It did, it did. I was pretty revolutionary for its time. In some ways, the Klamath PIIs were a step backwards.
 

DrMrLordX

Lifer
Apr 27, 2000
21,983
11,518
136
Until it became L3 thanks to K6-III.

Since we are going down history lane, you have to remember that the Super 7 boards were running 100 MHz fsb minimum, which was 50% faster than the original socket 5/socket 7 boards with on-board l2 cache running 66 MHz FSB. Latency and bandwidth both improved considerably. You'll also have to remember that older socket 7 boards had limitations on cacheable area (which were not well-documented at the time, but did exist). Those had been engineered out of Super 7 by the time of K6-III.

Just adding L3 to AMD's Super 7 lineup was a big deal in and of itself, but . . . can you imagine what v-cache would be like, if AMD halved the latency AND increased the size?
 

lightmanek

Senior member
Feb 19, 2017
406
856
136
Had an Epox MVP3-G5 with 2MB(!) L2 for work. Coupled with a K6-3 450MHz that thing was a complete beast.

It had a massive, for the time, 192MB RAM.

Yep, I too had 2MB L2 or with my K6-III 400MHz, L3 cache on board.
Funny that your total RAM at the time will soon be equal to an L3 cache on consumer desktop CPU :D

Kind of first test I will run on that thing will be Quake 2 and Quake 3 timedemos :D
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,786
136
It's true that latency will, if anything, be slightly worse, but AMD promised 2 TB/s bandiwidth. That has to be a considerable improvement (source):

Bandwidth on current Zen 3's L3 cache is no worse when taking into account capacity.

The earlier versions of AIDA64 had problems measuring it properly. With proper testing Zen 3's L3 will measure 1TB/s.
 

Insert_Nickname

Diamond Member
May 6, 2012
4,971
1,692
136
Funny that your total RAM at the time will soon be equal to an L3 cache on consumer desktop CPU :D

Had the same feeling back when I got my Ryzen 3600. Back in 1999 32MB RAM was pretty common for mid-range systems. 20 years later, and that's the CPUs L3 cache. Now that's progress. :D

Got 32GB RAM to go along with it. The symmetry is beautiful, isn't it?
 

zir_blazer

Golden Member
Jun 6, 2013
1,191
483
136
With Cache L3 sizes snowballing I'm actually rather dissapointed about AMD having removed CAR (Cache-as-RAM) support, as told by Coreboot documentation about Zen Picasso. CAR was a rather minor and mostly unknow feature pretty much used exclusively by Firmware during Hardware initialization, which could setup the Cache in CAR mode so that there could be some usable memory that isn't the Processor GPRs (General Purpose Registers) themselves before the DRAM Controller and the system RAM behind it are fully operational. The Cache L3 isn't a lot, yet I always found interesing the idea of someone being able of getting MS-DOS working without having any memory modules installed. With snowballing Cache sizes, it should be possible to run a small video framebuffer for APUs, too. I always thought that it could be very useful to run a system like that if you could use the Firmware to run some diagnostics tools from a self contained Processor without requiring having memory installed, and pretty much consider it a fully operational computer for as long that you didn't go beyond the limited Cache memory boundaries.
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
So if it's 36 mm^2 X2 for two stacks of additional cache to get 12% more performance in gaming only, it seems very inefficient. The additional 72 mm^2 of silicon should be costly, and since Zen 3 is 81 mm^2, 88.8% more silicon should be giving 37% more performance by the square root rule of thumb.
And we should call that The Hougy Coefficient from now on.
 

moinmoin

Diamond Member
Jun 1, 2017
5,063
8,025
136
With Cache L3 sizes snowballing I'm actually rather dissapointed about AMD having removed CAR (Cache-as-RAM) support, as told by Coreboot documentation about Zen Picasso. CAR was a rather minor and mostly unknow feature pretty much used exclusively by Firmware during Hardware initialization, which could setup the Cache in CAR mode so that there could be some usable memory that isn't the Processor GPRs (General Purpose Registers) themselves before the DRAM Controller and the system RAM behind it are fully operational. The Cache L3 isn't a lot, yet I always found interesing the idea of someone being able of getting MS-DOS working without having any memory modules installed. With snowballing Cache sizes, it should be possible to run a small video framebuffer for APUs, too. I always thought that it could be very useful to run a system like that if you could use the Firmware to run some diagnostics tools from a self contained Processor without requiring having memory installed, and pretty much consider it a fully operational computer for as long that you didn't go beyond the limited Cache memory boundaries.
Yeah, I remember when that news broke originally. Maybe AMD can make it work again, but my assumption is that AMD removed public support for this due to privately using CAR itself already for running AGESA/PSP and handling SCF with the extensive internal firmware that's likely running there.
 

soresu

Diamond Member
Dec 19, 2014
3,183
2,453
136
As someone whose first computer was a maxed out Atari 800 with a glorious 48K of RAM, I have to take issue with the claim that an L3 of tens of megabytes "isn't a lot" :laughing:
Even if Zen4 only increases EPYC4/TR5 core count to 96 we will actually see 1GB+ of SRAM cache in an x86 socket for the first time?