Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 949 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Jul 27, 2020
23,092
16,257
146
MALL only makes sense for GPUs.
Broadwell, Crystal Well etc. showed that it can matter for certain applications especially with limited data sets that need to do repetitive computations, like game engines.

But I think MALLs can matter more if a hardware/software profiling solution is developed so the same required data does not keep on getting evicted and reloaded over and over again just because there are short periods where that data isn't needed.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,067
2,504
136
Yeah but we need 512MB of that. At least I won't be satisfied till that happens
In general, the miss rate on a last level cache halves as the size of the cache quadruples. For example, if your hit rate on a 512Kb cache was 90%, your miss rate would be 10%. If you doubled that cache twice, to 2 MB, you would improve the miss rate to 5% and the hit rate to 95%. It would make a noticeable difference only in programs that have a hot working set that now fits in the expanded cache, but spilled before. Those are very general numbers for x86 code as the effect is still HIGHLY dependent on the hot working set size of each program.

Be aware that, for every doubling of cache size, you are going to introduce additional access latency as well as additional latency in any memory operations that result from a miss when seen from the program itself, OR, you will make the design of the cache more complex, taking up more area, resulting in additional product cost. Eventually, you just aren't making any useful impact in working set latencies and will have to resort to LOTS of predictive extra data loads from main memory to attempt to preload the cache with data that you think that the program will need next. This burns up a lot of energy making memory calls that are often unneeded.

I think that AMD is currently happy with their L3 cache ratio and may look to maintain that ratio into larger CCXs with respect to VCache packages.
 
Jul 27, 2020
23,092
16,257
146
Eventually, you just aren't making any useful impact in working set latencies and will have to resort to LOTS of predictive extra data loads from main memory to attempt to preload the cache with data that you think that the program will need next. This burns up a lot of energy making memory calls that are often unneeded.
This should be exposed as a BIOS option and let the users make that call. I personally have no issue burning a few extra watts for maximum performance.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,067
2,504
136
I vaguely remember from long ago that there were processors that had bios settings where you could turn cache prefetch on and off. It's been a minute, I've slept since then, and there may have been an alcohol or two in my system along the way, so that's about all I have at the moment.
 

MS_AT

Senior member
Jul 15, 2024
526
1,110
96
I vaguely remember from long ago that there were processors that had bios settings where you could turn cache prefetch on and off. It's been a minute, I've slept since then, and there may have been an alcohol or two in my system along the way, so that's about all I have at the moment.
It should be available on AM5. Usually the option can be found from AMD specifc menu but your mileage may vary, depending on the manufacturer.