Question Is using "Infinity Cache" for Navi 2X a good move?

TheRookie · Oct 28, 2020

AMD's previous flirting with exotic memory has been a disaster thanks to high prices and limited availability (think HBM2 in Vega and HBM in Fiji).

With Navi 2X using plain old GDDR6, AMD can get it from anywhere and inexpensively.

NVIDIA is stuck with getting relative expensive GDDR6X memory from Micron.

On the other hand, Infinity Cache has likely ballooned the size of the die and TSMC's 7nm is capacity constraint.

beginner99 · Oct 30, 2020

Bouowmx said:
I am initially skeptical on why a GPU would need very large cache, as the parallelism of GPUs hides memory latency. The die area used could have been used for more cores instead.

The cache is the first step into chiplets. A GPU based on chiplet simply needs an on-chiplet cache because every memory access goes through the IO die meaning higher latency and higher power use. It's the same thing AMD did with Zen2 with the large L3.

On top of that cache scales very well with process node (in contrast to bus size) and it at least partially if not fully solves the iGPU bandwidth problem. Of course there are trade-offs and looking just at N21 I can understand that it might not be worth it but in the bigger pictures, there is probably no way around it.

IntelUser2000 · Oct 30, 2020

Tup3x said:
What I'm interested to see is if it might backfire in FPS consistency. Average FPS numbers don't tell the whole story. In any case, interesting month ahead.

Yea two of the major releases were slighly under the hyped expectations. Xe LP graphics in Tigerlake and Ampere. Both claimed 2x, but both are up to figures, and sometimes you need to add in architecture specific optimizations like VRR or Ray Tracing to get there.

Nothing other than multiple 3rd party reviews and many games tested across various test settings will show the truth.

The marketers are really, really good at hyping things up nowadays. What happened to under-promise, over-deliver?

coercitiv · Oct 30, 2020

The cache is a cost and power saving feature. It allows AMD to accomplish 3 things:

reduce BOM relative to competition, which could lead to more aggressive pricing next year as supply improves
reduce footprint and power in mobile solutions, this is where RDNA2 will shine and AMD will have a winning combo in 2021
maximize availability as they're using less exotic memory than the competition (they essentially switched places since the Vega /w HBM era)

There must be some (performance) downsides too, and I'm sure Nvidia will be hard at work pointing them out to the press. Just hit F5 on review day.

Heartbreaker · Oct 30, 2020

coercitiv said:
The cache is a cost and power saving feature. It allows AMD to accomplish 3 things:

reduce BOM relative to competition, which could lead to more aggressive pricing next year as supply improves

reduce footprint and power in mobile solutions, this is where RDNA2 will shine and AMD will have a winning combo in 2021

maximize availability as they're using less exotic memory than the competition (they essentially switched places since the Vega /w HBM era)

There must be some (performance) downsides too, and I'm sure Nvidia will be hard at work pointing them out to the press. Just hit F5 on review day.

I am surprised how well the cache works. If this works that well, why wouldn't this be done before?

Assuming it holds up like AMD shows, then this is the breakthrough feature for Big Navi. As you say, this enables usage of lower power, lower cost memory system. It's win-win.

IMO, AMD catching NVidia, is more impressive than AMD, being ahead of Intel. Intel has been asleep at the wheel, endlessly fumbling and failing, but NVidia execution has been quite consistently moving forward, for AMD to catch right up, and even have a serious new feature advantage (this cache) is fantastic work.

In the 14 years since AMD acquired ATI, they have never been anywhere near this close to the top of both CPU and GPU...

maddie · Oct 30, 2020

guidryp said:
I am surprised how well the cache works. If this works that well, why wouldn't this be done before?

Assuming it holds up like AMD shows, then this is the breakthrough feature for Big Navi. As you say, this enables usage of lower power, lower cost memory system. It's win-win.

IMO, AMD catching NVidia, is more impressive than AMD, being ahead of Intel. Intel has been asleep at the wheel, endlessly fumbling and failing, but NVidia execution has been quite consistently moving forward, for AMD to catch right up, and even have a serious new feature advantage (this cache) is fantastic work.

In the 14 years since AMD acquired ATI, they have never been anywhere near this close to the top of both CPU and GPU...

Said about almost every breakthrough ever.

majord · Oct 30, 2020

guidryp said:
I am surprised how well the cache works. If this works that well, why wouldn't this be done before?

Assuming it holds up like AMD shows, then this is the breakthrough feature for Big Navi. As you say, this enables usage of lower power, lower cost memory system. It's win-win.

IMO, AMD catching NVidia, is more impressive than AMD, being ahead of Intel. Intel has been asleep at the wheel, endlessly fumbling and failing, but NVidia execution has been quite consistently moving forward, for AMD to catch right up, and even have a serious new feature advantage (this cache) is fantastic work.

In the 14 years since AMD acquired ATI, they have never been anywhere near this close to the top of both CPU and GPU...

Certainly not both together, no. Though they held a significant GPU advantage while waiting for fermi. CPU's wern't exactly terrible either during that time.

Mopetar · Oct 30, 2020

With previous nodes 128 MB of cache would have taken up more space and driven up the cost of the die more relative to what it does now. Perhaps they could have used a smaller cache, and I don't know if 128 MB is t he magic number, but there's likely a lower bound where such cache is too small to be effective.

To some degree the memory controllers don't shrink as well as other circuitry because it's a physical interface that needs to match up physically with an actual memory bus. It's part of the reason AMD could fab the IO die for Zen at GF on an old node. Moving it to 7nm wouldn't decrease the size much.

Eventually the cache getting larger for the die are used and the not much smaller on the new node memory controllers intersect and what may not have been a feasible approach in the past has become a viable option in the present.

DJinPrime · Oct 30, 2020

Hitman928 said:
AMD gave infinity cache hit rate measurement in one of their footnotes.

AMD Unveils Next-Generation PC Gaming with AMD Radeon™ RX 6000 Series – Bringing Leadership 4K Resolution Performance to AAA Gaming

– Groundbreaking AMD RDNA™ 2 gaming architecture delivers up to 2X higher performance1 and up to 54 percent higher performance-per-watt compared to AMD...

www.globenewswire.com

Thanks Hitman928!

According to the note:
Measurement calculated by AMD engineering, on a Radeon RX 6000 series card with 128 MB AMD Infinity Cache and 256-bit GDDR6. Measuring 4k gaming average AMD Infinity Cache hit rates of 58% across top gaming titles, multiplied by theoretical peak bandwidth from the 16 64B AMD Infinity Fabric channels connecting the Cache to the Graphics Engine at boost frequency of up to 1.94 GHz. RX-547

58% hit rate means that 42% of the time, it will have to go to the GDDR6. I don't know enough about graphic processing to know what impact that will have, 60% super fast, ~~then~~ 40% slow. I also assume AMD wouldn't use it if the negatives out weights the positives. Interesting stuff.

ElFenix · Oct 30, 2020

better than an infinity guantlet

TESKATLIPOKA · Oct 30, 2020

So 128MB has 58% hit rate in 4K resolution. What hit rate would It be with only 96MB, 64MB or 32MB size. Also at smaller resolution the hit rate should increase.

Hitman928 · Oct 30, 2020

DJinPrime said:
Thanks Hitman928!

According to the note:
Measurement calculated by AMD engineering, on a Radeon RX 6000 series card with 128 MB AMD Infinity Cache and 256-bit GDDR6. Measuring 4k gaming average AMD Infinity Cache hit rates of 58% across top gaming titles, multiplied by theoretical peak bandwidth from the 16 64B AMD Infinity Fabric channels connecting the Cache to the Graphics Engine at boost frequency of up to 1.94 GHz. RX-547

58% hit rate means that 42% of the time, it will have to go to the GDDR6. I don't know enough about graphic processing to know what impact that will have, 60% super fast, then 40% slow. I also assume AMD wouldn't use it if the negatives out weights the positives. Interesting stuff.

I'm no GPU (or even digital design in general) expert, but as AMD points out, that makes their average bandwidth much higher than going with just GDDR6 memory, even at wider interfaces. I would imagine that as long as you aren't getting many cache misses in a row, you should be fine and with the average cache hit rate being almost 60%, I don't think it will be a problem, but I'm looking forward to more deep dives and benchmark numbers.

VirtualLarry · Oct 30, 2020

Hitman928 said:
I'm no GPU (or even digital design in general) expert, but as AMD points out, that makes their average bandwidth much higher than going with just GDDR6 memory, even at wider interfaces. I would imagine that as long as you aren't getting many cache misses in a row, you should be fine and with the average cache hit rate being almost 60%, I don't think it will be a problem, but I'm looking forward to more deep dives and benchmark numbers.

It should be VERY interesting, what the frametime consistency is on some popular games (at 4K?), with this sort of InfinityCache setup. Surely, if it's not "perfect", then NVidia will use that as a marketing point for their cards. But as far as I'm concerned (pending multiple 3rd-party benchmarks), AMD seems to be the winner of the GPU crown this generation.

Also curious about the "Smart Access Memory" and "Rage Mode", too.

But it's insightful, that AMD chose to go with "more raster horsepower" effectively, rather than double-down on "RT" like NVidia did. Though, I do look forward to possibly playing CyperPunk 2077 using some form of RT... if I can remember my Steam password.

Hitman928 · Oct 30, 2020

VirtualLarry said:
It should be VERY interesting, what the frametime consistency is on some popular games (at 4K?), with this sort of InfinityCache setup. Surely, if it's not "perfect", then NVidia will use that as a marketing point for their cards. But as far as I'm concerned (pending multiple 3rd-party benchmarks), AMD seems to be the winner of the GPU crown this generation.

Also curious about the "Smart Access Memory" and "Rage Mode", too.

But it's insightful, that AMD chose to go with "more raster horsepower" effectively, rather than double-down on "RT" like NVidia did. Though, I do look forward to possibly playing CyperPunk 2077 using some form of RT... if I can remember my Steam password.

I don't think frame time consistency will be an issue, unless you hit a spot where there's a ton of misses forcing a large amount of data to come from RAM. With this system, even when you have to go to VRAM, it's not like it's going to be slow all of a sudden, it will be the same speed VRAM as it would have been otherwise. What you've done is reduced the traffic flowing between GPU and VRAM so you don't need nearly as wide of lanes. The traffic that does flow still flows at the same rate as without the cache though, so if that speed was going to be an issue, it would be an issue with or without the cache.

naukkis · Oct 30, 2020

Bouowmx said:
I am initially skeptical on why a GPU would need very large cache, as the parallelism of GPUs hides memory latency. The die area used could have been used for more cores instead.

If cache hit rate is 50% effective bandwith from memory interface is doubled as only other half of accesses need to go to memory.

But decreased latency for those half of memory accesses also did increase GPUs IPC, so it became better solution than equal GPU with double memory bandwith.

And as AMD measured, cache hit ratio is well over 50% so they would have needed both 512 bit GDDR6-interface and more execution units to have comparable performance, so it sure look that they use that die space wisely.

GodisanAtheist · Oct 30, 2020

I think Infinfity Cache was a strategic decision bouyed by AMD's push into APUs. Integrated GPUs will always be hamstrung by low bandwidth system RAM, but if the GPU arch (and potentially the CPU as well?!) can all hop into a giant pool of cache instead of making *very* slow trips out to system memory then a major bottleneck gets removed.

Infinity Cache is going to be the way forward for AMD's APUs to really shore up the bottom end of the market in a big way and pull away from Intel's offerings.

This has huge ramifications for gaming/productivity work on entry level laptops...

zlatan · Oct 30, 2020

TESKATLIPOKA said:
So 128MB has 58% hit rate in 4K resolution. What hit rate would It be with only 96MB, 64MB or 32MB size.

Worse. Hit rate will go up with bigger cache size, and/or lower resolution.

zlatan · Oct 30, 2020

VirtualLarry said:
It should be VERY interesting, what the frametime consistency is on some popular games (at 4K?), with this sort of InfinityCache setup. Surely, if it's not "perfect", then NVidia will use that as a marketing point for their cards. But as far as I'm concerned (pending multiple 3rd-party benchmarks), AMD seems to be the winner of the GPU crown this generation.

The frametime will be generally better, because there are a lot of shader codes when it is not really possible to optimize for a good resource allocation. In this case there won't be enougth warp/wavefront to keep the ALUs busy. The data access latency is simply too high. With Infinity Fabric the hit rate is high enough to keep the ALUs busy from cache, and this leeds to more consistent frametime.

GaiaHunter · Oct 30, 2020

GodisanAtheist said:
I think Infinfity Cache was a strategic decision bouyed by AMD's push into APUs. Integrated GPUs will always be hamstrung by low bandwidth system RAM, but if the GPU arch (and potentially the CPU as well?!) can all hop into a giant pool of cache instead of making *very* slow trips out to system memory then a major bottleneck gets removed.

Infinity Cache is going to be the way forward for AMD's APUs to really shore up the bottom end of the market in a big way and pull away from Intel's offerings.

This has huge ramifications for gaming/productivity work on entry level laptops...

Actually I wonder if the smart memory access isn't something trying to solve that from the other side.

Big shared caches between CPUs and GPUs as well as shared fast memory. Maybe a nightmare from the software side though.

GaiaHunter · Oct 30, 2020

guidryp said:
I am surprised how well the cache works. If this works that well, why wouldn't this be done before?

Well a part of performance improvement of GPUs have been better caches, although neither nvidia or amd have been very open about the details and amounts.

Maybe it is just the case that finally it makes more sense to have a big chunk of cache instead of a wider bus or more exotic forms of vram like hbm or gddr6x, in terms of transistor and/or power and/or die area.

Heartbreaker · Oct 30, 2020

GodisanAtheist said:
I think Infinfity Cache was a strategic decision bouyed by AMD's push into APUs. Integrated GPUs will always be hamstrung by low bandwidth system RAM, but if the GPU arch (and potentially the CPU as well?!) can all hop into a giant pool of cache instead of making *very* slow trips out to system memory then a major bottleneck gets removed.

Infinity Cache is going to be the way forward for AMD's APUs to really shore up the bottom end of the market in a big way and pull away from Intel's offerings.

This has huge ramifications for gaming/productivity work on entry level laptops...

OTOH, the die size penalty might be too large for an APU, to include a large cache, and GPU performance of an APU is not as critical as some people would like to think.

AmericanLocomotive · Oct 30, 2020

guidryp said:
I am surprised how well the cache works. If this works that well, why wouldn't this be done before?

Well we don't know if there is a minimum size for the cache to be effective. We don't know how it scales. So in today's 4K world you might need 128MB, but with "yesterday's" 1080p you might still need 96MB for it to be effective.

The other major issue is die size. Keep these two things in mind:

SRAM cache requires 6 transistors per bit. DRAM on the other hand requires 1 transistor and 1 capacitor. This means the 128MB of LLC Navi2 has requires 6.4 billion transistors by itself. More transistors than the entire Radeon R9-290X core from just ~7 years ago.
I/O (DRAM, display outputs, etc...) requires a lot of die space. You need "big" transistors that take up lots of space, that are capable of driving lots of current (relatively speaking) to overcome the capacitance of the PCB traces and connections.

So with past processes (28nm, 40nm, etc..) your I/O transistors were probably "similar" in size to the surrounding logic (or at least did not take up a comparatively large amount of space). But as we start getting down into these really small process nodes, the I/O transistors more or less actually need to remain the same size in order to able to handle the currents necessary for high-speed data communication. This only gets worse when you start using extremely high speed exotic memories like GDDR6X. So at some point, AMD would have been weighing choices, and determined that they could squeeze in a lot more performance per unit of die area with SRAM Cache vs. a wider memory bus. It would also save a ton of power that could be budgeted towards higher core clocks.

Why didn't nvidia do that? Who knows. One possible reason is a process disadvantage. GA102 has 28.3 billion transistors - just 5.5% more than the 6800XT, but over 20% larger die area. So a massive SRAM cache may not have been possible given their process constraints. Or maybe they just didn't think about it. Or maybe they just don't have the skill set yet for working with massive caches. Starting with Zen1, AMD has learned a lot about how to engineer and manufacture extremely large, extremely high performance caches. They said they're leveraging their Zen cache design after all. How much of that is truth vs marketing fluff? I don't know.

DJinPrime said:
58% hit rate means that 42% of the time, it will have to go to the GDDR6. I don't know enough about graphic processing to know what impact that will have, 60% super fast, ~~then~~ 40% slow. I also assume AMD wouldn't use it if the negatives out weights the positives. Interesting stuff.

I wouldn't say it means 40% of the accesses are "slow". It's more like the cache reduces the need for a super wide memory bus because it handles ~60% of the data requests.

Gloomy · Oct 30, 2020

I wonder how difficult it would be to thrash 128MB of cache. You'd have to be really trying I bet.

TheRookie · Oct 31, 2020

The cache has a hit rate of 58%.

Consequently, 42% of the time, it has to access the GDDR6 memory.

That means that it has the effective bandwidth of 256b/0.42 = 609.52b effective bandwidth

Now 609.52b/256b = 2.38X effective bandwidth

According to AMD, the effective bandwidth is 2.17X

I have no idea how AMD came up with that number.

Schmide · Oct 31, 2020

AMD has always been the king of cache, except for when it wasn't. Back in the day they ran the whole exclusive L1 L2 thing when others were playing the victim cache. Then tried to do too much and bulldozed the cache to the same speed as main memory. Well they seem to have figured out complexity and speed in their 5k and 6k parts.

LightningZ71 · Oct 31, 2020

In my opinion, AMD targeted 128MB to adequately fit all the frame buffers for 4K rendering. I read somewhere that a significant percentage of vram access cycles were for the frame buffers. So, having the frame buffer in a high speed cache certainly reduces the load on the memory controllers drastically.

in addition, reducing memory controller count makes a big savings in die area. It doesn’t matter how you write a die once you are in mass production. Assuming that what you’re asking is within the node’s capabilities, it only matters how many die you can produce per wager, and how capable they are once complete. So, AMD gets to use a node advantage to pack in more circuits in a smaller area, and also gets away with using fewer and cheaper memory ICs.

Where I expect the gains to be less pronounced? Lower resolutions. At 1080p, it should only need 32MB to be just as useful. That extra 96 MB is just a drop in the bucket for texture caching. It might help in RT tasks.

Question Is using "Infinity Cache" for Navi 2X a good move?

Junior Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Member

Elite Member

Platinum Member

Diamond Member

No Lifer

Diamond Member

Golden Member

Diamond Member

Senior member

Senior member

Diamond Member

Diamond Member

Diamond Member

Member

Golden Member

Junior Member

Diamond Member

Platinum Member