Let's not underestimate the "RT" part of things. By 2020 framebuffer caching and streaming and chunking is solved problem. Things like BVH expand the working set for chip in ways other "buffers", don't.Where I expect the gains to be less pronounced? Lower resolutions. At 1080p, it should only need 32MB to be just as useful. That extra 96 MB is just a drop in the bucket for texture caching. It might help in RT tasks.
You have "tree" structure that no matter how well you build it will have way more random access patterns than say framebuffer, vertex or texture block in memory. You are basically chasing tree node pointers around in memory and if full structure does not fit, bad things happen to the rest of the chip.
So AMD is throwing big chunk of SRAM to help their 1st gen RT hardware and other workloads, that is probably the reason why all chips have full 128MB ( even if SRAM is easier to yield, still a conscious decision to not nerf to say 96MB ).
The real question here is how much concurrency the chip can have between different workloads. NV can execute graphics + RT + tensors for DLSS at same time, can AMD 1st gen achieve something? Would be great to extract the most from these caches.
Also caching is a policy, there is some maneuvering a good driver team can do for performance for existing and new games ( like "let's pin BVH that is not more than size X for this game in cache" ).