• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Question Speculation: RDNA2 + CDNA Architectures thread

Page 59 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
I really REALLY doubt these RDNA 128MB L2 cache rumors. I am not seeing the point of such a large cache for a gaming card. That much cache would be great on say, CDNA, because then you could keep a crap load of instructions in there for it to crunch away on. But gaming cards don't get big gains from big cache. The data stored changes too often. The cache just needs to be large enough to feed the GPU for a short time. Too small and it hitches, too big and you just waste die space.
 
Would a 128 MB on-die eDRAM L3 cache make any sense?
Interesting! That would definitely make much more sense than on-die SRAM (100 mm²) or off die eDRAM.

IBM after all managed to squeeze over a GB of eDRAM (L3 and L4) on their z15 CPU (695.75 mm² Glo-Fo 14nm).

The density what they got on Glo-Fo 14nm (0.0174 μm²) was over 2 time better than SRAM density on TSMC 7nm (0.027 μm²) so a hypothetical eDRAM L3 would probably take less than 50 mm² of die area:

sram-wt-14hp-edram.png



I'm no hardware engineer though so I'm not sure if it would make any sense on a GPU, let alone a consumer-only GPU
 
No, it doesnt. Microsoft scraped the idea with the Xbox One X.
Another good point. According to anandtech:

Xbox 360 had 10MB of eDRAM with 32GB/s bandwidth
Xbox One had 32MB of eSRAM with 102GB/s bandwidth

The bandwidth figures are really anemic, I wonder what number IBM got out of their chips? 100 GB/s is less than you get out of 8-channel DDR4. It would have to be nearer to 1TB/s to make any kind of sense.
 
No, it doesnt. Microsoft scraped the idea with the Xbox One X.
Not saying the 128GB cache is true, but don't mistake one implementation of an idea with the idea itself having no value. In engineering, details are often the difference between failure and success.
 
Another good point. According to anandtech:

Xbox 360 had 10MB of eDRAM with 32GB/s bandwidth
Xbox One had 32MB of eSRAM with 102GB/s bandwidth

The bandwidth figures are really anemic, I wonder what number IBM got out of their chips? 100 GB/s is less than you get out of 8-channel DDR4. It would have to be nearer to 1TB/s to make any kind of sense.

Is eDRAM always going to have anemic bandwidth, even on 7nm, and with 128MB vs 10-32MB, or did the MS engineers design those products to meet a certain performance target and not bother going any higher?
 
Dylan522p, from r/hardware...

"As total SRAM, that makes sense actually.

Microsoft claims XSX has a total 76MB SRAM for their 8C, 2SE, 56CU chip.

If we assume that 12MB comes from having extra WGP and the CPU, then we get to 64MB for 2SE and 5 WGP per SA.

Navi 21 is supposed to be 4 SE, so *2 and tada, there's 128MB SRAM."
 
Not saying the 128GB cache is true, but don't mistake one implementation of an idea with the idea itself having no value. In engineering, details are often the difference between failure and success.

Microsoft went from 68gb/s to 320gb/s off-chip. I think that is a statement that their approach with the One/S has failed. On-Chip ram must still be refreshed to hold the information and increasing the L2 cache will result in yields problems.
 
Is eDRAM always going to have anemic bandwidth, even on 7nm, and with 128MB vs 10-32MB, or did the MS engineers design those products to meet a certain performance target and not bother going any higher?
Quick googling lead me to this interesting article from 2014
Intel claims 102.4GB/s at 1W using their OPIO. IBM claims 3TB/s L3 bandwidth (across 12-cores @ 4GHz) with its on-chip eDRAM implementation. Intel’s implementation has provided clear power and performance benefits for its parts, but the IBM results show that there is still more to be had if that eDRAM is placed on chip.
So it should definitely be possible to get > 1TB/s bandwidth from eDRAM @ 2 Ghz, interesting!

I would still like to hear from some hardware guys whether such a large eDRAM cache would make any sense on a consumer GPU.
 
Microsoft went from 68gb/s to 320gb/s off-chip. I think that is a statement that their approach with the One/S has failed. On-Chip ram must still be refreshed to hold the information and increasing the L2 cache will result in yields problems.
I didn't see L2 cache, just cache. Also cache is easy to make defect resilient, being a simple repetitive structure.
 
So many comments are about AMD launching later than NVIDIA, so it means rdna2 is not as good as first though. On the contrary, if rdna2 was not good they would launch it on the back of Zen3, so positive news on zen3 would over shadow rdna2.

I think they want to see Ampere reviews and actual availability/prices before committing to saying anything.
 
Dylan522p, from r/hardware...

"As total SRAM, that makes sense actually.

Microsoft claims XSX has a total 76MB SRAM for their 8C, 2SE, 56CU chip.

If we assume that 12MB comes from having extra WGP and the CPU, then we get to 64MB for 2SE and 5 WGP per SA.

Navi 21 is supposed to be 4 SE, so *2 and tada, there's 128MB SRAM."
And Andrei from AT said in response, "It infers to a 128MB cache chiplet connected to the GPU. This chiplet would probably also see use in Zen3 CPUs."

I'm not sure how to reconcile this still...
 
Dylan522p, from r/hardware...

"As total SRAM, that makes sense actually.

Microsoft claims XSX has a total 76MB SRAM for their 8C, 2SE, 56CU chip.

If we assume that 12MB comes from having extra WGP and the CPU, then we get to 64MB for 2SE and 5 WGP per SA.

Navi 21 is supposed to be 4 SE, so *2 and tada, there's 128MB SRAM."

Typically I find that these "out of left field" rumors tend to carry weight. It reminds me of the recent peculiar rumor about the nvidia 12-pin connector or the "crazy bus" when Fiji first appeared in leaked benchmarks before we knew it used HBM. People that try to spread false rumors usually present something expected, to make the rumor plausible.
 
And Andrei from AT said in response, "It infers to a 128MB cache chiplet connected to the GPU. This chiplet would probably also see use in Zen3 CPUs."

I'm not sure how to reconcile this still...
I mean, the pieces of the puzzle kinda makes sense...
1- N21 have 16 slices of L2, this means 256 or 512 memory width... 512 seems very unlike, also we have that photo leak with256-bit

2- a small density increase, and this huge cache can fit with 80Cu... also, Ray Tracing loves huge caches

3- N21 support xGMI, we were all puzzled why...
 
And Andrei from AT said in response, "It infers to a 128MB cache chiplet connected to the GPU. This chiplet would probably also see use in Zen3 CPUs."

I'm not sure how to reconcile this still...

Hmm.

Hypothetical question....If both have the 128MB could these be part of the hybrid RT solution? Maybe a higher speed interconnect?

Would any of the following patents be applicable?

 
Last edited:
No, it doesnt. Microsoft scraped the idea with the Xbox One X.
it wasnt a cache, it was a scratch pad

Microsoft went from 68gb/s to 320gb/s off-chip. I think that is a statement that their approach with the One/S has failed. On-Chip ram must still be refreshed to hold the information and increasing the L2 cache will result in yields problems.

How does increasing sram or edram increase yield problems, they are like the easiest thing to make redundant .
 
Come on, AMD is not shipping a high end GPU with a 256bit bus GDDR6. They got burned time and again with memory weirdness, if they haven't learned their lesson by now, there's really no hope for them. It's probably Navi 22 or a Navi 21 cut down to 256bits that they are using as a placeholder for Navi 22 for their own internal reasons like deferring tape-out costs.
 
I would like a 512bit bus, like with Hawaii. At least for the top end, if not using HBM2E.
 
Blame their arrogance towards AMD.

Doesn't quite explain the last few times Nvidia lost to AMD. Granted, few and far in between... Intel didn't think AMD was worthy leading up until March 2017. I know Intel readied themselves for Bulldozer, but when it became apparent that it was a lost cause for AMD, they breathed a sigh of relief.
 
Blame their arrogance towards AMD.

Doesn't quite explain the last few times Nvidia lost to AMD. Granted, few and far in between... Intel didn't think AMD was worthy leading up until March 2017. I know Intel readied themselves for Bulldozer, but when it became apparent that it was a lost cause for AMD, they breathed a sigh of relief.
 
Back
Top