Question Speculation: RDNA2 + CDNA Architectures thread

Page 76 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
There is crazy impressive research from AMD, that pretty much IS this patent....


Just a quote:
"We extensively evaluate our proposal across 28 GPGPU applications. Our dynamic scheme boosts performance by 22% (up to 52%) and energy efficiency by 49% for the applications that exhibit high data replication and cache sensitivity without degrading the performance of the other applications. This is achieved at a modest
area overhead of 0.09 mm2/core."
There are two, directly related Patents together.


I think we might be getting an idea of what that mythical already Infinity Cache might be...
 
  • Like
Reactions: Olikan and Krteq

Mopetar

Diamond Member
Jan 31, 2011
8,113
6,768
136
There are two, directly related Patents together.


I think we might be getting an idea of what that mythical already Infinity Cache might be...

I wouldn't get hopes up about seeing fancy new stuff. We saw the same posts about patents for cool technology when Vega was being discussed. Most of it never made it in and the magic drivers never materialized for most of it.

Maybe that's how they're getting performance uplifts now though. Older tech that's fallen off our radar or stuff that's fully baked finally.
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
Maybe that's how they're getting performance uplifts now though. Older tech that's fallen off our radar or stuff that's fully baked finally.

IMO, it's the only way (as we know) that gives big performance and power savings...

The latest big perf/watt increases, was Maxwell unification of SIMD/Wavefront sizes. With it, instructions scheduling was greatly simplified and registry pressure a thing of the past (cof Ampere cof)... well, RDNA1 already did that...
 

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
I wouldn't get hopes up about seeing fancy new stuff. We saw the same posts about patents for cool technology when Vega was being discussed. Most of it never made it in and the magic drivers never materialized for most of it.

Maybe that's how they're getting performance uplifts now though. Older tech that's fallen off our radar or stuff that's fully baked finally.
We also were not expecting Super-SIMD patents to appear with RDNA1, right? ;)
 

sirmo

Golden Member
Oct 10, 2011
1,014
391
136
I wouldn't get hopes up about seeing fancy new stuff. We saw the same posts about patents for cool technology when Vega was being discussed. Most of it never made it in and the magic drivers never materialized for most of it.

Maybe that's how they're getting performance uplifts now though. Older tech that's fallen off our radar or stuff that's fully baked finally.
Except this time none of what we know about the upcoming consoles would make sense if there weren't some significant improvement in the RDNA arch.

So I am personally excited and hope AMD can displace Nvidia at the top like AMD did with Intel. This would be great for consumers as Nvidia and their dirty tricks are getting stale.
 
  • Like
Reactions: Tlh97

Panino Manino

Senior member
Jan 28, 2017
869
1,119
136
I'm not liking all these rumors and the hype they may create.
It may result in AMD's crowing achievement or a disappointment bigger than Vega.

Don't we already have a good enough idea of how RDNA2 perform based on the new video games?
 

Kenmitch

Diamond Member
Oct 10, 1999
8,505
2,249
136
I'm not liking all these rumors and the hype they may create.
It may result in AMD's crowing achievement or a disappointment bigger than Vega.

Don't we already have a good enough idea of how RDNA2 perform based on the new video games?

I think about all we really know so far is what the cooler look like along with it'll perform better then RDNA.

Best to just take all the rumors/alleged leaks/fud/etc with a dose of salt and wait and see how it all pans out.
 
Mar 11, 2004
23,285
5,724
146
I would personally be pretty surprised if they have a large dedicated cache that isn't about mitigating chiplets, so my guess is that's much more related to the compute stuff (which doesn't mean we won't see it on gaming cards, just that I don't think it'll be across the stack or for anything but probably the highest end "Prosumer" stuff which would probably be $1000+). I could see it being part of an I/O chip, along with HBM on an interposer and so be limited to those cards.

Random out of nowhere guess, we might see a dual chip card for the very high end, where it won't offer as much as the compute does (where that can offer up to 2x performance if not higher than 2x what a single of those GPU would thanks to the cache), but it'll bring enough of an uplift that it'll be up with 3090/Titan/Ti. But get this, it might not be any more power hungry than the 3090 because it'll be optimized (meaning if a game can be worked to use both GPUs, it'll enable both but at lower clock speeds that keep power and heat in check, but other cases it'll just use the cache, HBM, and single GPU), with them allowing AIBs to sell unlocked versions (with extra beefy coolers - probably water).

I do also wonder if maybe one of the compute chips might be good for ray-tracing or for DLSS type of thing, so they could pair a gaming GPU with a compute one.

Since the one card shown had a USB-C connector, it makes me wonder if it might not be targeting VR, where they do a per eye rendering, and use the cache for shared stuff. I wonder if we might even see something interesting, like AMD release a PC version of the next Sony PSVR headset. AMD had talked about the new VR standard over USB-C, which Nvidia already abandoned (because none of the headset makers adopted it, but AMD might have a collaboration with Sony - which Sony is already talking about porting their 1st party games to PC - seems like they're starting to build the bridge towards streaming).

I don't think AMD should need the cache to compete with the 3080. I do think we'll see a 3090 competitive chip early next year, that's pushed very high, using HBM. and I think Nvidia will show off Ti versions and maybe a Titan, which then AMD will respond with the monster dual GPU card.

One reason I feel this way somewhat is that I think there's being work done to implement mGPU in games (in the way that DX12 was supposed to), as a means to getting to chiplets. Nvidia's recent announcement with regards to SLI, where their rationale is that developers now have the ability to do that to me is a signal of that. Just my personal opinion. I don't think it'll be a big boost (I'd guess avg boost over single GPU will be like 25-40%, with outliers being all over the place maybe a few edge cases where it gets a big boost, but for the most part, making it not worth the cost for most people).

I base this off of nothing but my own made up thoughts, I have no insider info or even a crystal ball and have been wrong multiple times (was wrong on Radeon VII, Navi, and arguably wrong on console predictions). Have your himalayan salt lamp on.
 

Kenmitch

Diamond Member
Oct 10, 1999
8,505
2,249
136
I would personally be pretty surprised if they have a large dedicated cache that isn't about mitigating chiplets, so my guess is that's much more related to the compute stuff (which doesn't mean we won't see it on gaming cards, just that I don't think it'll be across the stack or for anything but probably the highest end "Prosumer" stuff which would probably be $1000+). I could see it being part of an I/O chip, along with HBM on an interposer and so be limited to those cards.

Random out of nowhere guess, we might see a dual chip card for the very high end, where it won't offer as much as the compute does (where that can offer up to 2x performance if not higher than 2x what a single of those GPU would thanks to the cache), but it'll bring enough of an uplift that it'll be up with 3090/Titan/Ti. But get this, it might not be any more power hungry than the 3090 because it'll be optimized (meaning if a game can be worked to use both GPUs, it'll enable both but at lower clock speeds that keep power and heat in check, but other cases it'll just use the cache, HBM, and single GPU), with them allowing AIBs to sell unlocked versions (with extra beefy coolers - probably water).

I do also wonder if maybe one of the compute chips might be good for ray-tracing or for DLSS type of thing, so they could pair a gaming GPU with a compute one.

Since the one card shown had a USB-C connector, it makes me wonder if it might not be targeting VR, where they do a per eye rendering, and use the cache for shared stuff. I wonder if we might even see something interesting, like AMD release a PC version of the next Sony PSVR headset. AMD had talked about the new VR standard over USB-C, which Nvidia already abandoned (because none of the headset makers adopted it, but AMD might have a collaboration with Sony - which Sony is already talking about porting their 1st party games to PC - seems like they're starting to build the bridge towards streaming).

I don't think AMD should need the cache to compete with the 3080. I do think we'll see a 3090 competitive chip early next year, that's pushed very high, using HBM. and I think Nvidia will show off Ti versions and maybe a Titan, which then AMD will respond with the monster dual GPU card.

One reason I feel this way somewhat is that I think there's being work done to implement mGPU in games (in the way that DX12 was supposed to), as a means to getting to chiplets. Nvidia's recent announcement with regards to SLI, where their rationale is that developers now have the ability to do that to me is a signal of that. Just my personal opinion. I don't think it'll be a big boost (I'd guess avg boost over single GPU will be like 25-40%, with outliers being all over the place maybe a few edge cases where it gets a big boost, but for the most part, making it not worth the cost for most people).

I base this off of nothing but my own made up thoughts, I have no insider info or even a crystal ball and have been wrong multiple times (was wrong on Radeon VII, Navi, and arguably wrong on console predictions). Have your himalayan salt lamp on.

Well it is a speculation thread after all.
 

soresu

Diamond Member
Dec 19, 2014
3,230
2,515
136
Since the one card shown had a USB-C connector, it makes me wonder if it might not be targeting VR, where they do a per eye rendering, and use the cache for shared stuff. I wonder if we might even see something interesting, like AMD release a PC version of the next Sony PSVR headset. AMD had talked about the new VR standard over USB-C, which Nvidia already abandoned (because none of the headset makers adopted it, but AMD might have a collaboration with Sony - which Sony is already talking about porting their 1st party games to PC - seems like they're starting to build the bridge towards streaming).
It's more likely to be standard DP 2.0 alt mode with DSC, that makes Virtualink redundant anyways.

There's been enough time since the announcement for it to be integrated into RDNA2 dGPU's.
 

A///

Diamond Member
Feb 24, 2017
4,351
3,158
136
Don't we already have a good enough idea of how RDNA2 perform based on the new video games?
Only if you have direct access to the new consoles and can extrapolate data based on a third of what the supposed CU count is, minus downgrading of graphics because you wouldn't be running ultra levels on a console, and a custom Zen 2 APU that reportedly straddles between Zen+ and Zen 2 APUs, new decompression method, and more in a 300-350 watt box.
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
I haven't bothered to look into as much as you have, @jamescox but when Andrei mentioned it last week, he did state (from memory) that it would benefit their entire range of processors as a simple cache system, and it wouldn't be costly at scale.

Zen 4 could be larger. Ideally you wouldn't want it too large with too much space, as that will begin to affect the little things.


Sorry, are you referring to the patent going around with the crossbar in the middle? It's only suspected that is for CDNA and not CDNA and RDNA. It is a stepping stone towards MCM which is rumored to be slated for RDNA3.

I am just trying to figure out how such an “infinity cache” would make sense for both RDNA and Epyc. I wouldn’t put to much importance on such patents; it may be relevant and it may not. If it is just a kind of single ended infinity fabric connected cache (not pass through) , then it would need to be very configurable, so that patent may be related. It doesn’t seem like it makes sense that it is a single link device since that really isn’t much bandwidth for graphics. It could make a big difference in ray tracing and other compute stuff though. Ilike the idea that darkswordsman17 had about including a CDNA die. Perhaps the infinity cache is for a compute die That offloads certain task from the main RDNA gpu. I am not sure how single link devices would be used in Epyc. I guess you could just replace some CPU die with cache die. With two links, they could have the IO die on one side and the cpu die on the other. You would get big L4 caches, but only 32 cores. I don’t know if we will find anything out about this until RDNA reveal. The Zen 3 reveal may not include anything about this. I doubt that it would be used in a Ryzen 5000 consumer chip or ThreadRipper.
 

A///

Diamond Member
Feb 24, 2017
4,351
3,158
136
I am just trying to figure out how such an “infinity cache” would make sense for both RDNA and Epyc. I wouldn’t put to much importance on such patents; it may be relevant and it may not. If it is just a kind of single ended infinity fabric connected cache (not pass through) , then it would need to be very configurable, so that patent may be related. It doesn’t seem like it makes sense that it is a single link device since that really isn’t much bandwidth for graphics. It could make a big difference in ray tracing and other compute stuff though. Ilike the idea that darkswordsman17 had about including a CDNA die. Perhaps the infinity cache is for a compute die That offloads certain task from the main RDNA gpu. I am not sure how single link devices would be used in Epyc. I guess you could just replace some CPU die with cache die. With two links, they could have the IO die on one side and the cpu die on the other. You would get big L4 caches, but only 32 cores. I don’t know if we will find anything out about this until RDNA reveal. The Zen 3 reveal may not include anything about this. I doubt that it would be used in a Ryzen 5000 consumer chip or ThreadRipper.
That's the thing. Ever since I spotted Andrei's post I've been mulling over it. Makes sense for static data or raytrace material. Isn't so useful for constant switch of live data. But even then, if you take that out the then rumored 256 bit bus makes even less sense. Why would you, if that was the big Navi, cut it off at the kneecaps. My only suggestive reasoning is because RTG was assigned a slew of Zen engineers to help or takeover. This could all also be a smoke screen by AMD. Those were heat sensitive labels on that leak photo we saw. This whole charade is bizarre.

I find it a little bizarre that we've barely heard anything about either of AMD's upcoming products other than what they willingly have said, and yet we're supposed to believe that someone at AMD's overseas development centers leaked a photo of their flagship card that's coming out in a month?

Or those photos Jason from Jayz made a video on... If it's a mockup, it's done in-house. Even the renders officially released are a bit of a stretch.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,706
1,233
136
5529x4435 = 251 mm2
532x2948 = 16.054 mm2 <= 8 CUs
8 CUs * 5 = 80.268 mm2\251 mm2 = 3.127x 40 CUs to rest of chip.
^-- Navi 10

580x434 = 197.05 mm2
60x359 = 16.861818687 mm2 <= 8 CUs
8 CUs * 10 = 168.618186874 mm2
^-- Van Gogh/Mero
w/ Navi10 rate => 527.2691114 mm2

1797x1323 = 360.4 mm2
1433x156 = 33.888133536 <= 16 CUs
16 CUs * 5 = 169.440667679 * 3.127x
^-- Arden
w/ Navi10 rate => 529.840967832 mm2

>505 mm2 probably if DUV.
476.856871052 mm2 for 7nm EUV.
7nm+ is inline with 505 with added components

Min bound: 190.742748421(Alchips) Upper bound: 317.904580701(Marvel-more likely) mm2 for 5nm EUV.
5nm is inline with a Tahiti/Tonga/Vega20 die size => 352 mm2/359 mm2/331 mm2

Tested it against the other Navi die => 160.9416868 mm2 which isn't far from its actual 158 mm2 die size. However, with Navi10 selections it is 150 mm2.

Which for 5nm it is inline with Arcturius's 400 ~ 450 mm2 guess for 128 CUs.

Q2 2017 = 7nm risk production // ~90 masks <- 2nd fastest ramp
Q3 2018 = 7nm+ risk production // ~80 masks <- No ramp
Q1 2019 = 5nm risk production // ~70 masks <- Fastest ramp, highest yield.

Tue May 15 14:59:32 UTC 2018 => drm/amdgpu: Add vega20 pci ids
April 2017(v1.0 pdk) to May 2018 => 13 months <== N7
Mon Jun 17 19:26:04 UTC 2019
=> 26 months <== However, it is N7P which launched July 2019. With most tapeouts occurring: "alternatively there is N7P - an improved N7. TSMC had already announced the tape-out of an unknown chip in October 2018." - in regards to N7P and A13. - 11. Februar 2019, 11:00 Uhr

Tue Sep 15 18:24:09 UTC 2020 => drm/amdgpu: add device ID for sienna_cichlid (v2)
June 2018(v1.0 pdk) to September 2020 => 27 months <== N7+
March 2019(v1.0 pdk) to September 2020 => 18 months // to June 2020 => 15 months <== N5
¯\_(ツ)_/¯

If 7nm/7nm+, they are off schedule for 5nm GPUs.
 
Last edited:

Mopetar

Diamond Member
Jan 31, 2011
8,113
6,768
136
Except this time none of what we know about the upcoming consoles would make sense if there weren't some significant improvement in the RDNA arch.

I was mainly talking about some of the very Specific examples that get brought up in these theads.

AMD is still making little improvements all the time and a huge uplift isn't hard to expect for a second generation card. There's always a lot of low hanging fruit to pick on newer architectures.

NVidia got a lot of RT uplift from improving their design instead of just throwing more hardware at the problem. There's no reason to think AMD couldn't develop better implementations for existing tech they already use. Not everything has to be due to some brand new technologies we've only just uncovered patents for.
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
That's the thing. Ever since I spotted Andrei's post I've been mulling over it. Makes sense for static data or raytrace material. Isn't so useful for constant switch of live data. But even then, if you take that out the then rumored 256 bit bus makes even less sense. Why would you, if that was the big Navi, cut it off at the kneecaps. My only suggestive reasoning is because RTG was assigned a slew of Zen engineers to help or takeover. This could all also be a smoke screen by AMD. Those were heat sensitive labels on that leak photo we saw. This whole charade is bizarre.

I find it a little bizarre that we've barely heard anything about either of AMD's upcoming products other than what they willingly have said, and yet we're supposed to believe that someone at AMD's overseas development centers leaked a photo of their flagship card that's coming out in a month?

Or those photos Jason from Jayz made a video on... If it's a mockup, it's done in-house. Even the renders officially released are a bit of a stretch.

I haven’t really read into this that much from the GPU side. My interest is almost entirely about how it could be used In Epyc. A lot of places have a vendor lock-in because their entire code base is in CUDA. CPUs can actually be switched much more easily; my work might be getting an Epyc test system soon with Nvidia gpu though. Nvidia using Epyc in their DGX A100 systems was, I think, a big wake-up call. Intel just does not have a suitable solution for Nvidia DGX without a lot of compromise. They don’t have PCI-e 4.0 and it probably would have taken 4 Intel Xeons to connect all of the GPUs. Going up to a 4 cpu board is generally avoided. The board size and expense just gets out of hand. With Epyc, Nvidia gets very fast access to 16 DDR4 channels with dual processors.

For this cache rumor, it may be a lot more useful for rasterization than people expect. Modern GPUs have switched to essentially tile based architectures. Nvidia seems to have switched with maxwell. For AMD it was Vega with their draw stream binning rasterizer:


If infinity cache allows them to cache a lot more of the data needed for a bin (tiles essentially) in addition to the probably large on die caches, then it could increase performance significantly. It may allow them to use larger bins (tiles) which would be more efficient also; less overhead. It would reduce the external memory bandwidth required, so 256-bit GDDR6 may be plenty.

AMD has done a lot of work adding CPU style virtual memory management for GPUs. This allows much more efficient use of memory since allocations do not need to be contiguous. They can satisfy an allocation as long as they have enough pages available and those pages can be mapped anywhere in physical memory. The ability to swap pages out to cpu memory (GPU memory viewed as cache) may be useful if an application uses more gpu memory than available. Most applications are going to try to load everything they *might* need into gpu memory. A lot of that will not actually be used at any given time. In the link above, only about half the allocated memory was used. A page based system can just allocate a virtual address range and then only use pages when something is actually copied into that memory. If nothing is copied into it, then a real Page is never created. If it isn’t accessed, then it can be pushed out to secondary storage if there more demand for gpu memory than can be satisfied.

It will be interesting to see how this rumored cache would work. I would expect it will be some size cache line rather than caching whole pages, but I don’t know how big AMD gpu pages are. CPU memory is usually 32-byte cache lines and 4KB pages. I have seen cases where the 4KB pages hurt performance significantly, but the newer 2MB pages seem to still not be supported as well as they could be under Linux. I would think that a 32-byte cache line would be rather small for a gpu, although some types of things may have limited locality. Even for CPUs, a 32-byte line might be getting a bit small. With AVX256, 32 bytes is just one operand and FMA takes 3 operands. A larger size may be indicated. When these things move to being on an interposer, they may be transferring 1024-bits (128 bytes) per clock or more, so they may want to move to a larger than 32-byte line.

Given AMDs‘s work with optimizing the memory system, they may be able to make much better use of the cache hierarchy, both on chip and off. They may be using some of the same cache designs or ideas across cpu and gpu caches, so some of that knowledge may come from their cpu designs and cache coherent infinity architecture. Nvidia wants their own CPU architecture such that they can offer a complete solution like Intel and AMD are offering for some of the next generation super computers, hence the purchase of ARM. Now I am wondering if part of the ARM purchase is to get IP and engineers with virtual memory management experience. I know Nvidia has some features for letting the system automatically manage memory, but I don’t know how that compares to AMD’s virtual memory management. I thought the Nvidia solution was a driver level thing rather than hardware level virtual MMU.

We may not have seen much about this since the cache chip would probably be on the gpu package rather than on the board. You would, at a minimum, need to pull the cooler. With multiple chips in the package, they may have put a lid to protect the die or handle slightly different height chips. Removing the heat sink may not tell you anything.
 

Gideon

Golden Member
Nov 27, 2007
1,774
4,145
136
Navy Flounder has 40 CUs and 192 bit bus.

Its that 12 GB GPU from that Rogame's twitt regarding the VRAM buffers.
Wait, that's Navi 22 (the GPU previously rumored to have 60CUs) and the chip that should presumably go against GA104?

If that's the case, I Highly doubt Navi 21 doubles the CUs from that.

EDIT:
never mind
  • Sienna Cichlid: 2 SEs x 2 SHs x 10 CUs = 40 total CUs
  • Navy Flounder: 4 SEs x 2 SHs x 10 CUs = 80 total CUs
 
Last edited:
  • Like
Reactions: Tlh97 and Elfear

Devilek

Junior Member
Sep 10, 2020
1
0
6
Seems legit :) -> this Navi 22 (Navy Flounder) will be released next year right? Because rumors told us something about Navi 21 = 2020, everything else 2021... On the other hand, 12GB card could be released in 2020 right? (I am confused now...)
 

Viking Warrior

Junior Member
Aug 25, 2020
4
3
41
I come across a diagram of Big Navi showing the die in great detail.I would like opinions of these numbers.

4 shader engines 0-3,10 WGP's per 20 CU's
80 dual CU 160 TMU 64 ROP's ALU's 5120

Ray tracing accelerator built into Texure processor

L0 cache 16kb x 160 2560kb
L1 cache 128kb x 8 1024kb
L2 cache 256kb x 24 6144kb

RAM 2 X 16 Bit x 12
24 GB GDDR6
384 bit bus 768 GBs
 
Last edited:

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
Seems legit :) -> this Navi 22 (Navy Flounder) will be released next year right? Because rumors told us something about Navi 21 = 2020, everything else 2021... On the other hand, 12GB card could be released in 2020 right? (I am confused now...)

All we know is that Lisa said RDNA2 would launch this year. One could assume its at minimum Navi20, but we will find out next month.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,522
3,037
136
I come across a diagram of Big Navi showing the die in great detail.I would like opinions of these numbers.

4 shader engines 0-3,10 WGP's per 20 CU's
80 dual CU 160 TMU 64 ROP's ALU's 5120

Ray tracing accelerator built into Texure processor

L0 cache 16kb x 160 2560kb
L1 cache 128kb x 8 1024kb
L2 cache 256kb x 24 6144kb

RAM 2 X 16 Bit x 12
24 GB GDDR6
384 bit bus 768 GBs
The number of TMUs and ROps is the same as RX5700XT so what you wrote is wrong. Maybe you could provide a link to that diagram so we can have a look.