Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,720
136
1655034287489.png
1655034259690.png

1655034485504.png

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it :grimacing:

This is nuts, MI100/200/300 cadence is impressive.

1655034362046.png

Previous thread on CDNA2 and RDNA3 here

 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,612
1,810
136
This is nuts, MI100/200/300 cadence is impressive.
I'd say that a concrete plan for at least this much was worked out well when the CDNA/RDNA branching was agreed as necessary.

It will be interesting to see though if CDNA continues to hold to its GFX9/Vega roots or whether it will see something more revolutionary going forward to CDNA4+, assuming that GFX940 is not just a red herring name for CDNA3 that is.
 

Frenetic Pony

Senior member
May 1, 2012
218
179
116
Isn't 300 just a doubled up 200? Probably with extras, stacked cache, PCIE5, HBM3, etc. Hopper looks like it took back the crown for FP16 ai but will be well behind on FP64, not that it's trying to get that stat especially.

Any major architecture shift will probably have to wait until gen4, which one would expect on TSMC 4x. Though it is good to see AMD embrace the chiplet rapid iteration philosophy here as well, alternating which portions of a chip get updated to iterate faster. I wonder what consumer GPU versions of this will be, as RDNA3 definitely seems to be multi die if nothing else. Could see a GDDR7 version of RDNA3 come out as early as next year, for one example.
 

soresu

Platinum Member
Dec 19, 2014
2,612
1,810
136
Hopper looks like it took back the crown for FP16 ai but will be well behind on FP64, not that it's trying to get that stat especially.
I'd wager either they did not think FP64 important, or they did not think that AMD could catch up so quickly as to pass them, and now that they have it takes time to turn around a new µArch to focus on that specifically.

The question for me is why did AMD think it was important?

The main possibilities that come to mind were supercomputer contracts and focusing on something they knew nVidia was weaker in, so a niche that they left open allowing AMD a space to wedge their own products into the market.
 
  • Like
Reactions: Kaluan

soresu

Platinum Member
Dec 19, 2014
2,612
1,810
136
I wonder what consumer GPU versions of this will be, as RDNA3 definitely seems to be multi die if nothing else
Rumours frequently point to a doubling of ALUs/CUs per WGP in RDNA3.

So while the 20 WGP N22 has 40 CUs, the 20 WGP N33 would have 80 CUs.

It's also supposed to bring a serious whopper of an Infinity Cache upgrade - with N33 rumoured at 256 MB.

I'll believe it when I see it, but it would certainly be an impressive change if so even beyond the chiplet use in the N31 and N32 SKUs.
 

Aapje

Golden Member
Mar 21, 2022
1,310
1,771
106
It's also supposed to bring a serious whopper of an Infinity Cache upgrade - with N33 rumoured at 256 MB.

AMD's FSR 2 presentation says that their upscaling can benefit from a lot of cache, so this can actually be an extremely attractive proposition. If the low-end can upscale very well, you can still game on 4k/1440p screens on relatively cheap cards, with just a little extra latency & a little worse quality.

In general, upscaling seems to be future for those who don't get the god-like cards.
 
  • Like
Reactions: Tlh97 and Elfear

Aapje

Golden Member
Mar 21, 2022
1,310
1,771
106
Actually, I don't just see upscaling as just becoming significant in the PC market, but also in the console market. Huge TV's are very popular, that are often 4k and 120 Hz. The PS5 can just about do 4k at 60 Hz, although games seem to have to lower the quality to such an extent that many people complain that 4k doesn't look any better than 1080p, so it can't take maximum advantage of this hardware as it is.

At CES 2021, lots of 8k TVs were being shown, so that seems to be the future and will give an even bigger challenge to the hardware.

So upscaling could be a big boon to consoles. I expect that we'll see FSR 2 being implemented on the PS5 and Xbox before too long and that future console games will take advantage.

Of course, both the PS5 and Xbox use RNDA2, so FSR 2 should be designed to work with that well enough, but I expect RDNA3 to have hardware that is designed specifically to make upscaling work well. The PS4 Pro got a graphics card upgrade anyway, so we might see a PS5 Pro based on RDNA3 in two years or so.
 
  • Like
Reactions: Tlh97

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
Upscaling will probably be a bigger deal for consoles. 4K TVs are cheap, but 4K monitors are expensive. Consoles also fall behind from the moment of release and by the end of the cycle are basically a 6-year old midrange gaming PC at best.

People who tend to buy the expensive 4K or high refresh rate 1440p monitors will also buy expensive GPUs. At most technology like FSR and DLSS just offer more options for settings as those cards age out.

Frankly, 1440p isn't even necessary. There are a lot of great 1080p displays and most people with midrange or low end cards are still using that resolution. Turning up the quality settings at 1080p often produces better visuals than running lower settings to hit acceptable frame rates at higher resolutions.
 

Aapje

Golden Member
Mar 21, 2022
1,310
1,771
106
People who tend to buy the expensive 4K or high refresh rate 1440p monitors will also buy expensive GPUs. At most technology like FSR and DLSS just offer more options for settings as those cards age out.

Frankly, 1440p isn't even necessary. There are a lot of great 1080p displays and most people with midrange or low end cards are still using that resolution. Turning up the quality settings at 1080p often produces better visuals than running lower settings to hit acceptable frame rates at higher resolutions.

You can get very cheap 4k 60Hz monitors now and they will only get cheaper. The $200 next gen video cards are going to struggle with that.

You are also assuming a high level of competence. How many people simply buy a monitor that looks nice or has the bigger number and only then discover that their video card is too weak?
 

Frenetic Pony

Senior member
May 1, 2012
218
179
116
Upscaling is definitely in vogue everywhere, even if there are image quality tradeoffs. It's been tried for years at most major game publishers, but issues with image quality and needing to tweak setting a lot for every last title have kept it's usefulness limited. Which is why things like FSR2 are pretty useful all around, open source and apparently good quality.

Even if you've got a giant card it's useful. Programmer in charge of UE's wants to add a mode where even if you're running native res you still temporally upscale 4x before downsampling, as that would improve image quality a lot. Probably kept in mind for their VFX customers running virtual production, but useful if you have some god tier card as well.
 

soresu

Platinum Member
Dec 19, 2014
2,612
1,810
136
Upscaling will probably be a bigger deal for consoles. 4K TVs are cheap, but 4K monitors are expensive. Consoles also fall behind from the moment of release and by the end of the cycle are basically a 6-year old midrange gaming PC at best.
I've been using a TV as my PC display for about 7 and a half years now - monitors just don't provide the value proposition they once did.
 

Leeea

Diamond Member
Apr 3, 2020
3,599
5,340
106
I've been using a TV as my PC display for about 7 and a half years now - monitors just don't provide the value proposition they once did.
This.

My cheap TV, purchased at Walmart, purchased for less then what I paid for my monitor, provides superior motion, superior color, superior response time, and superior HDR.

I especially noticed it while playing Elden Ring recently, which I did on both screens.

The Visio TV E65 is 4k at 60 Hz, and the Samsung C32HG70 is 1440p at 144 Hz. Outside of being to large, the cheapo special Visio TV is just superior in every way. Its rated refresh rate might be lower, but in reality its response time is a 100x faster*.



edit one day later:
*I did some more testing, and it seems the Samsung only really lames out in FreeSync HDR mode. Flipping it to non-Freesync SDR fixed the response times. Still very disappointed. In HDR mode I could literally see the shimmer/blur effect of the pixels changing in faster moving scenes. The TV being able to do the same scenes with HDR on just makes it superior.
 
Last edited:

Frenetic Pony

Senior member
May 1, 2012
218
179
116
Ok, I've not been thinking. Of course they could save even more time, money, and etc. by just making one single compute chiplet, then pairing them up as needed. All the binning, all the time, forever.

If that "double the CU count per" rumor is true, just a re-arrangement of work distribution, then 40CU per chiplet is perfectly doable. And of course they could have 4 chiplets for the max variation, they're apparently doing that with CDNA3 anyway. You'd then get:

Navi 31:
160CU/384bit bus/24gbps GDDR6/384mb cache/2.6ghz?/500+ watt/AIO cooler/2k+ 48gb card. "World's fastest" etc. etc.
160CU/384bit bus/24gbps GDDR6/256mb cache/2.4ghz/450 watt/3.5 slot air cooler/$1600/24gb
144CU/384bit bus/20gbps GDDR6/256mb cache/2.2ghz/400 watt/3 slot air cooler/$1200/12gb
120CU/320bit bus/18gbps GDDR6/192mb cache/2.2ghz/350-400 watt/3 slot air/$1000/10gb(20?)

Navi 32:
80CU/256bit bus/24gbps GDDR6/192mb cache/2.8ghz/350 watt(PCIE 4)/3 slot air/$700/16gb
72CU/256bit bus/20gbps GDDR6/192mb cache/2.6ghz/300 watt/2.5 slot air/$500/16gb
60CU/192bit bus/20gbps GDDR6/128mb cache/2.4ghz/200 watt/2 slot air/$400/12gb
 

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,720
136
Seems for 2023 AMD still has enough ammunition to get enough PR attention until the big year 2024. Actually, for me, 2023 seems like a more exciting cadence from tech perspective, obviously not as a customer (I might consider TR7000, but with Zen5 closely following up, not sure). Bergamo/MI300/Genoa-X/Siena/Raphael X3D/TR7000/XDNA

I am wondering if the CXL 2.0 applies to MI300, since it carries IF 4.0.
Would be awesome if GPUs/APUs would start pooling memory via CXL and when one node goes off others can continue.
Would also make for some interesting use cases, to have gigantic ML models fully resident on slow memory pools in CXL backed memory.

1668468854030.png
1668469337687.png

Also found a patent on the usage of MALL type cache in CPUs which could allow CPUs to have insane BW backed by Infinity Cache as shown in MI300 slides.
20220318151 - METHOD AND APPARATUS FOR A DRAM CACHE TAG PREFETCHER

1668470415256.png
So many interesting things to discover.
  • Will MI300 uses Zen 4c or standard Zen 4
  • 2.5D packaging using EFB or InFO (a la RDNA3)
  • 3D packaging SoIC stacking logic on SRAM (vs stacking SRAM on logic, granted they are still in flip chip arrangement)
    • This could be a trial for @Hans de Vries hypothesis of stacking cores on L3. It is coming.

Small likelihood, but I will not be very surprised if MI300 will actually be available before Xeon Max to be the first x86 CPU with HBM2e/3 you can buy. As a bonus you get a GPU with it ;)
I am curious if AMD will use the MI300 GPU to beat AMX for PR purposes, both need software enablement to be able to put to use effectively (granted, less so for AMX)
 

BorisTheBlade82

Senior member
May 1, 2020
660
1,003
106
Any one want to take a stab at what they think the layout looks like? I am struggling to find a spot for that 9th die...
View attachment 73944
I have to admit that I am completely lost trying to make sense of all of this:
  • I had expected Bergamo. But 24 cores sounds more like standard Zen4. But 3 CCD for 4 base tiles?
  • Connecting the compute chiplets to the base chiplets might be done by some sort of SoIC or InFO. How is it possible to connect a Zen4 CCD to this when the bump pitch is much larger?
  • It is safe to assume that they will need three bridges to connect the base dies to each other. Do they count towards the 9 5nm dies?
We need a much better photograph of this chip in order to get some clues.
 

Saylick

Diamond Member
Sep 10, 2012
3,084
6,184
136
I have to admit that I am completely lost trying to make sense of all of this:
  • I had expected Bergamo. But 24 cores sounds more like standard Zen4. But 3 CCD for 4 base tiles?
  • Connecting the compute chiplets to the base chiplets might be done by some sort of SoIC or InFO. How is it possible to connect a Zen4 CCD to this when the bump pitch is much larger?
  • It is safe to assume that they will need three bridges to connect the base dies to each other. Do they count towards the 9 5nm dies?
We need a much better photograph of this chip in order to get some clues.
I don't think any bridge dies, should they exist, count towards the 9 5nm dies.

The CES video is pretty low resolution, but if you go frame by frame, you can obtain some additional details about the layout. Specifically, the compute portion of the package appears to have a split down the middle in both directions, as evidenced by the edge of those dies having a slight shimmer when the light hits it just right. I'll see if I can provide images at those exact moments.

Here you can see that the compute portion is bisected down the middle, at least it's true for the bottom of the package:
Screenshot_2023-01-05-00-32-02-38_f9ee0578fe1cc94de7482bd41accb329~2.jpg

Here you can see that the bisection also runs up to the middle of the package:
Screenshot_2023-01-05-00-32-08-87_f9ee0578fe1cc94de7482bd41accb329~3.jpg

You can just barely make out that the package is also bisected horizontally as well. There's just a glimmer of glare as the light catches the edge where the package is split horizontally:
Screenshot_2023-01-05-00-33-49-57_f9ee0578fe1cc94de7482bd41accb329~2.jpg
 
Last edited:

BorisTheBlade82

Senior member
May 1, 2020
660
1,003
106
So apart from the missing 9th chiplet this still seems to be the most plausible one:
2022-04-28_9-16-17-low_res-scale-2_00x-Custom.png

Maybe the compute dies are housing GPU IP as well as 6 Zen4 cores each - if you look closely, you can make out 6 small structures in these chiplets. Maybe they even share the L3 with the GPU - this could be the 2 bigger blocks above the CPU clusters.
 
  • Like
Reactions: Vattila