Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

soresu · Jan 31, 2024

eek2121 said:
Gamers Nexus routinely reruns their benchmarks.

Phoronix:

DaaQ · Feb 1, 2024

Honestly, OCN is pretty dead, but some of the sub forums are quite active. IE the watercooling and gpu sub forums. https://www.overclock.net/threads/official-amd-radeon-rx-7900-xtx-xt-owners-club.1802706/
Long but worthy read. User input is way more valuable than TPU GN re review charts. IMO.

Tigerick · Feb 8, 2024

While we are waiting for RDNA4 specs, how about some speculation of RDNA5 cards which are supposedly come out next year? We still don't know how the chiplets going to arrange; be it SED modular design or GCX single die. And we also don't know how AMD going to re-arrange RDNA5's WGP design. But we do know AMD going to employ GDDR7 as standard. Let's assume AMD use GCX and each GCX is linked to base tile with 128-bit GDDR7 memory bus as shown below:

Above picture is AMD's patent for Navi4c, it could be used for RDNA5. What AMD described as virtual compute die could be single GCX (or multiple SEDs) sitting on top of base IC die with Infinity Cache and GDDR7 memory controllers. I am not sure bridge chip is necessary though....Anyhow, since I am maintaining specs of upcoming nVidia's Blackwell, let's put it together and compare the product positioning with estimated pricing:-

RDNA5 Lineup	8600 XT ?	8700 ?	7900M	8700 XT ?	8800 XT ?	8900 XT ?	8900 XTX ?
Estimated Price	$299 ?	$399 ?	N5 Node	$499 ?	$699 ?	$899 ?	$999 - $1,099 ?
GPU chiplet	160 mm2 ?	250 mm2 ?	304 mm2	One	Two	Three	Three
Bridge chip	NA	NA	NA	0	1	3	3
Infinity Cache	32 MB ?	48 MB ?	64 MB	32 MB ?	64 MB ?	80 MB ?	96 MB ?

GDDR7	16GB GDDR6	12GB GDDR6	16GB GDDR6	12 GB	16 GB	20 GB	24 GB
Memory Bus	128-bit	192-bit	256-bit	128-bit	256-bit	320-bit	384-bit
Memory BW	320 GB/s	480 GB/s	576 GB/s	512 GB/s	1 TB/s	1.25 TB/s	1.5 TB/s

WGP ?	16	20	36	16	32	40	48
CU (with DI)	32	40	72	32	64	80	96
CU (4 CU per WGP)	NA	NA	NA	64	128	160	192

Blackwell Lineup	RTX 5060	RTX 5060Ti		RTX 5070	RTX 5070 Ti	RTX 5080	RTX 5080 Ti
Estimated Price	$299 ?	$449 ?		$599 ?	$799 ?	$999 ?	$1,199 ?
GDDR6X	16GB GDDR6	12GB GDDR6		12 GB	16 GB	20 GB	24 GB
Memory BW	320 GB/s	480 GB/s		576 GB/s	768 GB/s	960 GB/s	1.15 TB/
% of RDNA5				> 12.5%	~ 76%	~ 76%	~ 76%

The WGP numbers are purely speculated, we should look into final numbers of CU which is crucial to rasterization performance. The CU numbers are slightly lower but if we compared to 7900M number then it is making sense. Why compare to 7900M? Cause 8700XT would most likely replacing current 7900M as 8900M. From 7900M's 16GB 256-bit GDDR6 to 8900M's 16GB 128-bit GDDR7, it provides transmission from GDDR6 to GDDR7 in mobile notebook.
Even though 8700XT comes with 18.5% bigger memory bandwidth, they are still slower than 7800XT. Let's see how much performance AMD is able to squeeze out of RDNA5. 8700XT's competitor would be upcoming RTX5070 with slightly faster 12GB GDDR6X.
7800XT is considered oddball for NV; that might explain why NV has to replace RTX4070 with RTX4070S which is only 7% slower than RTX4070Ti. Even though RTX4070S is faster than 7800XT, there is still 4GB RAM advantage with 7800XT. Unfortunately, 8800XT would be priced much higher than 7800XT, my estimated price would be $699, $200 extra. The reasons behind are because of one more GCX and base tile. But they are still $100 cheaper than RTX5070Ti which is direct competitor of 8800XT, this time no more memory size difference but memory type difference... 😉
8900XT and 8900XTX are replacing current 7900XT series, that is simple to understand. OTOH, most people could confuse why do NV launch RTX4080S with only 2% faster performance but $200 cheaper. Well, I think I know why. Cause upcoming RTX5080 with 20GB is going to replace RTX4080S at $999 price point, that put 8900XT in direct competition with $100 cheaper price point.
If my calculations are correct, RDNA5 with GDDR7 has memory BW advantage compared to GDDR6X on the Blackwell series. However, there are power and memory bandwidth penalty due to chiplet design, that's why NV has maintained monolithic design for the moment. Anyhow, we will see direct fighting between NV and AMD once again next year.
There are four models listed above, AMD should be working on the successor of Navi4c, let's called it Navi5c. I am not sure AMD will pull through though, with four GCXs and four base tiles, the power requirements are quite high, we shall see...

Well, that's all for the moment. There are a lot of assumptions above and let's see how many percentages I could be right 😛. Feel free to disagree, that's the purpose of discussion....

Shmee · Feb 8, 2024

DaaQ said:
Honestly, OCN is pretty dead, but some of the sub forums are quite active. IE the watercooling and gpu sub forums. https://www.overclock.net/threads/official-amd-radeon-rx-7900-xtx-xt-owners-club.1802706/
Long but worthy read. User input is way more valuable than TPU GN re review charts. IMO.

I found in the past OCN to be very helpful with BIOS mods/flashing and the like. I think there was info there that showed me how to mod the BIOS on my X99 for PCIe bifurcation.

darkswordsman17 · Feb 8, 2024

Er, I thought we're supposed to be getting megahammer 999999XTXXXXXTTTTXXXXTTTXXT with MSRP of $3k+?

Why are you labeling RDNA5 as 7xxx and 8xxx series? Wait, were you the person insisting that 7800 is RDNA4? Oh, you are. Yeah, so you've already been shown to have no clue what you're talking about.

Sorry we really don't need this thread mucked up even more than it is with inane pointless speculation based on essentially nothing. Go make the RDNA5/CDNA4 thread if you want to start in on that.

branch_suggestion · Feb 8, 2024

Tigerick said:
I am not sure bridge chip is necessary though...

It is cheaper, more flexible and just plain better than CoWoS, so it really is necessary.

adroc_thurston · Feb 8, 2024

branch_suggestion said:
It is cheaper

No.

branch_suggestion said:
more flexible

Kinda.

branch_suggestion said:
just plain better than CoWoS,

Most definitely.

branch_suggestion · Feb 9, 2024

adroc_thurston said:
No.

Once the package is large enough, unless there are a lot of bridge connections, but that isn't a big deal at the ASP's for such things. Package yields become the main concern.

adroc_thurston said:
Kinda.

I suppose it does depend, the bridge connection is SoIC, no? So that is better than 2.5D routing underneath for raw metrics. Both can do stuff the other cannot so that is the main thing.

adroc_thurston · Feb 9, 2024

branch_suggestion said:
Once the package is large enough, unless there are a lot of bridge connections, but that isn't a big deal at the ASP's for such things. Package yields become the main concern.

Piling up on SoIC means lower packaging yield.

branch_suggestion said:
I suppose it does depend, the bridge connection is SoIC, no? So that is better than 2.5D routing underneath for raw metrics. Both can do stuff the other cannot so that is the main thing.

Better, yes.
More money, also yes.

Tigerick · Feb 9, 2024

I have updated the table with the bridge chips needed. With 3 base die, the amount has increased from 1 to 3...

branch_suggestion said:
It is cheaper, more flexible and just plain better than CoWoS, so it really is necessary.

krawcmac · Feb 10, 2024

Hello guys, please don't hate me but there is a new video from MLID.

The video is on projected performance and launch window for RDNA4 cards. He talks about Navi 48 and 44. MLID gives a very wide window for launch for those cards (Q3.24 - Q1.25). It is stated that Navi 48 can clock to 3-3.3 GHz. So it looks like RDNA4 is fixed RDNA3 with some additional features. One thing I am missing is the projection on RDNA4 power consumption.

soresu · Feb 10, 2024

krawcmac said:
Hello guys, please don't hate me but there is a new video from MLID.

The video is on projected performance and launch window for RDNA4 cards. He talks about Navi 48 and 44. MLID gives a very wide window for launch for those cards (Q3.24 - Q1.25). It is stated that Navi 48 can clock to 3-3.3 GHz. So it looks like RDNA4 is fixed RDNA3 with some additional features. One thing I am missing is the projection on RDNA4 power consumption.

#1. A short text summary is enough to suffice what little information is actually imparted in any video from MLID, RGT or GamerMeld.

There's no need for promoting these channels directly by posting the video itself 😒

#2. Lower end RDNA3 dies are fabbed on N6.

If his information is correct then N48 and N44 are on N4P which is enough to account for the clocking difference.

S'renne · Feb 10, 2024

What about RDNA 3.5, the one rumoured to be used on Strix Halo

DisEnchantment · Feb 10, 2024

krawcmac said:
Hello guys, please don't hate me

Ahh ...

soresu said:
#1. A short text summary is enough to suffice what little information is actually imparted in any video from MLID, RGT or GamerMeld.

- 9 months launch window
- RDNA4 has Matrix ops (LLVM says only GFX940)
- His leaks are more worthy than usual 'Linux driver stuffs' (the changes are actually in LLVM)
Not sure if these "leaks" are the kind that you like to hear 🙂

One interesting thing is that RGT said he heard RDNA5 has register renaming. Which is odd, that I have seen only in a patent. Not sure if he is actively digging patents. If this register renaming is there, they could solving lots of stalling issues.

Glo. · Feb 10, 2024

DisEnchantment said:
One interesting thing is that RGT said he heard RDNA5 has register renaming. Which is odd, that I have seen only in a patent. Not sure if he is actively digging patents. If this register renaming is there, they could solving lots of stalling issues.

And improve bandwidth(internal) and efficiency of the GPUs with this arch.

Especially this would be interesting for APUs.

SolidQ · Feb 10, 2024

DisEnchantment said:
RDNA4 has Matrix ops

Interesting will RDNA4 introduce FSR AI.
I'd assuming AMD ignore this, because they Open Source company, and not all yet support AI. Even Intel still doesn't make Xess open source.

soresu · Feb 12, 2024

CUDA -> HIP/AMD finally with minimum to zero work from ZLUDA....

AMD Quietly Funded A Drop-In CUDA Implementation Built On ROCm: It's Now Open-Source - Phoronix

www.phoronix.com

Ajay · Feb 12, 2024

soresu said:
CUDA -> HIP/AMD finally with minimum to zero work from ZLUDA....

AMD Quietly Funded A Drop-In CUDA Implementation Built On ROCm: It's Now Open-Source - Phoronix

www.phoronix.com

Pretty good move, IMHO. At least since ROCm is having a tough time catching on (does AMD fund a University program like NV used to?). If not, they should.

moinmoin · Feb 12, 2024

soresu said:
CUDA -> HIP/AMD finally with minimum to zero work from ZLUDA....

AMD Quietly Funded A Drop-In CUDA Implementation Built On ROCm: It's Now Open-Source - Phoronix

www.phoronix.com

Yay, more open source!

Kinda bad news:

It was open sourced because AMD stopped funding it.
It used to only support Intel GPUs, now it only supports AMD GPUs.

It being open source I hope it will still see wide usage which ideally helps making CUDA projects running on non-Nvidia GPUs a more common expectation.

soresu · Feb 12, 2024

moinmoin said:
Yay, more open source!

Kinda bad news:

It was open sourced because AMD stopped funding it.

It used to only support Intel GPUs, now it only supports AMD GPUs.

It being open source I hope it will still see wide usage which ideally helps making CUDA projects running on non-Nvidia GPUs a more common expectation.

Indeed.

I was initially hopeful about hipSYCL but now it seems like they are just fooling around with names while ZLUDA seems to be making dreams come true.

If this gets even a little bit of GPU rendering access with Arnold it's a win for me.

The fact that AMD and Intel in one way or another chose to cut funding or not fund to begin with makes me wonder what is going on.

Perhaps the coder is just difficult to work with.

Either way it's already doing better than the current HIP backend for Blender Cycles, so I'd rather see AMD's money going toward ZLUDA rather than Blender given it could allow commercial GPU renderers yet lacking HIP backends to run on it.

Aapje · Feb 13, 2024

What we really need is an industry-standard like in gaming, where all cards support DirectX, so the better implementation of the same API wins.

I think that the cloud companies are the most likely to force that onto the market, as they want to offer GPU-computing to the market, but don't want to be forced into choosing one vendor. They are even designing their own chips, which allows them to set a standard. So I can see Google/Amazon/etc coming up with a shared API and then demanding that Nvidia, AMD & Intel support it for them to even be in the running to deliver chips.

PJVol · Feb 14, 2024

branch_suggestion said:
It is cheaper, more flexible and just plain better than CoWoS, so it really is necessary.

Oh, not this again!
In all honesty, I'd like to see AMD to throw out all this bridge / interconnect / whatever **** from gaming segment and pack what's left in bga, leaving "innovations" for their datacenter moneybags.

carrotmania · Feb 14, 2024

PJVol said:
I'd like to see AMD to throw out all this bridge ... from gaming segment

You might get your wish with Zen6, according to adroc, now that they have money to make a server and a client part.

PJVol · Feb 14, 2024

carrotmania said:
You might get your wish with Zen6, according to adroc, now that they have money to make a server and a client part.

Actually I was talking about GPU, so unlikely. Аnd afaik desktop Zen 6 was originally designed as MCM, which is welcomed

SteinFG · Feb 17, 2024

Some people are guessing, so I think why not post my expectations. I'm considering recent (unreliable) rumors: 64 CU Navi 48, 7900 XT level of performance; 32 CU Navi 44.

	VRAM	Cores	Memory bus	TDP	Comparable to	Price
RX 8800 XT	16GB	64 CU	256bit, 20Gbps	260W	~7900 XT	$530
RX 8700 XT	16GB	56 CU	256bit, 18Gbps	230W	~6950 XT	$470
RX 8600 XT	16GB	32 CU	128bit, 20Gbps	160W	~4060 Ti 16G	$330
RX 8500 XT	8GB	28 CU	128bit, 18Gbps	130W	~3060 Ti	$250

I expect 8700 XT to have 16GB (7700XT with 12GB is a fail). I also think there's no demand for an 8GB card that's more powerful than 3060 Ti, that's why there's no 8GB 32 CU chip on this table. RX 7600 should slip to $200-$220 and RX 6600 will get discontinued.

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Diamond Member

Platinum Member

Senior member

Memory & Storage, Graphics Cards Mod Elite Member

Lifer

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Junior Member

Diamond Member

Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Golden Member

Senior member

Member

Senior member

Senior member