Question Speculation: RDNA3 + CDNA2 Architectures Thread

uzzi38 · Jan 23, 2021

Man I have been dying to make this one for a while now.

First rumours for RDNA3 are here so new thread time!

Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3 is much bigger than from RDNA1 to RDNA2. We should expect many big improvements in GFX11. 🤔" / Twitter

TESKATLIPOKA · Sep 25, 2022

I am interested in how much better will RT performance get.

	N23 (RX 6650 XT)	N33	Difference
WGP (old CU)	16 (32)	16	0%
Shaders	2048	4096	+ 100%
Clock speed boost	2635	~ 3.6GHz ?	+ 37%
Processing power in TFlops	10,793	29,491	+ 173%
Ray accelerators	32	?	?%

The much increased processing power should help a lot with RT performance. What I am interested is If Ray accelerator in RDNA3 will be more capable or there will be more of them per WGP.

Link

RDNA 2 introduces a new Ray Accelerator – one for each Compute Unit.

The Ray Accelerator is a fixed-function ray tracing acceleration engine to deliver real-time lighting, shadow and reflection realism through DirectX Raytracing (DXR).

It will calculate the intersections of the rays with the scene geometry as represented in a Bounding Volume Hierarchy, sort them, and return the information to the shaders for further scene traversal or result shading.

Each Ray Accelerator can calculate up to 4 rays per box intersections or 1 ray per triangle intersection per clock cycle.

Timorous · Sep 25, 2022

TESKATLIPOKA said:
I am interested in how much better will RT performance get.

N23 (RX 6650 XT) N33 Difference
WGP (old CU) 16 (32) 16 0%
Shaders 2048 4096 + 100%
Clock speed boost 2635 ~ 3.6GHz ? + 37%
Processing power in TFlops 10,793 29,491 + 173%
Ray accelerators 32 ? ?%

The much increased processing power should help a lot with RT performance. What I am interested is If Ray accelerator in RDNA3 will be more capable or there will be more of them per WGP.

Link

I believe there are 2x the amount per WGP than in RDNA2 but I have no idea where I read / heard that so can't link a source. I also may be mis-remembering something.

KompuKare · Sep 25, 2022

Kaluan said:
So you're dissing his insider math data based on just your outsider presumptions? Ok then.

My first time reading a SemiAnalysis article and while I was there I read the one about Samsung's foundry woes.

I'm sure they have lots of industry and insider knowledge but in that (unrelated) article about Micron racing ahead with just DUV while Hynix also have DRAM issues due to using some EUV steps... Well that's great for now but long-term, where and when are Micron getting their commercial EUV experience?
Okay, just my amateur observation and the article is not related to their next gen GPU costs one, but it wouldn't be the first time some "analyst" has missed an important detail.
Point being, they are not infallible.

scineram · Sep 25, 2022

Glo. said:
So, solely based on the sheer throughput increase we should see not only the higher utilization, but higher throughput of those ALUs/Shaders.

2.5 Times more shaders, 30-50% higher clock speeds(?), 50% higher memory bandwidth.

I think it(full fat N31) will be faster than 4090. But how much more performance is there?

Not much more if the texture mapping or rasterization chokes on all that bandwidth. How about those?

TESKATLIPOKA · Sep 25, 2022

scineram said:
Not much more if the texture mapping or rasterization chokes on all that bandwidth. How about those?

N33 would be the most interesting one to find this out. But that one will come out last.

GodisanAtheist · Sep 25, 2022

Will be interesting to see how AMD and NV's approach to bandwidth plays out.

NV went for beefier L2 and is probably looking for GDDR7 or faster GDDR6x while AMD has stacked vcache in the wings.

Some interesting divergence in solving the bandwidth problem after so many years of just "make bus bigger"

TESKATLIPOKA · Sep 25, 2022

I am starting to miss the good old days with 512bit bus width.

igor_kavinski · Sep 25, 2022

With how much Nvidia is charging for the 4090, it really should have a 2048 bit bus.

DiogoDX · Sep 25, 2022

HBM solves the bandwidth problem but seems like the cost is still too high for the game market.

coercitiv · Sep 25, 2022

igor_kavinski said:
With how much Nvidia is charging for the 4090, it really should have a 2048 bit bus.

Careful what you wish for...

Panino Manino · Sep 25, 2022

DiogoDX said:
HBM solves the bandwidth problem but seems like the cost is still too high for the game market.

For how much it coasts, it should come with 4096bit already.

RnR_au · Sep 25, 2022

Panino Manino said:
For how much it coasts, it should come with 4096bit already.

The R9 Nano back in 2015 had a 4096bit bus.

Timorous · Sep 26, 2022

Mopetar said:
Unless AMD and NVidia just postpone their low end GPUs they'll have to compete with the flood of used previous generation GPUs that were locked up in mining rigs until now. Even the sale of new high end 4080s and 7900s will put cheap 3080s and 6900s into the channel.

It doesn't matter what AMD and NVidia want to charge, it's what the market will let them.

If N33 does have 6900XT level 1080p performance and better RT performance then at a $400-$500 asking price it is a compelling option despite the 2nd hand flood coming. Especially since it will have much lower power draw than the 6900XTs.

Also if the Angstronomics die size is accurate then at $400 for N33 AMD are increasing the margin quite a bit vs N23 because it is A) smaller and B) on a cheaper node so getting it out ASAP might be a good strategy.

If they release it next year a good chunk of their target audience may have already purchased a 2nd hand part and won't be looking to upgrade.

jpiniero · Sep 26, 2022

Timorous said:
If N33 does have 6900XT level 1080p performance and better RT performance

I was looking at that and ~6900 XT looks unrealistic. I think the fanboiz are extrapolating that since the 2x shaders, it should be 2x performance of the 6650 XT. As you see with Ada it's not so simple. Especially when it almost has to be basically the same transistor count as N23 if you factor in that it's 15% smaller. You gut the fp64, you gut the lanes to 4. But there's no increased L3 and the memory bandwidth is probably only the 11% more you get for 20 instead of 18. I don't see how they can also fit in 2x ROPs and TMUs.

Timorous · Sep 26, 2022

jpiniero said:
I was looking at that and ~6900 XT looks unrealistic. I think the fanboiz are extrapolating that since the 2x shaders, it should be 2x performance of the 6650 XT. As you see with Ada it's not so simple. Especially when it almost has to be basically the same transistor count as N23 if you factor in that it's 15% smaller. You gut the fp64, you gut the lanes to 4. But there's no increased L3 and the memory bandwidth is probably only the 11% more you get for 20 instead of 18. I don't see how they can also fit in 2x ROPs and TMUs.

It is 2x the shaders and ~50% more clockspeed so even with the same rops and TMUS their numbers increase by 50% just through clockspeed. Bandwidth will be the limitation but I expect at 1080p it is okay, especially with faster Infinity Cache. At 1440p it will hurt a bit and at 4k It will fall down.

So ultimately 1.8x the 6650XT is probably doable and that will make it more CPU bound at 1080p and line it up with the the 6900XT.

Saylick · Sep 26, 2022

Timorous said:
It is 2x the shaders and ~50% more clockspeed so even with the same rops and TMUS their numbers increase by 50% just through clockspeed. Bandwidth will be the limitation but I expect at 1080p it is okay, especially with faster Infinity Cache. At 1440p it will hurt a bit and at 4k It will fall down.

So ultimately 1.8x the 6650XT is probably doable and that will make it more CPU bound at 1080p and line it up with the the 6900XT.

6900XT or even 6800XT performance at 1080p in a mobile chip is pretty dang good already, and it's supposed to be within 160W. Considering a 6800XT can already do >120fps at 1080p in a lot of titles, I think people should just underclock/undervolt and try to get to 65W while over 100fps.

TESKATLIPOKA · Sep 26, 2022

Timorous said:
It is 2x the shaders and ~50% more clockspeed so even with the same rops and TMUS their numbers increase by 50% just through clockspeed. Bandwidth will be the limitation but I expect at 1080p it is okay, especially with faster Infinity Cache. At 1440p it will hurt a bit and at 4k It will fall down.

So ultimately 1.8x the 6650XT is probably doable and that will make it more CPU bound at 1080p and line it up with the the 6900XT.

Clock speed increase won't be that much, 6650XT has ~2600MHz on average, so +50% would mean 3.9GHz.
I think +30% or ~3.4GHz is much more realistic for N33.

The Achilles heel will be the bandwidth. Even If they used 24GHz Samsung modules, It would be only 37% more than RX 6650XT.
The number of TMUs and ROPs is also questionable. I think ROPs could stay at 64, but It would be great If the number of TMUs increases by 50% from 8 -> 12 per WGP.
What I would love to see as N33 is this -> 16WGP:4096SP:192TMU:64ROP:64RA. I am a bit skeptical considering the size is smaller than N23.

Glo. · Sep 26, 2022

jpiniero said:
I was looking at that and ~6900 XT looks unrealistic. I think the fanboiz are extrapolating that since the 2x shaders, it should be 2x performance of the 6650 XT. As you see with Ada it's not so simple. Especially when it almost has to be basically the same transistor count as N23 if you factor in that it's 15% smaller. You gut the fp64, you gut the lanes to 4. But there's no increased L3 and the memory bandwidth is probably only the 11% more you get for 20 instead of 18. I don't see how they can also fit in 2x ROPs and TMUs.

It is unrealistic SOLELY because of not changed VGPR size.

If it would indeed be 192 KB, instead of 128 KB we would see massive increase in memory efficiency and throughput of the cores.

With N33 we are looking more on RX 6800 - RX 6800 XT performance, because of this.

TESKATLIPOKA · Sep 26, 2022

Glo. said:
It is unrealistic SOLELY because of not changed VGPR size.

If it would indeed be 192 KB, instead of 128 KB we would see massive increase in memory efficiency and throughput of the cores.

With N33 we are looking more on RX 6800 - RX 6800 XT performance, because of this.

I still don't understand why VGPR size is different.
RDNA3 WGP is supposedly a bit smaller than RDNA2 WGP on the same process, but which WGP? The one in N33 or in N31-32?

Joe NYC · Oct 4, 2022

This technology, called SoIC-H (for Horizontal) can enable multi-GCD cards in the future.

It is a silicon interposer that replaces bumps with SoIC connections. Different chips can be placed on the interposer including
- multiple GCDs
- multiple MCDs
- HBM

Maybe RDNA4, or even CDNA3?

Edit: There was a lot of speculation about a stacked Silicon Bridge between GCDs, and many of us were thrown off by he pictures in the patent applications to not see this obvious anser...

--------------------------------------------

Abstract:
An System on Integrated Chip_Horizontal (SoIC_H) technology for heterogeneous system integration in high-performance computing (HPC) is proposed. Compute logic chiplets and memory cubes are tightly integrated on a Si interposer via ultrafine pitch SoIC bond to provide low parasitic and high density in input/output (I/O) interconnects. To demonstrate the advantages of SoIC_H technology over μ bump in HPC applications, the electrical performance of a face-to-face (F2F), 3- μ m pitch ( μ mP), and low-temperature (LT) SoIC bonding on a silicon interposer was conducted and compared with the ones using μ bump. Through system technology co-optimization (STCO), the proposed SoIC_H technology at the bond pitch of 3 μ m improves energy per bit and latency for die-to-die I/O link and on-chip fan-in/fan-out design through the simulation. For memory cube integration, if μ bumps between stacked dies are replaced by SoIC bonds, lower latency, higher bandwidth, and lower energy per bit for 4-Hi static random access memory (SRAM) cache and 12-Hi high bandwidth memory (HBM) are obtained. Moreover, the proposed structure provides significant thermal resistance improvements along the thermal conduction path of logic and memory cubes attached to the Si interposer. With much improved electrical and thermal performance, the SoIC_H technology enables energy-efficient heterogeneous system integration and applications.

SoIC_H Technology for Heterogeneous System Integration

A System on Integrated Chip_Horizontal (SoIC_H) technology for heterogeneous system integration in high-performance computing (HPC) is proposed. Compute logic chiplets and memory cubes are tightly integrated on a Si interposer via ultrafine pitch SoIC bond to provide low parasitic and high...

ieeexplore.ieee.org

GodisanAtheist · Oct 4, 2022

AMD really knows how to build anticipation for their launches. Less than a month away from a potentially sea-change GPU launch and we got just about nada here. NV was leaking like a sieve at this point.

Leeea · Oct 4, 2022

GodisanAtheist said:
AMD really knows how to build anticipation for their launches. Less than a month away from a potentially sea-change GPU launch and we got just about nada here. NV was leaking like a sieve at this point.

I think the AIBs hate nvidia, and leak just to piss off Jenson.

moinmoin · Oct 5, 2022

Leeea said:
I think the AIBs hate nvidia, and leak just to piss off Jenson.

The amount of leaks as a parameter of confidence of all partners involved in said product and their trust in its manufacturer is an interesting thought.

beginner99 · Oct 5, 2022

moinmoin said:
The amount of leaks as a parameter of confidence of all partners involved in said product and their trust in its manufacturer is an interesting thought.

Agree as it makes too much sense to ignore. It kind of passive aggressive. Not really passive but more like "if you annoy the **** out off me I will pay it back by leaking". kind of a way to gain back control and power over a despotic ruler.

Stuka87 · Oct 5, 2022

beginner99 said:
Agree as it makes too much sense to ignore. It kind of passive aggressive. Not really passive but more like "if you annoy the **** out off me I will pay it back by leaking". kind of a way to gain back control and power over a despotic ruler.

Which in turn results in nVidia not giving them any details on anything until 30 days before launch, as was the case with the 3090.

Question Speculation: RDNA3 + CDNA2 Architectures Thread

Platinum Member

Platinum Member

Golden Member

Golden Member

Senior member

Platinum Member

Diamond Member

Platinum Member

Lifer

Senior member

Diamond Member

Senior member

Golden Member

Golden Member

Lifer

Golden Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member