Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 69 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,622
5,880
146

TESKATLIPOKA

Platinum Member
May 1, 2020
2,355
2,848
106
I am interested in how much better will RT performance get.
N23 (RX 6650 XT)N33Difference
WGP (old CU)16 (32)160%
Shaders20484096+ 100%
Clock speed boost2635~ 3.6GHz ?+ 37%
Processing power in TFlops10,79329,491+ 173%
Ray accelerators32??%
The much increased processing power should help a lot with RT performance. What I am interested is If Ray accelerator in RDNA3 will be more capable or there will be more of them per WGP.

Link
RDNA 2 introduces a new Ray Accelerator – one for each Compute Unit.

The Ray Accelerator is a fixed-function ray tracing acceleration engine to deliver real-time lighting, shadow and reflection realism through DirectX Raytracing (DXR).

It will calculate the intersections of the rays with the scene geometry as represented in a Bounding Volume Hierarchy, sort them, and return the information to the shaders for further scene traversal or result shading.

Each Ray Accelerator can calculate up to 4 rays per box intersections or 1 ray per triangle intersection per clock cycle.
 

Timorous

Golden Member
Oct 27, 2008
1,608
2,751
136
I am interested in how much better will RT performance get.
N23 (RX 6650 XT)N33Difference
WGP (old CU)16 (32)160%
Shaders20484096+ 100%
Clock speed boost2635~ 3.6GHz ?+ 37%
Processing power in TFlops10,79329,491+ 173%
Ray accelerators32??%
The much increased processing power should help a lot with RT performance. What I am interested is If Ray accelerator in RDNA3 will be more capable or there will be more of them per WGP.

Link

I believe there are 2x the amount per WGP than in RDNA2 but I have no idea where I read / heard that so can't link a source. I also may be mis-remembering something.
 

KompuKare

Golden Member
Jul 28, 2009
1,013
924
136
So you're dissing his insider math data based on just your outsider presumptions? Ok then.
My first time reading a SemiAnalysis article and while I was there I read the one about Samsung's foundry woes.

I'm sure they have lots of industry and insider knowledge but in that (unrelated) article about Micron racing ahead with just DUV while Hynix also have DRAM issues due to using some EUV steps... Well that's great for now but long-term, where and when are Micron getting their commercial EUV experience?
Okay, just my amateur observation and the article is not related to their next gen GPU costs one, but it wouldn't be the first time some "analyst" has missed an important detail.
Point being, they are not infallible.
 
  • Like
Reactions: Tlh97 and Kaluan

scineram

Senior member
Nov 1, 2020
361
283
106
So, solely based on the sheer throughput increase we should see not only the higher utilization, but higher throughput of those ALUs/Shaders.

2.5 Times more shaders, 30-50% higher clock speeds(?), 50% higher memory bandwidth.

I think it(full fat N31) will be faster than 4090. But how much more performance is there?
Not much more if the texture mapping or rasterization chokes on all that bandwidth. How about those?
 

GodisanAtheist

Diamond Member
Nov 16, 2006
6,783
7,117
136
Will be interesting to see how AMD and NV's approach to bandwidth plays out.

NV went for beefier L2 and is probably looking for GDDR7 or faster GDDR6x while AMD has stacked vcache in the wings.

Some interesting divergence in solving the bandwidth problem after so many years of just "make bus bigger"
 

Timorous

Golden Member
Oct 27, 2008
1,608
2,751
136
Unless AMD and NVidia just postpone their low end GPUs they'll have to compete with the flood of used previous generation GPUs that were locked up in mining rigs until now. Even the sale of new high end 4080s and 7900s will put cheap 3080s and 6900s into the channel.

It doesn't matter what AMD and NVidia want to charge, it's what the market will let them.

If N33 does have 6900XT level 1080p performance and better RT performance then at a $400-$500 asking price it is a compelling option despite the 2nd hand flood coming. Especially since it will have much lower power draw than the 6900XTs.

Also if the Angstronomics die size is accurate then at $400 for N33 AMD are increasing the margin quite a bit vs N23 because it is A) smaller and B) on a cheaper node so getting it out ASAP might be a good strategy.

If they release it next year a good chunk of their target audience may have already purchased a 2nd hand part and won't be looking to upgrade.
 

jpiniero

Lifer
Oct 1, 2010
14,584
5,206
136
If N33 does have 6900XT level 1080p performance and better RT performance

I was looking at that and ~6900 XT looks unrealistic. I think the fanboiz are extrapolating that since the 2x shaders, it should be 2x performance of the 6650 XT. As you see with Ada it's not so simple. Especially when it almost has to be basically the same transistor count as N23 if you factor in that it's 15% smaller. You gut the fp64, you gut the lanes to 4. But there's no increased L3 and the memory bandwidth is probably only the 11% more you get for 20 instead of 18. I don't see how they can also fit in 2x ROPs and TMUs.
 
  • Like
Reactions: scineram and Leeea

Timorous

Golden Member
Oct 27, 2008
1,608
2,751
136
I was looking at that and ~6900 XT looks unrealistic. I think the fanboiz are extrapolating that since the 2x shaders, it should be 2x performance of the 6650 XT. As you see with Ada it's not so simple. Especially when it almost has to be basically the same transistor count as N23 if you factor in that it's 15% smaller. You gut the fp64, you gut the lanes to 4. But there's no increased L3 and the memory bandwidth is probably only the 11% more you get for 20 instead of 18. I don't see how they can also fit in 2x ROPs and TMUs.

It is 2x the shaders and ~50% more clockspeed so even with the same rops and TMUS their numbers increase by 50% just through clockspeed. Bandwidth will be the limitation but I expect at 1080p it is okay, especially with faster Infinity Cache. At 1440p it will hurt a bit and at 4k It will fall down.

So ultimately 1.8x the 6650XT is probably doable and that will make it more CPU bound at 1080p and line it up with the the 6900XT.
 

Saylick

Diamond Member
Sep 10, 2012
3,125
6,296
136
It is 2x the shaders and ~50% more clockspeed so even with the same rops and TMUS their numbers increase by 50% just through clockspeed. Bandwidth will be the limitation but I expect at 1080p it is okay, especially with faster Infinity Cache. At 1440p it will hurt a bit and at 4k It will fall down.

So ultimately 1.8x the 6650XT is probably doable and that will make it more CPU bound at 1080p and line it up with the the 6900XT.
6900XT or even 6800XT performance at 1080p in a mobile chip is pretty dang good already, and it's supposed to be within 160W. Considering a 6800XT can already do >120fps at 1080p in a lot of titles, I think people should just underclock/undervolt and try to get to 65W while over 100fps.

1664210764358.png
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,355
2,848
106
It is 2x the shaders and ~50% more clockspeed so even with the same rops and TMUS their numbers increase by 50% just through clockspeed. Bandwidth will be the limitation but I expect at 1080p it is okay, especially with faster Infinity Cache. At 1440p it will hurt a bit and at 4k It will fall down.

So ultimately 1.8x the 6650XT is probably doable and that will make it more CPU bound at 1080p and line it up with the the 6900XT.
Clock speed increase won't be that much, 6650XT has ~2600MHz on average, so +50% would mean 3.9GHz.
I think +30% or ~3.4GHz is much more realistic for N33.

The Achilles heel will be the bandwidth. Even If they used 24GHz Samsung modules, It would be only 37% more than RX 6650XT.
The number of TMUs and ROPs is also questionable. I think ROPs could stay at 64, but It would be great If the number of TMUs increases by 50% from 8 -> 12 per WGP.
What I would love to see as N33 is this -> 16WGP:4096SP:192TMU:64ROP:64RA. I am a bit skeptical considering the size is smaller than N23.
 
Last edited:
  • Like
Reactions: Leeea

Glo.

Diamond Member
Apr 25, 2015
5,705
4,549
136
I was looking at that and ~6900 XT looks unrealistic. I think the fanboiz are extrapolating that since the 2x shaders, it should be 2x performance of the 6650 XT. As you see with Ada it's not so simple. Especially when it almost has to be basically the same transistor count as N23 if you factor in that it's 15% smaller. You gut the fp64, you gut the lanes to 4. But there's no increased L3 and the memory bandwidth is probably only the 11% more you get for 20 instead of 18. I don't see how they can also fit in 2x ROPs and TMUs.
It is unrealistic SOLELY because of not changed VGPR size.

If it would indeed be 192 KB, instead of 128 KB we would see massive increase in memory efficiency and throughput of the cores.

With N33 we are looking more on RX 6800 - RX 6800 XT performance, because of this.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,355
2,848
106
It is unrealistic SOLELY because of not changed VGPR size.

If it would indeed be 192 KB, instead of 128 KB we would see massive increase in memory efficiency and throughput of the cores.

With N33 we are looking more on RX 6800 - RX 6800 XT performance, because of this.
I still don't understand why VGPR size is different.
RDNA3 WGP is supposedly a bit smaller than RDNA2 WGP on the same process, but which WGP? The one in N33 or in N31-32?
 
Last edited:
  • Like
Reactions: Leeea

Joe NYC

Golden Member
Jun 26, 2021
1,934
2,272
106
This technology, called SoIC-H (for Horizontal) can enable multi-GCD cards in the future.

It is a silicon interposer that replaces bumps with SoIC connections. Different chips can be placed on the interposer including
- multiple GCDs
- multiple MCDs
- HBM

Maybe RDNA4, or even CDNA3?

Edit: There was a lot of speculation about a stacked Silicon Bridge between GCDs, and many of us were thrown off by he pictures in the patent applications to not see this obvious anser...

--------------------------------------------


Abstract:
An System on Integrated Chip_Horizontal (SoIC_H) technology for heterogeneous system integration in high-performance computing (HPC) is proposed. Compute logic chiplets and memory cubes are tightly integrated on a Si interposer via ultrafine pitch SoIC bond to provide low parasitic and high density in input/output (I/O) interconnects. To demonstrate the advantages of SoIC_H technology over μ bump in HPC applications, the electrical performance of a face-to-face (F2F), 3- μ m pitch ( μ mP), and low-temperature (LT) SoIC bonding on a silicon interposer was conducted and compared with the ones using μ bump. Through system technology co-optimization (STCO), the proposed SoIC_H technology at the bond pitch of 3 μ m improves energy per bit and latency for die-to-die I/O link and on-chip fan-in/fan-out design through the simulation. For memory cube integration, if μ bumps between stacked dies are replaced by SoIC bonds, lower latency, higher bandwidth, and lower energy per bit for 4-Hi static random access memory (SRAM) cache and 12-Hi high bandwidth memory (HBM) are obtained. Moreover, the proposed structure provides significant thermal resistance improvements along the thermal conduction path of logic and memory cubes attached to the Si interposer. With much improved electrical and thermal performance, the SoIC_H technology enables energy-efficient heterogeneous system integration and applications.

 
  • Like
Reactions: Elfear and Kaluan

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
The amount of leaks as a parameter of confidence of all partners involved in said product and their trust in its manufacturer is an interesting thought.

Agree as it makes too much sense to ignore. It kind of passive aggressive. Not really passive but more like "if you annoy the **** out off me I will pay it back by leaking". kind of a way to gain back control and power over a despotic ruler.
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
Agree as it makes too much sense to ignore. It kind of passive aggressive. Not really passive but more like "if you annoy the **** out off me I will pay it back by leaking". kind of a way to gain back control and power over a despotic ruler.

Which in turn results in nVidia not giving them any details on anything until 30 days before launch, as was the case with the 3090.