Question Speculation: RDNA2 + CDNA Architectures thread

Page 119 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,622
5,880
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

uzzi38

Platinum Member
Oct 16, 2019
2,622
5,880
146
There is no way cu's would scale linearly, doubling the cu's would require a major redesign of the core to optimize feeding all the cu's and shaders, which is a very intricate and tough thing to do. Considering RTX 3080 is about 80% faster than the RX 5700xt, its pretty clear that AMD have achieved similar performance to Nvidia and about 80% scaling with double the cu's, which is quite a reasonable number.
The 3080 is literally 2x 5700XT performance.
0418c126c080c38a058152c1a9a2601f.jpg
 

Glo.

Diamond Member
Apr 25, 2015
5,705
4,549
136
There is no way cu's would scale linearly, doubling the cu's would require a major redesign of the core to optimize feeding all the cu's and shaders, which is a very intricate and tough thing to do. Considering RTX 3080 is about 80% faster than the RX 5700xt, its pretty clear that AMD have achieved similar performance to Nvidia and about 80% scaling with double the cu's, which is quite a reasonable number.
It was already done with RDNA1...

RDNA2 was further redesigned to reduce the memory bandwidth dependency, and higher cache hit rates.

Both affect scaling, and utilization with higher CU counts.
 

Zstream

Diamond Member
Oct 24, 2005
3,396
277
136
GPU's are already very scalable, each CU is essentially a CCX or a processor itself, however you want to call it. The issue comes down to feeding all the CU's and keeping them optimally used.

There is no need for chiplet design for GPU, at least not in the desktop market and similar. Super computers and such are a different story though.
That’s exactly why you’re wrong. Hopper is already going to do this. There is no code in the world that will feed CU’s at 100% efficiency at this scale. That’s why you start separating out workloads via multi-gpu. In four years time you will have 20/40 CU chiplets that have IC as the glue. I’d think we will start seeing 4x 40 CU chiplets here soon.
 
  • Like
Reactions: IcedEarth

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
GPU's are already very scalable, each CU is essentially a CCX or a processor itself, however you want to call it. The issue comes down to feeding all the CU's and keeping them optimally used.

There is no need for chiplet design for GPU, at least not in the desktop market and similar. Super computers and such are a different story though.
Cost and all the other benefits that come with chiplets, unless you look forward to paying ever increasing prices as node size decrease.
 
  • Like
Reactions: Tlh97

kurosaki

Senior member
Feb 7, 2019
258
250
86
Would be cool having 4 separate 40 CU-chiplets, chugging a frame each, in a cycle. If it works as intended, that would be like a mini-crossfire type of setup.
 

PhonakV30

Senior member
Oct 26, 2009
987
378
136
I think you’re wrong on the chiplet design. Imagine


Because the IPC increase isn't there.
AMD claimed RDNA2 is +50% more perf/watt

AMD-RDNA2-Slide.jpg


It's Impossible to tweak N7 for 50% more performance at the same watt.so How can they gain 50% more perf/watt ? also Look at the word "IPC" , This is official from AMD slides.AMD claims there is improved Perf-per-clock (IPC) over RDNA1 , unless they mean using features like VSR/DirectML or etc can improve perf.
 

Zstream

Diamond Member
Oct 24, 2005
3,396
277
136
AMD claimed RDNA2 is +50% more perf/watt

AMD-RDNA2-Slide.jpg


It's Impossible to tweak N7 for 50% more performance at the same watt.so How can they gain 50% more perf/watt ? also Look at the word "IPC" , This is official from AMD slides.AMD claims there is improved Perf-per-clock (IPC) over RDNA1 , unless they mean using features like VSR/DirectML or etc can improve perf.
Yeah, I'm aware. I've seen that slide a hundred times. If the card demo'd on stage is the best RDNA2, there is absolutely no way it achieved RAW 50% improvement at the same speed.

If they are adding the cache mechanisms, some sort of AI to do branch predictions etc.. I can understand.
 

Mopetar

Diamond Member
Jan 31, 2011
7,831
5,980
136
They've been capable of doing CF/SLI on a board for quite some time.

The whole point would be to put all the chiplets on the job on EACH frame.

The real trick is making it seamless so that developers don't need to change their code to take advantage of it like was often the case with CF/SLI where some games had great support with near 100% scaling and others didn't support it at all.

If you can get that part down it's probably the biggest hurdle, because most companies don't want to have to devote resources to getting something such a small number of users run working perfectly.

If it were easy we'd have seen it done previously. I suppose companies always had some incentive not to care about it too much since it's a lot harder to sell a top-end card if you can get identical performance from two or more low end cards that are priced better.
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
AMD claimed RDNA2 is +50% more perf/watt

It's Impossible to tweak N7 for 50% more performance at the same watt.so How can they gain 50% more perf/watt ? also Look at the word "IPC" , This is official from AMD slides.AMD claims there is improved Perf-per-clock (IPC) over RDNA1 , unless they mean using features like VSR/DirectML or etc can improve perf.

'Impossible' is never a word you should use when describing technological improvements. Its very closed minded.

There are numerous ways to improve performance per watt. Tweaking the process, or moving to a more energy efficient process is one of them. Another way is to improve the performance of the IC itself. The third common way is to improve the efficiency of the of the IC.

Lets say the big cache we have been seeing is true, and they do not require a big 384bit bus, but instead can use a much smaller 256bit bus, that saves quite a bit of energy as the GPU is no longer having to go off die to get commonly used data. It also means that they don't have to use energy hungry memory (such as GDDR6X).
 

Midwayman

Diamond Member
Jan 28, 2000
5,723
325
126
If you don't give people the option of multi-card SLI and instead only have chiplets you can really ratchet up the profit margin on those high yield chips. How amazing would it be if AMD announced they cracked it? It would take Nvidia years to be able to respond. I don't think its likely, but nice to dream.
 
  • Like
Reactions: Tlh97 and kurosaki

Konan

Senior member
Jul 28, 2017
360
291
106
If you don't give people the option of multi-card SLI and instead only have chiplets you can really ratchet up the profit margin on those high yield chips. How amazing would it be if AMD announced they cracked it? It would take Nvidia years to be able to respond. I don't think its likely, but nice to dream.
They need to crack RT and a DLSS competitor first...IMO
MCM approach for both companies with 5nm incoming
 

Zstream

Diamond Member
Oct 24, 2005
3,396
277
136
If you don't give people the option of multi-card SLI and instead only have chiplets you can really ratchet up the profit margin on those high yield chips. How amazing would it be if AMD announced they cracked it? It would take Nvidia years to be able to respond. I don't think its likely, but nice to dream.
Yes, this is what I mean. With IC, you're sharing much of the data between the GPU's. It solves SFR's limitation, and is a no brainer to implement.

Not my quote, but you get the idea:

"Overdrawing is the biggest problem here; all of the vertices for scene geometry have to be transformed by each GPU even if they are not within the GPU's assigned region, meaning geometry performance cannot scale like it does with AFR, and any polygon between multiple rendering regions has to be fully textured and shaded by each GPU whose region it occupies, which is wasteful. Of course, there are also complications that can rise from inaccurate workload allocations."

If you're able to share the data between GPU's, the redundancy isn't required.
 

Glo.

Diamond Member
Apr 25, 2015
5,705
4,549
136
RTX 2080 Ti - 50% faster than RX 5700 XT at 4K, mainly due to VRAM buffer limit on RX 5700 XT.

Navi 21 has 80 CUs, 256 bit memory bus, 2.2 GHz clock speeds, at the very least

2.2 GHz is 16% above what RX 5700 XT clocked. And RDNA2 GPU are supposed to have higher IPC, than RDNA1.

Solely on CU counts, and VRAM size buffer, Navi 21 should achieve 100% performance above RX 5700 XT.

And that is excluding IPC and clock speed differences.

And yet, AMD demoed a GPU that is 70% faster at 4K than RX 5700 XT.

Something does not add up.
A little follow up.

If RDNA2 achieved 10% IPC increase - 80 RDNA2 CUs at 1.8 GHz will perform just like 88 RDNA1 CUs clocked at 1.8 GHz.

2.2 GHz clock speeds are 16% above RX 5700 XT clock speeds(1887 MHz).

In essence, if AMD has found a way, that 256 bit memory bus is enough for 80 RDNA2 CUs that GPU should be, with full config, with 2.2 GHz around 135% above RX 5700 XT.

Around 10-15% above RTX 3090. In 4K.

And this is only with 10% IPC increase.

This is all theoretical calculation.

So we better pray, that 256 bit bus is enough to feed those CUs, and that AMD was able to find a way, that those CUs scale in performance, similarly to RDNA1 architecture.