• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Question Speculation: RDNA2 + CDNA Architectures thread

Page 119 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
There is no way cu's would scale linearly, doubling the cu's would require a major redesign of the core to optimize feeding all the cu's and shaders, which is a very intricate and tough thing to do. Considering RTX 3080 is about 80% faster than the RX 5700xt, its pretty clear that AMD have achieved similar performance to Nvidia and about 80% scaling with double the cu's, which is quite a reasonable number.
The 3080 is literally 2x 5700XT performance.
0418c126c080c38a058152c1a9a2601f.jpg
 
There is no way cu's would scale linearly, doubling the cu's would require a major redesign of the core to optimize feeding all the cu's and shaders, which is a very intricate and tough thing to do. Considering RTX 3080 is about 80% faster than the RX 5700xt, its pretty clear that AMD have achieved similar performance to Nvidia and about 80% scaling with double the cu's, which is quite a reasonable number.
It was already done with RDNA1...

RDNA2 was further redesigned to reduce the memory bandwidth dependency, and higher cache hit rates.

Both affect scaling, and utilization with higher CU counts.
 
GPU's are already very scalable, each CU is essentially a CCX or a processor itself, however you want to call it. The issue comes down to feeding all the CU's and keeping them optimally used.

There is no need for chiplet design for GPU, at least not in the desktop market and similar. Super computers and such are a different story though.
That’s exactly why you’re wrong. Hopper is already going to do this. There is no code in the world that will feed CU’s at 100% efficiency at this scale. That’s why you start separating out workloads via multi-gpu. In four years time you will have 20/40 CU chiplets that have IC as the glue. I’d think we will start seeing 4x 40 CU chiplets here soon.
 
GPU's are already very scalable, each CU is essentially a CCX or a processor itself, however you want to call it. The issue comes down to feeding all the CU's and keeping them optimally used.

There is no need for chiplet design for GPU, at least not in the desktop market and similar. Super computers and such are a different story though.
Cost and all the other benefits that come with chiplets, unless you look forward to paying ever increasing prices as node size decrease.
 
Would be cool having 4 separate 40 CU-chiplets, chugging a frame each, in a cycle. If it works as intended, that would be like a mini-crossfire type of setup.
 
I think you’re wrong on the chiplet design. Imagine


Because the IPC increase isn't there.
AMD claimed RDNA2 is +50% more perf/watt

AMD-RDNA2-Slide.jpg


It's Impossible to tweak N7 for 50% more performance at the same watt.so How can they gain 50% more perf/watt ? also Look at the word "IPC" , This is official from AMD slides.AMD claims there is improved Perf-per-clock (IPC) over RDNA1 , unless they mean using features like VSR/DirectML or etc can improve perf.
 
AMD claimed RDNA2 is +50% more perf/watt

AMD-RDNA2-Slide.jpg


It's Impossible to tweak N7 for 50% more performance at the same watt.so How can they gain 50% more perf/watt ? also Look at the word "IPC" , This is official from AMD slides.AMD claims there is improved Perf-per-clock (IPC) over RDNA1 , unless they mean using features like VSR/DirectML or etc can improve perf.
Yeah, I'm aware. I've seen that slide a hundred times. If the card demo'd on stage is the best RDNA2, there is absolutely no way it achieved RAW 50% improvement at the same speed.

If they are adding the cache mechanisms, some sort of AI to do branch predictions etc.. I can understand.
 
They've been capable of doing CF/SLI on a board for quite some time.

The whole point would be to put all the chiplets on the job on EACH frame.

The real trick is making it seamless so that developers don't need to change their code to take advantage of it like was often the case with CF/SLI where some games had great support with near 100% scaling and others didn't support it at all.

If you can get that part down it's probably the biggest hurdle, because most companies don't want to have to devote resources to getting something such a small number of users run working perfectly.

If it were easy we'd have seen it done previously. I suppose companies always had some incentive not to care about it too much since it's a lot harder to sell a top-end card if you can get identical performance from two or more low end cards that are priced better.
 
AMD claimed RDNA2 is +50% more perf/watt

It's Impossible to tweak N7 for 50% more performance at the same watt.so How can they gain 50% more perf/watt ? also Look at the word "IPC" , This is official from AMD slides.AMD claims there is improved Perf-per-clock (IPC) over RDNA1 , unless they mean using features like VSR/DirectML or etc can improve perf.

'Impossible' is never a word you should use when describing technological improvements. Its very closed minded.

There are numerous ways to improve performance per watt. Tweaking the process, or moving to a more energy efficient process is one of them. Another way is to improve the performance of the IC itself. The third common way is to improve the efficiency of the of the IC.

Lets say the big cache we have been seeing is true, and they do not require a big 384bit bus, but instead can use a much smaller 256bit bus, that saves quite a bit of energy as the GPU is no longer having to go off die to get commonly used data. It also means that they don't have to use energy hungry memory (such as GDDR6X).
 
If you don't give people the option of multi-card SLI and instead only have chiplets you can really ratchet up the profit margin on those high yield chips. How amazing would it be if AMD announced they cracked it? It would take Nvidia years to be able to respond. I don't think its likely, but nice to dream.
 
If you don't give people the option of multi-card SLI and instead only have chiplets you can really ratchet up the profit margin on those high yield chips. How amazing would it be if AMD announced they cracked it? It would take Nvidia years to be able to respond. I don't think its likely, but nice to dream.
They need to crack RT and a DLSS competitor first...IMO
MCM approach for both companies with 5nm incoming
 
If you don't give people the option of multi-card SLI and instead only have chiplets you can really ratchet up the profit margin on those high yield chips. How amazing would it be if AMD announced they cracked it? It would take Nvidia years to be able to respond. I don't think its likely, but nice to dream.
Yes, this is what I mean. With IC, you're sharing much of the data between the GPU's. It solves SFR's limitation, and is a no brainer to implement.

Not my quote, but you get the idea:

"Overdrawing is the biggest problem here; all of the vertices for scene geometry have to be transformed by each GPU even if they are not within the GPU's assigned region, meaning geometry performance cannot scale like it does with AFR, and any polygon between multiple rendering regions has to be fully textured and shaded by each GPU whose region it occupies, which is wasteful. Of course, there are also complications that can rise from inaccurate workload allocations."

If you're able to share the data between GPU's, the redundancy isn't required.
 
RTX 2080 Ti - 50% faster than RX 5700 XT at 4K, mainly due to VRAM buffer limit on RX 5700 XT.

Navi 21 has 80 CUs, 256 bit memory bus, 2.2 GHz clock speeds, at the very least

2.2 GHz is 16% above what RX 5700 XT clocked. And RDNA2 GPU are supposed to have higher IPC, than RDNA1.

Solely on CU counts, and VRAM size buffer, Navi 21 should achieve 100% performance above RX 5700 XT.

And that is excluding IPC and clock speed differences.

And yet, AMD demoed a GPU that is 70% faster at 4K than RX 5700 XT.

Something does not add up.
A little follow up.

If RDNA2 achieved 10% IPC increase - 80 RDNA2 CUs at 1.8 GHz will perform just like 88 RDNA1 CUs clocked at 1.8 GHz.

2.2 GHz clock speeds are 16% above RX 5700 XT clock speeds(1887 MHz).

In essence, if AMD has found a way, that 256 bit memory bus is enough for 80 RDNA2 CUs that GPU should be, with full config, with 2.2 GHz around 135% above RX 5700 XT.

Around 10-15% above RTX 3090. In 4K.

And this is only with 10% IPC increase.

This is all theoretical calculation.

So we better pray, that 256 bit bus is enough to feed those CUs, and that AMD was able to find a way, that those CUs scale in performance, similarly to RDNA1 architecture.
 
Back
Top