Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 38 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,624
5,891
146

Mopetar

Diamond Member
Jan 31, 2011
7,833
5,981
136
So with MCM the work distribution, what lanes run what, is all still centrally controlled. There's one input stream to deal with, and all the figuring out of how to get that instruction stream and results and those results back to where they're needed, back and forth across chiplets is done on the hardware itself. Over what needs to be a very high bandwidth, very low latency, preferably very cheap connection (thus why it's taken so long to make).

The problem with "sli" and whatever is indeed latency, in part, and bandwidth. And the fact that there's two instruction streams going to two different chips who then may have to shuttle data back and forth to each other, requiring a lot of bandwidth and consideration of latency (and complication, lots of it) over PCIE (which might be saturated at points already), or everything has to be copied out twice to both cards, meaning a lot of the benefit is lost as you can't perfectly "split the screen in half" and have one card to one half and the other do the other half.Either way, you can pay double the cost for something that is not double the performance.

Meanwhile if you split a GPU in two each half costs less than one big whole, depending on the sizes as splitting a 200mm chip into 2 100mm chips you don't save much and it might even cost more as you end up needing extra hardware for that link between chiplets. But if you split a big enough chip, like a huge GPU or a 64 core CPU or etc. into two or more chiplets then even with some extra added hardware and etc. for chiplet cost your overall cost goes down.

Outside of latency (and I suppose multiple PCI bus connections) though what is the difference between coordinating two chiplets in the same PCB or two chiplets on different cards?

What would prevent the same issues with SLI/Xfire implementations from occurring or not needing the additional work from developers to extract the performance gain.

From the past we know that it's possible to get 70% plus scaling with SLI/Xfire and in some cases it's about as close to perfect as could be expected. In those cases not even the inter-card latency matters more than a percentage or two in terms of raw performance.

Presumably for AMD to have an MCM approach means that some part of the issue that prevents SLI/Xfire from just working without additional support from developers has been solved.
 

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
Outside of latency (and I suppose multiple PCI bus connections) though what is the difference between coordinating two chiplets in the same PCB or two chiplets on different cards?

What would prevent the same issues with SLI/Xfire implementations from occurring or not needing the additional work from developers to extract the performance gain.

From the past we know that it's possible to get 70% plus scaling with SLI/Xfire and in some cases it's about as close to perfect as could be expected. In those cases not even the inter-card latency matters more than a percentage or two in terms of raw performance.

Presumably for AMD to have an MCM approach means that some part of the issue that prevents SLI/Xfire from just working without additional support from developers has been solved.
I don't understand the assumption that chiplet = complete stand alone GPU.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
6,783
7,117
136
I don't understand the assumption that chiplet = complete stand alone GPU.

...and from the sound of the rumors, we're not even getting multiple logic chiplets. Sounds like cache/memory controller/Io chiplets etc which are a completely different beast than logic dies.

People are really making some wild and crazy assumptions based on a couple extre *pci-e* ports...
 
  • Like
Reactions: Tlh97 and Stuka87

Mopetar

Diamond Member
Jan 31, 2011
7,833
5,981
136
...and from the sound of the rumors, we're not even getting multiple logic chiplets. Sounds like cache/memory controller/Io chiplets etc which are a completely different beast than logic dies.

People are really making some wild and crazy assumptions based on a couple extre *pci-e* ports...

The rumors seem to have changed at some point, but once upon a time it seemed like there were multiple dies that contained the shaders. They certainly could have a design that has all of the compute on a monolithic die that connects to multiple cache dies that also handle the memory interference connections.

Perhaps the lack of multiple shader chiplets does indicate they haven't worked out all of the problems there yet.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
The rumors seem to have changed at some point, but once upon a time it seemed like there were multiple dies that contained the shaders. They certainly could have a design that has all of the compute on a monolithic die that connects to multiple cache dies that also handle the memory interference connections.

Perhaps the lack of multiple shader chiplets does indicate they haven't worked out all of the problems there yet.

I suspect the problem is more logistical than anything else. Moving to one chiplet means they can make twice as many cards, and they probably realized that raising clocks allows them to hit internal performance targets just as easily.
 

Frenetic Pony

Senior member
May 1, 2012
218
179
116
I suspect the problem is more logistical than anything else. Moving to one chiplet means they can make twice as many cards, and they probably realized that raising clocks allows them to hit internal performance targets just as easily.

There's a number of advantages to one IO chiplet and one compute. 6nm should be more available and, probably, cheaper to produce (it's definitely cheaper to design) than 5nm, and since it doesn't take up as much power relative to compute that's not as much of a priority, nor does it need ultra high clockspeeds.

The odd part of needing to grab everything from ram and then shuttle it across to the compute chiplets, increasing power cost and complication there, might still be worth things like cost savings. On top of that you get the idea of tik/tok cycle of upgrading your IO and compute chiplets separately. Chip design costs a hell of a lot these days in both money and time, and getting out upgrades regularly is helpful for staying up with the competition.

But there are arguments against it. Moving to newer node like 5nm would save power on IO regardless, both from improved power usage when retrieving stuff and from less need to shuttle traffic across chiplets. It also means you can just, potentially design one chiplet and then put as many of those as you need on package, incredibly easy scalability! So it's all a bunch of tradeoffs.
 

Dribble

Platinum Member
Aug 9, 2005
2,076
611
136
I suspect the problem is more logistical than anything else. Moving to one chiplet means they can make twice as many cards, and they probably realized that raising clocks allows them to hit internal performance targets just as easily.
There seems to be some assumption that moving to chiplets is easy, but it's not. You introduce a load of problems with the sharing of resources and how to break up execution between the chiplets, particuarly for games where the whole lot needs to generate a single output (a frame). Almost certainly the reason AMD haven't done it is they haven't got it working seamlessly or efficiently yet.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
6,783
7,117
136
There seems to be some assumption that moving to chiplets is easy, but it's not. You introduce a load of problems with the sharing of resources and how to break up execution between the chiplets, particuarly for games where the whole lot needs to generate a single output (a frame). Almost certainly the reason AMD haven't done it is they haven't got it working seamlessly or efficiently yet.

"If it was easy someone would have done it already"
 

biostud

Lifer
Feb 27, 2003
18,241
4,755
136
There seems to be some assumption that moving to chiplets is easy, but it's not. You introduce a load of problems with the sharing of resources and how to break up execution between the chiplets, particuarly for games where the whole lot needs to generate a single output (a frame). Almost certainly the reason AMD haven't done it is they haven't got it working seamlessly or efficiently yet.

Not being an expert in any way, but in the end isn't about latency between the different chiplets? Even in a single GPU you have to data around from the thousands of cores inside a modern GPU.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
There seems to be some assumption that moving to chiplets is easy, but it's not. You introduce a load of problems with the sharing of resources and how to break up execution between the chiplets, particuarly for games where the whole lot needs to generate a single output (a frame). Almost certainly the reason AMD haven't done it is they haven't got it working seamlessly or efficiently yet.

Oh I never said it was easy. However AMD has filed many patents, and they have had plenty of time to get it right. The fact that the “leaks” changed so late in the game suggest either a last minute problem or a logistical issue. Normally you would think there was a problem, however, it wasn’t long ago that GPUs were very hard to find. AMD may wish to avoid a repeat of that.

EDIT: and with their Ryzen chips hitting 5.5ghz, chances are good that their GPUs will see a large clockspeed increase as well.
 

ModEl4

Member
Oct 14, 2019
71
33
61
Can someone summarize what is the current "consensus" for N31 vs N21 regarding gaming performance jump based on the unofficial AMD leakers?
Before one year or around that time we had 2.5X-3X depending the leaker (specifically mentioning gaming performance (not FP32) difference...)
 

ModEl4

Member
Oct 14, 2019
71
33
61
I feelnlike its still hovering around 2-2.5x based on leaked specs but its all hand waving.
Thanks, is this 2-2.5X from the same known leakers, or is it your interpretation based on the specs and leaks?
I have my own prediction, I just wanted to see how far is it from the current official-unofficial leaks and if they changed tune.
 

Aapje

Golden Member
Mar 21, 2022
1,379
1,855
106
My estimate would be 1.5 to 2 times across the line up for rasterization and more than that for ray-tracing.

I think that the higher estimates that leakers give are for ray-tracing.
 

jpiniero

Lifer
Oct 1, 2010
14,585
5,209
136
My estimate would be 1.5 to 2 times across the line up for rasterization and more than that for ray-tracing.

That isn't going to be enough versus nVidia's top models. Now I could see AMD seeing the TSMC price increases and mining slump and getting scared that they don't have the ability to sell $2k+ cards regardless of competitiveness. Esp if N31 is going to be released 6 months later.
 
  • Like
Reactions: Saylick

Saylick

Diamond Member
Sep 10, 2012
3,127
6,303
136
Seeing as how N33 is supposed to beat the 6900XT at 1080p, match it at 1440p, and lose to it at 4K, I think 2x 6900XT is conservative for N31, which is in essence triple N33 in specs (3x the shaders, 3x Infinity Cache, allegedly 3x the memory bus width). The only thing that isn't 3x is the TDP, which is closer to 2x, maybe 2.5x tops (assuming N33 is 200W, N31 is 450W). How high it can scale is anyone's guess, but 2x seems like a good floor. Personally, I'd like to see 2.5x 6900XT, which puts it at roughly 2.2x RTX 3090. They kind of need it that fast if they want to compete at the super high end.
 

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
It's both. Remember it's basically a double shrink. Maybe even more than that when you factor in how garbage quality wise SS8 is.
At a high level, processing throughput efficiency, shader numbers & speed are what matters. SS8 vs N4 is merely one of the building blocks to achieve the end result. Using the node as you do here is deceptive.