Info Big Navi - Radeon 5950 XT specs leak.

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Hans de Vries

Senior member
May 2, 2008
321
1,018
136
www.chip-architect.com
Seems this might be a valid leak from inside partner SK Hynix..

Twitter

Big Navi - Radeon 5950XT: Twice the compute units as Navi 10.

Shading units: 5120
TMUs: 320
Compute units: 80
ROPs: 96
L2 cache: 12MB
Memory: 24GB (4 x HBM2e, 3 die)
Memory bus: 4096 bits
Band Width: 2048 GB/s

ergzz28uuaaa2wf-jpg.17447



All these Big Navi numbers are perfectly consistent. The 96 ROPs are tiny pieces of logic at the edge of the memory tiles of the 12MB L2 cache which explanes the factor 3 in these numbers. An old example here https://bjorn3d.com/2010/01/nvidia-gf100-fermi-gpu/

SK Hynix will make 6GB HBM2e stacks with 3 dies per stack for consumer applications on request. SK Hynix recently announced HBM2e at 512GB/s at ISSCC-2020. Samsung went a step further with 640GB/s HBM2e (5 Gb/s/pin)

20200224_075007.jpg
 

Attachments

  • ERgZZ28UUAAA2Wf.jpg
    ERgZZ28UUAAA2Wf.jpg
    651.3 KB · Views: 1,526

sandorski

No Lifer
Oct 10, 1999
70,101
5,640
126
From what I understand, multi-GPU support may allow such an increase.

If everyone got together and pushed dev/app support for the Vulkan/DX12 parallelism, it would probably create a Zen-like increase in performance/$ across the board for AMD, Nvidia, and Intel. But I doubt any of them have a vested interest in that, due to the non-linear scaling of GPU cost as performance increases. Once the cat's out of the bag with multiGPU support, there's nothing stopping folks from sticking two $280 5700s in a system instead of buying, say, a $999 5950XT. Same applies for Nvidia's lineup. The GPU manufacturers, however, could choose to steer developers toward multi-GPU on-die support, but disable for support of multiple cards, hence allowing them to slap several chiplets on one piece of silicon, and still being able to price as they wish without consumer competition from people buying two cheaper cards for similar performance.

Though I'm not sure I have a great grasp on this, it seems like all the tools are there to make the leap, it's just that I am not sure AMD, Nvidia, and Intel see any major benefit to it.

I think IF MGPU is for Professional/Industrial applications.
 

gk1951

Member
Jul 7, 2019
170
150
116
fleshconsumed, great clip from Notting Hill. What did Spike call him? A daft P-----k? HAHA

We need to lighten up and realize AMD has been playing catch up in the GPU arena with Nvidia for quite awhile. They now seem to be on a path to releasing a solid competitor.
 
  • Like
Reactions: zinfamous

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
Open question:

Do GPUs not intrinsically lend themselves to the MCM approach? [Thinking back to the 4870x2]

Workloads are highly parallel and mostly independent. A central controller could surely farm the work out to worker chiplets and reconcile the returned data without the need for crossfire/SLI.

Which means smaller chips, better yields and an easier roll out of a full product stack.


Unless I am missing something?
 

CastleBravo

Member
Dec 6, 2019
119
271
96
Open question:

Do GPUs not intrinsically lend themselves to the MCM approach? [Thinking back to the 4870x2]

Workloads are highly parallel and mostly independent. A central controller could surely farm the work out to worker chiplets and reconcile the returned data without the need for crossfire/SLI.

Which means smaller chips, better yields and an easier roll out of a full product stack.


Unless I am missing something?


I think the big technical challenge is that an MCM GPU interconnect would need a lot more bandwidth than an MCM CPU interconnect.
 

Krteq

Senior member
May 22, 2015
991
671
136
Open question:

Do GPUs not intrinsically lend themselves to the MCM approach? [Thinking back to the 4870x2]

Workloads are highly parallel and mostly independent. A central controller could surely farm the work out to worker chiplets and reconcile the returned data without the need for crossfire/SLI.

Which means smaller chips, better yields and an easier roll out of a full product stack.


Unless I am missing something?
Coherent memory space, fast interconnect, scheduling complexity etc.

MCM approach is quite complicated but it can have some great benefits (as you wrote). Pure compute GPUs can profit much more from this solution and there already are some working concepts (AMD is working on it's "ExaScale" since 2014, NV is rumored to use MCM on "Hopper" GPU, Intel seems to use this concept with Xe GPU).
 
Mar 11, 2004
23,077
5,558
146
Multi-GPU support (from both AMD and Nvidia) has been waning for years. The driver stacks don't even officially support it any longer afaik, and devs are not coding for it.

The old way of doing mGPU, yes. From what I understand, is that companies now are looking at a completely different approach. Which, I'm not really sure why that is, as mGPU scaling was still good enough that it should be viable (seriously, if they just got +50% performance, it'd be enough for quite a few people), and with the extra burden of higher resolutions (4K-8K), higher refresh rates, and stuff like ray-tracing, I'd think it'd be even more so. Likewise, for VR, where they could do per-eye rendering.

The really boggling thing is that Microsoft showed off mGPU support of DX12, where they got very good scaling while mixing cards from both vendors (Nvidia and AMD). And they claimed it was easier for developers to implement.

Incidentally, enterprise was what was driving the development of the large GPUs, with that market able to adopt mGPU more easily, that really makes me wonder about consumer. Its getting squeezed from 3 directions, diverging from enterprise, rise of streaming, and then push for mobile. Makes me wonder if the push for ray-tracing is out of necessity (them realizing that rasterization focused processors are going to be a bit harder to justify).

I think IF MGPU is for Professional/Industrial applications.

It explicitly is from what AMD has said so far. I don't know of any consumer GPUs that has IF exposed, and their recent announcements with regards to that are all about enterprise and not consumer. Which, that's not surprising as didn't GPUs not even saturate x8 PCIe 3.0 bus? Its mostly enterprise where they're looking to massively increase the overall data throughput that would even need that right now. Not that it wouldn't be beneficial to consumers (I think it'll be important for whatever GPU chiplet plans AMD has, in fact arguably that's where it'll end up being most important), but there isn't a dire need for it there like there is elsewhere. At least right now.
 

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
The really boggling thing is that Microsoft showed off mGPU support of DX12, where they got very good scaling while mixing cards from both vendors (Nvidia and AMD). And they claimed it was easier for developers to implement.

Yeah devs can't even implement dx12 without mGPU properly so it almost certainly isn't "that easy". I was a proponent of dx12 or low level apis but now we have to conclude they pretty much have failed as many games simply are more playable in dx11 mode (frame-rate variance, micro stutter, glitches,...)
 

RetroZombie

Senior member
Nov 5, 2019
464
386
96
I think the big technical challenge is that an MCM GPU interconnect would need a lot more bandwidth than an MCM CPU interconnect.
Isn't that the reason that amd is splitting the gcn in gpus (rdna) and gpgpu (cdna)?

CDNA2 will have a similar implementation of what the TR3990X or TR3970X has. There will be the I/O die with the imc(s), the IF links and the pcie port(s), and finally x amount of chiplets with the 'stream/compute' processors.

Not sure about doing it for graphics it would require some analysis data, for example of one rendered frame that is displayed on the monitor, how much data is from the textures, from the compute, from rendering, processing, ... And how much bandwidth is consumed for each stage, And how much memory space is occupied by those in the graphics card memory.

Just one more thing, if each chiplet in TR3990X have 50GB/s link bandwidth. all the eight chiplets total an aggregate bandwidth of 400GB/s, which is similar to the current memory bandwidth of todays gpus... And the new TR and epyc cpus are also getting into TDP levels of the GPUs, coincidence? I don't think so.
 

extide

Senior member
Nov 18, 2009
261
64
101
www.teraknor.net
Check out this research paper: https://research.nvidia.com/sites/default/files/publications/ISCA_2017_MCMGPU.pdf

Nvidia talked a lot about what the bandwidth requirements and such would be -- in the context of a compute oriented GPU. Their theoretical 4-chiplet GPU would have 768GB/sec for each link, totalling 3TB/sec. (Four links, two links per chiplet, making 2 dies within 1 hop, and the remaining die in 2 hops)

MCM GPU's are coming for sure -- probably first for compute, but for graphics will undoubtedly follow.
 
  • Like
Reactions: RetroZombie

maddie

Diamond Member
Jul 18, 2010
4,746
4,687
136
Check out this research paper: https://research.nvidia.com/sites/default/files/publications/ISCA_2017_MCMGPU.pdf

Nvidia talked a lot about what the bandwidth requirements and such would be -- in the context of a compute oriented GPU. Their theoretical 4-chiplet GPU would have 768GB/sec for each link, totalling 3TB/sec. (Four links, two links per chiplet, making 2 dies within 1 hop, and the remaining die in 2 hops)

MCM GPU's are coming for sure -- probably first for compute, but for graphics will undoubtedly follow.
Didn't the thinking here conclude that power consumption due to greater data movement, mainly, be the limiting factor? After all the 1st computers were comprised of discrete components wired together. All (haha) integration allowed was denser circuitry at lower power and failure rates.
 

extide

Senior member
Nov 18, 2009
261
64
101
www.teraknor.net
It's been a while since I have read it through, but I believe they basically concluded that it's all doable -- the data movement is a big issue because the cost in terms of pJ/bit is much higher, but with some crafty tricks to reduce b/w usage they can make it work pretty well.
 

joesiv

Member
Mar 21, 2019
75
24
41
Didn't the thinking here conclude that power consumption due to greater data movement, mainly, be the limiting factor? After all the 1st computers were comprised of discrete components wired together. All (haha) integration allowed was denser circuitry at lower power and failure rates.
I would say so. Look at how much, percentage wise AMD has spent in terms of watts in their chiplet designs.

in that case it's around 20% to uncore, a lot of that being interconnect. GPU's are power limited these days, and they would require even higher bandwidth, as data sharing between shader cores is critical, so I'd guess that a MCM/chiplet GPU to require much more than 20% of the TDP to go towards interconnect.

It may be the only way to scale if you want larger than reticle limit, and eventually it'll offer more performance, but yeah, I think power consumption on the interconnect is what is making it less of an all win situation.
 

RetroZombie

Senior member
Nov 5, 2019
464
386
96
MCM GPU's are coming for sure -- probably first for compute, but for graphics will undoubtedly follow.
The article is very interesting but is very compute related.
We already know that at compute even without a specially architecture tailored towards it, gpus already scaled very well:
1584041942572.png

There is also rare circumstances with games where sli and crossfire also did 2x scaling, but the developers efforts are so big to do it that it's not worthy:
01143541742l.jpg


7d49075a4624.jpg


Slide2-4.jpg


I'd guess that a MCM/chiplet GPU to require much more than 20% of the TDP to go towards interconnect.
I think the efficiency gains from having smaller dies would offset that, the big problem with the chiplets approach is the very high idle power draw, the i/o die can not be powered down nor power gated because it always needs to be 'ready to work' and to power up the chiplets, for now at least it seams that way.
 
Last edited:

Gideon

Golden Member
Nov 27, 2007
1,646
3,709
136
PS5 specs unveiled today: 36 CUs @ 2.23 GHz max. This bodes well to know that RDNA 2 can clock so high.

Yes. Actually the PS5 GPU is surprisingly similar to the Radeon 5700 (non-XT). They have the same amount of CUs, same memory bandwidth. Both have a similar style boost-clock that is achievable in the vast majority of games (therefore PS5 boost should be very similar to Radeons Game Clock). The TDP of the GPU part is 180W. We don't know the consoles TDP, but the GPU portion is almost guaranteed to be below 180W (considering PS4 Pro had 180W for the entire console). Finally If I remember correctly NAVI is actually made on TSMC N7P process not N7 vanillla (as Renoir is).

This in turn makes them surprisingly well comparable The same info in table form:

Radeon 5700​
PS5​
CUs
36​
36​
Boost Clock/Game Clock MHz
1650​
2233​
Clock Speed (compared to Radeon 5700)​
100%​
137%
TDP
180W​
? (but should be in the ballpark, possibly even less)​
Memory Bandwidth
448.0 GB/s​
448.0 GB/s​
Memory Capacity
8GB​
16GB (shared)​
Process
TSMC N7(P?)​
TSMC N7P​

This bodes very well for RDNA 2.0 on desktop. The clocks should improve 30% at minimum, probably closer to 40+% for the same CU count and TDP.

Now It's really interesting to know if RDNA 1.0 really does use N7P (I can't find the source but i strictly remember it). Anyway AMD has clearly stated (in slides and article below) that whatever 7nm process RDNA 2.0 uses it will be improved compared to RDNA 1.0 If they did indeed use N7P it must either be N7+ or even N6 (unlikely considering rampup time and other factors). Anyway if that's the case there should be a littlle extra performance coming from that too.
Process nodes will also play some kind of a role here. While AMD is still going to be on a 7nm process here – and they are distancing themselves from saying that they'll be using TSMC’s EUV-based “N7+” node – the company has clarified that they will be using an enhanced version of 7nm. To what extent those enhancements are we aren’t sure (possibly using TSMC’s N7P?), but AMD won’t be standing still on process tech.
 
  • Like
Reactions: Elfear and Saylick

soresu

Platinum Member
Dec 19, 2014
2,664
1,863
136
Micron finally launching HBM2 product this year apparently - most auspicious if Big Navi has it, at the very least it's likely to benefit CDNA costs.
 

NobleX13

Member
Apr 7, 2020
27
18
41
I am so excited for "Big Navi". Not because I am overconfident that it will suddenly hold the performance crown, but because this will ultimately drive NVIDIA to not hold anything back, and possibly stave off another round of NVIDIA price increases.

Let's just hope TSMC has enough capacity to make these GPUs alongside all of the console APUs they are churning out these days. Not to mention AMD's CPU line.