Info Big Navi - Radeon 5950 XT specs leak.

Hans de Vries · Feb 24, 2020

Seems this might be a valid leak from inside partner SK Hynix..

Twitter

Big Navi - Radeon 5950XT: Twice the compute units as Navi 10.

Shading units: 5120
TMUs: 320
Compute units: 80
ROPs: 96
L2 cache: 12MB
Memory: 24GB (4 x HBM2e, 3 die)
Memory bus: 4096 bits
Band Width: 2048 GB/s

All these Big Navi numbers are perfectly consistent. The 96 ROPs are tiny pieces of logic at the edge of the memory tiles of the 12MB L2 cache which explanes the factor 3 in these numbers. An old example here https://bjorn3d.com/2010/01/nvidia-gf100-fermi-gpu/

SK Hynix will make 6GB HBM2e stacks with 3 dies per stack for consumer applications on request. SK Hynix recently announced HBM2e at 512GB/s at ISSCC-2020. Samsung went a step further with 640GB/s HBM2e (5 Gb/s/pin)

sandorski · Mar 11, 2020

amrnuke said:
From what I understand, multi-GPU support may allow such an increase.

If everyone got together and pushed dev/app support for the Vulkan/DX12 parallelism, it would probably create a Zen-like increase in performance/$ across the board for AMD, Nvidia, and Intel. But I doubt any of them have a vested interest in that, due to the non-linear scaling of GPU cost as performance increases. Once the cat's out of the bag with multiGPU support, there's nothing stopping folks from sticking two $280 5700s in a system instead of buying, say, a $999 5950XT. Same applies for Nvidia's lineup. The GPU manufacturers, however, could choose to steer developers toward multi-GPU on-die support, but disable for support of multiple cards, hence allowing them to slap several chiplets on one piece of silicon, and still being able to price as they wish without consumer competition from people buying two cheaper cards for similar performance.

Though I'm not sure I have a great grasp on this, it seems like all the tools are there to make the leap, it's just that I am not sure AMD, Nvidia, and Intel see any major benefit to it.

I think IF MGPU is for Professional/Industrial applications.

gk1951 · Mar 11, 2020

fleshconsumed, great clip from Notting Hill. What did Spike call him? A daft P-----k? HAHA

We need to lighten up and realize AMD has been playing catch up in the GPU arena with Nvidia for quite awhile. They now seem to be on a path to releasing a solid competitor.

Atari2600 · Mar 11, 2020

Open question:

Do GPUs not intrinsically lend themselves to the MCM approach? [Thinking back to the 4870x2]

Workloads are highly parallel and mostly independent. A central controller could surely farm the work out to worker chiplets and reconcile the returned data without the need for crossfire/SLI.

Which means smaller chips, better yields and an easier roll out of a full product stack.

Unless I am missing something?

CastleBravo · Mar 11, 2020

Atari2600 said:
Open question:

Do GPUs not intrinsically lend themselves to the MCM approach? [Thinking back to the 4870x2]

Workloads are highly parallel and mostly independent. A central controller could surely farm the work out to worker chiplets and reconcile the returned data without the need for crossfire/SLI.

Which means smaller chips, better yields and an easier roll out of a full product stack.

Unless I am missing something?

I think the big technical challenge is that an MCM GPU interconnect would need a lot more bandwidth than an MCM CPU interconnect.

Krteq · Mar 11, 2020

Atari2600 said:
Open question:

Do GPUs not intrinsically lend themselves to the MCM approach? [Thinking back to the 4870x2]

Workloads are highly parallel and mostly independent. A central controller could surely farm the work out to worker chiplets and reconcile the returned data without the need for crossfire/SLI.

Which means smaller chips, better yields and an easier roll out of a full product stack.

Unless I am missing something?

Coherent memory space, fast interconnect, scheduling complexity etc.

MCM approach is quite complicated but it can have some great benefits (as you wrote). Pure compute GPUs can profit much more from this solution and there already are some working concepts (AMD is working on it's "ExaScale" since 2014, NV is rumored to use MCM on "Hopper" GPU, Intel seems to use this concept with Xe GPU).

darkswordsman17 · Mar 11, 2020

Pohemi420 said:
Multi-GPU support (from both AMD and Nvidia) has been waning for years. The driver stacks don't even officially support it any longer afaik, and devs are not coding for it.

The old way of doing mGPU, yes. From what I understand, is that companies now are looking at a completely different approach. Which, I'm not really sure why that is, as mGPU scaling was still good enough that it should be viable (seriously, if they just got +50% performance, it'd be enough for quite a few people), and with the extra burden of higher resolutions (4K-8K), higher refresh rates, and stuff like ray-tracing, I'd think it'd be even more so. Likewise, for VR, where they could do per-eye rendering.

The really boggling thing is that Microsoft showed off mGPU support of DX12, where they got very good scaling while mixing cards from both vendors (Nvidia and AMD). And they claimed it was easier for developers to implement.

Incidentally, enterprise was what was driving the development of the large GPUs, with that market able to adopt mGPU more easily, that really makes me wonder about consumer. Its getting squeezed from 3 directions, diverging from enterprise, rise of streaming, and then push for mobile. Makes me wonder if the push for ray-tracing is out of necessity (them realizing that rasterization focused processors are going to be a bit harder to justify).

sandorski said:
I think IF MGPU is for Professional/Industrial applications.

It explicitly is from what AMD has said so far. I don't know of any consumer GPUs that has IF exposed, and their recent announcements with regards to that are all about enterprise and not consumer. Which, that's not surprising as didn't GPUs not even saturate x8 PCIe 3.0 bus? Its mostly enterprise where they're looking to massively increase the overall data throughput that would even need that right now. Not that it wouldn't be beneficial to consumers (I think it'll be important for whatever GPU chiplet plans AMD has, in fact arguably that's where it'll end up being most important), but there isn't a dire need for it there like there is elsewhere. At least right now.

beginner99 · Mar 11, 2020

darkswordsman17 said:
The really boggling thing is that Microsoft showed off mGPU support of DX12, where they got very good scaling while mixing cards from both vendors (Nvidia and AMD). And they claimed it was easier for developers to implement.

Yeah devs can't even implement dx12 without mGPU properly so it almost certainly isn't "that easy". I was a proponent of dx12 or low level apis but now we have to conclude they pretty much have failed as many games simply are more playable in dx11 mode (frame-rate variance, micro stutter, glitches,...)

extide · Mar 11, 2020

If(when) they do chiplet multi-die GPU's -- it will just present to software as one big GPU. That is the only way it makes sense.

RetroZombie · Mar 11, 2020

CastleBravo said:
I think the big technical challenge is that an MCM GPU interconnect would need a lot more bandwidth than an MCM CPU interconnect.

Isn't that the reason that amd is splitting the gcn in gpus (rdna) and gpgpu (cdna)?

CDNA2 will have a similar implementation of what the TR3990X or TR3970X has. There will be the I/O die with the imc(s), the IF links and the pcie port(s), and finally x amount of chiplets with the 'stream/compute' processors.

Not sure about doing it for graphics it would require some analysis data, for example of one rendered frame that is displayed on the monitor, how much data is from the textures, from the compute, from rendering, processing, ... And how much bandwidth is consumed for each stage, And how much memory space is occupied by those in the graphics card memory.

Just one more thing, if each chiplet in TR3990X have 50GB/s link bandwidth. all the eight chiplets total an aggregate bandwidth of 400GB/s, which is similar to the current memory bandwidth of todays gpus... And the new TR and epyc cpus are also getting into TDP levels of the GPUs, coincidence? I don't think so.

extide · Mar 11, 2020

Check out this research paper: https://research.nvidia.com/sites/default/files/publications/ISCA_2017_MCMGPU.pdf

Nvidia talked a lot about what the bandwidth requirements and such would be -- in the context of a compute oriented GPU. Their theoretical 4-chiplet GPU would have 768GB/sec for each link, totalling 3TB/sec. (Four links, two links per chiplet, making 2 dies within 1 hop, and the remaining die in 2 hops)

MCM GPU's are coming for sure -- probably first for compute, but for graphics will undoubtedly follow.

maddie · Mar 11, 2020

extide said:
Check out this research paper: https://research.nvidia.com/sites/default/files/publications/ISCA_2017_MCMGPU.pdf

Nvidia talked a lot about what the bandwidth requirements and such would be -- in the context of a compute oriented GPU. Their theoretical 4-chiplet GPU would have 768GB/sec for each link, totalling 3TB/sec. (Four links, two links per chiplet, making 2 dies within 1 hop, and the remaining die in 2 hops)

MCM GPU's are coming for sure -- probably first for compute, but for graphics will undoubtedly follow.

Didn't the thinking here conclude that power consumption due to greater data movement, mainly, be the limiting factor? After all the 1st computers were comprised of discrete components wired together. All (haha) integration allowed was denser circuitry at lower power and failure rates.

extide · Mar 12, 2020

It's been a while since I have read it through, but I believe they basically concluded that it's all doable -- the data movement is a big issue because the cost in terms of pJ/bit is much higher, but with some crafty tricks to reduce b/w usage they can make it work pretty well.

joesiv · Mar 12, 2020

maddie said:
Didn't the thinking here conclude that power consumption due to greater data movement, mainly, be the limiting factor? After all the 1st computers were comprised of discrete components wired together. All (haha) integration allowed was denser circuitry at lower power and failure rates.

I would say so. Look at how much, percentage wise AMD has spent in terms of watts in their chiplet designs.

The AMD 3rd Gen Ryzen Deep Dive Review: 3700X and 3900X Raising The Bar

www.anandtech.com

in that case it's around 20% to uncore, a lot of that being interconnect. GPU's are power limited these days, and they would require even higher bandwidth, as data sharing between shader cores is critical, so I'd guess that a MCM/chiplet GPU to require much more than 20% of the TDP to go towards interconnect.

It may be the only way to scale if you want larger than reticle limit, and eventually it'll offer more performance, but yeah, I think power consumption on the interconnect is what is making it less of an all win situation.

RetroZombie · Mar 12, 2020

extide said:
MCM GPU's are coming for sure -- probably first for compute, but for graphics will undoubtedly follow.

The article is very interesting but is very compute related.
We already know that at compute even without a specially architecture tailored towards it, gpus already scaled very well:

There is also rare circumstances with games where sli and crossfire also did 2x scaling, but the developers efforts are so big to do it that it's not worthy:

joesiv said:
I'd guess that a MCM/chiplet GPU to require much more than 20% of the TDP to go towards interconnect.

I think the efficiency gains from having smaller dies would offset that, the big problem with the chiplets approach is the very high idle power draw, the i/o die can not be powered down nor power gated because it always needs to be 'ready to work' and to power up the chiplets, for now at least it seams that way.

kapulek · Mar 14, 2020

N22 die size is comparable to N10 according to AquariusZi.

soresu · Mar 14, 2020

kapulek said:
N22 die size is comparable to N10 according to AquariusZi.

Interesting - I wonder if we will get a 40/64/80 split in CU count across the 3 NV2X dies.

uzzi38 · Mar 15, 2020

kapulek said:
N22 die size is comparable to N10 according to AquariusZi.

Thanks for that.

Perf/mm^2 is up then as well.

Navi22 is a weird GPU though. PRO skus only.

https://twitter.com/x/status/1235954919913422851

Saylick · Mar 18, 2020

PS5 specs unveiled today: 36 CUs @ 2.23 GHz max. This bodes well to know that RDNA 2 can clock so high.

lobz · Mar 18, 2020

Saylick said:
PS5 specs unveiled today: 36 CUs @ 2.23 GHz max. This bodes well to know that RDNA 2 can clock so high.

Especially that according to Sony, the GPU will actually be close to that maximum almost all of the time.

Gideon · Mar 19, 2020

Saylick said:
PS5 specs unveiled today: 36 CUs @ 2.23 GHz max. This bodes well to know that RDNA 2 can clock so high.

Yes. Actually the PS5 GPU is surprisingly similar to the Radeon 5700 (non-XT). They have the same amount of CUs, same memory bandwidth. Both have a similar style boost-clock that is achievable in the vast majority of games (therefore PS5 boost should be very similar to Radeons Game Clock). The TDP of the GPU part is 180W. We don't know the consoles TDP, but the GPU portion is almost guaranteed to be below 180W (considering PS4 Pro had 180W for the entire console). Finally If I remember correctly NAVI is actually made on TSMC N7P process not N7 vanillla (as Renoir is).

This in turn makes them surprisingly well comparable The same info in table form:

	Radeon 5700	PS5
CUs	36	36
Boost Clock/Game Clock MHz	1650	2233
Clock Speed (compared to Radeon 5700)	100%	137%
TDP	180W	? (but should be in the ballpark, possibly even less)
Memory Bandwidth	448.0 GB/s	448.0 GB/s
Memory Capacity	8GB	16GB (shared)
Process	TSMC N7(P?)	TSMC N7P

This bodes very well for RDNA 2.0 on desktop. The clocks should improve 30% at minimum, probably closer to 40+% for the same CU count and TDP.

Now It's really interesting to know if RDNA 1.0 really does use N7P (I can't find the source but i strictly remember it). Anyway AMD has clearly stated (in slides and article below) that whatever 7nm process RDNA 2.0 uses it will be improved compared to RDNA 1.0 If they did indeed use N7P it must either be N7+ or even N6 (unlikely considering rampup time and other factors). Anyway if that's the case there should be a littlle extra performance coming from that too.

Process nodes will also play some kind of a role here. While AMD is still going to be on a 7nm process here – and they are distancing themselves from saying that they'll be using TSMC’s EUV-based “N7+” node – the company has clarified that they will be using an enhanced version of 7nm. To what extent those enhancements are we aren’t sure (possibly using TSMC’s N7P?), but AMD won’t be standing still on process tech.

AMD's RDNA 2 Gets A Codename: “Navi 2X” Comes This Year With 50% Improved Perf-Per-Watt

www.anandtech.com

soresu · Mar 27, 2020

Micron finally launching HBM2 product this year apparently - most auspicious if Big Navi has it, at the very least it's likely to benefit CDNA costs.

NobleX13 · Apr 7, 2020

I am so excited for "Big Navi". Not because I am overconfident that it will suddenly hold the performance crown, but because this will ultimately drive NVIDIA to not hold anything back, and possibly stave off another round of NVIDIA price increases.

Let's just hope TSMC has enough capacity to make these GPUs alongside all of the console APUs they are churning out these days. Not to mention AMD's CPU line.

Info Big Navi - Radeon 5950 XT specs leak.

Senior member

Attachments

No Lifer

Member

Golden Member

Member

Golden Member

Lifer

Diamond Member

Senior member

Senior member

Senior member

Diamond Member

Senior member

Member

Senior member

Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Golden Member

Diamond Member

Member