Discussion Nvidia Blackwell in Q4-2024 ?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Saylick

Diamond Member
Sep 10, 2012
3,035
6,059
136
Yeah, doesn't seem like we're going to get an intermediate generation or refresh. Next consumer release from Nvidia will be Blackwell on TSMC N3E in 2025, likely their last monolithic node at the high end because TSMC N2 likely uses high-NA, which has a reticle limit that is half of the current one.
 

Ajay

Lifer
Jan 8, 2001
15,332
7,787
136
Reticle limit is 858 mm^2 so there's room to grow. We've seen this in the past where NVidia has two generations on the same node. Smaller designs in the first generation mean better yields and room to grow. By the time the second generation goes into production the process should be mature enough to support the bigger dies.
We'll never see a consumer die that large. When Nvidia needs to move to TSMC's High NA node (N2P) then that reticle limit drops by half (IIRC). That, and yield requirements will make tiles a must even on high end consumer GPUs.

Edit: @Saylick beat me by mere minutes :confused:
 
Last edited:

MoogleW

Member
May 1, 2022
32
21
41
Can someone tell me why this is compared to RDNA5 despite AMD's 2 year cadence?

As usual, Nvidia probably designed new tensor cores and maybe will turnone of their research papers into DLSS 4, like the textue compession/upscaling, neural radiance cache, etc

On a hadware level, I don't thinkthey will go lower than FP8, so any such model using such should support rtx 40 series as well. Will b interesting if the RT core gets the same attention as this gen with new extra features to offload even more work of from the shaders plus doubling performance.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,152
136
According to RedGaminTech Nvidia is 'breaking tradition' with its SM design
Assuming SM here means streaming multiprocessors, RGT is not the one who broke that news. I heard that rumor almost 2 years ago about blackwell. I don't know much about video cards other than this is a very big deal for Nvidia.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,152
136
Can someone tell me why this is compared to RDNA5 despite AMD's 2 year cadence?
A leak and confirmation came out a while back that there won't be any high end cards for RDNA4 that compete with nvidia high end.That doesn't explain why RDNA5 is being compared here because that would in theory come out a year after Blackwell in 2026. I'm equally confused as you.
 

MoogleW

Member
May 1, 2022
32
21
41
A leak and confirmation came out a while back that there won't be any high end cards for RDNA4 that compete with nvidia high end.That doesn't explain why RDNA5 is being compared here because that would in theory come out a year after Blackwell in 2026. I'm equally confused as you.
Leakers Kopite7kimi and AGF are now disputing the 2025 rumor, potentially the slide was fabricated or misunderstood.
 
Last edited:

Tigerick

Senior member
Apr 1, 2022
476
409
96

Blackwell-Perf.png

Well, XDA picked up the leaks from Chiphell Forum (it is in Chinese BTW, and I got the English version), basically confirmed the 512-bit memory bus that I calculated with more details:
  • +52% memory bandwidth: my figure is +50%,
  • +78% cache equal to 128MB L2 cache; pretty large area of cache even though N3E process won't scale much. With 512-bit memory controllers, that's mean each controller is connected to 16MB of L2 cache.
  • +50% scale most likely refer to CUDA cores which equal to 24,576 CUDA cores. XDA mentioned 192 SM total which might not be correct number cause nVidia going to change the structure of SM. Nonetheless the increase in CUDA cores align with memory bandwidth improvement.
  • +15% frequency increase: not much cause most advancement is on the amount of CUDA cores
  • 1.7X improvement: 70% improvement most likely refer to rasterization performance. RTX4090 has average of 45% relative performance improvement. If nVidia manages to pull it through, yeah Blackwell's performance upgrade will be bigger than Ada Lovelace.
I actually maintain a table of full range of RTX5000 series, and the most significant and bigger upgrade of Blackwell series is actually on 2nd and 3rd tier of GPU series. With flagship GB202 having 512-bit memory, GB203 most likely going to get 384-bit memory bus like RTX4090 and GB205 going to get 256-bit memory bus like RTX4080. The bandwidth improvement will be even bigger than GB202, thus I am expecting upcoming 5070Ti and 5080Ti will have at least +50% performance with much lower price points.
 

jpiniero

Lifer
Oct 1, 2010
14,391
5,109
136
+78% cache equal to 128MB L2 cache; pretty large area of cache even though N3E process won't scale much. With 512-bit memory controllers, that's mean each controller is connected to 16MB of L2 cache.

That's the part that's sketchy. With no scaling at all, that's a lot of very expensive real estate taken up by the L2. If this is even remotely accurate, it almost means that GB202 is near rectacle size. Again it seems like that's $3k+ Titan territory and not a 5090.
 

KompuKare

Golden Member
Jul 28, 2009
1,004
900
136
The bandwidth improvement will be even bigger than GB202, thus I am expecting upcoming 5070Ti and 5080Ti will have at least +50% performance with much lower price points.
Could I interest you in buying a bridge?!

Anyhow, what these rumours are missing is node and die size.
On 5nm ("4N") AD1012, for instance, at 609mm² doesn't much scope to go bigger.
Which are the things which will determine Nvidia's cost to make - not that cost to make means that much for a fabless company with ~70% margins.
 

Ajay

Lifer
Jan 8, 2001
15,332
7,787
136
That's the part that's sketchy. With no scaling at all, that's a lot of very expensive real estate taken up by the L2. If this is even remotely accurate, it almost means that GB202 is near rectacle size. Again it seems like that's $3k+ Titan territory and not a 5090.
I do wonder if NV is going to use a chiplet/tile architecture to get the % of KGD they need to keep costs under control (even though packaging will be more expensive).
 

jpiniero

Lifer
Oct 1, 2010
14,391
5,109
136
I do wonder if NV is going to use a chiplet/tile architecture to get the % of KGD they need to keep costs under control (even though packaging will be more expensive).

Appears they aren't with Blackwell Gaming. It sounds like the compute product uses MCM.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,312
2,797
106
  • +50% scale most likely refer to CUDA cores which equal to 24,576 CUDA cores. XDA mentioned 192 SM total which might not be correct number cause nVidia going to change the structure of SM. Nonetheless the increase in CUDA cores align with memory bandwidth improvement.
  • +15% frequency increase: not much cause most advancement is on the amount of CUDA cores
  • 1.7X improvement: 70% improvement most likely refer to rasterization performance. RTX4090 has average of 45% relative performance improvement. If nVidia manages to pull it through, yeah Blackwell's performance upgrade will be bigger than Ada Lovelace.
RTX 4090 vs RTX 3090TI
SM: +52% (128SM vs 84SM)
Base frequency: +43%(boost: +35.5%)
Raster performance 4K: +47%
RT performance 4K: +51%
Yet from Blackwell we should expect 70% higher performance from 72.5% more TFLOPs?
 

eek2121

Platinum Member
Aug 2, 2005
2,860
3,833
136
RTX 4090 vs RTX 3090TI
SM: +52% (128SM vs 84SM)
Base frequency: +43%(boost: +35.5%)
Raster performance 4K: +47%
RT performance 4K: +51%
Yet from Blackwell we should expect 70% higher performance from 72.5% more TFLOPs?
Looks to be about 30-40% from architecture and the rest from frequency, if that is accurate.
 

Ajay

Lifer
Jan 8, 2001
15,332
7,787
136
Appears they aren't with Blackwell Gaming. It sounds like the compute product uses MCM.
Well, we knew that for compute. If not, then the next gen gaming GPUs, at least for x80 and x90 GPUs. Massive dice just aren’t going to be economical, and scaling running out of gas till more vertical xtors with different materials come online - beyond GAA devices. AMD is on the right track, maybe just a bit early given the challenges at hand.
 

Frenetic Pony

Senior member
May 1, 2012
218
179
116
That's the part that's sketchy. With no scaling at all, that's a lot of very expensive real estate taken up by the L2. If this is even remotely accurate, it almost means that GB202 is near rectacle size. Again it seems like that's $3k+ Titan territory and not a 5090.

I mean, at least other than that it sounds like Nvidia and way more real than "512bit".

GDDR7, way too big and too much power just like the 4090 so they have to cut it down just to get in a 450w tdp. So I'll not discount it completely, but I'd think they'd want more bandwidth for the L2 instead of making it bigger, and save the sram for bigger L1 and register.
 

Ajay

Lifer
Jan 8, 2001
15,332
7,787
136
I mean, at least other than that it sounds like Nvidia and way more real than "512bit".

GDDR7, way too big and too much power just like the 4090 so they have to cut it down just to get in a 450w tdp. So I'll not discount it completely, but I'd think they'd want more bandwidth for the L2 instead of making it bigger, and save the sram for bigger L1 and register.
That's a fair point. Both companies are upping SRAM (on chip and package) to eek out more performance over screaming high amounts DRAM of bandwidth (which has terrible latency by comparison).
 

jpiniero

Lifer
Oct 1, 2010
14,391
5,109
136

kopkite7kimi seems to think GB202 only has 192 SMs, or about a third more than AD102. They could increase the number of cores per SM. Still thinks it's 512-bit too.
 

Ajay

Lifer
Jan 8, 2001
15,332
7,787
136

kopkite7kimi seems to think GB202 only has 192 SMs, or about a third more than AD102. They could increase the number of cores per SM. Still thinks it's 512-bit too.
Yes, but the 5090 may not use all 512b. Could be that 512b is used for an enterprise product. Might be that 5090 is cut down a bit from the MAX SMs and only have a 384b bus. Of course, at mention just above, the upcoming changes in the SMs could change the performance quite a bit and make the comparison unequal.
 
Jul 27, 2020
14,776
9,040
106
  • Like
Reactions: Tlh97 and Ajay

jpiniero

Lifer
Oct 1, 2010
14,391
5,109
136
Yes, but the 5090 may not use all 512b. Could be that 512b is used for an enterprise product. Might be that 5090 is cut down a bit from the MAX SMs and only have a 384b bus. Of course, at mention just above, the upcoming changes in the SMs could change the performance quite a bit and make the comparison unequal.

I suspect the 5090 will be the full GB203 with 384 bit. I also suspect GB202 is going to be close to the recticle limit of N3E so any product has to be stupidly expensive.

Edit: If GB203 is 7*8, that would make the full GB203 have 112 SMs, or about 15% less than the 4090. It'd still be faster of course because of the bandwidth.
 
Last edited: