Discussion Nvidia Blackwell in Q4-2024 ?

Tigerick

Senior member
Apr 1, 2022
476
409
96
Here comes the thread for discussion of all future Blackwell GPU family, aka RTX 5000 series. We also know about codename of all 5 die sizes; namely GB202, GB203, GB205, GB206 and GB207. Below is the table with some speculations of my own, you guys are welcome to pitch in:- :p I also put in upcoming RDNA5 just for comparison sake...


CodenamePossible Model NumberPossible Memory ConfigurationPossible Mobile GPUPossible AMD's response
GB2025090 Ti384-bit 36 GB GDDR7
5090384-bit 24 GB GDDR7
GB2035080 Ti384-bit 24 GB GDDR6XN51 384-bit 24GB GDDR7
5080320-bit 20 GB GDDR6XN51 320-bit 20GB GDDR7
GB2055070 Ti256-bit 16 GB GDDR6X256-bit 16GB GDDR6N52 256-bit 16GB GDDR7
GB2065070192-bit 12 GB GDDR6X192-bit 12GB GDDR6N53 128-bit 12GB GDDR7
5060 Ti192-bit 12 GB GDDR6128-bit 8GB GDDR6 ?
GB2075060128-bit 16 GB GDDR6128-bit 8GB GDDR6



There was rumored about usage of 384-bit GDDR7 memory before, I believe nVidia is testing both 384-bit GDDR7 and 512-bit GDDR6x and then decide to use GDDR7 options. Both memories have same bandwidth even though inteface is different.

As for amount of CUDA cores, nVidia is changing the architecture of Blackwell GPU, so far no leaks about structure of new Shader Model...However, we should be expecting at least 50% performance improvement due to extra 50% memory bandwidth improvement...

Update 1:
ModelCodenameDie Size (mm2)SM/CUCUDA/L2 CacheMemoryMemory BWBW +/-TPU Perf
RTX 3090TiGA10262884107526 MB384-bit 24GB GDDR6X1 TB/s
RTX 4090AD1026091281638472 MB384-bit 24GB GDDR6X1 TB/s+ 0%+ 45%
RTX 5090GB202???84MB ?384-bit 24GB GDDR7< 1.5 TB/s< 50%?
RTX 5090TiGB202?192 ??96MB ?384-bit 36 GB GDDR71.5 TB/s+ 50%?
RTX 3080TiGA10262880102406 MB384-bit 12GB GDDR6X912.4 GB/s
RTX 4080AD10337976972864 MB256-bit 16GB GDDR6X716 GB/s+33%
RTX 4080 SuperAD103379801024064 MB256-bit 16GB GDDR6X736.3 GB/s+3%+2%
RTX 5080GB203???80 MB320-bit 20GB GDDR6X960 GB/s
(75% of N51)
+ 30%?
RTX 5080TiGB203?144 ??96 MB384-bit 24GB GDDR6X1152 GB/s
(75% of N51)
+ 56%?
RTX 3080GA1026286887045 MB320-bit 10GB GDDR6X760.3 GB/s
RTX 4070TiAD10429460768048 MB192-bit 12GB GDDR6X504.2 GB/s- 34%+ 17%
RTX 4070Ti SuperAD10337966884848 MB256-bit 16GB GDDR6X672.3 GB/s+ 33%+9%
RTX 5070TiGB205?80 ??64 MB256-bit 16GB GDDR6X768 GB/s
(75% of N52)
+ 14%?
RTX 3070TiGA1043924861444 MB256-bit 8GB GDDR6X608.3 GB/s
RTX 4070AD10429446588836 MB192-bit 12GB GDDR6X504.2 GB/s- 17%+ 14%
RTX 4070 SuperAD10429456716848 MB192-bit 12GB GDDR6X504.2 GB/s0%+ 16%
RTX 5070GB206?60 ??48 MB192-bit 12GB GDDR6X576 GB/s (12.5% > N53)+ 14%?
RTX 3060TiGA1043923848644 MB256-bit 8GB GDDR6448 GB/s
RTX 4060TiAD10618834435232 MB128-bit 8/16 GB GDDR6288 GB/s- 36%+ 11%
RTX 5060TiGB206???48 MB ?192-bit 12GB GDDR6480 GB/s+ 67%?
RTX 3060GA1062762835843 MB192-bit 12GB GDDR6360 GB/s
RTX 4060AD10715924307224 MB128-bit 8GB GDDR6272 GB/s- 25%+ 18%
RTX 5060GB207?24-36 ??32 MB?128-bit 16GB GDDR6320 GB/s+ 18%?
RX 6950 XTNavi 21520805120128 MB256-bit 16GB GDDR6576 GB/s
RX 7900 XTXNavi 3152996614496 MB384-bit 24GB GDDR6960 GB/s+ 67%+ 36%
RX 8900XT ?Navi 51???96 MB?384-bit 24GB GDDR71.5 TB/s+ 60%?
RX 6750 XTNavi 2233540256096 MB192-bit 12GB GDDR6432 GB/s
RX 7800 XTNavi 3234660384064 MB256-bit 16GB GDDR6620.8 GB/s+ 44%+ 40%
RX 8800XT ?Navi 52???64 MB?256-bit 16GB GDDR71 TB/s+ 65%?
RX 7700 XTNavi 3234654345648 MB192-bit 12GB GDDR6432 GB/s
RX 8700XT ?Navi 53???32 MB?128-bit 12GB GDDR7512 GB/s+ 18.5%?



Update 2:
My versionSizeMemory BWL2 CacheSM ?RGT VersionSizeMemory BWL2 CacheSM
GB202384-bit GDDR736 GB1.5TB/s96 MB192384-bit GDDR736 GB1.5TB/s96 MB192
384-bit GDDR724 GB< 1.5TB/s84 MB?
GB203384-bit GDDR6X24 GB1.15TB/s96 MB144256-bit GDDR724 GB1 TB/s64 MB108
320-bit GDDR6X20 GB960 GB/s80 MB?16 GB:p64 MB
GB205256-bit GDDR6X16 GB768 GB/s64 MB80192-bit GDDR718 GB768 GB/s48 MB72
GB206192-bit GDDR6X12 GB576 GB/s48 MB60128-bit GDDR712 GB512 GB/s32 MB44
192-bit GDDR612 GB480 GB/s48 MB?8 GB:p32 MB
GB207128-bit GDDR616 GB320 GB/s32 MB24-3696-bit GDDR79 GB384 GB/s32 MB28
 
Last edited:

Ajay

Lifer
Jan 8, 2001
15,332
7,787
136
With Kopite7kimi pretty much confirm the 512-bit memory bus interface of upcoming Blackwell GPU, here comes the thread for discussion of all future Blackwell GPU family, aka RTX 5000 series. We also know about codename of all 5 die sizes; namely GB202, GB203, GB205, GB206 and GB207. Below is the table with some speculations of my own, you guys are welcome to pitch in:- :p I also put in upcoming RDNA5 just for comparison sake...



CodenamePossible Model NumberPossible Memory ConfigurationPossible Mobile GPUPossible AMD's response
GB2025090 Ti512-bit 32GB
5090448-bit 28GB
GB2035080 Ti384-bit 24GBN51 384-bit 24GB GDDR7
5080320-bit 20GBN51 320-bit 20GB GDDR7
GB2055070 Ti256-bit 16GB256-bit 16GB GDDR6N52 256-bit 16GB GDDR7
GB2065070192-bit 12GB192-bit 12GB GDDR6N52 192-bit 12GB GDDR7
GB2075060128-bit 8/16 GB128-bit 8GB GDDR6N43/N44/N53 ???
5050128-bit 8GB

Even though we know nVidia will implement 512-bit memory bus, we don't know what types of memory they will choose, I try to list down all possibility of memory choices below:

384-bit GDDR6(X)384-bit GDDR7512-bit GDDR7512-bit GDDR6X
RTX 409024GB 21Gbps 1TB/s24GB 32Gbps 1.5TB/s32GB 32Gbps 2TB/s32GB 24Gbps 1.5TB/s
+ 50%+ 100%+ 50%
Pros
  • Lower TDP due to GDDR7's lower voltage
  • Extra 50% bandwidth
  • Lower TDP due to GDDR7's lower voltage
  • Bigger memory size
  • Double the memory bandwidth
  • Bigger memory size
  • Extra 50% bandwidth
  • Cheaper to source
Cons
  • More expensive than GDDR6x
  • Highest BOM
  • More layers PCB to accommodate 16 pcs memory chips
  • More layers PCB to accommodate 16 pcs memory chips
  • Highest TDP
AMD Radeon RX 7900 XTX24GB 20Gbps 960GB/s24GB 32Gbps 1.5TB/s
+ 60%

There was rumored about usage of 384-bit GDDR7 memory before, I believe nVidia is testing both 384-bit GDDR7 and 512-bit GDDR6x and then decide to use GDDR6x options. Both memories have same bandwidth even though inteface is different.

Hmm, it seems to me usage of GDDR6x make more sense compared to GDDR7, what do you think???

As for amount of CUDA cores, nVidia is changing the architecture of Blackwell GPU, so far no leaks about structure of new Shader Model...However, we should be expecting at least 50% performance improvement due to extra 50% memory bandwidth improvement...
Since the release date is somewhere in 2025 - I think GDDR7 is highly probable - at least for GB102 based GFX AIBs. Power consumption per Gb is a bit high on the current gen of GDDR7, but, at least with Samsung, there is a gen2 in the works that will use less power per Gb (not sure if this is from a process change, or an implementation change). As for everything else - who knows, well, who knows that can talk! No details on uarch changes, CCs, SMs, RT cores, etc. The one thing that's clear, given current trends, is that you better be prepared to sell one of your children to buy the top SKU!
 
  • Like
Reactions: Executor_

CakeMonster

Golden Member
Nov 22, 2012
1,378
468
136
I don't feel qualified to guess at all, but right now I just hope for better memory bandwidth as well as a top card with 32GB, just for hobbyist AI stuff. If more AI progress is made, those will no doubt be useful in all sorts of cool stuff we can hardly imagine now. For gaming, I frankly have no idea, as various upscaling and frame generation technologies don't necessarily scale with the same specs that raster did before, and those are taking over so hardware support for features unknown as of now might be more impactful. I can't imagine there would be a gaming need for >24GB even if it releases in 2025, but I'll of course be happy to take the headroom if the price isn't more than a 4090..

Edit: Also, given CPU limitation rearing its head these days for those with 4090s, we might have different wish list specs for 50xx series depending on how CPU's perform when we're closer to release, as both AMD and Intel actual next gen (not just refresh) should be out by then.
 
  • Like
Reactions: Tlh97

Mopetar

Diamond Member
Jan 31, 2011
7,784
5,879
136
If I had to guess it's unlikely that the consumer card will get the full bus. The (mostly) uncut cards are going into the data center. Consumer cards are the salvage parts and with a bus that big it's getting cut down.
 

Ajay

Lifer
Jan 8, 2001
15,332
7,787
136
If I had to guess it's unlikely that the consumer card will get the full bus. The (mostly) uncut cards are going into the data center. Consumer cards are the salvage parts and with a bus that big it's getting cut down.
Datacenter? Hope not, that would be lame (though I guess some of that is currently happening with the 4090). Workstation cards will be built off the top two chips - so there could be a supply problem. We'll just have to see what Nvidia's wafer allotment is.
 

MoogleW

Member
May 1, 2022
32
21
41
If I had to guess it's unlikely that the consumer card will get the full bus. The (mostly) uncut cards are going into the data center. Consumer cards are the salvage parts and with a bus that big it's getting cut down.
If by datacenter you mean Quadro then you're like right, although that doesn't matter since they typically seem to target same performace but Quadro relies on achieving the same with more hardware while consumer does so with higher clocks.

Like 4090 vs rtx A6000 Ada. 4090 often outperforms despite less hardware but A6000 ada is more efficient and in a smaller footprint

I have been wondering why there are dozens of pages of RDNA4 and CDNA3 but absolutely no discussion of GB100 and GB102/202
 
Last edited:

Ajay

Lifer
Jan 8, 2001
15,332
7,787
136
I have been wondering why there are dozens of pages of RDNA4 and CDNA3 but absolutely no discussion of GB100 and GB102/202
Probably because Nvidia has been very tight lipped on this and is successful keeping info out of leakers hands for now.
 
  • Like
Reactions: A///

jpiniero

Lifer
Oct 1, 2010
14,391
5,109
136
I think the reason that GB202 has 512-bit is mainly for Quadro (ie: they want more memory capacity) but maybe they will do a Titan as well. It would not surprise me if they slash the L2 cache size because of N3E's lack of SRAM scaling... and really on the lower models it also wouldn't surprise me if they also cut the memory controller count too. GDDR7 is likely not going to be cheap either.

ie:

GB202 512-bit
GB203 320-bit
GB205 192-bit
GB206 128-bit
GB207 128-bit (or less)
 

Ajay

Lifer
Jan 8, 2001
15,332
7,787
136
I think the reason that GB202 has 512-bit is mainly for Quadro (ie: they want more memory capacity) but maybe they will do a Titan as well. It would not surprise me if they slash the L2 cache size because of N3E's lack of SRAM scaling... and really on the lower models it also wouldn't surprise me if they also cut the memory controller count too. GDDR7 is likely not going to be cheap either.
Lack of SRAM scaling isn't a reason to cut it. It could be a reason for not increasing it. I'll just take up a larger portion of the GPU - though changes are coming even to consumer GPUs that will move 'uncore' stuff out of the rendering core. It's a necessity, if for no other reason than high NA EUV will reduce reticle sizes (and of course, improving yields).
 
  • Like
Reactions: Executor_

jpiniero

Lifer
Oct 1, 2010
14,391
5,109
136
Lack of SRAM scaling isn't a reason to cut it. It could be a reason for not increasing it. I'll just take up a larger portion of the GPU - though changes are coming even to consumer GPUs that will move 'uncore' stuff out of the rendering core. It's a necessity, if for no other reason than high NA EUV will reduce reticle sizes (and of course, improving yields).

I'm expecting die sizes to be cut (other than GB202) to compensate for the higher costs of N3E and GDDR7.
 

Tigerick

Senior member
Apr 1, 2022
476
409
96
I'm expecting die sizes to be cut (other than GB202) to compensate for the higher costs of N3E and GDDR7.
That's why I think NV will use higher-bus of GDDR6X instead of more expensive GDDR7 on Blackwell GPU to counter upcoming RDNA5 with GDDR7. With GDDR6X, NV might be able to offer GB203 and GB205 with performance increase while maintaining current pricing... Current RTX4080 with 256-bit GDDR6X offers similar raster performance than 7900XTX with 384-bit GDDR6. We might see another round of NV magic :cool:
 
Last edited:

Ajay

Lifer
Jan 8, 2001
15,332
7,787
136
That's why I think NV will use higher-bus of GDDR6X instead of more expensive GDDR7 on Blackwell GPU to counter upcoming RDNA5 with GDDR7. With GDDR6X, NV might be able to offer GB203 and GB205 with performance increase while maintaining current pricing... Current RTX4080 with 256-bit GDDR6X offers similar raster performance than 7900XTX with 384-bit GDDR6. We might see another round of NV magic :cool:
Nvidia has always pushed the memory for its GPUs to the max. They won't bother with saving a bit of money. These are going to be very expensive GFX AIBs - no need to cut corners.
 
  • Love
Reactions: A///

jpiniero

Lifer
Oct 1, 2010
14,391
5,109
136
Nvidia has always pushed the memory for its GPUs to the max. They won't bother with saving a bit of money. These are going to be very expensive GFX AIBs - no need to cut corners.

nVidia is more likely to use less chips, esp assuming there's a capacity increase. Which will help defray the cost increase at the expense of reducing the bandwidth increase (of course).

At 3 GB/chip, 128-bit becomes 12 GB and 192-bit becomes 18 GB.
 

Ajay

Lifer
Jan 8, 2001
15,332
7,787
136
nVidia is more likely to use less chips, esp assuming there's a capacity increase. Which will help defray the cost increase at the expense of reducing the bandwidth increase (of course).

At 3 GB/chip, 128-bit becomes 12 GB and 192-bit becomes 18 GB.
If that's the case GB202 512-bit and GB203 320-bit could have insane amounts of memory. So, probably just the GPUs with narrow busses?
 

dr1337

Senior member
May 25, 2020
293
488
106
Where's the 2025 rumor coming from? Nvidia has had a strong 2 year cadence going for almost a decade and I'd be really surprised if they broke it now.
 

Ajay

Lifer
Jan 8, 2001
15,332
7,787
136
Where's the 2025 rumor coming from? Nvidia has had a strong 2 year cadence going for almost a decade and I'd be really surprised if they broke it now.
I believe it was @adroc_thurston who mentioned it (for NV). Not sure why. Wafer shortage because of G(H)100/200 - though I thought the problem was packaging (CoWoS)??
 

MoogleW

Member
May 1, 2022
32
21
41
Where's the 2025 rumor coming from? Nvidia has had a strong 2 year cadence going for almost a decade and I'd be really surprised if they broke it now.
According to RedGaminTech Nvidia is 'breaking tradition' with its SM design, he also mentioned an improved TPC design scaled down from the H100 design (DSMEM) that reduces trips to L2 cache (but was seemingly shelved for Lovelace or was always meant for rtx 50) so that could factor. Another could be bugs that require further efforts. Last but not least, Nvidia could be waiting for 3nm to mature a bit more for more performance or less cost

As for source of the leak, its an interpretation of a slide
 

KompuKare

Golden Member
Jul 28, 2009
1,004
900
136
Last but not least, Nvidia could be waiting for 3nm to mature a bit more for more performance or less cost
The interval between new nodes certainly has slowed down.
Now for Ada (aside from the xx90 models), that doesn't matter that much as all the rest are small enough that on the present node they have plenty of scope not just for a refresh but actually a second generation. Hence the performance stagnation at most tiers.

What was surprising about AD102 was that it was so big. Better for sales this generation, but even +20% above 3090 Ti would have sold. 600mm² doesnn't leave them much of option but to move to a new node. BW102 maybe 450mm² 3nm, with an BW101 at around 600mm² but with no intention to launch a gaming card with the latter? That might make sense (for Nvidia).

My feeling is that from now on node progression well be trickled out. Well, unless the calculation is that a rival might use that to get ahead, so something like a big BW101 die might be their answer.
 
Last edited:

jpiniero

Lifer
Oct 1, 2010
14,391
5,109
136
If that CoWoS shortage talk is legit, it wouldn't surprise me if GB202's main purpose is to be a GDDR7 compute product. May not be suitable for gaming. But if it is, it's going to be very expensive (ie: Titan only)
 

Mopetar

Diamond Member
Jan 31, 2011
7,784
5,879
136
What was surprising about AD102 was that it was so big. Better for sales this generation, but even +20% above 3090 Ti would have sold. 600mm² doesnn't leave them much of option but to move to a new node.

Reticle limit is 858 mm^2 so there's room to grow. We've seen this in the past where NVidia has two generations on the same node. Smaller designs in the first generation mean better yields and room to grow. By the time the second generation goes into production the process should be mature enough to support the bigger dies.

The big die might be restricted to professional and data center users. Even the 4090 is pretty cut down and since AMD is t offering anything that can match NVidia on performance there's not much incentive to release another card above that. It's possible we get more resources on a slightly smaller die for a 5090 just because it isn't as cut down.

NVidia has options, but a lot of what gets sold and under what label depends on what the alternatives look like at the time or what NVidia is predicting the competition to have.
 

jpiniero

Lifer
Oct 1, 2010
14,391
5,109
136
Reticle limit is 858 mm^2 so there's room to grow. We've seen this in the past where NVidia has two generations on the same node. Smaller designs in the first generation mean better yields and room to grow. By the time the second generation goes into production the process should be mature enough to support the bigger dies.

I had kind of wondered if nVidia would do a respin of Ada to N4P (?) and throw in GDDR7. Wouldn't be any smaller but perhaps slightly higher clock speeds and any performance gains they can get from the higher bandwidth.
 

Mopetar

Diamond Member
Jan 31, 2011
7,784
5,879
136
I think there's some confusion over their process node. They've decided to call their custom set of stuff 4N, but it's not the same as TSMC's N4 nodes.

I could see them changing anyway even if it doesn't get them any extra density. The power or performance boost is usually worth it.
 

jpiniero

Lifer
Oct 1, 2010
14,391
5,109
136
I think there's some confusion over their process node. They've decided to call their custom set of stuff 4N, but it's not the same as TSMC's N4 nodes.

I could see them changing anyway even if it doesn't get them any extra density. The power or performance boost is usually worth it.

It doesn't appear they are doing any sort of Ada refresh.