Discussion Nvidia Blackwell in Q4-2024 ?

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

tamz_msc

Diamond Member
Jan 5, 2017
3,671
3,534
136
For the GB207 that will go into the 5050 series, I guess they'll use a 128 bit bus, so 8GB for those cards. If they price it right (around $250-300), it could be a viable choice as it'll be in between 3070 and 3080 level of performance or slightly higher than the current 4060 Ti.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,671
3,534
136
For instance, say the 5070/Ti was 160-bit. With 3 GB chips that would be 15 GB. An odd number would be unusual, sure... but that would be 5 memory chips instead of 8.
Nah I don't think they'll go for those weird configurations. I wonder though if you could mod 3GB modules on cards that come with 2GB chips, like it has been done before.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,671
3,534
136
I would not expect much, esp at the low end. If the 5060 is more than 10% faster than the 4060 I would be surprised.
It is always the case that the performance of a tier in any given generation is matched or exceeded by the corresponding card one tier lower of the subsequent generation.
 

jpiniero

Lifer
Oct 1, 2010
14,410
5,120
136
It is always the case that the performance of a tier in any given generation is matched or exceeded by the corresponding card one tier lower of the subsequent generation.

Clearly you haven't been paying attention to nVidia lately.

Edit: Keep in mind I am projecting GB205/6/7 to be substantially smaller than their Ada counterparts... and there's no Cache/IO scaling with N3E.
 
Last edited:

tamz_msc

Diamond Member
Jan 5, 2017
3,671
3,534
136
Clearly you haven't been paying attention to nVidia lately.
You don't say?

Let's focus on the xx60 tier for the moment:

4060 Ti matching 3070, coming close to 3070 Ti

1700105304912.png

3060 matching 2070

1700105464984.png

2060 6GB matching 1070 Ti

1700105535513.png

All of these are launch reviews.
 

MoogleW

Member
May 1, 2022
33
23
41
Clearly you haven't been paying attention to nVidia lately.

Edit: Keep in mind I am projecting GB205/6/7 to be substantially smaller than their Ada counterparts... and there's no Cache/IO scaling with N3E.
Using the targetted resolutions, lets compare the chips shall we:

AD107 is faster than GA106 (4060 vs 3060) at 1080p, faster in RT and AI

AD106 7% slower vs GA104 (4060ti vs 3070ti) at 1080p, faster in RT and AI (https://www.techpowerup.com/review/nvidia-geforce-rtx-4060-ti-founders-edition/32.html)

AD104 matches GA102 (4070ti vs 3090ti) at 1440p, faster in RT and AI (https://www.techpowerup.com/review/asus-geforce-rtx-4070-ti-tuf/32.html)

AD103 is faster than GA102 (4080 vs 3090ti) at 1440p and 4K, fasterin RT and AI
(https://www.techpowerup.com/review/nvidia-geforce-rtx-4080-founders-edition/32.html)

GA104 faster than or equal toTU102 (3070ti vs 2080ti, rtx Titan not reviewed, inferred linear performance scaling based on specs still loses to 3070ti) at 1440p, even faster in RT and AI
(https://www.techpowerup.com/review/nvidia-geforce-rtx-3070-ti-founders-edition/28.html)

GA106 and GA107 are the odd due out vs TU104 and TU106, TU104 and TU106 are larger dies with better specs in every way, so rtx 3060 loses to rtx 2080 super (TU104) and rtx 2070 (TU106), they only match in RT and AI

For rtx 5060 (assuming GB207) to fail to be faster or equal to 4060ti (AD106), either:

1)GB207 stays 24SM, which is unlikely, since Blackwell is changing the TPC in a GPC from 6 to 8. AD107 has 2GPC so 24 SMs. 2 GB207 GPC means 32SMs, not to mention architectural improvements and maybe clocks.
2)Nvidia moves 5060ti to GB207 and 5070 to GB206, 5080 to GB205 and 5080ti to GB203, I doubt it but up and down movements of SKU vs chips have happened.
 

MoogleW

Member
May 1, 2022
33
23
41
None of the other chips give a good whole number with the ratio of cache to memory that 128MB gives for GB202. Either GB202 will have less relative cache or more relative cache.

If GB202 has more relative cache, then the XX203 chips and below would have same size L2 cache as rtx 40 series. I think that makes sense due to poor memory scaling, rtx 40 series basically has 4bit bus per 1MB cache. while 128MB gives 3bit per 1MB cache. The lower chips would still enjoy raw memory bandwidth upgrade from GDDR7
 

jpiniero

Lifer
Oct 1, 2010
14,410
5,120
136
1)GB207 stays 24SM, which is unlikely, since Blackwell is changing the TPC in a GPC from 6 to 8. AD107 has 2GPC so 24 SMs. 2 GB207 GPC means 32SMs, not to mention architectural improvements and maybe clocks.

I believe GB205/6/7 will have less SMs compared to the Ada counterparts.
 

MrTeal

Diamond Member
Dec 7, 2003
3,520
1,607
136
Using the targetted resolutions, lets compare the chips shall we:

AD107 is faster than GA106 (4060 vs 3060) at 1080p, faster in RT and AI

AD106 7% slower vs GA104 (4060ti vs 3070ti) at 1080p, faster in RT and AI (https://www.techpowerup.com/review/nvidia-geforce-rtx-4060-ti-founders-edition/32.html)

AD104 matches GA102 (4070ti vs 3090ti) at 1440p, faster in RT and AI (https://www.techpowerup.com/review/asus-geforce-rtx-4070-ti-tuf/32.html)

AD103 is faster than GA102 (4080 vs 3090ti) at 1440p and 4K, fasterin RT and AI
For chips, that is true. Chip numbering doesn't really matter when the cards and price points don't match up gen to gen though. We now get 94% of AD106 in the $399 4060 Ti 8G as a replacement for the 79% of GA104 in the $399 3060 Ti 8G, and performance uplift is... uninspired.
1700145761742.png
It basically splits the difference between the 3060 Ti and 3070.

We'll have to wait and see if Blackwell brings an Ampere level of performance uplift at similar cards and price points, or whether we get another Turing or Ada.
 
  • Like
Reactions: Tlh97 and KompuKare

jpiniero

Lifer
Oct 1, 2010
14,410
5,120
136
Will say that the full GB203 should be comparable or maybe even a tad faster than the 4090. What they will call it (and how close the MSRP is to the 4090's) remains to be seen.
 

MoogleW

Member
May 1, 2022
33
23
41
It
For chips, that is true. Chip numbering doesn't really matter when the cards and price points don't match up gen to gen though. We now get 94% of AD106 in the $399 4060 Ti 8G as a replacement for the 79% of GA104 in the $399 3060 Ti 8G, and performance uplift is... uninspired.
It basically splits the difference between the 3060 Ti and 3070.

We'll have to wait and see if Blackwell brings an Ampere level of performance uplift at similar cards and price points, or whether we get another Turing or Ada.
may be more expensive but the uplift will be there. That is unless you assume 5070 will take over GB206 from AD104
 

jpiniero

Lifer
Oct 1, 2010
14,410
5,120
136

Because I think the dies will be much smaller comparitively speaking... and with the Cache/IO having no scaling, that doesn't leave a lot of room left. And I expect that room to be taken by AI and possibly RT. And of course by cutting the number of SMs.

I don't think it will be slower in raster but I wouldn't expect much.
 

MoogleW

Member
May 1, 2022
33
23
41
Because I think the dies will be much smaller comparitively speaking... and with the Cache/IO having no scaling, that doesn't leave a lot of room left. And I expect that room to be taken by AI and possibly RT. And of course by cutting the number of SMs.

I don't think it will be slower in raster but I wouldn't expect much.
While I do believe GB207 will be small focusing entirely on cost, even 32SM chip will be smaller than the current 24SM chip, especially retaining same cache and memory bus width. N3E offers up to 60% logic scaling . GB207 with 32SM, 128 bit bus and 32MB L2 cache plus GDDR6 and architectural improvements would likely still be 10% smaller than AD107.

The more Nvidia focuses on RT, the more they need CUDA cores the 'raster performance' because after calculating the ray paths and hits, the GPU needs to handle things like color and textures, denoising (outside of DLSS3.5) and resource management. It would hinder their efforts to scale hardware negatively. Unless they clock 30% better again. RT hardware improvements are not mutually exclusive to gaming performance, they intersect with gaming hardware improvements.

Page 65 or slide 71 are in depth breakdowns in RT frames and the frametimes of the stages in an RT frame.
TLDR: Doubling down on improving RT performance requires more TFLOPs (which means more CUDA cores or clock very high), and more TFLOPs also helps in games.
 
Last edited:

jpiniero

Lifer
Oct 1, 2010
14,410
5,120
136
While I do believe GB207 will be small focusing entirely on cost, even 32SM chip will be smaller than the current 24SM chip, especially retaining same cache and memory bus width. N3E offers up to 60% logic scaling . GB207 with 32SM, 128 bit bus and 32MB L2 cache plus GDDR6 and architectural improvements would likely still be 10% smaller than AD107.

I think it's going to be more than 10% smaller plus you have to account for any changes they make to the SM structure.
 

SteinFG

Senior member
Dec 29, 2021
385
443
106
I think it's going to be more than 10% smaller plus you have to account for any changes they make to the SM structure.
It's probably N4 or something. Cost/Transistor is higher for better nodes, so it doesn't make sense for cost-effective part to be 3nm. It's feasable that 192-bit chips and below will use N4
 

CakeMonster

Golden Member
Nov 22, 2012
1,378
469
136
October announcement as usual? I'm assuming the """" means you think they might delay the availability somewhat. And I don't really care about the AI chips, I'm thinking of the **90 desktop part as most people here probably.

GDDR7 speculation