Discussion Nvidia Blackwell in Q1-2025

Page 179 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

jpiniero

Lifer
Oct 1, 2010
16,792
7,242
136
Well, the 4090 was bandwidth limited and the 5090 power limited. And there might be some CPU limitation in some games as well.
 

branch_suggestion

Senior member
Aug 4, 2023
821
1,782
106
I didn’t see a Rubin thread, so I’ll post this here:

It looks like Nvidia are pre-announcing what I’m guessing is the professional version of what will likely be the same die as the RTX 6090, especially since it employs GDDR7 memory. I also count what appears to be a 512-bit memory interface:
View attachment 129892
Looks like 16*12 to me.
Unless NV is trolling, but that SM layout looks different. So same SM number but way bigger.
Assuming the same number of everything as GB202 with a clock bump to ~3.1Ghz, you can achieve 30PF of sparse FP4 with 6x the FP4 throughput of Blackwell...
For reference I expect AT0 to be a bit over 5PF sparse FP4, leapfrogging GB202 and nothing more, investing the rest of the area gains into things that actually matter.
So yeah, way bigger systolic arrays. Nothing about the diagram looks like 256-288SM and there is no way it is 384, so 192 is my bet with the info we have.
We know GB202 is 12*16 as is visible here.
288 SMs? Looks like an additional row on each side vs GB202View attachment 129893
 

Saylick

Diamond Member
Sep 10, 2012
4,029
9,448
136
Looks like 16*12 to me.
Unless NV is trolling, but that SM layout looks different. So same SM number but way bigger.
Assuming the same number of everything as GB202 with a clock bump to ~3.1Ghz, you can achieve 30PF of sparse FP4 with 6x the FP4 throughput of Blackwell...
For reference I expect AT0 to be a bit over 5PF sparse FP4, leapfrogging GB202 and nothing more, investing the rest of the area gains into things that actually matter.
So yeah, way bigger systolic arrays. Nothing about the diagram looks like 256-288SM and there is no way it is 384, so 192 is my bet with the info we have.
We know GB202 is 12*16 as is visible here.
Good catch. Yeah, it looks like in terms of rows*columns at each quadrant, Blackwell was 8x6 while Rubin is 6x8. Same number of SMs, just a different aspect ratio.

Blackwell:
1757443881626.png

Rubin:
1757443903223.png
 

jpiniero

Lifer
Oct 1, 2010
16,792
7,242
136
Good catch. Yeah, it looks like in terms of rows*columns at each quadrant, Blackwell was 8x6 while Rubin is 6x8. Same number of SMs, just a different aspect ratio.

Hmm... guess Rubin is what I thought Blackwell would be... AI all the things. I am not sure how this will translate to gaming performance. The higher clocks will help for sure.

Think doubling or more of tensor cores.
 

branch_suggestion

Senior member
Aug 4, 2023
821
1,782
106
Hmm... guess Rubin is what I thought Blackwell would be... AI all the things. I am not sure how this will translate to gaming performance. The higher clocks will help for sure.
"New Paradigm!"
But yeah, generic SM PPC gains in games will be interesting to see, they are finally investing a lot of area into SM upgrades, most of it looks like GEMM bloat though.
Won't speculate further for now.
Think doubling or more of tensor cores.
I expect 2x for FP16/8, maybe 4x.
6x for FP4.
hey slight chance AMD ships a 5% bin into gaming now
88-92 WGP equiv seems certain for volume now.
Full die for limited volume or mid gen refresh is increasingly likely.
 

jpiniero

Lifer
Oct 1, 2010
16,792
7,242
136
Ooh, Videocardz says it's 128 GB so that must mean it is using 4 GB chips. That's a surprise it'd be available that soon, even in small quantities.
 
  • Like
Reactions: Mopetar

jpiniero

Lifer
Oct 1, 2010
16,792
7,242
136
Getting this back on Blackwell at least... given the lack of rumors about the launch I suspect the Blackwell Super isn't happening until next year.

Wonder if it's mainly because of the tariffs, see if the tariffs get ruled illegal.
 

basix

Senior member
Oct 4, 2024
237
473
96
Good catch. Yeah, it looks like in terms of rows*columns at each quadrant, Blackwell was 8x6 while Rubin is 6x8. Same number of SMs, just a different aspect ratio.

Blackwell:


Rubin:
Interestingly, on these Die shot visualizations Rubin CPX looks more like a gaming GPU instead of a datacenter GPU. Very close to AD102 and GB202. And in the press release they said availability by end of 2026, which would also align with the next gaming card release cycle.

So my questions would be:
  • Could it really be, that Nvidia massively increases tensor core capabilities in GR102 and re-uses it in datacenter?
    • Would also make sense for PCIe based accelerator cards and RTX Pro
  • Does this include only FP4 inferencing or all tensor core data formats?
  • Does this also apply for the smaller gaming GPUs in the Rubin line-up?
Good news for us is, that 32 Gbit GDDR7 modules are available by end of 2026. So manufacturers can choose between 16 / 24 / 32 Gbit modules and therefore, no cards with subpar VRAM capacities anymore.
 
  • Like
Reactions: Tlh97 and MoogleW

adroc_thurston

Diamond Member
Jul 2, 2023
7,015
9,734
106
Could it really be, that Nvidia massively increases tensor core capabilities in GR102 and re-uses it in datacenter?
Looks to be 2x rates for FP8 and 4x rated for FP4.
Does this include only FP4 inferencing or all tensor core data formats?
Area's pricey so only the relevant MAC arrays grow.
Does this also apply for the smaller gaming GPUs in the Rubin line-up?
Probably, NV doesn't have the hweng manpower to keep 3 SM variants afloat.
that 32 Gbit GDDR7 modules are available by end of 2026.
Pipe down, you're not getting fat DRAM piles in client.
So manufacturers can choose between 16 / 24 / 32 Gbit modules and therefore, no cards with subpar VRAM capacities anymore.
You'll have to wait a bit to get 32Gb parts in volume.
 

fastandfurious6

Senior member
Jun 1, 2024
740
941
96
AMD can't even be bothered to do Vulkan properly for RDNA4. It's not detected in LM Studio yet both Intel ARC and Nvidia cards (even the 1080 Ti) have no issue being detected by LM Studio.


any specialized senior gpu engineer can 'vibe code' that... makes you wonder why it's not there yet
 

CakeMonster

Golden Member
Nov 22, 2012
1,629
809
136
I realize the term is vague and controversial, but for gaming designs after Turing seem more or less 'iterative'. Obviously 2025 Blackwell is very different from 2018 Turing, but they've been making very gradual and not really fundamental changes feature wise from one gen to the next. Frankly the RT/DLSS performance compared to raster has hardly been increasing in Ada and Blackwell. AI/ML performance has increased more so its not that they're phoning it in, but IMO people are right to expect something more gaming wise with the 6000 series.

On the VRAM, you don't need to be a YT grifter to assume that there will be an increase from 8/12/16 to 12/18/24 on the low/mid range, although for the 6090 I don't dare hope for 32-->48GB.
 
  • Like
Reactions: MrMPFR

jpiniero

Lifer
Oct 1, 2010
16,792
7,242
136
On the VRAM, you don't need to be a YT grifter to assume that there will be an increase from 8/12/16 to 12/18/24 on the low/mid range, although for the 6090 I don't dare hope for 32-->48GB.

I believe the 7 die will be cut to 96-bit, so 9 GB for the 6060. There'll be plenty for the complainers to complain about.
 

branch_suggestion

Senior member
Aug 4, 2023
821
1,782
106
I realize the term is vague and controversial, but for gaming designs after Turing seem more or less 'iterative'. Obviously 2025 Blackwell is very different from 2018 Turing, but they've been making very gradual and not really fundamental changes feature wise from one gen to the next. Frankly the RT/DLSS performance compared to raster has hardly been increasing in Ada and Blackwell. AI/ML performance has increased more so its not that they're phoning it in, but IMO people are right to expect something more gaming wise with the 6000 series.

On the VRAM, you don't need to be a YT grifter to assume that there will be an increase from 8/12/16 to 12/18/24 on the low/mid range, although for the 6090 I don't dare hope for 32-->48GB.
The regfile per SMSP and the high level SM layout as a whole is unchanged since Maxwell.
Maxwell is NV's Conroe, and to avoid the same fate as Intel they do need to move away with prudence before it is too late.
 
  • Like
Reactions: MrMPFR

Mopetar

Diamond Member
Jan 31, 2011
8,484
7,716
136
The price whining is gonna be fun. Gotta love nFlation

GPUs so powerful they can interpolate additional 0's into the price tag.

I hope they limit the 4 GB memory module cards to the professional market if only so the gaming cards don't get snatched up by people wanting something with which to train their LLM on the cheap.
 

jpiniero

Lifer
Oct 1, 2010
16,792
7,242
136
I hope they limit the 4 GB memory module cards to the professional market if only so the gaming cards don't get snatched up by people wanting something with which to train their LLM on the cheap.

Oh I would expect it to be like 3 GB was with Blackwell at best - maybe the top mobile part will get it. Then we will see about Rubin Super.
 

ToTTenTranz

Senior member
Feb 4, 2021
686
1,142
136
GPUs so powerful they can interpolate additional 0's into the price tag.

I hope they limit the 4 GB memory module cards to the professional market if only so the gaming cards don't get snatched up by people wanting something with which to train their LLM on the cheap.
No one is going to train LLMs on mid-range GPUs with 32GB VRAM.


Unless you mean running LLMs and AI agents on those GPUs. In which case yes, they will.
 

reaperrr3

Member
May 31, 2024
130
374
96
I realize the term is vague and controversial, but for gaming designs after Turing seem more or less 'iterative'. Obviously 2025 Blackwell is very different from 2018 Turing, but they've been making very gradual and not really fundamental changes feature wise from one gen to the next. Frankly the RT/DLSS performance compared to raster has hardly been increasing in Ada and Blackwell. AI/ML performance has increased more so its not that they're phoning it in, but IMO people are right to expect something more gaming wise with the 6000 series.
I wouldn't expect too much honestly, at least in terms of raster IPC increase per SM.

Most of the area/power savings from the newer process will in my opinion be invested in AI/ML (tensor), higher clocks, PT/RT, modestly more SM for some client GPUs, in about that order of priority.

GR102 - 192 SM
GR103/4 - 96 SM (+12 vs. BW)
GR104/5 - 60 SM (+10)
GR106 - 36 SM
GR107 - 24 SM (+4)

~5-10% more raster IPC per SM
~10-30% more RT/PT IPC per SM
~15-20% higher clocks
would be my guess.

On the VRAM, you don't need to be a YT grifter to assume that there will be an increase from 8/12/16 to 12/18/24 on the low/mid range, although for the 6090 I don't dare hope for 32-->48GB.
I'm genuinely curious what NV will do with the 5060 Ti successor.
I'd say continuing to use clamshell for 16GB makes the most sense (assuming the big 4GB/32Gb chips won't reach enough volume/competitive per-GB prices anytime soon), but knowing NV, I wouldn't put a temporary downgrade to 12GB past them.

I believe the 7 die will be cut to 96-bit, so 9 GB for the 6060. There'll be plenty for the complainers to complain about.
Nah.
I mean yeah, they might do such an SKU, but I don't see them calling that 6060.

6090 - 170-176 SM, 512bit, 48 GB @ 36 Gbps
6080 - 92 SM, 256bit, 24 GB @ 36 Gbps
6070 Ti - 76 SM, 192bit, 18 GB @ 36 Gbps
6070 - 56 SM, 192bit, 18 GB @ 32 Gbps
6060 Ti - 36 SM, 128bit, 16 GB (clamshell) @ 32 Gbps
6060 - 32 SM, 128bit, 12 GB @ 28-32 Gbps
6050 Ti - 24 SM, 128bit, 8-12 GB @ 28 Gbps
6050 - 20 SM, 128/96bit, 8/9 GB @ 28 Gbps

There, full initial Rubin desktop line-up (disclaimer: 100% pure speculation).

Oh I would expect it to be like 3 GB was with Blackwell at best - maybe the top mobile part will get it. Then we will see about Rubin Super.
I wouldn't be so pessimistic about it, in part because NV absolutely wants to keep mem interfaces as narrow as they can, 8GB (and 12GB for 500+$ range) is becoming unpopular with desktop consumers and clamshell isn't that popular with AIBs.

I expect a good chunk of desktop Rubin to get those 3GB chips.

Same for AMD, AT3/4 won't use G7 and AT2 with its 192bit interface surely will have neither 12 nor 24GB as the standard config, which leaves only 1 logical option for it.
And 24GB looks insufficient for that alleged 384bit gaming AT0, too.

I think the first-gen 2GB G7 modules will get phased out rather quickly (in terms of relative volume compared to 3GB chips).