Discussion Nvidia Blackwell in Q1-2025

jpiniero · Sep 9, 2025

Well, the 4090 was bandwidth limited and the 5090 power limited. And there might be some CPU limitation in some games as well.

adroc_thurston · Sep 9, 2025

jpiniero said:
Well, the 4090 was bandwidth limited

nope.

jpiniero said:
and the 5090 power limited

Nope.

branch_suggestion · Sep 9, 2025

Saylick said:
I didn’t see a Rubin thread, so I’ll post this here:

NVIDIA Rubin CPX Accelerates Inference Performance and Efficiency for 1M+ Token Context Workloads | NVIDIA Technical Blog

Inference has emerged as the new frontier of complexity in AI. Modern models are evolving into agentic systems capable of multi-step reasoning, persistent memory, and long-horizon context—enabling…

developer.nvidia.com

It looks like Nvidia are pre-announcing what I’m guessing is the professional version of what will likely be the same die as the RTX 6090, especially since it employs GDDR7 memory. I also count what appears to be a 512-bit memory interface:
View attachment 129892

Looks like 16*12 to me.
Unless NV is trolling, but that SM layout looks different. So same SM number but way bigger.
Assuming the same number of everything as GB202 with a clock bump to ~3.1Ghz, you can achieve 30PF of sparse FP4 with 6x the FP4 throughput of Blackwell...
For reference I expect AT0 to be a bit over 5PF sparse FP4, leapfrogging GB202 and nothing more, investing the rest of the area gains into things that actually matter.
So yeah, way bigger systolic arrays. Nothing about the diagram looks like 256-288SM and there is no way it is 384, so 192 is my bet with the info we have.
We know GB202 is 12*16 as is visible here.

Kepler_L2 said:
288 SMs? Looks like an additional row on each side vs GB202View attachment 129893

Saylick · Sep 9, 2025

branch_suggestion said:
Looks like 16*12 to me.
Unless NV is trolling, but that SM layout looks different. So same SM number but way bigger.
Assuming the same number of everything as GB202 with a clock bump to ~3.1Ghz, you can achieve 30PF of sparse FP4 with 6x the FP4 throughput of Blackwell...
For reference I expect AT0 to be a bit over 5PF sparse FP4, leapfrogging GB202 and nothing more, investing the rest of the area gains into things that actually matter.
So yeah, way bigger systolic arrays. Nothing about the diagram looks like 256-288SM and there is no way it is 384, so 192 is my bet with the info we have.
We know GB202 is 12*16 as is visible here.

Good catch. Yeah, it looks like in terms of rows*columns at each quadrant, Blackwell was 8x6 while Rubin is 6x8. Same number of SMs, just a different aspect ratio.

Blackwell:

Rubin:

adroc_thurston · Sep 9, 2025

Oh we're getting a 96/192 core slapfight.
Fun.
Very samurai duel here.

branch_suggestion · Sep 9, 2025

adroc_thurston said:
Oh we're getting a 96/192 core slapfight.
Fun.
Very samurai duel here.

AMD could have the FP32 TFLOPS lead for the first time since Fiji.
This could be good.

jpiniero · Sep 9, 2025

Saylick said:
Good catch. Yeah, it looks like in terms of rows*columns at each quadrant, Blackwell was 8x6 while Rubin is 6x8. Same number of SMs, just a different aspect ratio.

Hmm... guess Rubin is what I thought Blackwell would be... AI all the things. I am not sure how this will translate to gaming performance. The higher clocks will help for sure.

Think doubling or more of tensor cores.

adroc_thurston · Sep 9, 2025

branch_suggestion said:
AMD could have the FP32 TFLOPS lead for the first time since Fiji.
This could be good.

hey slight chance AMD ships a 5% bin into gaming now

branch_suggestion · Sep 9, 2025

jpiniero said:
Hmm... guess Rubin is what I thought Blackwell would be... AI all the things. I am not sure how this will translate to gaming performance. The higher clocks will help for sure.

"New Paradigm!"
But yeah, generic SM PPC gains in games will be interesting to see, they are finally investing a lot of area into SM upgrades, most of it looks like GEMM bloat though.
Won't speculate further for now.

jpiniero said:
Think doubling or more of tensor cores.

I expect 2x for FP16/8, maybe 4x.
6x for FP4.

adroc_thurston said:
hey slight chance AMD ships a 5% bin into gaming now

88-92 WGP equiv seems certain for volume now.
Full die for limited volume or mid gen refresh is increasingly likely.

adroc_thurston · Sep 9, 2025

branch_suggestion said:
88-92 WGP equiv seems certain for volume now.

Oh no.

branch_suggestion said:
Full die for limited volume or mid gen refresh is increasingly likely.

Lmao not happening.

jpiniero · Sep 9, 2025

Ooh, Videocardz says it's 128 GB so that must mean it is using 4 GB chips. That's a surprise it'd be available that soon, even in small quantities.

jpiniero · Sep 9, 2025

Getting this back on Blackwell at least... given the lack of rumors about the launch I suspect the Blackwell Super isn't happening until next year.

Wonder if it's mainly because of the tariffs, see if the tariffs get ruled illegal.

branch_suggestion · Sep 9, 2025

jpiniero said:
Ooh, Videocardz says it's 128 GB so that must mean it is using 4 GB chips. That's a surprise it'd be available that soon, even in small quantities.

Oh the AT0 leak already confirmed 4GB modules are coming soon.
But they will not be in client cards until prices and supply are better, that will be a while.

adroc_thurston · Sep 9, 2025

jpiniero said:
Wonder if it's mainly because of the tariffs

amerimuttia isn't the world.

gdansk · Sep 9, 2025

jpiniero said:
Ooh, Videocardz says it's 128 GB so that must mean it is using 4 GB chips. That's a surprise it'd be available that soon, even in small quantities.

Micron had their "24gb+" 36gbps GDDR7 chip on their roadmap for 2nd half of 2026. So it is all lining up.

basix · Sep 10, 2025

Saylick said:
Good catch. Yeah, it looks like in terms of rows*columns at each quadrant, Blackwell was 8x6 while Rubin is 6x8. Same number of SMs, just a different aspect ratio.

Blackwell:

Rubin:

Interestingly, on these Die shot visualizations Rubin CPX looks more like a gaming GPU instead of a datacenter GPU. Very close to AD102 and GB202. And in the press release they said availability by end of 2026, which would also align with the next gaming card release cycle.

So my questions would be:

Could it really be, that Nvidia massively increases tensor core capabilities in GR102 and re-uses it in datacenter?
- Would also make sense for PCIe based accelerator cards and RTX Pro
Does this include only FP4 inferencing or all tensor core data formats?
Does this also apply for the smaller gaming GPUs in the Rubin line-up?

Good news for us is, that 32 Gbit GDDR7 modules are available by end of 2026. So manufacturers can choose between 16 / 24 / 32 Gbit modules and therefore, no cards with subpar VRAM capacities anymore.

adroc_thurston · Sep 10, 2025

basix said:
Could it really be, that Nvidia massively increases tensor core capabilities in GR102 and re-uses it in datacenter?

Looks to be 2x rates for FP8 and 4x rated for FP4.

basix said:
Does this include only FP4 inferencing or all tensor core data formats?

Area's pricey so only the relevant MAC arrays grow.

basix said:
Does this also apply for the smaller gaming GPUs in the Rubin line-up?

Probably, NV doesn't have the hweng manpower to keep 3 SM variants afloat.

basix said:
that 32 Gbit GDDR7 modules are available by end of 2026.

Pipe down, you're not getting fat DRAM piles in client.

basix said:
So manufacturers can choose between 16 / 24 / 32 Gbit modules and therefore, no cards with subpar VRAM capacities anymore.

You'll have to wait a bit to get 32Gb parts in volume.

fastandfurious6 · Sep 10, 2025

igor_kavinski said:
AMD can't even be bothered to do Vulkan properly for RDNA4. It's not detected in LM Studio yet both Intel ARC and Nvidia cards (even the 1080 Ti) have no issue being detected by LM Studio.

any specialized senior gpu engineer can 'vibe code' that... makes you wonder why it's not there yet

CakeMonster · Sep 10, 2025

I realize the term is vague and controversial, but for gaming designs after Turing seem more or less 'iterative'. Obviously 2025 Blackwell is very different from 2018 Turing, but they've been making very gradual and not really fundamental changes feature wise from one gen to the next. Frankly the RT/DLSS performance compared to raster has hardly been increasing in Ada and Blackwell. AI/ML performance has increased more so its not that they're phoning it in, but IMO people are right to expect something more gaming wise with the 6000 series.

On the VRAM, you don't need to be a YT grifter to assume that there will be an increase from 8/12/16 to 12/18/24 on the low/mid range, although for the 6090 I don't dare hope for 32-->48GB.

jpiniero · Sep 10, 2025

CakeMonster said:
On the VRAM, you don't need to be a YT grifter to assume that there will be an increase from 8/12/16 to 12/18/24 on the low/mid range, although for the 6090 I don't dare hope for 32-->48GB.

I believe the 7 die will be cut to 96-bit, so 9 GB for the 6060. There'll be plenty for the complainers to complain about.

branch_suggestion · Sep 10, 2025

CakeMonster said:
I realize the term is vague and controversial, but for gaming designs after Turing seem more or less 'iterative'. Obviously 2025 Blackwell is very different from 2018 Turing, but they've been making very gradual and not really fundamental changes feature wise from one gen to the next. Frankly the RT/DLSS performance compared to raster has hardly been increasing in Ada and Blackwell. AI/ML performance has increased more so its not that they're phoning it in, but IMO people are right to expect something more gaming wise with the 6000 series.

On the VRAM, you don't need to be a YT grifter to assume that there will be an increase from 8/12/16 to 12/18/24 on the low/mid range, although for the 6090 I don't dare hope for 32-->48GB.

The regfile per SMSP and the high level SM layout as a whole is unchanged since Maxwell.
Maxwell is NV's Conroe, and to avoid the same fate as Intel they do need to move away with prudence before it is too late.

Mopetar · Sep 10, 2025

jpiniero said:
The price whining is gonna be fun. Gotta love nFlation

GPUs so powerful they can interpolate additional 0's into the price tag.

I hope they limit the 4 GB memory module cards to the professional market if only so the gaming cards don't get snatched up by people wanting something with which to train their LLM on the cheap.

jpiniero · Sep 10, 2025

Mopetar said:
I hope they limit the 4 GB memory module cards to the professional market if only so the gaming cards don't get snatched up by people wanting something with which to train their LLM on the cheap.

Oh I would expect it to be like 3 GB was with Blackwell at best - maybe the top mobile part will get it. Then we will see about Rubin Super.

ToTTenTranz · Sep 10, 2025

Mopetar said:
GPUs so powerful they can interpolate additional 0's into the price tag.

I hope they limit the 4 GB memory module cards to the professional market if only so the gaming cards don't get snatched up by people wanting something with which to train their LLM on the cheap.

No one is going to train LLMs on mid-range GPUs with 32GB VRAM.

Unless you mean running LLMs and AI agents on those GPUs. In which case yes, they will.

reaperrr3 · Sep 11, 2025

CakeMonster said:
I realize the term is vague and controversial, but for gaming designs after Turing seem more or less 'iterative'. Obviously 2025 Blackwell is very different from 2018 Turing, but they've been making very gradual and not really fundamental changes feature wise from one gen to the next. Frankly the RT/DLSS performance compared to raster has hardly been increasing in Ada and Blackwell. AI/ML performance has increased more so its not that they're phoning it in, but IMO people are right to expect something more gaming wise with the 6000 series.

I wouldn't expect too much honestly, at least in terms of raster IPC increase per SM.

Most of the area/power savings from the newer process will in my opinion be invested in AI/ML (tensor), higher clocks, PT/RT, modestly more SM for some client GPUs, in about that order of priority.

GR102 - 192 SM
GR103/4 - 96 SM (+12 vs. BW)
GR104/5 - 60 SM (+10)
GR106 - 36 SM
GR107 - 24 SM (+4)

~5-10% more raster IPC per SM
~10-30% more RT/PT IPC per SM
~15-20% higher clocks
would be my guess.

CakeMonster said:
On the VRAM, you don't need to be a YT grifter to assume that there will be an increase from 8/12/16 to 12/18/24 on the low/mid range, although for the 6090 I don't dare hope for 32-->48GB.

I'm genuinely curious what NV will do with the 5060 Ti successor.
I'd say continuing to use clamshell for 16GB makes the most sense (assuming the big 4GB/32Gb chips won't reach enough volume/competitive per-GB prices anytime soon), but knowing NV, I wouldn't put a temporary downgrade to 12GB past them.

jpiniero said:
I believe the 7 die will be cut to 96-bit, so 9 GB for the 6060. There'll be plenty for the complainers to complain about.

Nah.
I mean yeah, they might do such an SKU, but I don't see them calling that 6060.

6090 - 170-176 SM, 512bit, 48 GB @ 36 Gbps
6080 - 92 SM, 256bit, 24 GB @ 36 Gbps
6070 Ti - 76 SM, 192bit, 18 GB @ 36 Gbps
6070 - 56 SM, 192bit, 18 GB @ 32 Gbps
6060 Ti - 36 SM, 128bit, 16 GB (clamshell) @ 32 Gbps
6060 - 32 SM, 128bit, 12 GB @ 28-32 Gbps
6050 Ti - 24 SM, 128bit, 8-12 GB @ 28 Gbps
6050 - 20 SM, 128/96bit, 8/9 GB @ 28 Gbps

There, full initial Rubin desktop line-up (disclaimer: 100% pure speculation).

jpiniero said:
Oh I would expect it to be like 3 GB was with Blackwell at best - maybe the top mobile part will get it. Then we will see about Rubin Super.

I wouldn't be so pessimistic about it, in part because NV absolutely wants to keep mem interfaces as narrow as they can, 8GB (and 12GB for 500+$ range) is becoming unpopular with desktop consumers and clamshell isn't that popular with AIBs.

I expect a good chunk of desktop Rubin to get those 3GB chips.

Same for AMD, AT3/4 won't use G7 and AT2 with its 192bit interface surely will have neither 12 nor 24GB as the standard config, which leaves only 1 logical option for it.
And 24GB looks insufficient for that alleged 384bit gaming AT0, too.

I think the first-gen 2GB G7 modules will get phased out rather quickly (in terms of relative volume compared to 3GB chips).

Discussion Nvidia Blackwell in Q1-2025

Lifer

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Lifer

Diamond Member

Senior member

Diamond Member

Lifer

Lifer

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Golden Member

Lifer

Senior member

Diamond Member

Lifer

Senior member

Member