Discussion Nvidia Blackwell in Q4-2024 ?

Page 16 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

ToTTenTranz

Member
Feb 4, 2021
86
132
76
Stealing lunch money from NPUs?
No, like always their aim is to establish a perception of premium experience if you buy a nvidia graphics card. This time based on being able to run LLMs locally.

Though I'm not sure how they're going to convince people their anemic VRAM amounts are adequate to run LLMs.

Laptop SoCs can often be paired with 32GB LPDDR nowadays, but getting lots of VRAM on consumer GPUs isn't compatible with nvidia's usual planned obsolescence.

The mental gymnastics for convincing people that 8-12GB is good enough for LLMs is going to be interesting to watch.
 

Mahboi

Senior member
Apr 4, 2024
658
1,079
91
No, like always their aim is to establish a perception of premium experience if you buy a nvidia graphics card. This time based on being able to run LLMs locally.

Though I'm not sure how they're going to convince people their anemic VRAM amounts are adequate to run LLMs.

Laptop SoCs can often be paired with 32GB LPDDR nowadays, but getting lots of VRAM on consumer GPUs isn't compatible with nvidia's usual planned obsolescence.

The mental gymnastics for convincing people that 8-12GB is good enough for LLMs is going to be interesting to watch.
I'm sure NV will come up with a "LLM hardware compression" that'll somehow compress 5% and they'll present that as revolutionary.

Also this came out https://wccftech.com/amd-instinct-m...memory-up-to-4x-speedup-versus-discrete-gpus/ as a "hi" to Jensen. Also explains why NV entirely dropped out of HPC with Blackwell's announced FP64 numbers. NV doesn't like being number 2.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,248
5,247
136
No, like always their aim is to establish a perception of premium experience if you buy a nvidia graphics card. This time based on being able to run LLMs locally.

Though I'm not sure how they're going to convince people their anemic VRAM amounts are adequate to run LLMs.

Laptop SoCs can often be paired with 32GB LPDDR nowadays, but getting lots of VRAM on consumer GPUs isn't compatible with nvidia's usual planned obsolescence.

The mental gymnastics for convincing people that 8-12GB is good enough for LLMs is going to be interesting to watch.

But Laptop SoC are likely too slow to very large LLMs. I don't see any push to run LLMs on SoCs...

Consumer applications of AI will likely involve models optimized for 8GB.

Hobbyist stuff can expand as much as you want, but if you want to run those massive models, you likely aren't going to choose an SoC.
 
Jul 27, 2020
17,155
11,022
106
But Laptop SoC are likely too slow to very large LLMs. I don't see any push to run LLMs on SoCs...
Really? https://www.tomshardware.com/tech-i...wers-more-than-500-ai-models-the-company-says

Consumer applications of AI will likely involve models optimized for 8GB.
Just come out and say that even a 3050 8GB is better for AI than an SoC's NPU.

Hobbyist stuff can expand as much as you want, but if you want to run those massive models, you likely aren't going to choose an SoC.
Intel/AMD are working hard to prove your sire Jensen wrong.
 
  • Like
Reactions: MoogleW

Mopetar

Diamond Member
Jan 31, 2011
7,961
6,312
136
The mental gymnastics for convincing people that 8-12GB is good enough for LLMs is going to be interesting to watch.

Pfft. 8 GB is more than fine for (L)ittle (L)anguage (M)odels, which shouldn't be confused with those other pesky LLMs people have been talking about.
 

ToTTenTranz

Member
Feb 4, 2021
86
132
76
But Laptop SoC are likely too slow to very large LLMs. I don't see any push to run LLMs on SoCs...
They're not. Microsoft / OpenAI know what they're doing when they're asking for 45TOPs for compliance.
8GB won't be enough even for the tiny 3B models, but a SoC with access to 32GB RAM can do 7B models quite well.

Token/s performance is easier to optimize than memory footprint.
 

dr1337

Senior member
May 25, 2020
357
606
136
Does a LLM even have that many tokens per second with 40TOPs? From what I understand A card like the 3060 already has well over 200 AI TOPs and is still quite slow for LLMs. I thought everything in mobile AI was about small and highly tuned models.
 

MoogleW

Member
May 1, 2022
62
29
61

New rumor alleging that AD203 based rtx 5080 expected to launch before rtx 5090, contradicting the rtx 50 is hurried and only rtx 5090 will come in 2024 rumors.

Since only rtx5080 is mentioned, then assumption is that rtx 5070(ti) based on GB205 will launch later. Possibly same timframe as AD104 based rtx4070ti had, so maybe a January 2025 window.

If January 2025 for GB205, then it (GB205 chip named 5070 or 5070ti) may get a mention at the GTC architecture unveiling with first party benchmarks and other marketing
 
Last edited:

Mopetar

Diamond Member
Jan 31, 2011
7,961
6,312
136
Makes sense since AMD doesn't have a big GPU to compete against the 5080, so NVidia has no reason not to hold back and put all of their big dies into the more profitable datacenter or professional products.
 

xpea

Senior member
Feb 14, 2014
447
141
116

New rumor alleging that AD203 based rtx 5080 expected to launch before rtx 5090, contradicting the rtx 50 is hurried and only rtx 5090 will come in 2024 rumors.

Since only rtx5080 is mentioned, then assumption is that rtx 5070(ti) based on GB205 will launch later. Possibly same timframe as AD104 based rtx4070ti had, so maybe a January 2025 window.

If January 2025 for GB205, then it (GB205 chip named 5070 or 5070ti) may get a mention at the GTC architecture unveiling with first party benchmarks and other marketing
5080 and 5090 will be announced at same time. Availability will be separated by few weeks...
 

jpiniero

Lifer
Oct 1, 2010
14,738
5,368
136
There is a good argument against doing that... given that the 5080 is unlikely to be that much better in games than the 4090 because it should be too bandwidth limited. It may however have way better TOPs (at least on paper) so I suppose... but it seems like a better strategy to just release the 5090 and let the 4090 supply thin out before releasing the 5080.
 

SmokSmog

Member
Oct 2, 2020
59
98
61

Aapje

Golden Member
Mar 21, 2022
1,451
1,999
106
There is a good argument against doing that... given that the 5080 is unlikely to be that much better in games than the 4090 because it should be too bandwidth limited. It may however have way better TOPs (at least on paper) so I suppose... but it seems like a better strategy to just release the 5090 and let the 4090 supply thin out before releasing the 5080.

I think that it makes perfect sense to announce them at the same time. The 5090 will allow them to boast about the performance improvement, but the price is surely getting another increase with the 512 bit bus and 32 GB. So the 5080 is needed for Nvidia to defend against angry comments about the price. I expect the 5080 to stay at $999 exactly so they can argue that they only increased the 5090 price because they had to.

And I expect the 5080 to be at 4090D level so they can sell it in China. Announcing the 5080 at the same time as the 5090 prevents Nvidia from angering the Chinese, who will not be able to get the 5090 (legally). Then the message from Nvidia to China is: you can now get 4090D level performance for $999 instead of the $1800 that they have to pay for the actual 4090D.
 

jpiniero

Lifer
Oct 1, 2010
14,738
5,368
136
If anything, I'd expect the 5080's tensor performance to be way better than the 4090. So yes, under the current sanctions it won't be available in China.

Edit: I am expecting the die size to be very big, part from adding the additional SMs but also because of the inevitable additional tensor cores. It'll be closer to AD102 than AD103 and GB202 being at the recticle limit.
 
Last edited:
  • Like
Reactions: gdansk

SteinFG

Senior member
Dec 29, 2021
498
590
106
Kopite says the three PCBs are IO, PCIe connector, and the main GPU. PCIe connector is just that, a connector 1716485932337.png

What are they using to connect those PCBs lol, there's 0.5 Terabits/s of data on that PCIe bus. I guess it's a flex cable of some kind.
 
  • Like
Reactions: Mopetar

MrTeal

Diamond Member
Dec 7, 2003
3,580
1,725
136
Kopite says the three PCBs are IO, PCIe connector, and the main GPU. PCIe connector is just that, a connector View attachment 99529

What are they using to connect those PCBs lol, there's 0.5 Terabits/s of data on that PCIe bus. I guess it's a flex cable of some kind.
It wouldn't have to be anything different than a really short PCIe riser, really.