Discussion Nvidia Blackwell in Q4-2024 ?

Page 16 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

ToTTenTranz

Member
Feb 4, 2021
52
98
61
Stealing lunch money from NPUs?
No, like always their aim is to establish a perception of premium experience if you buy a nvidia graphics card. This time based on being able to run LLMs locally.

Though I'm not sure how they're going to convince people their anemic VRAM amounts are adequate to run LLMs.

Laptop SoCs can often be paired with 32GB LPDDR nowadays, but getting lots of VRAM on consumer GPUs isn't compatible with nvidia's usual planned obsolescence.

The mental gymnastics for convincing people that 8-12GB is good enough for LLMs is going to be interesting to watch.
 

Mahboi

Senior member
Apr 4, 2024
422
713
91
No, like always their aim is to establish a perception of premium experience if you buy a nvidia graphics card. This time based on being able to run LLMs locally.

Though I'm not sure how they're going to convince people their anemic VRAM amounts are adequate to run LLMs.

Laptop SoCs can often be paired with 32GB LPDDR nowadays, but getting lots of VRAM on consumer GPUs isn't compatible with nvidia's usual planned obsolescence.

The mental gymnastics for convincing people that 8-12GB is good enough for LLMs is going to be interesting to watch.
I'm sure NV will come up with a "LLM hardware compression" that'll somehow compress 5% and they'll present that as revolutionary.

Also this came out https://wccftech.com/amd-instinct-m...memory-up-to-4x-speedup-versus-discrete-gpus/ as a "hi" to Jensen. Also explains why NV entirely dropped out of HPC with Blackwell's announced FP64 numbers. NV doesn't like being number 2.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,231
5,239
136
No, like always their aim is to establish a perception of premium experience if you buy a nvidia graphics card. This time based on being able to run LLMs locally.

Though I'm not sure how they're going to convince people their anemic VRAM amounts are adequate to run LLMs.

Laptop SoCs can often be paired with 32GB LPDDR nowadays, but getting lots of VRAM on consumer GPUs isn't compatible with nvidia's usual planned obsolescence.

The mental gymnastics for convincing people that 8-12GB is good enough for LLMs is going to be interesting to watch.

But Laptop SoC are likely too slow to very large LLMs. I don't see any push to run LLMs on SoCs...

Consumer applications of AI will likely involve models optimized for 8GB.

Hobbyist stuff can expand as much as you want, but if you want to run those massive models, you likely aren't going to choose an SoC.
 
Jul 27, 2020
16,609
10,599
106
But Laptop SoC are likely too slow to very large LLMs. I don't see any push to run LLMs on SoCs...
Really? https://www.tomshardware.com/tech-i...wers-more-than-500-ai-models-the-company-says

Consumer applications of AI will likely involve models optimized for 8GB.
Just come out and say that even a 3050 8GB is better for AI than an SoC's NPU.

Hobbyist stuff can expand as much as you want, but if you want to run those massive models, you likely aren't going to choose an SoC.
Intel/AMD are working hard to prove your sire Jensen wrong.
 

Mopetar

Diamond Member
Jan 31, 2011
7,911
6,178
136
The mental gymnastics for convincing people that 8-12GB is good enough for LLMs is going to be interesting to watch.

Pfft. 8 GB is more than fine for (L)ittle (L)anguage (M)odels, which shouldn't be confused with those other pesky LLMs people have been talking about.
 

ToTTenTranz

Member
Feb 4, 2021
52
98
61
But Laptop SoC are likely too slow to very large LLMs. I don't see any push to run LLMs on SoCs...
They're not. Microsoft / OpenAI know what they're doing when they're asking for 45TOPs for compliance.
8GB won't be enough even for the tiny 3B models, but a SoC with access to 32GB RAM can do 7B models quite well.

Token/s performance is easier to optimize than memory footprint.
 

dr1337

Senior member
May 25, 2020
344
598
106
Does a LLM even have that many tokens per second with 40TOPs? From what I understand A card like the 3060 already has well over 200 AI TOPs and is still quite slow for LLMs. I thought everything in mobile AI was about small and highly tuned models.
 

MoogleW

Member
May 1, 2022
57
28
61

New rumor alleging that AD203 based rtx 5080 expected to launch before rtx 5090, contradicting the rtx 50 is hurried and only rtx 5090 will come in 2024 rumors.

Since only rtx5080 is mentioned, then assumption is that rtx 5070(ti) based on GB205 will launch later. Possibly same timframe as AD104 based rtx4070ti had, so maybe a January 2025 window.

If January 2025 for GB205, then it (GB205 chip named 5070 or 5070ti) may get a mention at the GTC architecture unveiling with first party benchmarks and other marketing
 
Last edited:

Mopetar

Diamond Member
Jan 31, 2011
7,911
6,178
136
Makes sense since AMD doesn't have a big GPU to compete against the 5080, so NVidia has no reason not to hold back and put all of their big dies into the more profitable datacenter or professional products.
 

xpea

Senior member
Feb 14, 2014
430
135
116

New rumor alleging that AD203 based rtx 5080 expected to launch before rtx 5090, contradicting the rtx 50 is hurried and only rtx 5090 will come in 2024 rumors.

Since only rtx5080 is mentioned, then assumption is that rtx 5070(ti) based on GB205 will launch later. Possibly same timframe as AD104 based rtx4070ti had, so maybe a January 2025 window.

If January 2025 for GB205, then it (GB205 chip named 5070 or 5070ti) may get a mention at the GTC architecture unveiling with first party benchmarks and other marketing
5080 and 5090 will be announced at same time. Availability will be separated by few weeks...
 

jpiniero

Lifer
Oct 1, 2010
14,659
5,282
136
There is a good argument against doing that... given that the 5080 is unlikely to be that much better in games than the 4090 because it should be too bandwidth limited. It may however have way better TOPs (at least on paper) so I suppose... but it seems like a better strategy to just release the 5090 and let the 4090 supply thin out before releasing the 5080.