Discussion Qualcomm Snapdragon Thread

Page 193 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Raqia

Member
Nov 19, 2008
122
84
101
Those want membw that LPDDR can never offer.

DC GPUs have their FF GFX h/w long excavated.
For bulk inferencing where multiple models may be resident, a MUCH bigger pool of LPDDR per card will mean lower costs, better availability of components, and less memory thrashing, cooling requirements and failure rates over the life time of ownership.

A focused inferencing design is also not necessarily a workable, efficient GPU design and vice versa. There are just differences in the scale of data movement and hence cache architecture, and the relative die space dedicated to processing lower precision datatypes / higher order math ops & data structures. AMD and nVidia bolted some lower precision datatypes & higher order tensor units to their GPU slices and HBM to their package to address the market ASAP, but there's surely a more efficient architecture with much less data movement if you are targeting inferencing only.

It's a nascent space with advances in model architecture likely to continuously inform hardware design, but there won't be as much return on investment from using hyper-expensive and power hungry GPUs for inferencing loads over more dedicated solutions. We'll see what the "near-memory" architecture of the AI250 brings to the table --but even more radical architectures like baking useful models into photonics might be needed for the scale of inferencing investments being discussed to be sustainable power-wise.

It's surely a bubble, but breaking out dedicated inferencing from training is at least moving hardware in the right direction.
 

mvprod123

Senior member
Jun 22, 2024
472
503
96
Four years have passed since the acquisition of Nuvia, and there is still no Oryon-based server CPU. Wasn't that the primary desire of Nuvia's executives?
 

adroc_thurston

Diamond Member
Jul 2, 2023
7,365
10,112
106
For bulk inferencing where multiple models may be resident, a MUCH bigger pool of LPDDR per card will mean lower costs, better availability of components, and less memory thrashing, cooling requirements and failure rates over the life time of ownership.
Tons of words to say tokens/sec ratio will be bad.
BTW, have you seen the picture of Mi450, with HBM, and the base die of each HBM stack also having a memory controller for 2x LPDDR channels? Looks insane...
Looks normal.
 

coercitiv

Diamond Member
Jan 24, 2014
7,394
17,539
136
Wake me up when y'all want to talk about inferencing
Wake me up when Qualcomm wants to talk about inference. The joke I posted above only works because QC did not offer any meaningful metric that presents their hardware as efficient. No performance numbers, no power usage attached to that performance.

Buncha trolls here :p
SemiAnalysis make their living out of taking both the industry in general and AI in particular VERY seriously. So sleep on this: if QC had a truly valuable product in their hands, we'd be swimming in benchmarks and efficiency claims.
 

Raqia

Member
Nov 19, 2008
122
84
101
Wake me up when Qualcomm wants to talk about inference. The joke I posted above only works because QC did not offer any meaningful metric that presents their hardware as efficient. No performance numbers, no power usage attached to that performance.


SemiAnalysis make their living out of taking both the industry in general and AI in particular VERY seriously. So sleep on this: if QC had a truly valuable product in their hands, we'd be swimming in benchmarks and efficiency claims.
I'm not convinced by someone just because he takes something seriously --you still have to know what you're talking about too and not have some narrative agenda in mind before writing. Not sure I can take an appeal to authority or some claim of false causation regarding the lack of benchmarks seriously either.

They didn't just enter the AI market, they have had success in DCs with their AI100's released in 2019 and have dominated mobile AI both hardware and software wise for a decade (until Mediatek's strong entry this year...) They pioneered model quantization techniques and have a unique perception stack that uses gauge equivalent CNN's. Along with nVidia's DLSS, their modem \ RF systems are just a few pieces of consumer tech which use non-trivial neural nets fruitfully wholly at the client level.

Given their low power pedigree, I expect the parts to be very efficient, and their direct addressing of memory movement power overhead with a near compute architecture looks right on the money to me. I expect the parts to be a popular choice for inferencing that can improve compute density per rack by being easier to cool.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,733
5,295
136
Looks normal.

24 channels of LPDDR looks like a big deal. I wonder who makes the base die for the HBM, if it is AMD's own design or someone else's.

Edit: Also, looks like NVDA missed the boat on this one. Their LPDDR connected to Grace CPU is kind of a yawn compared to LPDDR being connected to GPGPU.
 

Hitman928

Diamond Member
Apr 15, 2012
6,720
12,418
136
I'm not convinced by someone just because he takes something seriously --you still have to know what you're talking about too and not have some narrative agenda in mind before writing. Not sure I can take an appeal to authority or some claim of false causation regarding the lack of benchmarks seriously either.

They didn't just enter the AI market, they have had success in DCs with their AI100's released in 2019 and have dominated mobile AI both hardware and software wise for a decade (until Mediatek's strong entry this year...) They pioneered model quantization techniques and have a unique perception stack that uses gauge equivalent CNN's. Along with nVidia's DLSS, their modem \ RF systems are just a few pieces of consumer tech which use non-trivial neural nets fruitfully wholly at the client level.

Given their low power pedigree, I expect the parts to be very efficient, and their direct addressing of memory movement power overhead with a near compute architecture looks right on the money to me. I expect the parts to be a popular choice for inferencing that can improve compute density per rack by being easier to cool.

How much success have they had in the DC AI market? As far as I can see, the DCAI share of their revenue is effectively 0% for the past few years.