• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion Qualcomm Snapdragon Thread

Page 193 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Those want membw that LPDDR can never offer.

DC GPUs have their FF GFX h/w long excavated.
For bulk inferencing where multiple models may be resident, a MUCH bigger pool of LPDDR per card will mean lower costs, better availability of components, and less memory thrashing, cooling requirements and failure rates over the life time of ownership.

A focused inferencing design is also not necessarily a workable, efficient GPU design and vice versa. There are just differences in the scale of data movement and hence cache architecture, and the relative die space dedicated to processing lower precision datatypes / higher order math ops & data structures. AMD and nVidia bolted some lower precision datatypes & higher order tensor units to their GPU slices and HBM to their package to address the market ASAP, but there's surely a more efficient architecture with much less data movement if you are targeting inferencing only.

It's a nascent space with advances in model architecture likely to continuously inform hardware design, but there won't be as much return on investment from using hyper-expensive and power hungry GPUs for inferencing loads over more dedicated solutions. We'll see what the "near-memory" architecture of the AI250 brings to the table --but even more radical architectures like baking useful models into photonics might be needed for the scale of inferencing investments being discussed to be sustainable power-wise.

It's surely a bubble, but breaking out dedicated inferencing from training is at least moving hardware in the right direction.
 
Four years have passed since the acquisition of Nuvia, and there is still no Oryon-based server CPU. Wasn't that the primary desire of Nuvia's executives?
 
For bulk inferencing where multiple models may be resident, a MUCH bigger pool of LPDDR per card will mean lower costs, better availability of components, and less memory thrashing, cooling requirements and failure rates over the life time of ownership.
Tons of words to say tokens/sec ratio will be bad.
BTW, have you seen the picture of Mi450, with HBM, and the base die of each HBM stack also having a memory controller for 2x LPDDR channels? Looks insane...
Looks normal.
 
Wake me up when y'all want to talk about inferencing
Wake me up when Qualcomm wants to talk about inference. The joke I posted above only works because QC did not offer any meaningful metric that presents their hardware as efficient. No performance numbers, no power usage attached to that performance.

Buncha trolls here 😛
SemiAnalysis make their living out of taking both the industry in general and AI in particular VERY seriously. So sleep on this: if QC had a truly valuable product in their hands, we'd be swimming in benchmarks and efficiency claims.
 
Wake me up when Qualcomm wants to talk about inference. The joke I posted above only works because QC did not offer any meaningful metric that presents their hardware as efficient. No performance numbers, no power usage attached to that performance.


SemiAnalysis make their living out of taking both the industry in general and AI in particular VERY seriously. So sleep on this: if QC had a truly valuable product in their hands, we'd be swimming in benchmarks and efficiency claims.
I'm not convinced by someone just because he takes something seriously --you still have to know what you're talking about too and not have some narrative agenda in mind before writing. Not sure I can take an appeal to authority or some claim of false causation regarding the lack of benchmarks seriously either.

They didn't just enter the AI market, they have had success in DCs with their AI100's released in 2019 and have dominated mobile AI both hardware and software wise for a decade (until Mediatek's strong entry this year...) They pioneered model quantization techniques and have a unique perception stack that uses gauge equivalent CNN's. Along with nVidia's DLSS, their modem \ RF systems are just a few pieces of consumer tech which use non-trivial neural nets fruitfully wholly at the client level.

Given their low power pedigree, I expect the parts to be very efficient, and their direct addressing of memory movement power overhead with a near compute architecture looks right on the money to me. I expect the parts to be a popular choice for inferencing that can improve compute density per rack by being easier to cool.
 
Looks normal.

24 channels of LPDDR looks like a big deal. I wonder who makes the base die for the HBM, if it is AMD's own design or someone else's.

Edit: Also, looks like NVDA missed the boat on this one. Their LPDDR connected to Grace CPU is kind of a yawn compared to LPDDR being connected to GPGPU.
 
I'm not convinced by someone just because he takes something seriously --you still have to know what you're talking about too and not have some narrative agenda in mind before writing. Not sure I can take an appeal to authority or some claim of false causation regarding the lack of benchmarks seriously either.

They didn't just enter the AI market, they have had success in DCs with their AI100's released in 2019 and have dominated mobile AI both hardware and software wise for a decade (until Mediatek's strong entry this year...) They pioneered model quantization techniques and have a unique perception stack that uses gauge equivalent CNN's. Along with nVidia's DLSS, their modem \ RF systems are just a few pieces of consumer tech which use non-trivial neural nets fruitfully wholly at the client level.

Given their low power pedigree, I expect the parts to be very efficient, and their direct addressing of memory movement power overhead with a near compute architecture looks right on the money to me. I expect the parts to be a popular choice for inferencing that can improve compute density per rack by being easier to cool.

How much success have they had in the DC AI market? As far as I can see, the DCAI share of their revenue is effectively 0% for the past few years.
 
All things considered, that's not a great result for the X2 Elites vs the R9. 12-core vs 12-core, the Snapdragon CPU is ~6% faster at 50W, ~6% at 40W, ~9% at 30W, and ~15% at 20W.

The 18-core looks very impressive though.
The M4 Max achieves 2000+ points with a power consumption of 57-60 watts.
 
A lot of strange stuff in the slide deck.
The text implies that there are 4 branch units in the int pipeline, yet diagram only shows 2.
Conveniently, the diagram is an exact copy of the one from Gen 1.
1763567487772.png1763567517580.png
Load/store slide looks like an exact copy as well. It also implies the 96KB L1D (like Gen 1), yet geekerwan says 128KB. Unless it's configured differently between laptop and mobile of course.
1763567581443.png
 
Back
Top