Discussion Qualcomm Snapdragon Thread

igor_kavinski · Aug 1, 2024

I'm a bit confused.

Are pipeline stages == clock cycles?

That's what DavidC1 seems to be implying with that post.

GTracing · Aug 1, 2024

igor_kavinski said:
I'm a bit confused.

Are pipeline stages == clock cycles?

That's what DavidC1 seems to be implying with that post.

No, but adding pipeline stages allows designers to increase clockspeed.

In an CPU with no pipelining, the core has to decode and execute the whole instruction in a single cycle. It takes a long time for the chip to do all that work.

Pipelining is splitting the work into multiple stages. By splitting the work into 10 stages, each stage does about 1/10th of the work to decode and execute an instruction. And then you can clock the processor much faster. The downside is that it takes 10 cycles to complete a single instruction. But while the first instruction is on stage 6, the second one can be on stage 5, the third instruction can be on stage 4 etc. So overall it's still a huge speedup.

Where this gets hairy is when there's a branch in the code. The CPU doesn't know what instruction comes next until the branch finishes executing, so it just guesses. CPUs are pretty good at guessing correctly, but when they get it wrong they have to discard all the work they did on the "guess". The time lost working on the wrong guess is known as the branch misprediction penalty.

Nothingness · Aug 1, 2024

@GTracing Good summary, thanks 🙂

I would just add that even after dispatching instructions the penalty is variable due to instructions not needing the same number of stages/cycles to execute (int add vs FP64 fma for instance).

Also as @DavidC1 said in his post there are several other variables that change latency. That’s why I don’t understand what the single 11 figure for X925 represents. It should be a range or flagged as a max latency. I looked at X925 Software Optimization Guide and couldn’t find any information about that.

Nothingness · Aug 1, 2024

igor_kavinski said:
Are pipeline stages == clock cycles?

1 pipe stage usually is 1 clock cycle, but that’s not always the case. IIRC the P4 was using double clock speed for its ALUs (where is Sarah when you need her?).

FlameTail · Aug 1, 2024

Cheaper Snapdragon laptops are coming.... Next year!?

$700 Snapdragon X PCs will be available starting next year, says Qualcomm CEO

More affordable computers with more efficient chips are coming in 2025.

www.tomshardware.com

“We expect PC to be the next biggest driver of diversification for the company,” says Amon. Business will be “slow and steady as the market transitions,” he admitted, but we already see some good signs coming out of the woodwork. Amon said that some Snapdragon X PCs have already sold out, while Geekbench also posted on X that 6.5% of Geekbench 6 benchmarks from June 15 to July 15, 2024, were also run on Snapdragon X devices — good signs for Qualcomm, especially as it had launched less than a month before that.

FlameTail · Aug 2, 2024

Should Qualcomm create a minimum laptop specification standard, like Intel Evo?

FlameTail · Aug 2, 2024

Snapdragon 8 Gen 4 leak:

FlameTail · Aug 2, 2024

Also a comment about efficiency concerns about 8Gen4. Next gen Chinese phones have been rumoured to have much larger batteries (5500,6000,7000 mAh) compared to the current crop of phones (4500, 5000, 6000 mAh). Some speculated this is due to bad power efficiency of 8G4.

https://twitter.com/x/status/1816918497869304200

adroc_thurston · Aug 2, 2024

FlameTail said:
Should Qualcomm create a minimum laptop specification standard, like Intel Evo?

They don't have the volume to enforce anything.

FlameTail said:
Some speculated this is due to bad power efficiency of 8G4.

It is the return of sd810 yes, but that's unrelated to the battery capacity bumps.

FlameTail · Aug 2, 2024

Sebastian Aaltonen comments again about GPU architecture of Qualcomm.

https://twitter.com/x/status/1819378559472792058

And now you understand why GPU-driven games such as Rainbox Six Siege run very poorly on Qualcomm Snapdragon X. Same for Nanite.

Other Android GPU vendors have similar bottlenecks. This is not just Qualcomm. These GPUs are optimized for traditional VS+PS workloads. All big data (matrices, material) should be loaded from uniform buffer using fixed address. This hits fast paths.

If you want to know how HypeHype’s new renderer optimizes around these bottlenecks, check our new SIGGRAPH 2024 presentation (in Moving Mobile Graphics track). Slide deck will be public next week.

Most people don’t know that biggest improvement in Turing wasn’t ray-tracing or tensor cores. It was 28 cycle L1$ latency (vs 85 cycles) and 2x L1$ bandwidth (source: CUDA paper). This is great for GPU-driven render + V-buffer. And great for ray-tracing too.

Nvidia wasn’t great in GPU compute before Turing. They also lacked async compute. Kepler also emulated LDS atomics. Turing was a massive enabler for Nvidia -> $3 trillion company now. But Pascal had so fast geometry processing that gamers didn’t notice GCNs compute advantage.

Back in the Pascal days Nvidia made a series of blog posts advicing devs to use uniform buffers for their new deferred lighting shaders. IIRC Pascal was suffering 30% from modern raw buffer compute code. Now Qualcomm is in the same position when you run modern workloads.

soresu · Aug 2, 2024

Nvidia wasn’t great in GPU compute before Turing. They also lacked async compute. Kepler also emulated LDS atomics. Turing was a massive enabler for Nvidia -> $3 trillion company now. But Pascal had so fast geometry processing that gamers didn’t notice GCNs compute advantage.

This is a big understatement.

If not for their CUDA walled garden they would not have the market dominance in GPU general compute they enjoy at the moment.

Case in point CDNAx is still based on Vega/GFX9.

GFX9 remains good enough for straight compute, it just was poorly designed to scale with games while filling the CUs with work.

I have heard rumblings though that a near future iteration of CDNA could be based on GFX13 or 14 (RDNA5 onward).

adroc_thurston · Aug 2, 2024

soresu said:
Case in point CDNAx is still based on Vega/GFX9.

Very little of that left outside of the overall 1 scheduler/4 SIMD arrangement.
But yeah, it took until GA100 for NV to reach featureset parity with Southern Islands.

FlameTail · Aug 3, 2024

Snapdragon 8 Gen 4 might match the Single Core performance of Apple A18 in Geekbench 5, where SME does not add to the score.

gdansk · Aug 3, 2024

FlameTail said:
Snapdragon 8 Gen 4 might match the Single Core performance of Apple A18 in Geekbench 5, where SME does not add to the score.

I find that claim questionable.

jdubs03 · Aug 3, 2024

FlameTail said:
Snapdragon 8 Gen 4 might match the Single Core performance of Apple A18 in Geekbench 5, where SME does not add to the score.

Do we know the X Elite GB5 score? I can’t find anything on it. That’d be a decent proxy.
Would have to get around 2500-2600 to challenge.

Nothingness · Aug 3, 2024

jdubs03 said:
Do we know the X Elite GB5 score? I can’t find anything on it. That’d be a decent proxy.
Would have to get around 2500-2600 to challenge.

Here you go: https://browser.geekbench.com/v5/cpu/search?utf8=✓&q=X1e84100

jdubs03 · Aug 3, 2024

Awesome thank you.

Yeah so based on that the X1E-84-100, the score is around 1940 (median of 5 results). The A17 Pro is already at 2130ish on average.

The A18 Pro will probably be around 2400* EDIT. Doesn’t seem possible that the Snapdragon 8 Gen 4 will be anywhere near that. I don’t even know if we can expect it to be at 2000.

FlameTail · Aug 3, 2024

Nothingness said:
Here you go: https://browser.geekbench.com/v5/cpu/search?utf8=✓&q=X1e84100

Oh no. Apple is stronger in GB5 than GB6?

	GB5	GB6	CB2024
X1E-84	1950	2950	131
M3 Max	2350	3150	144
Delta	+20.5%	+6.7%	+9.9%

jdubs03 · Aug 3, 2024

That score I think is for the 80W reference design.
The 23W reference was 124.
And I’ve seen scores around for X1E-84 of 125/126 and 129 (in a Best Buy review) from the Galaxy Book 4 Edge, which seems most representative of real world results as of now.

hemedans · Aug 3, 2024

jdubs03 said:
Awesome thank you.

Yeah so based on that the X1E-84-100, the score is around 1940 (median of 5 results). The A17 Pro is already at 2130ish on average.

The A18 Pro will probably be around 2400* EDIT. Doesn’t seem possible that the Snapdragon 8 Gen 4 will be anywhere near that. I don’t even know if we can expect it to be at 2000.

4.2ghz is massive jump in frequency, that plus minor ipc improvement it's possible for 8 gen 4 to reach ~3500 in GB 6 which is A18 level.

FlameTail · Aug 3, 2024

The Ryzen AI HX 9 370 is also surpassing the X1E84100 in GB5

AI 370 - Geekbench 5 CPU Search - Geekbench

Exceeding 2050 points.

FlameTail · Aug 3, 2024

hemedans said:
4.2ghz is massive jump in frequency, that plus minor ipc improvement it's possible for 8 gen 4 to reach ~3500 in GB 6 which is A18 level.

The rumour says 4.37 GHz ST boost for 8G4.

X Elite @ 4.3 GHz can hit 3200 in Linux (3000 in Windows). If assume that carries over to Android (because it uses the Linux kernel), we might be looking at about 3250 points for 8G4.

I also see a possibility that Phoenix-L in 8G4 might have some single digit (<5%) IPC gains compared to Phoenix in X Elite, due to a more robust memory subsystem etc...

jdubs03 · Aug 3, 2024

Tbh I’d be very surprised if a mobile derivative of the X Elite will perform as high as their laptop part at its’ highest wattage.
Keep in mind their GB6 score of ~2965 was for the 80W reference part. Their 23W reference design achieved ~2765.

If I had to guess, it’d be under 3000.
3250 seems too high. That’s almost 10% faster than the 80W reference.
Why wouldn’t they want to use that same performance core in their flagship laptops?

FlameTail · Aug 3, 2024

jdubs03 said:
Tbh I’d be very surprised if a mobile derivative of the X Elite will perform as high as their laptop part at its’ highest wattage.
Keep in mind their GB6 score of ~2965 was for the 80W reference part. Their 23W reference design achieved ~2765.

If I had to guess, it’d be under 3000.
3250 seems too high. That’s almost 10% faster than the 80W reference.
Why wouldn’t they want to use that same performance core in their flagship laptops?

A single core is definetely not guzzling 80W.

Also that 3200 is for Linux. In Windows it does 3000 (for reasons that I cannot explain -_-)

jdubs03 · Aug 3, 2024

FlameTail said:
A single core is definetely not guzzling 80W.

Also that 3200 is for Linux. In Windows it does 3000 (for reasons that I cannot explain -_-)

Surely not. But the 23W design is more representative of a mobile SKU.
Using the 2765 baseline from the 23W, apply the Linux adjustment gets to 2950. Being generous with another 5% for IPC improvements is 3100. And another adjustment to get to that 4.37Ghz vs 4.3, gets to 3150.

It’s tough to assume that an 8W phone would score that high. Just due to the form factor. If they can hit 3000 that would be a surprise.

Discussion Qualcomm Snapdragon Thread

Lifer

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Senior member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Golden Member