Discussion Qualcomm Snapdragon Thread

adroc_thurston · Apr 26, 2024

SpudLobby said:
You’ll see him also say adroc is a liar etc.

Oh he coped alright.
It's funny, but I'm just a messanger anyway.

DrMrLordX · Apr 27, 2024

SpudLobby said:
Chips n Cheese discord. You’ll see him also say adroc is a liar etc.

Damn he went that far? Does @adroc_thurston even post there? Things are getting too hot around here. Folks need to chillax a little bit.

Henry swagger · Apr 27, 2024

DrMrLordX said:
Damn he went that far? Does @adroc_thurston even post there? Things are getting too hot around here. Folks need to chillax a little bit.

Who is adroc is he a insider or a mild type ?

moinmoin · Apr 27, 2024

DrMrLordX said:
Damn he went that far?

He always had been of the outspoken kind in forums. It's just sad this is no longer backed with any excellent public analysis like he did in AT articles.

Nothingness · Apr 28, 2024

I remind people that one of the reasons Andrei left this forum (not AT) was constant unfounded criticisms by some here. OTOH I guess he could not tell us much anyway due to where he works now.

SpudLobby · May 2, 2024

Ghostsonplanets said:
OOOOF I guess it's a good thing Flametail took a break:

View attachment 97827

View attachment 97828

That’s the one on a different die, i.e. cheaper physically speaking. The others are just Elites binned down. So tbh it’ll be fine and suggests they’re going to take it mainstream if it’s like that and with an 8c max.

poke01 · May 4, 2024

llvm-project/llvm/lib/Target/AArch64/AArch64SchedOryon.td at 8aebe46d7fdd15f02a9716718f53b03056ef0d19 · llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - llvm/llvm-project

github.com

Decoder width : 14-wide
Reorder buffer : 376 entries

Nothingness · May 4, 2024

poke01 said:
llvm-project/llvm/lib/Target/AArch64/AArch64SchedOryon.td at 8aebe46d7fdd15f02a9716718f53b03056ef0d19 · llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - llvm/llvm-project

github.com

Decoder width : 14-wide
Reorder buffer : 376 entries

Unless I missed something it's the dispatch width which is 14 uops. It's not the same as a 14 wide instruction decoder.

SpudLobby · May 4, 2024

poke01 said:
llvm-project/llvm/lib/Target/AArch64/AArch64SchedOryon.td at 8aebe46d7fdd15f02a9716718f53b03056ef0d19 · llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - llvm/llvm-project

github.com

Decoder width : 14-wide
Reorder buffer : 376 entries

Where did you see this originally?

poke01 · May 4, 2024

SpudLobby said:
Where did you see this originally?

https://twitter.com/x/status/1786755551688524124

here

DisEnchantment · May 6, 2024

Nothingness said:
Unless I missed something it's the dispatch width which is 14 uops. It's not the same as a 14 wide instruction decoder.

Dispatch width is extremely wide. Since there is no uop cache those uops should come fully from instruction decode. Compared to 6 vs Z4 or 8 for Z5. But not apples to apples comparable
Same mis-predict penalty vs Z4
Same L1 latency vs Z4
It has lower latency for vector (neon), but not apples to apples comparable vs x86 vector
ROB is surprisingly conservative for a 2024 core.

Similar amounts of LS and ALU ports compared to Z5, although Z5/Z4 has two additional FP ports. Store and Load queues depths comparable to Z4.
Not sure how branch address are calculated, for instance Z5 has 4 AGUs for those

It looks OK, nothing in particular stands out. If it is mostly an efficiency play then we will see soon.

Nothingness · May 6, 2024

DisEnchantment said:
Dispatch width is extremely wide. Since there is no uop cache those uops should come fully from instruction decode. Compared to 6 vs Z4 or 8 for Z5. But not apples to apples comparable

That can be quite complex: uops are stored in queues between decoder and dispatch; also a decoder could emit two uops per cycle. For instance a mem operation with writeback can be split in two uops, one going into ALU queue(s) for the writeback of the base register, while the other goes into load/store queue(s). So I'm afraid at this point nothing can be guessed about decoder width (though I agree it's surely wide, but it's unlikely to be 14-wide).

I've often been wrong, so I won't exclude I'm wrong again 🙂

DisEnchantment said:
Same mis-predict penalty vs Z4
Same L1 latency vs Z4
It has lower latency for vector (neon), but not apples to apples comparable vs x86 vector
ROB is surprisingly conservative for a 2024 core.

Similar amounts of LS and ALU ports compared to Z5, although Z5/Z4 has two additional FP ports. Store and Load queues depths comparable to Z4.
Not sure how branch address are calculated, for instance Z5 has 4 AGUs for those

It looks OK, nothing in particular stands out. If it is mostly an efficiency play then we will see soon.

I agree with you on all these points. Can't wait to see the reverse engineering of the uarch details by talented hackers! And benchmarks.

PS - A writeback operation in AArch64 is for instance a ldr x0, [x1], #8 which will do the load then add 8 (size of x0) to x1.

SarahKerrigan · May 6, 2024

Nothingness said:
That can be quite complex: uops are stored in queues between decoder and dispatch; also a decoder could emit two uops per cycle. For instance a mem operation with writeback can be split in two uops, one going into ALU queue(s) for the writeback of the base register, while the other goes into load/store queue(s). So I'm afraid at this point nothing can be guessed about decoder width (though I agree it's surely wide, but it's unlikely to be 14-wide).

I've often been wrong, so I won't exclude I'm wrong again 🙂

I agree with you on all these points. Can't wait to see the reverse engineering of the uarch details by talented hackers! And benchmarks.

PS - A writeback operation in AArch64 is for instance a ldr x0, [x1], #8 which will do the load then add 8 (size of x0) to x1.

There's also fusion to consider. As you know, "width" is kind of a fuzzy concept, especially with aggressively OoO machines where number of uops executing in a given cycle can greatly exceed the machine's sustained whole-pipe width.

Note that Neoverse V2, which is emphatically an 8-wide core, is listed as 16-wide in its LLVM machine model. Vendors tend to do the high-level machine-model variables in their own unique ways, often based on quantitative analysis on codegen rather than on the uarch manual.

soresu · May 7, 2024

Went back to that ARM rumor site and found something odd under Cortex X6:

Implication seems to be a new core IP segment between X and A7xx starting with this 'Alto'.

Not sure if this is just a bad translation or not.

SpudLobby · May 7, 2024

soresu said:
Went back to that ARM rumor site and found something odd under Cortex X6:

View attachment 98541
Implication seems to be a new core IP segment between X and A7xx starting with this 'Alto'.

Not sure if this is just a bad translation or not.

If ELP = the A5x isn’t that in between A5x and A7x? Or is ELP extra large perf not extra low power?

adroc_thurston · May 7, 2024

SpudLobby said:
ELP = the A5x isn’t that in between A5x and A7x?

It is.

SpudLobby · May 7, 2024

adroc_thurston said:
It is.

Ya figured.

soresu · May 7, 2024

SpudLobby said:
If ELP = the A5x isn’t that in between A5x and A7x? Or is ELP extra large perf not extra low power?

ELP is the internal terminology for the Cortex X cores.

The X cores previous to Blackhawk were labeled Makalu-ELP and Hunter-ELP internally.

trivik12 · May 8, 2024

I hope we get independent benchmarks after Asus releases its 1st laptop with X Elite.

SpudLobby · May 8, 2024

soresu said:
ELP is the internal terminology for the Cortex X cores.

The X cores previous to Blackhawk were labeled Makalu-ELP and Hunter-ELP internally.

Oh

SpudLobby · May 8, 2024

trivik12 said:
I hope we get independent benchmarks after Asus releases its 1st laptop with X Elite.

Watch it be barely any different lmao

The battery life is the only real reveal.
Just a guess, it’s gonna be good.

adroc_thurston · May 8, 2024

SpudLobby said:
Just a guess, it’s gonna be good.

It's ok.

SpudLobby · May 8, 2024

adroc_thurston said:
It's ok.

Don’t bother lol. We’ll see.

Nothingness · May 8, 2024

For people with too much money on their hands, there's a board with a Cortex-X3 based Qualcomm SoC:

C8550 Development Kit - Thundercomm

Thundercomm TurboX C8550 Development Kit is a high performance Development Kit which is powered by next Gen Flagship Qualcomm® Snapdragon™ QCS8550 processor. It supports Android, featuring in advanced AI increases, huge camera and video advancements and evolved graphics capability. It is an...

www.thundercomm.com

The TurboX SOM specification: https://thundercomm.s3.ap-northeast...0-en]_TurboX_C8550_SOM_Product_Brief_V1.0.pdf

Qualcomm® QCS8550
Kryo™ CPU
Adreno™ 740 GPU
GPU Spectra™ ISP
12GB

QCS8550: https://docs.qualcomm.com/bundle/pu..._QCS8550_QCM8550_PROCESSORS_PRODUCT_BRIEF.pdf

Qualcomm® Kryo™ CPU; 64-bit architecture
- 1 Prime core, up to 3.36 GHz with Arm® Cortex®-X3 technology
- 4 Performance cores, up to 2.8 GHz
- 3 Efficiency cores, up to 2.0 GHz

$1600 with only 12 GB is way too much for my toying needs 🙂

SarahKerrigan · May 8, 2024

Nothingness said:
For people with too much money on their hands, there's a board with a Cortex-X3 based Qualcomm SoC:

C8550 Development Kit - Thundercomm

Thundercomm TurboX C8550 Development Kit is a high performance Development Kit which is powered by next Gen Flagship Qualcomm® Snapdragon™ QCS8550 processor. It supports Android, featuring in advanced AI increases, huge camera and video advancements and evolved graphics capability. It is an...

www.thundercomm.com

The TurboX SOM specification: https://thundercomm.s3.ap-northeast-1.amazonaws.com/uploads/web/c8550/[tc-P-1110-en]_TurboX_C8550_SOM_Product_Brief_V1.0.pdf

QCS8550: https://docs.qualcomm.com/bundle/pu..._QCS8550_QCM8550_PROCESSORS_PRODUCT_BRIEF.pdf

$1600 with only 12 GB is way too much for my toying needs 🙂

Embeddedified Snapdragon 8g2, looks like. A little steep for me as well, though certainly quick.

Discussion Qualcomm Snapdragon Thread

Diamond Member

Lifer

Senior member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Senior member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Senior member

Golden Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Senior member