adroc_thurston
Diamond Member
- Jul 2, 2023
- 5,536
- 7,723
- 96
Oh he coped alright.You’ll see him also say adroc is a liar etc.
It's funny, but I'm just a messanger anyway.
Oh he coped alright.You’ll see him also say adroc is a liar etc.
Damn he went that far? Does @adroc_thurston even post there? Things are getting too hot around here. Folks need to chillax a little bit.Chips n Cheese discord. You’ll see him also say adroc is a liar etc.
Who is adroc is he a insider or a mild type ?Damn he went that far? Does @adroc_thurston even post there? Things are getting too hot around here. Folks need to chillax a little bit.
He always had been of the outspoken kind in forums. It's just sad this is no longer backed with any excellent public analysis like he did in AT articles.Damn he went that far?
That’s the one on a different die, i.e. cheaper physically speaking. The others are just Elites binned down. So tbh it’ll be fine and suggests they’re going to take it mainstream if it’s like that and with an 8c max.
Unless I missed something it's the dispatch width which is 14 uops. It's not the same as a 14 wide instruction decoder.![]()
llvm-project/llvm/lib/Target/AArch64/AArch64SchedOryon.td at 8aebe46d7fdd15f02a9716718f53b03056ef0d19 · llvm/llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - llvm/llvm-projectgithub.com
Decoder width : 14-wide
Reorder buffer : 376 entries
Where did you see this originally?![]()
llvm-project/llvm/lib/Target/AArch64/AArch64SchedOryon.td at 8aebe46d7fdd15f02a9716718f53b03056ef0d19 · llvm/llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - llvm/llvm-projectgithub.com
Decoder width : 14-wide
Reorder buffer : 376 entries
Where did you see this originally?
Dispatch width is extremely wide. Since there is no uop cache those uops should come fully from instruction decode. Compared to 6 vs Z4 or 8 for Z5. But not apples to apples comparableUnless I missed something it's the dispatch width which is 14 uops. It's not the same as a 14 wide instruction decoder.
That can be quite complex: uops are stored in queues between decoder and dispatch; also a decoder could emit two uops per cycle. For instance a mem operation with writeback can be split in two uops, one going into ALU queue(s) for the writeback of the base register, while the other goes into load/store queue(s). So I'm afraid at this point nothing can be guessed about decoder width (though I agree it's surely wide, but it's unlikely to be 14-wide).Dispatch width is extremely wide. Since there is no uop cache those uops should come fully from instruction decode. Compared to 6 vs Z4 or 8 for Z5. But not apples to apples comparable
I agree with you on all these points. Can't wait to see the reverse engineering of the uarch details by talented hackers! And benchmarks.Same mis-predict penalty vs Z4
Same L1 latency vs Z4
It has lower latency for vector (neon), but not apples to apples comparable vs x86 vector
ROB is surprisingly conservative for a 2024 core.
Similar amounts of LS and ALU ports compared to Z5, although Z5/Z4 has two additional FP ports. Store and Load queues depths comparable to Z4.
Not sure how branch address are calculated, for instance Z5 has 4 AGUs for those
It looks OK, nothing in particular stands out. If it is mostly an efficiency play then we will see soon.
That can be quite complex: uops are stored in queues between decoder and dispatch; also a decoder could emit two uops per cycle. For instance a mem operation with writeback can be split in two uops, one going into ALU queue(s) for the writeback of the base register, while the other goes into load/store queue(s). So I'm afraid at this point nothing can be guessed about decoder width (though I agree it's surely wide, but it's unlikely to be 14-wide).
I've often been wrong, so I won't exclude I'm wrong again
I agree with you on all these points. Can't wait to see the reverse engineering of the uarch details by talented hackers! And benchmarks.
PS - A writeback operation in AArch64 is for instance a ldr x0, [x1], #8 which will do the load then add 8 (size of x0) to x1.
If ELP = the A5x isn’t that in between A5x and A7x? Or is ELP extra large perf not extra low power?Went back to that ARM rumor site and found something odd under Cortex X6:
View attachment 98541
Implication seems to be a new core IP segment between X and A7xx starting with this 'Alto'.
Not sure if this is just a bad translation or not.
It is.ELP = the A5x isn’t that in between A5x and A7x?
If ELP = the A5x isn’t that in between A5x and A7x? Or is ELP extra large perf not extra low power?
OhELP is the internal terminology for the Cortex X cores.
The X cores previous to Blackhawk were labeled Makalu-ELP and Hunter-ELP internally.
Watch it be barely any different lmaoI hope we get independent benchmarks after Asus releases its 1st laptop with X Elite.
QCS8550: https://docs.qualcomm.com/bundle/pu..._QCS8550_QCM8550_PROCESSORS_PRODUCT_BRIEF.pdfQualcomm® QCS8550
Kryo™ CPU
Adreno™ 740 GPU
GPU Spectra™ ISP
12GB
Qualcomm® Kryo™ CPU; 64-bit architecture
- 1 Prime core, up to 3.36 GHz with Arm® Cortex®-X3 technology
- 4 Performance cores, up to 2.8 GHz
- 3 Efficiency cores, up to 2.0 GHz
For people with too much money on their hands, there's a board with a Cortex-X3 based Qualcomm SoC:
The TurboX SOM specification: https://thundercomm.s3.ap-northeast-1.amazonaws.com/uploads/web/c8550/[tc-P-1110-en]_TurboX_C8550_SOM_Product_Brief_V1.0.pdf![]()
C8550 Development Kit - Thundercomm
Thundercomm TurboX C8550 Development Kit is a high performance Development Kit which is powered by next Gen Flagship Qualcomm® Snapdragon™ QCS8550 processor. It supports Android, featuring in advanced AI increases, huge camera and video advancements and evolved graphics capability. It is an...www.thundercomm.com
QCS8550: https://docs.qualcomm.com/bundle/pu..._QCS8550_QCM8550_PROCESSORS_PRODUCT_BRIEF.pdf
$1600 with only 12 GB is way too much for my toying needs![]()