Discussion ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion

Shivansps · Feb 4, 2025

Staff Graphics Software Engineer in Raanana at Arm

Learn more about and apply for the Staff Graphics Software Engineer job in Raanana at Arm

careers.arm.com

Looks like ARM is looking forward for Mali on Windows.

DZero · Feb 4, 2025

Shivansps said:
Staff Graphics Software Engineer in Raanana at Arm

Learn more about and apply for the Staff Graphics Software Engineer job in Raanana at Arm

careers.arm.com

Looks like ARM is looking forward for Mali on Windows.

About time, seems that ARM on Windows will deliver the desired drivers for everyone (in terms of emulation)

soresu · Feb 5, 2025

Oh @NostaSeronx found another rOoO paper.....

ReOVE: Restricted Out-of-Order Execution for Superscalar Processors with Vector Extension | Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design

dl.acm.org

soresu · Feb 5, 2025

It's RV specific, but I'd be surprised if the principles couldn't apply to ARM too.

soresu · Feb 5, 2025

Oh look and another....

CASINO Core Microarchitecture: Generating Out-of-Order Schedules Using Cascaded In-Order Scheduling Windows | Ipoom Jeong

In this work, we propose a CASINO core microarchitecture that dynamically and speculatively generates out-of-order instruction issue schedules with complexity close to in-order scheduling by leveraging CAScaded IN-Order scheduling windows.

ipoom-jeong.com

soresu · Feb 5, 2025

Edit: Moved to new µArch thread.

soresu · Feb 5, 2025

Hmmmm, I'd like to dive deeper into this particular rabbit hole, but I don't want to derail this thread further so I'm starting new threads for CPU and GPU µArch research.

naukkis · Feb 5, 2025

soresu said:
It's RV specific, but I'd be surprised if the principles couldn't apply to ARM too.

Sure, after ARM adds vector ISA to their instruction set. Vector isa is pretty much designed to do long daisy-chained instruction loops without need to rearrange execution so in-order execution of vector side is pretty obvious thing to do.

Shivansps · Feb 8, 2025

Cix Technology Group Co., Ltd. CIX Phecda Board - Geekbench

Benchmark results for a Cix Technology Group Co., Ltd. CIX Phecda Board with an ARM ARMv8 processor.

browser.geekbench.com

Another benchmark of the CIX soc, now at 3Ghz

DZero · Feb 8, 2025

I am now convinced: ARM must follow Huawei's path and deliver a small out of order core.

ARM A7XX are excellent, but as for small cores won't be a good idea. Even Apple pulled that.
Maybe a nerfed A7XX called as a A6XX core could be ideal for it.

DrMrLordX · Feb 9, 2025

Shivansps said:
Another benchmark of the CIX soc, now at 3Ghz

Hmm that's not too bad? At least glancing at the GB6 numbers. Still not M1 territory but let's be honest, who expected otherwise? It's beating the RK3588 pretty convincingly.

Nothingness · Feb 9, 2025

naukkis said:
Sure, after ARM adds vector ISA to their instruction set. Vector isa is pretty much designed to do long daisy-chained instruction loops without need to rearrange execution so in-order execution of vector side is pretty obvious thing to do.

You mean SVE which Arm introduced years ago? And which the article linked to explicitly mentions?

Shivansps · Feb 9, 2025

DrMrLordX said:
Hmm that's not too bad? At least glancing at the GB6 numbers. Still not M1 territory but let's be honest, who expected otherwise? It's beating the RK3588 pretty convincingly.

Its ok, almost Ryzen 7 3700X territory. At just 3ghz its really not that bad.

naukkis · Feb 9, 2025

Nothingness said:
You mean SVE which Arm introduced years ago? And which the article linked to explicitly mentions?

SVE ain't vector isa but scalable SIMD. It is designed to be OOO-friendly implementation. RV instead is full vector ISA which hardware OOO-implementation was no long ago considered pretty much impossible to implement. Seems that OOO-implementations are indeed doable but executing vector path OOO is pretty questionable as whole design is build to be extract enough parallelism from code to make wide in-order cores to work efficiently.

DZero · Feb 9, 2025

DrMrLordX said:
Hmm that's not too bad? At least glancing at the GB6 numbers. Still not M1 territory but let's be honest, who expected otherwise? It's beating the RK3588 pretty convincingly.

The RK3588 even wanted to be defeated, now is Raspberry which is in the situation of "can't catch up" since there are more options available and even cheaper.
And that's since Rockchip already has the 3688 in the works

DrMrLordX · Feb 9, 2025

DZero said:
The RK3588 even wanted to be defeated, now is Raspberry which is in the situation of "can't catch up" since there are more options available and even cheaper.
And that's since Rockchip already has the 3688 in the works

One hopes availability on the Orion O6 (and RK3688) is better than the RK3588 which was delayed for so long.

soresu · Feb 9, 2025

DrMrLordX said:
One hopes availability on the Orion O6 (and RK3688) is better than the RK3588 which was delayed for so long.

RK3588 was announced long before it actually went to fabs I think.

The specs they originally announced were different to what they later made.

DZero · Feb 9, 2025

soresu said:
RK3588 was announced long before it actually went to fabs I think.

The specs they originally announced were different to what they later made.

That is interesting, what were the OG Specs of the RK 3588?

LightningDust · Feb 9, 2025

naukkis said:
RV instead is full vector ISA which hardware OOO-implementation was no long ago considered pretty much impossible to implement.

Maybe if by "not long ago" you mean the 1980s. NEC has been doing out-of-order on vector computers for a while - since at least SX-9 and I believe SX-8 as well.

naukkis · Feb 9, 2025

LightningDust said:
Maybe if by "not long ago" you mean the 1980s. NEC has been doing out-of-order on vector computers for a while - since at least SX-9 and I believe SX-8 as well.

I mean out-of-order hardware. SX9 has vector overtake instruction on vector operations so compiler can mark non-dependent instructions to overtake long latency shuffle instructions. That's still pure ir-order vector hardware. Non-vector side of cpu could of course be OOO just like that invention in discussion suggests.

OOO hardware is there to execute instructions at rate memory reads are served. Vector ISA does that at ISA level by putting data in long daisy chained vectors that are executed sequentially, thus hardware should be able to fully utilize it's available memory bandwidth without OOO.

LightningDust · Feb 9, 2025

naukkis said:
I mean out-of-order hardware. SX9 has vector overtake instruction on vector operations so compiler can mark non-dependent instructions to overtake long latency shuffle instructions. That's still pure ir-order vector hardware. Non-vector side of cpu could of course be OOO just like that invention in discussion suggests.

Nope. SX-9 can do straight-up out-of-order issue of vector ops. SX-ACE extends it further.

naukkis · Feb 9, 2025

LightningDust said:
Nope. SX-9 can do straight-up out-of-order issue of vector ops. SX-ACE extends it further.

Might be, I'm not familiar with those designs. Doing fast search I just found this which only explains that software based reordering capability. Even that document mentions that doing hardware OOO for 256-element vectors would be challenging. https://link.springer.com/article/10.1007/s11227-017-1993-y

MS_AT · Feb 9, 2025

Here is a link to hotchips slides for SXAurora Vector Engine Proccessor https://old.hotchips.org/hc30/2conf/2.14_NEC_vector_NEC_SXAurora_TSUBASA_HotChips30_finalb.pdf, Anandtech did a live blog on it here https://www.anandtech.com/show/13259/hot-chips-2018-nec-vector-processor-live-blog. The blog post mentions OoO, while the slides are using OoO scheduling. You can also find SX-ACE slides here https://old.hotchips.org/wp-content...e-epub/HC26.11.110-SX-ACE-MOMOSE-NEC-v004.pdf

From high level point of view they seem similar, SX-ACE hotchips slides don't mention OoO explicitly as far as I can tell. But still Aurora seems like an evolution of ACE so they thought that adding OoO scheduling is important.

soresu · Feb 9, 2025

DZero said:
That is interesting, what were the OG Specs of the RK 3588?

IIRC the GPU spec changed from one more contemporary to the A76 to the G610.

I might be misremembering things though.

RockChip have an annoying tendency to be ambiguous with specs of future SoCs sometimes, like the RK3688 mentions a v9.3-A CPU core, but according to latest rumours of X930 and Ax30 they are v9.4-A ISA instead 😒

soresu · Feb 9, 2025

naukkis said:
SVE ain't vector isa but scalable SIMD

A bit of reading seems to imply by SIMD's basic definition that is a vector, depending on what you classify to be a vector:

Row and column vectors - Wikipedia

en.wikipedia.org

Discussion ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Platinum Member

Lifer

Diamond Member

Diamond Member

Golden Member

Platinum Member

Lifer

Diamond Member

Platinum Member

Member

Golden Member

Member

Golden Member

Golden Member

Diamond Member

Diamond Member