Question Qualcomm's first Nuvia based SoC - Hamoa

poke01 · Nov 8, 2022

Qualcomm's working on a 2024 PC chip codename "Hamoa" with up to 12 (8P+4E).

Said to the same cache layout as M1. large private L1$, per-cluster L2$ (cluster = 4 cores, 12MB for every cluster) and a lot of LLC.

source:

https://twitter.com/x/status/1589405172979339264

Doug S · Nov 14, 2023

eek2121 said:
Have you ever heard people say megahertz doesn’t matter? This is why. It is possible to build a 1ghz chip that runs just as fast. AMD and Intel justdesign their chips around a very specific performance/power/cost/marketing profile.

No its not. I don't know what the lowest frequency you could get that level of performance it, but it is far higher than 1 GHz. We can't come close to extracting enough parallelism to quadruple IPC over M3 or sextuple it over Intel/AMD, which is what would be required to match that performance level at 1 GHz.

FlameTail · Nov 14, 2023

Doug S said:
If you require more than triple the power to gain 30% in performance

So it's pushed way past the efficiency sweet spot.

How does Apple compare?

mikegg · Nov 14, 2023

SpudLobby said:
I definitely don’t think the 4.3GHz ST will look great. And we know the MT clocks are at best 3.8GHz fwiw. Tops out at 80W platform power there so I mean in practice for ST I could see the power draw at 9-12W or something at 3.8GHz. Which isn’t too bad, the perf on GB6 is around 2770 or so at that point? I think AMD still draws more for similar.

Apple M2 Review

The new Apple M2 is being touted as a much faster SoC than its competitors, but we'll have to see that for ourselves as we test it...

www.techspot.com

Here’s a review that took power from the wall minus idle (so display mostly and statics removed). M2 ST is about 8W.

You also notice MT power — and we saw this in the Mac Mini Andrei review too — is a bit higher than Apple’s claims probably because they do roughly Intel/AMD on their power figures when they say the M2 is 15W at top (or M3, 17W) — they are only referring to the CPU or SoC.

Also see the ST is a bit higher from even the AMD and Intel stuff. I’m not 100% sure what Phoenix looks like but I’d wager it peaks as badly as Rembrandt just with more performance.

They had to use Cinebench R23 for this. Cinebench R23 is hand optimized for AVX and poorly translated to use NEON.

Once again, lazy reviewer basing perf/watt on Cinebench between ARM and x86.

SpudLobby · Nov 14, 2023

senttoschool said:
They had to use Cinebench R23 for this. Cinebench R23 is hand optimized for AVX and poorly translated to use NEON.

Once again, lazy reviewer basing perf/watt on Cinebench between ARM and x86.

View attachment 88835

View attachment 88836

I am aware of that, but it’s not going to affect the wattage itself substantially and you see the same totals elsewhere. I am only interested in the wattage here, because it will have overlap with ST workload dynamic power elsewhere where the code is fair, and this is what was available. Even Andrei’s M1 Mini Test ended up around 7W dynamic power ST. Point is, nearly everyone underestimates what their chips in any meaningful sense are consuming, and this is abetted by AMD/Intel and to a lesser extent Apple given how they qualify consumption.

Nothingness · Nov 14, 2023

senttoschool said:
They had to use Cinebench R23 for this. Cinebench R23 is hand optimized for AVX and poorly translated to use NEON.

For what's it's worth:

Cinebench R23 ST score: M2 is 28% below 14900K
Cinebench R24 ST score: M2 is 14% below 14900K.

For M1 it's -35% for R23 and -20% R24.

It's hard to say how that'd translate to power usage and efficiency, but I guess power would be very similar given that, even though translation is poor, NEON is used in R23.

Refs:

Cinebench R23 Scores [Updated Results]

Cinebench R23 is the newest instalment of the most popular CPU-Rendering Benchmark Cinebench. We have the Score Results for all modern Processors.

www.cgdirector.com

Cinebench 2024 Scores [CPU & GPU] (Updated Results)

The new Version of Cinebench, 2024, benchmarks CPUs and GPUs in the popular Redshift render engine. We benchmarked all modern CPUs and GPUs and list them in our score overview.

www.cgdirector.com

SpudLobby · Nov 14, 2023

Nothingness said:
For what's it's worth:

Cinebench R23 ST score: M2 is 28% below 14900K
Cinebench R24 ST score: M2 is 14% below 14900K.

For M1 it's -35% for R23 and -20% R24.

It's hard to say how that'd translate to power usage and efficiency, but I guess power would be very similar given that, even though translation is poor, NEON is used in R23.

Refs:

Cinebench R23 Scores [Updated Results]

Cinebench R23 is the newest instalment of the most popular CPU-Rendering Benchmark Cinebench. We have the Score Results for all modern Processors.

www.cgdirector.com

Cinebench 2024 Scores [CPU & GPU] (Updated Results)

The new Version of Cinebench, 2024, benchmarks CPUs and GPUs in the popular Redshift render engine. We benchmarked all modern CPUs and GPUs and list them in our score overview.

www.cgdirector.com

Yep. It would harm energy efficiency by virtue of being less performant for our outcome, but the *power* figures themselves are unlikely to change much, and I posted them more as proxies for what these chips look like under various workloads when stressed.

C24 is probably more fair to Arm I’d bet

eek2121 · Nov 14, 2023

SpudLobby said:
No they are not. It’s not on the die and it’s optional. You’re in for a ride.

Buddy if you think opposing MHZ wars helps you I have awful news for you. We’re talking frequencies because we care about performance and AMD and Intel need higher clocks for the same performance. Asking where the frequency/power points on a core’s curve lie is just standard.

Frequencies, power, and IPC are mostly disconnected. You can have a 6 ghz CPU that sips power and a 1 Ghz CPU that drinks it, both made on the same process.

I suppose you think Intel is the first company with a 6ghz CPU? IBM had this game down years ago.

Doug S said:
No its not. I don't know what the lowest frequency you could get that level of performance it, but it is far higher than 1 GHz. We can't come close to extracting enough parallelism to quadruple IPC over M3 or sextuple it over Intel/AMD, which is what would be required to match that performance level at 1 GHz.

NVIDIA and AMD disagree: GPUs are what are powering AI right now, mind you. They are massively parallel by design, low frequency, powerful, efficient for what they do, and turing complete.

Doug S · Nov 14, 2023

FlameTail said:
So it's pushed way past the efficiency sweet spot.

How does Apple compare?

You could be sure M2/M3 would see power shoot way up similarly if they supported a turbo mode 500 MHz above their standard clock.

Doug S · Nov 14, 2023

eek2121 said:
Frequencies, power, and IPC are mostly disconnected. You can have a 6 ghz CPU that sips power and a 1 Ghz CPU that drinks it, both made on the same process.

I suppose you think Intel is the first company with a 6ghz CPU? IBM had this game down years ago.

NVIDIA and AMD disagree: GPUs are what are powering AI right now, mind you. They are massively parallel by design, low frequency, powerful, efficient for what they do, and turing complete.

GPUs are only useful for running massively parallel code, which is what both shaders and 'AI' is. They are nothing like CPUs, and comparing them is this way is just plain foolish. Compile normal straight line code like Geekbench's LLVM test on one and it would absolutely choke, with performance far worse than an Apple or Intel/AMD CPU downclocked to the same frequency, because it isn't designed for straight line code.

gdansk · Nov 14, 2023

senttoschool said:
They had to use Cinebench R23 for this. Cinebench R23 is hand optimized for AVX and poorly translated to use NEON.

Once again, lazy reviewer basing perf/watt on Cinebench between ARM and x86.

View attachment 88835

View attachment 88836

And just like that it's a reminder of the real world. Plenty of AVX optimized open source software without NEON ports yet.
And only a few in the opposite direction.

Nothingness · Nov 14, 2023

gdansk said:
And just like that it's a reminder of the real world. Plenty of AVX optimized open source software without NEON ports yet.
And only a few in the opposite direction.

Do you have examples of widely used OSS that lacks NEON port?

gdansk · Nov 14, 2023

Nothingness said:
Do you have examples of widely used OSS that lacks NEON port?

Nope. Just uncool stuff like PHP.

ikjadoon · Nov 15, 2023

ikjadoon said:
To be fair, Oryon only has a rough timeline of late Q2 / early Q3 ("mid-year 2024"). Arm v Qualcomm's trial won't start until September 2024, so perhaps just after Qualcomm's intended timeline.

The lawsuit is still heavily ongoing; you can follow along here:

Arm Ltd. v. Qualcomm Inc., 1:22-cv-01146 - CourtListener.com

Docket for Arm Ltd. v. Qualcomm Inc., 1:22-cv-01146 — Brought to you by Free Law Project, a non-profit dedicated to creating high quality open legal information.

www.courtlistener.com

Interesting tidbit #1: AMD, Apple, Ampere, MediaTek, TSMC, NVIDIA, Cadence, Google, Synopsys, Intel, Cadence etc. are all involved in the trial now. Lots of people giving depositions / receiving subpoenas to testify in Court.

Interesting tidbit #2: So far with discovery & depositions, the judge is mostly siding with Arm and against Qualcomm.

ALAs from Arm: Judge says Qualcomm's motion is partly granted, partly denied. Can't see the details.

Qualcomm tried to get a deposition from Masayoshi Son. Judge rules against Qualcomm here.

Qualcomm wanted discovery of Arm's IPO. Judge rules against Qualcomm here.

Qualcomm wanted docs from Antonio Viana at Arm. Judge rules against Qualcomm here.

Qualcomm wanted Apple's & Ampere's specific ALAs. Judge rules against Qualcomm here.

The last update is October 25, so literally yesterday haha.

Not many updates (as expected after just 3 weeks).

//

Pure speculation on my part: Axios reported recently that "Arm is in advanced talks on a large deal with an existing customer that, if it closes by year-end, would bring Q3 revenue at the high end of its guidance. But Haas [Arm CEO] says it's a 'complex deal' that might bleed into January, particularly given how negotiations can slow around the holidays. If so, Q3 would come in light but the full fiscal year would be okay. He adds that Arm has a very high degree of confidence the new contract will close."

I initially thought it can't be Qualcomm; Hamoa hasn't launched, so Arm's upcoming Q3/Q4 financials won't be significantly affected. But, then I wondered, perhaps Arm & Qualcomm are restructuring all of Qualcomm's Arm licenses as part of the NUVIA settlement deal.

Or maybe this is another customer (e.g., NVIDIA or AMD for their alleged consumer desktop CPUs), so it might not matter re: this lawsuit.

Nothingness · Nov 15, 2023

gdansk said:
Nope. Just uncool stuff like PHP.

That's significant, I think PHP is still used a lot. I wonder what the impact of AVX use is, I couldn't find any benchmark. The JIT supports AArch64, but doesn't emit NEON instructions it seems.

But that's just one example and one that doesn't matter for end users, we're not talking servers are we? I know AArch64 is a bit behind x86, but it's getting better and better support, and I don't think there's that many OSS that has AVX support and no NEON.

FlameTail · Nov 15, 2023

This SoC supports only DirectX12.

So will DX11 and older games work on it?

Edit: Okay, so as per my research- DirectX is reverse compatible with older versions. So in theory games using older DX versions should run.

DrMrLordX · Nov 16, 2023

FlameTail said:
This SoC supports only DirectX12.

So will DX11 and older games work on it?

Edit: Okay, so as per my research- DirectX is reverse compatible with older versions. So in theory games using older DX versions should run.

There's always dxvk.

soresu · Nov 16, 2023

FlameTail said:
Edit: Okay, so as per my research- DirectX is reverse compatible with older versions. So in theory games using older DX versions should run.

DX12 is a completely different beast from DX10/11 or DX7/8/9.

Otherwise DXVK wouldn't need to write new code to support DX8 given how well it already supports DX10/11.

What you call "reverse compatible" is most likely MS speak for "we force ODMs to write drivers for everything else we did earlier before they get the thumbs up for DX12".

If DX12 is anything like Vulkan in versatility you could write a translation layer for everything that came before it - but given the sorry state of MS's internal OGL -> DX12 trans layer perf relative to Zink (OGL -> VK) it doesn't seem to be so hot for that task.

igor_kavinski · Nov 16, 2023

FlameTail said:
Edit: Okay, so as per my research- DirectX is reverse compatible with older versions. So in theory games using older DX versions should run.

Nuh uh. If it were that simple, Intel's driver team wouldn't have had gone through so much trouble with ARC, trying to support older DirectX AAA titles. QC isn't targeting gaming so most likely, older DX games will not run or suffer from glitches that no one other than the community will have to fix.

uzzi38 · Nov 16, 2023

FlameTail said:
This SoC supports only DirectX12.

So will DX11 and older games work on it?

Edit: Okay, so as per my research- DirectX is reverse compatible with older versions. So in theory games using older DX versions should run.

So far QC's D3D11 driver sucks balls.

QC have to improve a LOT if they want the iGPU to perform reasonably across a wide variety of games. Or well... even work at all in some games, going off of prior 8cx devices.

soresu · Nov 16, 2023

igor_kavinski said:
QC isn't targeting gaming so most likely, older DX games will not run or suffer from glitches that no one other than the community will have to fix.

By then QC will likely have a Vulkan driver running on WoA with DXVK providing all the magic, leaving only x86 binary translation as the significant stumbling block.

FlameTail · Nov 16, 2023

soresu said:
By then QC will likely have a Vulkan driver running on WoA with DXVK providing all the magic, leaving only x86 binary translation as the significant stumbling block.

Wow DXVK sounds amazing, converting DX9,10,11 to Vulkan.

Do you think Qualcomm will get Zink working on it too? Zink converts OpenGL to Vulkan.

That would centralise a lot of the GPU performance on Vulkan, which is not necessarily a bad thing, since Adreno GPUs in their smartphone Snapdragons have excellent Vulkan performance.

soresu · Nov 16, 2023

FlameTail said:
Wow DXVK sounds amazing, converting DX9,10,11 to Vulkan.

They are in the process of adding D8VK to it soon also for DX8 support.

I'm not sure, but I think that a fork of DXVK is a vital cog in the RTX Remix toolset.

soresu · Nov 16, 2023

FlameTail said:
Do you think Qualcomm will get Zink working on it too? Zink converts OpenGL to Vulkan.

FlameTail said:
That would centralise a lot of the GPU performance on Vulkan, which is not necessarily a bad thing, since Adreno GPUs in their smartphone Snapdragons have excellent Vulkan performance.

Not sure - it's possible that they will just leave that to devs to do much like some have been doing with DXVK-Native which linked it directly to the game code so that it produces Vulkan output without the OS/drivers coming into it.

In the future I could see Zink displacing OGL driver code compilers for all significantly new GPU µArchs that demand a serious compiler rewrite.

OGL is a mountain to implement all the way up to v4.6 of the API, so it seems like the path of least resistance is likely to be taken going forward, even with the translation penalty vs a well optimised native OGL driver.

FlameTail · Nov 16, 2023

And what of OpenCL ?

soresu · Nov 16, 2023

FlameTail said:
And what of OpenCL ?

OCL -> Rusticl -> Zink -> Vulkan.

Question Qualcomm's first Nuvia based SoC - Hamoa

Diamond Member

Diamond Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member