Question Qualcomm's first Nuvia based SoC - Hamoa

poke01 · Nov 8, 2022

Qualcomm's working on a 2024 PC chip codename "Hamoa" with up to 12 (8P+4E).

Said to the same cache layout as M1. large private L1$, per-cluster L2$ (cluster = 4 cores, 12MB for every cluster) and a lot of LLC.

source:

https://twitter.com/x/status/1589405172979339264

JoeRambo · Oct 30, 2023

Anyone else also wishing that QC "hired" Chips And Cheese sometime in next year but before release, to do a proper deep dive slash sneak peak? Now that would be awesome, a true arch and perf deep dive at low level and not what marketing parrots allowed.
Some of those results are truly awesome, hopefully they are already hard at work with followup!

FlameTail · Oct 30, 2023

JoeRambo said:
Anyone else also wishing that QC "hired" Chips And Cheese sometime in next year but before release, to do a proper deep dive slash sneak peak? Now that would be awesome, a true arch and perf deep dive at low level and not what marketing parrots allowed.
Some of those results are truly awesome, hopefully they are already hard at work with followup!

We want Andrei Frumusanu

moinmoin · Oct 30, 2023

FlameTail said:
We want Andrei Frumusanu

No longer a 3rd party.

FlameTail · Oct 30, 2023

So will Strix Point beat the Adreno GPU in X Elite.

Right now Adreno is 50% ahead of the Radeon 780M with 12 CU RDNA3.

Strix Point is 16 CU RDNA4 I hear?

So the raw CU gain (12 -> 16) alone will give 33% performance gain and then IPC/Clock speed gains will determine if it will beat Adreno.

FlameTail · Oct 30, 2023

FlameTail said:
We want Andrei Frumusanu

Man was an absolute legend.

I don't know about you guys but I myself occasionally refer to his articles every now and then. They are an absolute gold mine.

FlameTail · Oct 30, 2023

Just checked his one on the M1 chips.

M1 has a 16 MB SLC. M1 Pro has 24 MB and M1 Max has a humongous 48 MB SLC.

Incredible.

That absolutely blows out of the water the tiny 6 MB unit on the Snapdragon X Elite.

FlameTail · Oct 30, 2023

Are we even sure the "42 MB Total Cache" which was quoted for the X Elite includes the SLC?

For one, they said 42 MB specifically during the CPU part of the announcement. On the contrary, the SLC is shared across all blocks in the chip.

The other is that we know there are 3 slices of 12 MB L2 cache distributed among the 3 clusters. What then about the L1? If we perhaps assume 512 KB of L1 per core, then 512 KB × 12 = 6 MB.

36 MB L2 + 6 MB L1 = 42 MB 'total cache'

SpudLobby · Oct 30, 2023

FlameTail said:
Also the reference device Qualcomm showed for the X Elite apparently had a whopping 64 GB of RAM, gleaning from the Task Manager Geekerean showed in his video.

Samsung unleashes industry's first LPDDR5X DRAM

Built using the 14 nm technology, Samsung's first 16 Gb LPDDR5X memory chips promise to drive further growth beyond the mobile sector, also targeting 5G and AI applications, as well as the metaverse. The new 16 Gb LPDDR5X comes three years after Samsung introduced the industry's first 8 Gb...

www.notebookcheck.net

So this means the entire 64 GB RAM is one module? That would be impressive.

For reference if you look at the motherboard of an M2 Max macbook, the M2 Max chip has 4 'modules' of RAM around it.

It has 4 modules I believe because the bus width is 512 bit…..

SpudLobby · Oct 30, 2023

FlameTail said:
Are we even sure the "42 MB Total Cache" which was quoted for the X Elite includes the SLC?

For one, they said 42 MB specifically during the CPU part. The SLC is shared across all blocks in the chip.

The other is that we know there are 3 slices of L2 cache distributed among the 3 clusters. What then about the L1? If we perhaps assume 512 KB of L1 per core, then 512 KB × 12 = 6 MB.

36 MB L2 + 6 MB L1 = 42 MB 'total cache'

Yes this is one possibility RE the cache — that it’s 36 L2 + 512kbx12 of L1. Apple’s is at 320kb of L1 so it’s not impossible, and it certainly won’t be the size of the Cortex X4’s.

But I wouldn’t count on it

FlameTail · Oct 30, 2023

SpudLobby said:
It has 4 modules I believe because the bus width is 512 bit…..

The bottom line reason is actually because the size of a memory controller is 128 bit.
So there are 4 memory controllers to get to the 512 bit size.

4 memory controllers would require 4 modules. I guess that explains it.

See this die shot:

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

Apple M1 Pro and M1 Max: Specs, Performance, Everything We Know

Apple takes its fight against Intel to a second round.

www.anandtech.com

FlameTail · Oct 30, 2023

SpudLobby said:
Yes this is one possibility RE the cache — that it’s 36 L2 + 512kbx12 of L1. Apple’s is at 320kb of L1 so it’s not impossible, and it certainly won’t be the size of the Cortex X4’s.

But I wouldn’t count on it

Hey can't we ping and ask the man himself who architected this thing? 😃

SpudLobby · Oct 30, 2023

FlameTail said:
The bottom line reason is actually because the size of a memory controller is 128 bit.
So there are 4 memory controllers to get to the 512 bit size.

4 memory controllers would require 4 modules. I guess that explains it.

See this die shot:

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

Apple M1 Pro and M1 Max: Specs, Performance, Everything We Know

Apple takes its fight against Intel to a second round.

www.anandtech.com

Right yeah.

Doug S · Oct 30, 2023

moinmoin said:
If the interests of a company and that of its employees mismatches the company still has several ways to go about that. If Apple indeed shut down any efforts to work on server chips even if that's what the employees that left absolutely wanted to work on, then it didn't value those employees high enough. It's really not a new thing nowadays to allow highly valued employees to work at least part time on stuff that current company proper see no value in. If that's really how Apple lost most of its previously formidable CPU team, tough luck and zero commiseration (especially considering where they ended up at).

Working on something that the company explicitly says it has no interest in ever using, let alone selling in products, would be pointless. If they had said "we want to work on generative AI chips" and Apple said "well we don't know if that's something that fits our future but what the heck", that's one thing, but if Apple shut down their server CPU ambitions it was because Apple was 100% sure they didn't want to have anything to do with servers even internally. I always thought Apple might have a use for servers built using their own chips but Tim Cook may disagree.

I doubt they would have been that interested in working on server CPUs part time that would never even get fabbed. And no, Apple was not going to fab their designs just to keep them happy, not when mask sets cost tens of millions of dollars in leading edge nodes.

SpudLobby · Oct 30, 2023

FlameTail said:
So will Strix Point beat the Adreno GPU in X Elite.

Right now Adreno is 50% ahead of the Radeon 780M with 12 CU RDNA3.

Strix Point is 16 CU RDNA4 I hear?

So the raw CU gain (12 -> 16) alone will give 33% performance gain and then IPC/Clock speed gains will determine if it will beat Adreno.

To be fair that was only one benchmark for mobile. Realistically the best of Point will beat Adreno at peak performance yes and will somewhat fix some bandwidth issues I think Phoenix has, but it’s not going to be crazy and I bet Adreno will still do better in low power scenarios.

Doug S · Oct 30, 2023

FlameTail said:
Hey can't we ping and ask the man himself who architected this thing? 😃

I imagine anywhere he feels a desire to chime in and correct our misapprehensions he would, but honestly in his shoes I'd probably get a bigger kick at laughing at everything we get wrong!

SpudLobby · Oct 30, 2023

FlameTail said:
Just checked his one on the M1 chips.

M1 has a 16 MB SLC. M1 Pro has 24 MB and M1 Max has a humongous 48 MB SLC.

Incredible.

That absolutely blows out of the water the tiny 6 MB unit on the Snapdragon X Elite.

Dude the M1 Pro and M1 Max have a gigantic GPUs or gigantic GPU options and a core point of that added SLC is to improve bandwidth for the GPU since it’s a system cache not an L3, and also lower power use.

Adreno with the X Elite is closer to the M2/M1 standard with higher power peaks, or a base level M1/2 Pro GPU.

Just because the thing has 12 cores and aims for CPU MT on par with the Mx Pro/Max doesn’t mean it’s overall a similar chip target. If it were the GPU would be larger and they’d go with a 256/512B bus.

Better to think of this as a competitor to the M2 Pro solely on the CPU front and then really a Phoenix/Strix/MTL/ADL competitor.

It’s not an overall M2 Pro/Max competitor or a Strix Halo part.

SpudLobby · Oct 30, 2023

I can confirm the L1 is larger than the X4’s 128kb or AMD and Intel’s paltry stuff — its humongous similar to Apple’s for whatever that’s worth. This checks out based on the designer history, the shared L2 design etc. Moves the cache hierarchy down and closer to the core basically which is great for power + perf with a massive core.

I dunno if it’ll be similar to Apple’s on the mark or even larger, though.

SpudLobby · Oct 30, 2023

uzzi38 said:
@SpudLobby Slight correction: it's not full platform power, it's the cooling capacity of the chassis. Significant difference between the two tbh.

From his explanation over on Discord, it sounds a lot closer to what STAPM power indicates on Ryzen laptops.

Thank you, my apologies — I take that back for anyone who read that.

I thought it was platform power akin to his measurements for the benchmarks, appears not.

SpudLobby · Oct 30, 2023

FlameTail said:
Do you think Snapdragon on PC will become a loved choice by content creators?

Content creators love the incredible battery life and performance of the Apple Silicon Macbooks. But the thing is Appls Silicon has special accelerators for content creation stuff like video editing - such as the Media Engine. If Qualcomm does not have the likes of the media engine, it may not be as performant as the Apple Silicon Macs.

Yes I think it has that potential it they work with firms which they seem intent on like with Da Vinci Resolve which when ported to WoA will be accelerated for Qualcomm. They’ve claimed they’re doing the same with Adobe.

However others are working on accelerators etc and Microsoft had announced their own collaboration with Adobe that simply makes use of an on-device NPU from any OEM via DirectML, so to some extent stuff like this will be a matter of hardware devoted to it and possibly generic.

They have dedicated transcoding and all yes.

Look up the Da Vinci Resolve announcement. Very clear they’re not sitting idly by RE: content creators and acceleration, the port will make use of the Hexagon NPU.

roger_k · Oct 30, 2023

moinmoin said:
Thanks for the Cinebench 24 comparison. That's impressive indeed.

I think I'll start calling GB6's "MT" CT instead, as in "Consumer limited Test". GB6 does not contain a classical MT test score, of which people intuitively have the expectation of being able to fairly compare chips with vastly different amounts of cores. So any Cinebench version is better than GB6's "MT" any way any day.

Cinebench is basically a test of floating point SIMD throughput across all cores. The older R23 version tests the case when all data fits in the L2 cache, the new 2024 version uses a larger data set where the data does not fit in the cache (and I am pretty sure the only reason why Apple and Nuvia show good results here is because they have larger caches than x86 CPUs).

You may criticise GB6 MC all you want, but Cinebench is about as useful for inferring general purpose multi-core performance as the 0-100 time is informative about a usability of a car for a family on five in rush hour traffic. It's a niche, domain-specific benchmark that doesn't translate well to anything outside. BTW, for all intends and purposes Cinebench is embedded within GB6 (albeit on a very simple scene) and exhibits perfect MC scaling, as expected.

John Bruno · Oct 30, 2023

FlameTail said:
Man was an absolute legend.

I don't know about you guys but I myself occasionally refer to his articles every now and then. They are an absolute gold mine.

What do you mean *was* a legend? He still is.

John Bruno · Oct 30, 2023

Scary fast:

Qualcomm Snapdragon X Elite Benchmarks: A Potential Game-Changer

Qualcomm's Snapdragon X Elite is poised to shake up the PC landscape, with great performance, wireless connectivity, and next-level efficiency.

hothardware.com

SpudLobby · Oct 30, 2023

Can’t wait to see Oryon in 8 Gen 4, and specifically energy efficient versions tbh

trivik12 · Oct 30, 2023

Based on what we saw from Apple today, X Elite should be better than everything but highest end M3 Max unless Qualcomm has oversold it big time.

lopri · Oct 31, 2023

Performance seems similar to Zen 4 and M2? All three of them on similar (same) nodes..

Kudos to Qualcomm for this milestone. I do not know if this core will make it in to smartphones, but I do hope revisions will do. Maybe I will stay with Android instead of jumping ship to iOS.

Question Qualcomm's first Nuvia based SoC - Hamoa

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Golden Member

Golden Member

Golden Member

Member

Junior Member

Junior Member

Golden Member

Senior member

Elite Member