Question Qualcomm's first Nuvia based SoC - Hamoa

Page 27 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Anyone else also wishing that QC "hired" Chips And Cheese sometime in next year but before release, to do a proper deep dive slash sneak peak? Now that would be awesome, a true arch and perf deep dive at low level and not what marketing parrots allowed.
Some of those results are truly awesome, hopefully they are already hard at work with followup!
 

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,761
106
Anyone else also wishing that QC "hired" Chips And Cheese sometime in next year but before release, to do a proper deep dive slash sneak peak? Now that would be awesome, a true arch and perf deep dive at low level and not what marketing parrots allowed.
Some of those results are truly awesome, hopefully they are already hard at work with followup!
We want Andrei Frumusanu
 

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,761
106
So will Strix Point beat the Adreno GPU in X Elite.

Right now Adreno is 50% ahead of the Radeon 780M with 12 CU RDNA3.

Strix Point is 16 CU RDNA4 I hear?

So the raw CU gain (12 -> 16) alone will give 33% performance gain and then IPC/Clock speed gains will determine if it will beat Adreno.
 

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,761
106
Just checked his one on the M1 chips.

M1 has a 16 MB SLC. M1 Pro has 24 MB and M1 Max has a humongous 48 MB SLC.

Incredible.

That absolutely blows out of the water the tiny 6 MB unit on the Snapdragon X Elite.
 

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,761
106
Are we even sure the "42 MB Total Cache" which was quoted for the X Elite includes the SLC?

For one, they said 42 MB specifically during the CPU part of the announcement. On the contrary, the SLC is shared across all blocks in the chip.

The other is that we know there are 3 slices of 12 MB L2 cache distributed among the 3 clusters. What then about the L1? If we perhaps assume 512 KB of L1 per core, then 512 KB × 12 = 6 MB.

36 MB L2 + 6 MB L1 = 42 MB 'total cache'
 

SpudLobby

Golden Member
May 18, 2022
1,041
702
106
Also the reference device Qualcomm showed for the X Elite apparently had a whopping 64 GB of RAM, gleaning from the Task Manager Geekerean showed in his video.



So this means the entire 64 GB RAM is one module? That would be impressive.

For reference if you look at the motherboard of an M2 Max macbook, the M2 Max chip has 4 'modules' of RAM around it.
It has 4 modules I believe because the bus width is 512 bit…..
 
  • Like
Reactions: Tlh97 and Thibsie

SpudLobby

Golden Member
May 18, 2022
1,041
702
106
Are we even sure the "42 MB Total Cache" which was quoted for the X Elite includes the SLC?

For one, they said 42 MB specifically during the CPU part. The SLC is shared across all blocks in the chip.

The other is that we know there are 3 slices of L2 cache distributed among the 3 clusters. What then about the L1? If we perhaps assume 512 KB of L1 per core, then 512 KB × 12 = 6 MB.

36 MB L2 + 6 MB L1 = 42 MB 'total cache'
Yes this is one possibility RE the cache — that it’s 36 L2 + 512kbx12 of L1. Apple’s is at 320kb of L1 so it’s not impossible, and it certainly won’t be the size of the Cortex X4’s.

But I wouldn’t count on it
 
  • Like
Reactions: Tlh97 and FlameTail

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,761
106
It has 4 modules I believe because the bus width is 512 bit…..
The bottom line reason is actually because the size of a memory controller is 128 bit.
So there are 4 memory controllers to get to the 512 bit size.

4 memory controllers would require 4 modules. I guess that explains it.

See this die shot:


 
  • Like
Reactions: Tlh97 and SpudLobby

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,761
106
Yes this is one possibility RE the cache — that it’s 36 L2 + 512kbx12 of L1. Apple’s is at 320kb of L1 so it’s not impossible, and it certainly won’t be the size of the Cortex X4’s.

But I wouldn’t count on it
Hey can't we ping and ask the man himself who architected this thing? 😃
 

SpudLobby

Golden Member
May 18, 2022
1,041
702
106

Doug S

Diamond Member
Feb 8, 2020
3,351
5,870
136
If the interests of a company and that of its employees mismatches the company still has several ways to go about that. If Apple indeed shut down any efforts to work on server chips even if that's what the employees that left absolutely wanted to work on, then it didn't value those employees high enough. It's really not a new thing nowadays to allow highly valued employees to work at least part time on stuff that current company proper see no value in. If that's really how Apple lost most of its previously formidable CPU team, tough luck and zero commiseration (especially considering where they ended up at).

Working on something that the company explicitly says it has no interest in ever using, let alone selling in products, would be pointless. If they had said "we want to work on generative AI chips" and Apple said "well we don't know if that's something that fits our future but what the heck", that's one thing, but if Apple shut down their server CPU ambitions it was because Apple was 100% sure they didn't want to have anything to do with servers even internally. I always thought Apple might have a use for servers built using their own chips but Tim Cook may disagree.

I doubt they would have been that interested in working on server CPUs part time that would never even get fabbed. And no, Apple was not going to fab their designs just to keep them happy, not when mask sets cost tens of millions of dollars in leading edge nodes.
 

SpudLobby

Golden Member
May 18, 2022
1,041
702
106
So will Strix Point beat the Adreno GPU in X Elite.

Right now Adreno is 50% ahead of the Radeon 780M with 12 CU RDNA3.

Strix Point is 16 CU RDNA4 I hear?

So the raw CU gain (12 -> 16) alone will give 33% performance gain and then IPC/Clock speed gains will determine if it will beat Adreno.
To be fair that was only one benchmark for mobile. Realistically the best of Point will beat Adreno at peak performance yes and will somewhat fix some bandwidth issues I think Phoenix has, but it’s not going to be crazy and I bet Adreno will still do better in low power scenarios.
 

SpudLobby

Golden Member
May 18, 2022
1,041
702
106
Just checked his one on the M1 chips.

M1 has a 16 MB SLC. M1 Pro has 24 MB and M1 Max has a humongous 48 MB SLC.

Incredible.

That absolutely blows out of the water the tiny 6 MB unit on the Snapdragon X Elite.
Dude the M1 Pro and M1 Max have a gigantic GPUs or gigantic GPU options and a core point of that added SLC is to improve bandwidth for the GPU since it’s a system cache not an L3, and also lower power use.

Adreno with the X Elite is closer to the M2/M1 standard with higher power peaks, or a base level M1/2 Pro GPU.

Just because the thing has 12 cores and aims for CPU MT on par with the Mx Pro/Max doesn’t mean it’s overall a similar chip target. If it were the GPU would be larger and they’d go with a 256/512B bus.

Better to think of this as a competitor to the M2 Pro solely on the CPU front and then really a Phoenix/Strix/MTL/ADL competitor.

It’s not an overall M2 Pro/Max competitor or a Strix Halo part.
 

SpudLobby

Golden Member
May 18, 2022
1,041
702
106
I can confirm the L1 is larger than the X4’s 128kb or AMD and Intel’s paltry stuff — its humongous similar to Apple’s for whatever that’s worth. This checks out based on the designer history, the shared L2 design etc. Moves the cache hierarchy down and closer to the core basically which is great for power + perf with a massive core.

I dunno if it’ll be similar to Apple’s on the mark or even larger, though.
 

SpudLobby

Golden Member
May 18, 2022
1,041
702
106
@SpudLobby Slight correction: it's not full platform power, it's the cooling capacity of the chassis. Significant difference between the two tbh.

From his explanation over on Discord, it sounds a lot closer to what STAPM power indicates on Ryzen laptops.
Thank you, my apologies — I take that back for anyone who read that.

I thought it was platform power akin to his measurements for the benchmarks, appears not.
 

SpudLobby

Golden Member
May 18, 2022
1,041
702
106
Do you think Snapdragon on PC will become a loved choice by content creators?

Content creators love the incredible battery life and performance of the Apple Silicon Macbooks. But the thing is Appls Silicon has special accelerators for content creation stuff like video editing - such as the Media Engine. If Qualcomm does not have the likes of the media engine, it may not be as performant as the Apple Silicon Macs.
Yes I think it has that potential it they work with firms which they seem intent on like with Da Vinci Resolve which when ported to WoA will be accelerated for Qualcomm. They’ve claimed they’re doing the same with Adobe.

However others are working on accelerators etc and Microsoft had announced their own collaboration with Adobe that simply makes use of an on-device NPU from any OEM via DirectML, so to some extent stuff like this will be a matter of hardware devoted to it and possibly generic.


They have dedicated transcoding and all yes.

Look up the Da Vinci Resolve announcement. Very clear they’re not sitting idly by RE: content creators and acceleration, the port will make use of the Hexagon NPU.
 

roger_k

Member
Sep 23, 2021
102
219
86
Thanks for the Cinebench 24 comparison. That's impressive indeed.

I think I'll start calling GB6's "MT" CT instead, as in "Consumer limited Test". GB6 does not contain a classical MT test score, of which people intuitively have the expectation of being able to fairly compare chips with vastly different amounts of cores. So any Cinebench version is better than GB6's "MT" any way any day.

Cinebench is basically a test of floating point SIMD throughput across all cores. The older R23 version tests the case when all data fits in the L2 cache, the new 2024 version uses a larger data set where the data does not fit in the cache (and I am pretty sure the only reason why Apple and Nuvia show good results here is because they have larger caches than x86 CPUs).

You may criticise GB6 MC all you want, but Cinebench is about as useful for inferring general purpose multi-core performance as the 0-100 time is informative about a usability of a car for a family on five in rush hour traffic. It's a niche, domain-specific benchmark that doesn't translate well to anything outside. BTW, for all intends and purposes Cinebench is embedded within GB6 (albeit on a very simple scene) and exhibits perfect MC scaling, as expected.
 

trivik12

Senior member
Jan 26, 2006
348
318
136
Based on what we saw from Apple today, X Elite should be better than everything but highest end M3 Max unless Qualcomm has oversold it big time.
 

lopri

Elite Member
Jul 27, 2002
13,310
687
126
Performance seems similar to Zen 4 and M2? All three of them on similar (same) nodes..

Kudos to Qualcomm for this milestone. I do not know if this core will make it in to smartphones, but I do hope revisions will do. Maybe I will stay with Android instead of jumping ship to iOS.