Discussion ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion

SpudLobby · Jan 13, 2024

Doug S said:
I wasn't suggesting ACTUALLY doing that, since as you say getting an A76 and an X4 able to use the same DRAM is probably not possible. Just that this effect should be taken into account as well. When Apple went from LPDDR4 to LPDDR4X to LPDDR5, some of their gains would have been from those transitions, and they will get a bump from LPDDR5X when they make that transition.

How much that accounts for is unknown (but in the low single digits unless you choose specific tests) but the more your benchmark components are sensitive to memory bandwidth (memory latency is unlikely to matter much in the smallish subtests GB6 is doing, especially at today's L3/SLC sizes) the more of an effect it will have.

Yep.

SarahKerrigan · Jan 13, 2024

soresu said:
I wouldn't say flop, there are likely SVE2 instructions in Android code or anything else major, but right now anything that doesn't have money going into it regularly doesn't yet feel the need to adopt SVE2 over polishing their NEON code paths.

The situation isn't a million miles from early AVX512 adoption with Apple's continued v8-A use acting as a sort of analogue to the fragmentation issues that had (and still has 😂) plus the lack of AMD support.

High-visibility vendors of SoC's with SVE2-capable cores shipping with SVE2 disabled, and nobody particularly caring, is an interesting definition of "not a flop."

Edited for clarity: Maybe "flop" is too harsh, but it sure ain't great. That's a lot more serious than "welllllll apps haven't really needed to implement SVE2 fastpaths yet, so some of them haven't."

DrMrLordX · Jan 13, 2024

SVE2 makes less sense for phone SoCs anyway. It makes a lot of sense for ARM server SoCs and possible future desktop SoCs.

soresu · Jan 13, 2024

SarahKerrigan said:
High-visibility vendors of SoC's with SVE2-capable cores shipping with SVE2 disabled, and nobody particularly caring, is an interesting definition of "not a flop."

Oh damn I forgot about that.

But yeah as @DrMrLordX points out it doesn't make as much sense in mobile where greater than 128 bit processing would have to be flawlessly low power in order to not burn your junk off when its in your front pocket half of the time 😂

It doesn't make much sense to leave it out of Apple Mx though - especially for the higher end Max and Pro SKUs, so I do wonder what is keeping them here.

Perhaps their next significant IPC jump is a µArch overhaul that also switches to v9-A.

naukkis · Jan 14, 2024

soresu said:
Oh damn I forgot about that.

But yeah as @DrMrLordX points out it doesn't make as much sense in mobile where greater than 128 bit processing would have to be flawlessly low power in order to not burn your junk off when its in your front pocket half of the time 😂

It doesn't make much sense to leave it out of Apple Mx though - especially for the higher end Max and Pro SKUs, so I do wonder what is keeping them here.

Perhaps their next significant IPC jump is a µArch overhaul that also switches to v9-A.

Every Apple chip has powerful GPU for highly paraller workloads - offering few tens of times more flops than CPU part. Complicating and making cpu part more inefficient for few tens of percent more flops for workloads that are better handled with GPU won't make any sense. Wide SIMD in x86 cpu's doesn't either for consumer class workloads.

FlameTail · Jan 14, 2024

naukkis said:
Every Apple chip has powerful GPU for highly paraller workloads - offering few tens of times more flops than CPU part. Complicating and making cpu part more inefficient for few tens of percent more flops for workloads that are better handled with GPU won't make any sense. Wide SIMD in x86 cpu's doesn't either for consumer class workloads.

I don't buy that argument.

soresu · Jan 14, 2024

naukkis said:
Every Apple chip has powerful GPU for highly paraller workloads - offering few tens of times more flops than CPU part. Complicating and making cpu part more inefficient for few tens of percent more flops for workloads that are better handled with GPU won't make any sense. Wide SIMD in x86 cpu's doesn't either for consumer class workloads.

Eh?

I'm sure anyone aspiring to run software video encoders on Mac Studio might disagree.

Surely there are other such branch heavy code use cases that do not favor GPU compute?

naukkis · Jan 14, 2024

soresu said:
Eh?

I'm sure anyone aspiring to run software video encoders on Mac Studio might disagree.

Surely there are other such branch heavy code use cases that do not favor GPU compute?

Video encoding is just example of programs that will not benefit much from larger than 128-bit vector lengths. AVX2 does give x86 some 10% speedup and AVX512 about same but that comes from combination of additional useful instructions and wider registers. Apple cpu does better with it quad 128-bit NEON pipelines, more SIMD execution units far outweight benefits from going wider registers. Apple's designs may lack cpu core counts but SIMD registers aren't problem at all.

Doug S · Jan 14, 2024

naukkis said:
Every Apple chip has powerful GPU for highly paraller workloads - offering few tens of times more flops than CPU part. Complicating and making cpu part more inefficient for few tens of percent more flops for workloads that are better handled with GPU won't make any sense. Wide SIMD in x86 cpu's doesn't either for consumer class workloads.

Between the AMX unit for more complex numerical work like matrixes and the GPU for standard muladd type stuff they have things pretty well covered. For specialty stuff like counting/shifting bits, divide, trig, etc. they have the four NEON units plus those are good for "standard" code where you want instructions output by a compiler rather than using a library.

What makes Apple's GPU a better option when compared to PCs or Android is that EVERY iPhone and EVERY Apple Silicon Mac has the same GPU. On a PC you might have an Intel GPU, might have an AMD GPU, might have an Nvidia GPU, or might (on a server) have no GPU. On Android you have the same problems, though at least you know for sure you will have a GPU.

Better yet, Apple's GPUs share cache coherent high bandwidth memory with the CPU, so throwing something off to the GPU to calculate is much easier/quicker than it is on systems where the GPU has its own RAM.

All that really boxes in the advantages for SVE2. Not saying there is NOTHING it wouldn't do better, but if there was something it could do better that Apple cared about they could very likely address that use case in future implementations of AMX or their GPU.

moinmoin · Jan 14, 2024

SpudLobby said:
Reading into it too much. No company likes competition but it’s a boon for us. Of course a great X5 on par with Apple/QC on perf & in the same league on perf/W is a PITA — frustrating, annoying, even, for everyone but MediaTek, Samsung and Nvidia.

In a business that has lead times of usually 3 years and more from inception to a finished product on the market it's foolish to see the competition in terms of "like", "PITA", "frustration" and "annoying". To survive in such an environment every company that wants to compete is required to make own projections of where the competition will be in 3+ years to make any sensible business decisions about its own R&D and products in that future. Also decisions like whether to buy startups like Nuvia only can be done sensibly when such projections exist and buying such companies gives a significant advantage for the financial transaction to make sense.

In this business what we see playing out in front of us in the market are the cumulation of decisions made at different independent companies multiple years up to a decade in the past. If a company fails to compete at any given point most likely its initial projection failed thus having led to bad decision making.

SpudLobby · Jan 14, 2024

moinmoin said:
In a business that has lead times of usually 3 years and more from inception to a finished product on the market it's foolish to see the competition in terms of "like", "PITA", "frustration" and "annoying". To survive in such an environment every company that wants to compete is required to make own projections of where the competition will be in 3+ years to make any sensible business decisions about its own R&D and products in that future.

Sure they are well aware of what the X5 should look like. Ultimately I think this is just borderline neurodivergent to read into for cheap points — I’m not literally under the impression these multinational F500 firms are personified cartoon characters yet it’s not uncommon to see analogies to that effect for clarity or wit.

soresu · Jan 15, 2024

SpudLobby said:
Of course a great X5 on par with Apple/QC on perf & in the same league on perf/W is a PITA — frustrating, annoying, even, for everyone but MediaTek, Samsung and Nvidia.

Remember that X5 is also likely basis for Neo V3 too, so that's a whole extra slew of licensees very happy about it.

soresu · Jan 15, 2024

SpudLobby said:
I mean Qualcomm’s 8 Gen 2 don’t have SVE2 even included/enabled fwiw, and the X Elite is Arm V8. So it’s pretty dire lol

My main point there was that X1 got a serious boost to NEON performance from a NEON unit doubling over A77 (2 -> 4), so with or without SVE2 the current SIMD perf is no slouch even if individual code issue is limited to 128 bit per unit.

Perhaps X5 might bring another boost there too.

Not sure if X2 -> X4 changed much on the SIMD side of things, especially given current Snapdragon SoCs apparently don't even have SVE2.

ikjadoon · Jan 17, 2024

Notebookcheck's Xiaomi 14 Pro review a few weeks ago shed a bit more light on the Snapdragon 8 Gen 3's power.

SD 8G2: X3 + 2x A715 + 2x A710 + 3x A510 (8C = 1 big, 4 mid, 3 small)
SD 8G3: X4 + 5x A720 + 2x A520 (8C = 1 big, 5 mid, 2 small)

They have the power of a Geekbench 5.5 full run (1T + nT) with the screen on, and then idle numbers. Ideally, we could have the raw data to tease out 1T vs nT, but without that, here are the idle-normalized averages:

SD8G2: 4.54W - 0.902W idle = 3.638W
SD8G3: 5.2W - 0.732W idle = 4.468W (+22.81% more power)

Notably, the 8G3 has one more mid core and one less small core; the X4 is clocked at 3.3 GHz vs 3.2 GHz in the X3; and none of the cores are the same; so it's hard to make a core-to-core comparison (unless we had the raw data for 1T).

I tried to match peaks, but it's too messy to give a number, especially when even 0.5W is 6% of 8W (the highest I can see). Would love the raw data to get some joules out of these.

ikjadoon · Jan 21, 2024

Much higher power consumption on 8G3 vs 8G2. Can't tell if 17.37W is the GPU or CPU or both, as I'm not familiar with this benchmark.

On an nT workload (Burnout Benchmark) at XDA:

Absolutely obscene +49% increase in 1T GB6. I need to go back and re-read this review and see what's up.

soresu · Jan 21, 2024

ikjadoon said:
Much higher power consumption on 8G3 vs 8G2. Can't tell if 17.37W is the GPU or CPU or both, as I'm not familiar with this benchmark.

On an nT workload (Burnout Benchmark) at XDA:

View attachment 92291

Absolutely obscene +49% increase in 1T GB6. I need to go back and re-read this review and see what's up.

GB6 MT increase is pretty close to that figure too at just under +45%.

soresu · Jan 21, 2024

The actual text says "computational workload", so it may just be a subset of GB6 scores?

soresu · Jan 22, 2024

ikjadoon said:
On an nT workload (Burnout Benchmark) at XDA:

When you look at the proportional increase in CPU perf vs power it is still a win for perf/watt though?

That being said, 17.37W is getting mighty damn toasty for something many people put so close to their junk.

I'm hoping that is an absolute worst case peak value otherwise children will no longer be on the cards 😂

Nothingness · Jan 22, 2024

soresu said:
When you look at the proportional increase in CPU perf vs power it is still a win for perf/watt though?

Not necessarily, you'd need the average power consumption to conclude that, not peak power.

BTW the 13.67W figure for Gen 2 comes from their previous review of Gen 2 vs Gen 1 where the curve shows 16W peek rather than 13.7W (the 11.5W quoted for Gen 1 matches the curve). That's strange.

soresu · Jan 22, 2024

Indeed that does seem like a strange screw up to make:

snapdragon-8-gen-2-vs-snapdragon-8-plus-gen-1-wattage.jpg

In that case it would be the Gen 1 -> Gen 2 that was the big peak increase, while Gen 2 -> Gen 3 is a relatively minor 9.4%.

Tup3x · Jan 22, 2024

ikjadoon said:
Much higher power consumption on 8G3 vs 8G2. Can't tell if 17.37W is the GPU or CPU or both, as I'm not familiar with this benchmark.

On an nT workload (Burnout Benchmark) at XDA:

View attachment 92291

Absolutely obscene +49% increase in 1T GB6. I need to go back and re-read this review and see what's up.

Those Gen 2 benches make no sense. My S23 does ~2000 single and ~5400 multi.

Nothingness · Jan 22, 2024

Tup3x said:
Those Gen 2 benches make no sense. My S23 does ~2000 single and ~5400 multi.

Are you sure?

My S23 Ultra gets 1585/4482 on 6.0.0 and 1615/4664 on 6.2.2.

EDIT: looking at the GB DB, it looks like 2000/5400 is the norm. Now the question is: why do I get as bad results as XDA?

EDIT 2: mystery solved, I was in power saving mode. I now get 2001/5206.
So I wonder if XDA ran Gen 2 in power saving mode, did they also run Gen 3 in power saving mode? The >40% ST increase in performance looks dubious so I guess Gen 3 was not in power saving mode.

This raises concerns questions about XDA results

DrMrLordX · Jan 22, 2024

Nothingness said:
This raises concerns questions about XDA results

Makes me wish they could run the chips @isopower.

FlameTail · Jan 22, 2024

soresu said:
When you look at the proportional increase in CPU perf vs power it is still a win for perf/watt though?

That being said, 17.37W is getting mighty damn toasty for something many people put so close to their junk.

I'm hoping that is an absolute worst case peak value otherwise children will no longer be on the cards 😂

17W?

They should put this thing in laptops.

Tup3x · Jan 22, 2024

Nothingness said:
Are you sure?

My S23 Ultra gets 1585/4482 on 6.0.0 and 1615/4664 on 6.2.2.

EDIT: looking at the GB DB, it looks like 2000/5400 is the norm. Now the question is: why do I get as bad results as XDA?

EDIT 2: mystery solved, I was in power saving mode. I now get 2001/5206.
So I wonder if XDA ran Gen 2 in power saving mode, did they also run Gen 3 in power saving mode? The >40% ST increase in performance looks dubious so I guess Gen 3 was not in power saving mode.

This raises concerns questions about XDA results

Gen 3 had all power savings turned off and it ran full throttle. They used ASUS gaming phone so obviously the results are going to look like this. By default only something like OnePlus devices seem to give these really low numbers.

Discussion ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion

Golden Member

Senior member

Lifer

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Senior member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Lifer

Diamond Member

Golden Member