Question x86 and ARM architectures comparison thread.

OneEng2 · Saturday at 8:10 PM

poke01 said:
nope, we still compared Zen4 to Raptor Lake. AMD was ahead on node and result

Of course we compared them!

We didn't conclude that Raptor Lake was an inferior architecture though, only that Zen 4 had slightly better performance. These days, It is always always noted which processor is benefitting from a newer or more expensive process node. It has become hyper-critical. "The best" architecture is the one that makes the most money. This involves being able to do more with less.

I am an engineer. I can tell you that most any idiot can "do more with more". Doing more with less requires talent though.

Schmide · Saturday at 8:11 PM

poke01 said:
I DID acknowledge it in post #98. The 12 core/12 thread (8P + 4E) M4 Pro is 64% slower than 16/32 thread strix halo. But the 12 core version uses less than 40 watts.

View attachment 128127

I don't think those results are CPU only? Strix Halo seems to score ~200s in blender 4.4.

poke01 · Saturday at 8:12 PM

OneEng2 said:
and I never stated within a power envelop. Zen 5 can pound M4 in MT by a large margin .... even in Laptop .... but ESPECIALLY everywhere else where Zen 5's infrastructure is simply light years ahead of M4.

no it can't. It only does so by pounding it with 2x the power.

poke01 · Saturday at 8:13 PM

Schmide said:
I don't think those results are CPU only? Strix Halo seems to score ~200s in blender 4.4.

it literally says CPU

poke01 · Saturday at 8:18 PM

OneEng2 said:
By all means. Compare away! I agree... BUT if we are evaluating an ARCHITECTURE, it seems like a very THIN argument to say .... "M4 is good at laptop therefore it is fundamentally better everywhere" .... and that is even more true since my arguement is that M4 isn't even fundamentally better than Zen 5 in laptop! It is better at some things, but not most things. Not sure how that makes it "better". Furthermore, it is more expensive AND has a process advantage.

How is the Apple M4 urach not better when Apple is only losing by 64% but AMD has 2x P cores and 2x the threads?

The NODE advanatge does NOT give apple this big of a lead.

adroc_thurston · Saturday at 8:21 PM

gdansk said:
Q1. Or cancelled. It isn't clear at this time.

Really not the point.

poke01 · Saturday at 8:21 PM

OneEng2 said:
We didn't conclude that Raptor Lake was an inferior architecture though, only that Zen 4 had slightly better performance. These days, It is always always noted which processor is benefitting from a newer or more expensive process node. It has become hyper-critical. "The best" architecture is the one that makes the most money. This involves being able to do more with less.

it was though, Zen4 is better than Raptor Lake. Raptor Lake only seemed like it was winning benchmarks cause it consumed a LOT more power.

Schmide · Saturday at 8:22 PM

poke01 said:
it literally says CPU
View attachment 128129

Can you link to that site. It just seems off

adroc_thurston · Saturday at 8:24 PM

poke01 said:
How is the Apple M4 urach not better when Apple is only losing by 64% but AMD has 2x P cores and 2x the threads?

Low-key feeling like dragging the full Turin-D SKU out, locking it to 400W cTDP just to own you

poke01 · Saturday at 8:35 PM

adroc_thurston said:
Low-key feeling like dragging the full Turin-D SKU out, locking it to 400W cTDP just to own you

do it?

poke01 · Saturday at 8:38 PM

adroc_thurston said:
Low-key feeling like dragging the full Turin-D SKU out, locking it to 400W cTDP just to own you

why 400watts? We are seeing if the uArch is better or not
lock it 16c/32t at 80 watts.
go on own me

poke01 · Saturday at 8:40 PM

poke01 said:
why 400watts? We are seeing if the uArch is better or not
lock it 16c/32t at 80 watts.
go on own me

@adroc_thurston I'll give you a week. Do the same test in Blender 3.3 classroom with the above requirments that way we can also compare to Strix halo and M4 Pro

Schmide · Saturday at 8:44 PM

I found it

Doug S · Saturday at 9:04 PM

OneEng2 said:
Apple does amazing things with their ARM processors, but to dream that this somehow makes ARM fundamentally superior to x86 is just silly IMO.

I can't speak for others but I've never claimed ARM is fundamentally superior to x86. I've even argued that x86's more complex instruction decoding doesn't make any real difference power wise in the era of billion transistor chips because that additional transistors that requires are such a tiny portion of a modern core.

But a lot of people here seem to trying to argue that ARM cores in general, or Apple's in particular, are somehow unsuitable for DC. That's ridiculous on its face. No one can point to a benchmark that shows Apple cores as not being appropriate for DC tasks. Test x number of Apple P cores against the same number of x86 cores P cores and unless you're talking tasks that "just happen" to be all about AVX512 (or alternatively all about SVE2) you can't find any big difference in either direction. The only thing people can point to is "well x86 scales up to 192 cores and Apple doesn't" trying to imply that this is proof that Apple can't. That's just ignorant reasoning. It is the exact same reasoning people used to use claiming Apple's cores weren't appropriate for PCs, because Apple was only using them in phones.

Assuming Qualcomm's next gen cores are competitive with Apple's (I think likely to be better since it appears Qualcomm will be binning on frequency) and they start selling server chips as rumored then we'll have a good comparison - the same ARM cores being used in both phones and servers. People will have to stop lying that ARM and/or Apple's cores are somehow unsuitable for DC loads. Well maybe they'll still try to claim that about Apple's cores despite the evidence from Qualcomm's cores, because they just can't help themselves, but they'll be hanging on by the thinnest of threads.

gdansk · Saturday at 9:14 PM

poke01 said:
it literally says CPU
View attachment 128129

I do not know why we are picking random benchmarks but let's have at it.

Strix Halo 12C 390 @ ~60W

vs
M4 Max 14C @ ??W

poke01 · Saturday at 9:15 PM

we have the full blown M4 Pro here with 10P+4E/14 threads here against Strix Halo. AMD here has 60% more P cores and 32 threads here. Yet the difference is in scene completion is 5%.

How is Zen5 on N3E going to make up for that?

Let me put this way, if Intel had a uArch that is 5% slower and used much less power but it had 10P+4E with 14 threads compared to AMDs 16c/32t, the industry would go nuts.

poke01 · Saturday at 9:24 PM

gdansk said:
I do not know why we are picking random benchmarks but let's have at it.

Strix Halo 12C 390 @ ~60W
View attachment 128131
vs
M4 Max 14C @ ??W
View attachment 128132

Oh boy you should not have used that at ALL cause I know what benchmark exists and it’s run using a x86_64 kernel so Rosetta 2 is being used.

MacOS 15.4.1 CPU Only Mac Studio M4.txt Benchmarks - OpenBenchmarking.org

openbenchmarking.org

If had the kernel had been native ie arm64, it would say so like below.

M3 Pro Max Benchmarks - OpenBenchmarking.org

openbenchmarking.org

Schmide · Saturday at 9:24 PM

I think your power figures are off.

From notebookcheck

Cyberpunk 2077 2.2 Phantom Liberty Ultra preset 1920x1080
AMD Ryzen AI Max+ PRO 395, AMD Radeon 8060S 80.7 fps 49.1w = 1.64 fps/w
Apple M4 Max (16 cores), Apple M4 Max 40-Core GPU 47.4 fps 42.2w = 1.12 fps/w

other metrics Load Maximum 49.2w 42.2w Load Average 42.2w 42.2w respectively. (whatever that is) they are close.

Depending of the form factor it can go either way.

gdansk · Saturday at 9:27 PM

poke01 said:
Oh boy you should not have used that at ALL cause I know what benchmark exists and it’s run using a x86_64 kernel so Rosetta 2 is being used.
View attachment 128134

MacOS 15.4.1 CPU Only Mac Studio M4.txt Benchmarks - OpenBenchmarking.org

openbenchmarking.org

If had the kernel had been native ie arm64, it would say so like below.
View attachment 128135

M3 Pro Max Benchmarks - OpenBenchmarking.org

openbenchmarking.org

Good catch. I can't find any Blender 4.4 results for M4 Max that aren't Rosetta. I can run it on my M4 Pro but I have no way of instrumenting power.
But in either case the Strix results suggest using Linux again causes a large performance increase. Larger than the difference between M4P and STH. Which the chart you posted didn't consider.

poke01 · Saturday at 9:28 PM

Schmide said:
I think your power figures are off.

From notebookcheck

Cyberpunk 2077 2.2 Phantom Liberty Ultra preset 1920x1080
AMD Ryzen AI Max+ PRO 395, AMD Radeon 8060S 80.7 fps 49.1w = 1.64 fps/w
Apple M4 Max (16 cores), Apple M4 Max 40-Core GPU 47.4 fps 42.2w = 1.12 fps/w

other metrics Load Maximum 49.2w 42.2w respectively. (whatever that is) they are close.

Depending of the form factor it can go either way.

Cyberpunk does not max out the CPU cores and power consumption is lower than Blender and games like Cyberpunk are not heavily nT applications

Schmide · Saturday at 9:31 PM

poke01 said:
Cyberpunk does not max out the CPU cores and power consumption is lower than Blender and games like Cyberpunk are not heavily nT applications

Yeah but if it's lower by a few watts but the performance is higher it's basically a wash. If you normalize performance to watts you would be hard to find large differences.

poke01 · Saturday at 9:32 PM

gdansk said:
can run it on my M4 Pro but I have no way of instrumenting power

Use this. https://www.seense.com/menubarstats/mxpg/

It uses macOS powermetrics.

igor_kavinski · Saturday at 9:38 PM

@poke01 Can you PM me please? Your profile is restricted so can't PM you.

poke01 · Saturday at 9:42 PM

Schmide said:
Yeah but if it's lower by a few watts but the performance is higher it's basically a wash. If you normalize performance to watts you would be hard to find large differences.

Yes buts that only if you ignore the number of physical cores and logical cores differences. But that’s the point, you cannot do so when comparing uArch against real world nT applications.

OneEng2 · Saturday at 10:01 PM

poke01 said:
no it can't. It only does so by pounding it with 2x the power.

... and who cares again? Explain why power should enter into an outright performance discussion?

poke01 said:
How is the Apple M4 urach not better when Apple is only losing by 64% but AMD has 2x P cores and 2x the threads?

The NODE advanatge does NOT give apple this big of a lead.

It is losing .... full stop. It is just insult to injury that it is losing AND likely cost more to manufacture AND has a node process advantage.

You seem to be fixated on the number of cores. That's kind of silly in this day and age.

You buy processor X for some price and it performs at some level and processor Y for another price and it performs at another level. This is how the consumer sees it.

For the company, it cost X to produce processor 1 and I make this much on each one and it costs Y to produce processor 2 and I make something different.

These are the only things that matter.

Doug S said:
I can't speak for others but I've never claimed ARM is fundamentally superior to x86. I've even argued that x86's more complex instruction decoding doesn't make any real difference power wise in the era of billion transistor chips because that additional transistors that requires are such a tiny portion of a modern core.

But a lot of people here seem to trying to argue that ARM cores in general, or Apple's in particular, are somehow unsuitable for DC. That's ridiculous on its face. No one can point to a benchmark that shows Apple cores as not being appropriate for DC tasks. Test x number of Apple P cores against the same number of x86 cores P cores and unless you're talking tasks that "just happen" to be all about AVX512 (or alternatively all about SVE2) you can't find any big difference in either direction. The only thing people can point to is "well x86 scales up to 192 cores and Apple doesn't" trying to imply that this is proof that Apple can't. That's just ignorant reasoning. It is the exact same reasoning people used to use claiming Apple's cores weren't appropriate for PCs, because Apple was only using them in phones.

Assuming Qualcomm's next gen cores are competitive with Apple's (I think likely to be better since it appears Qualcomm will be binning on frequency) and they start selling server chips as rumored then we'll have a good comparison - the same ARM cores being used in both phones and servers. People will have to stop lying that ARM and/or Apple's cores are somehow unsuitable for DC loads. Well maybe they'll still try to claim that about Apple's cores despite the evidence from Qualcomm's cores, because they just can't help themselves, but they'll be hanging on by the thinnest of threads.

That is a quite fair assessment of x86's "extra" decode into RISC like equal length instructions .... I agree.

M4 core itself is hard to prove one way or another that it would perform as well as Zen 5 in DC since no platform exists to test the theory. I SUSPECT that it would not perform as well simply because that is NOT what it was designed for. Zen 5 (and several previous generations) have been specifically architected "Server First" (AMD's quote, not mine). It is therefore likely that M4 wouldn't fare well in such a contest.

On the flip side, Zen 5 wouldn't work well at all in a phone or tablet.

To date, this is the only ARM vs Zen 5 benchmark in DC I have seen:

AMD EPYC 9965 "Turin Dense" Delivers Better Performance/Power Efficiency vs. AmpereOne 192-Core ARM CPU Review - Phoronix

www.phoronix.com

It didn't look very flattering for ARM.

poke01 said:
View attachment 128130

we have the full blown M4 Pro here with 10P+4E/14 threads here against Strix Halo. AMD here has 60% more P cores and 32 threads here. Yet the difference is in scene completion is 5%.

How is Zen5 on N3E going to make up for that?

Let me put this way, if Intel had a uArch that is 5% slower and used much less power but it had 10P+4E with 14 threads compared to AMDs 16c/32t, the industry would go nuts.

I have definitely never said M4 was not good at anything. It does well at Blender, yet even then, it does so with a full node advantage .... and still loses to a Zen 5 part that likely costs less to make.

BTW, I also wonder how important memory bandwidth is to the Blender CPU benchmark. M4 Max has a huge memory bandwidth advantage that may well aid it significantly rendering a 1440p scene.

Question x86 and ARM architectures comparison thread.

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Senior member