Question Intel Raptor Lake vs AMD Zen 4 vs Apple M2

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
These CPUs are all going to square off against each other at some point this year assuming nothing catastrophic occurs to delay any of the product launches. So going by what we know from official sources and informed rumor mongers (many of which were very accurate before Alder Lake and the M1 launched), which CPU do you think will win out in these categories?

1) Single threaded performance
2) Multithreaded performance
3) Gaming performance
4) Performance per watt
5) Overall performance (who wins the majority of applications)

While I've been keeping a close eye on rumors and leaks for Zen 4 and Raptor Lake, I have not admittedly been doing so for the M2; as I'm unrepentant Apple hater :innocent: At least I'm honest about it... That said, this is my ranking based on what I've seen and heard:

I think the single threaded crown will go to Raptor Lake, and I say this based on informed rumors that Raptor Lake will have up to 10% more IPC from microarchitectural updates, cache upgrades and higher clock speeds than Alder Lake. From what I've seen, gauging IPC performance isn't easy as it varies so much based on application, but I'd say Alder Lake already has at least a 15% across the board IPC advantage over Zen 3, so Raptor Lake could conceivably have 25% better IPC than Zen 3, which is similar to what Zen 4 will reportedly possess. But I doubt Zen 4 will match Raptor Lake in clock speeds and memory latency performance, which is why I'm predicting Raptor Lake will take the single threaded performance crown.

For multithreaded performance, Zen 4 should easily take it due to having more big cores than its Intel counterpart and similar IPC.

Gaming performance is more complicated because while some games are inherently more reliant on single core performance (strategy games for instance), more and more 3D engines are becoming increasingly parallel due to the adoption of Vulkan and DX12 in addition to modernized programming methods. Still, very few 3D engines can scale beyond 8 threads and 6 to 8 cores remains the sweet spot for gaming and will be for some time. So overall, I feel more comfortable going with Raptor Lake for the gaming crown. Also if rumors are correct, Raptor Lake will officially support DDR5-5600 off the bat while Zen 4 will reportedly use DDR5-5200. The raw memory speed won't likely be a significant factor, but Intel's memory controller will be right next to the CPU cores while Zen 4's will be in the I/O die which while still on the same package will definitely incur a significant latency penalty; which I'm sure will be offset by a massive L3 cache. :)

On performance per watt, one would think the M2 should take this category easily......but from the small amount of research that I've collected on it, it seems that there won't be much of a performance increase with the M2, if any. Some rumors are even suggesting there may be a bit of a regression in that aspect. Also since Zen 4 will be on TSMC's 5nm node, it will undoubtedly have excellent performance per watt and I believe it will also easily crush Apple's best in single core. So for performance per watt, I'm going to go with Zen 4.

When it comes to overall performance, I'm leaning towards Zen 4 but it will be close. Raptor Lake will supposedly double the amount of Gracemont efficiency cores which will certainly help in multithreaded performance per watt, but ultimately they won't be a match for Zen 4's 16 big cores with SMT. AMD will have the core count advantage and when that's combined with IPC parity with Raptor Lake, Zen 4 will win the majority of the benchmarks.
 

Abwx

Lifer
Apr 2, 2011
10,854
3,298
136
That's not true. A CPU may get better perf/watt (or worse perf/watt) if it uses wider SIMD, depending on too many factors to list.

More FP calculations will imply more power proportionately to the throughput improvement, a SIMD instruction would allow for say 2x execution throughput per cycle and it would necessitate twice the amount of adder, multiplier and hence 2x the power for the most power hungry bloc, if anything AVX512 is a good exemple since frequencies are to be much lowered to keep with the higher throughput/cycle.
 

thunng8

Member
Jan 8, 2013
152
61
101
More FP calculations will imply more power proportionately to the throughput improvement, a SIMD instruction would allow for say 2x execution throughput per cycle and it would necessitate twice the amount of adder, multiplier and hence 2x the power for the most power hungry bloc, if anything AVX512 is a good exemple since frequencies are to be much lowered to keep with the higher throughput/cycle.
I have no idea why you are continuing with this. Cinebench r23 is definitely not optimised to run on m1 Which many have already pointed out. The Simd Code runs through a wrapper. We have no idea how efficient that wrapper is and it affect on performance but anything that is not optimised will not be as efficient.
 
  • Like
Reactions: Viknet

Abwx

Lifer
Apr 2, 2011
10,854
3,298
136
We have no idea how efficient that wrapper is and it affect on performance but anything that is not optimised will not be as efficient.

As said if the wrapper was 100% efficient then it would dispatch more instructions/cycle and this would led of course to more FP calculations done/cycle, power would increase accordingly.

To say it otherwise if the wrapper led to only 80% CPU utilisation then we can agree that if it was perfect it would allow 100% CPU utilisation, then those 25% better throughput and CPU being at 100% wouldnt come for free power wise.
 

thunng8

Member
Jan 8, 2013
152
61
101
As said if the wrapper was 100% efficient then it would dispatch more instructions/cycle and this would led of course to more FP calculations done/cycle, power would increase accordingly.

To say it otherwise if the wrapper led to only 80% CPU utilisation then we can agree that if it was perfect it would allow 100% CPU utilisation, then those 25% better throughput and CPU being at 100% wouldnt come for free power wise.
Now you are just being silly. We have no ideas how much more efficient it would be. No amount of hand waving will change that. I have measured in many other applications and the m1 performs very badly in cinebench in efficiency compared to other industry standard benchmarks and applications.

In fact, cinebench is possibly the worst benchmark to compare x64 to ARM and draw any conclusions.
 
  • Like
Reactions: Viknet

Henry swagger

Senior member
Feb 9, 2022
356
235
86
Now you are just being silly. We have no ideas how much more efficient it would be. No amount of hand waving will change that. I have measured in many other applications and the m1 performs very badly in cinebench in efficiency compared to other industry standard benchmarks and applications.

In fact, cinebench is possibly the worst benchmark to compare x64 to ARM and draw any conclusions.
Cinebench is the gold standard.. arm chips are weak because they have no smt2
 

Doug S

Platinum Member
Feb 8, 2020
2,203
3,405
136
More FP calculations will imply more power proportionately to the throughput improvement, a SIMD instruction would allow for say 2x execution throughput per cycle and it would necessitate twice the amount of adder, multiplier and hence 2x the power for the most power hungry bloc, if anything AVX512 is a good exemple since frequencies are to be much lowered to keep with the higher throughput/cycle.

You are assuming identical FP calculations in different functional blocks will require identical power. That's only true if they have the exact same design. Which isn't true, because the SIMD units don't perform the same functions (i.e. the FP units may perform sqrt while the SIMD units don't, the FP units don't support popcount while the SIMD units do) and beyond that may have completely different design points.

For example, a muladd in the FP units may use more power due to higher fanout circuits in order to achieve less latency (because you assume the SIMD units will be doing a lot of successive calculations where throughput matters more) They may not even use the same transistor types, with one unit using lower power transistors and another using high performance transistors.

Next we have to consider wider SIMD units that obviously won't support as many instructions executing in parallel, and may have different restrictions on the types of operations that can be issued/executed/retired in parallel than narrower units.

This is only scratching the surface of possible differences. You might as well try to tell us that an FP calculation in efficiency cores is the same performance/watt as the same calculation in performance cores.
 
  • Like
Reactions: lightmanek

TheELF

Diamond Member
Dec 22, 2012
3,967
720
126
According to Intel: Cinema4D's Cinebench is the least relevant benchmark of all: Ranked 1331 th.

Oh wait... :p

View attachment 62821
Meh, I can easily believe that less than 1% of notebook users can even start up cinema4D let alone do any work with it.
uztHhVu.jpg
 

shady28

Platinum Member
Apr 11, 2004
2,520
397
126
Just like to point out, this is what happens when you bench Rocket Lake with a Maximus motherboard and tuned Kingston Hyper-X DDR4, and use crap motherboards and cheap DDR4-3200 / DDR5-4800 on everything else on an RTX 2080.

1660673454040.png
 
  • Haha
Reactions: lobz