Question Intel Raptor Lake vs AMD Zen 4 vs Apple M2

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
These CPUs are all going to square off against each other at some point this year assuming nothing catastrophic occurs to delay any of the product launches. So going by what we know from official sources and informed rumor mongers (many of which were very accurate before Alder Lake and the M1 launched), which CPU do you think will win out in these categories?

1) Single threaded performance
2) Multithreaded performance
3) Gaming performance
4) Performance per watt
5) Overall performance (who wins the majority of applications)

While I've been keeping a close eye on rumors and leaks for Zen 4 and Raptor Lake, I have not admittedly been doing so for the M2; as I'm unrepentant Apple hater :innocent: At least I'm honest about it... That said, this is my ranking based on what I've seen and heard:

I think the single threaded crown will go to Raptor Lake, and I say this based on informed rumors that Raptor Lake will have up to 10% more IPC from microarchitectural updates, cache upgrades and higher clock speeds than Alder Lake. From what I've seen, gauging IPC performance isn't easy as it varies so much based on application, but I'd say Alder Lake already has at least a 15% across the board IPC advantage over Zen 3, so Raptor Lake could conceivably have 25% better IPC than Zen 3, which is similar to what Zen 4 will reportedly possess. But I doubt Zen 4 will match Raptor Lake in clock speeds and memory latency performance, which is why I'm predicting Raptor Lake will take the single threaded performance crown.

For multithreaded performance, Zen 4 should easily take it due to having more big cores than its Intel counterpart and similar IPC.

Gaming performance is more complicated because while some games are inherently more reliant on single core performance (strategy games for instance), more and more 3D engines are becoming increasingly parallel due to the adoption of Vulkan and DX12 in addition to modernized programming methods. Still, very few 3D engines can scale beyond 8 threads and 6 to 8 cores remains the sweet spot for gaming and will be for some time. So overall, I feel more comfortable going with Raptor Lake for the gaming crown. Also if rumors are correct, Raptor Lake will officially support DDR5-5600 off the bat while Zen 4 will reportedly use DDR5-5200. The raw memory speed won't likely be a significant factor, but Intel's memory controller will be right next to the CPU cores while Zen 4's will be in the I/O die which while still on the same package will definitely incur a significant latency penalty; which I'm sure will be offset by a massive L3 cache. :)

On performance per watt, one would think the M2 should take this category easily......but from the small amount of research that I've collected on it, it seems that there won't be much of a performance increase with the M2, if any. Some rumors are even suggesting there may be a bit of a regression in that aspect. Also since Zen 4 will be on TSMC's 5nm node, it will undoubtedly have excellent performance per watt and I believe it will also easily crush Apple's best in single core. So for performance per watt, I'm going to go with Zen 4.

When it comes to overall performance, I'm leaning towards Zen 4 but it will be close. Raptor Lake will supposedly double the amount of Gracemont efficiency cores which will certainly help in multithreaded performance per watt, but ultimately they won't be a match for Zen 4's 16 big cores with SMT. AMD will have the core count advantage and when that's combined with IPC parity with Raptor Lake, Zen 4 will win the majority of the benchmarks.
 

Abwx

Lifer
Apr 2, 2011
11,542
4,325
136
For the time being AMD s 6nm based APU is about as efficient in MT than Apple s 5nm based M1, and it will be no different for the M2.

Apple marleting material in their comparison with X86 SoCs is dishonnest at best, they use Intel as comparison since their designs are significantly less efficient than AMD s ones...

 

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
OK. What are they actually focusing on?
I don't know a whole lot about Meteor Lake, but I can speak with confidence about Phoenix. But I'm still going to atick to talking about public rumours etc. So there's 3 points I want to bring up that we already know about:

CPU: Zen 4, duh. You're gonna see significant power efficiency gains coming out of this. Shouldn't be much of a surprise, so I'm gonna just leave that there.

GPU: As has been pointed out by RGT, you have an upgrade to RDNA3 which will bring an improvement to performance and power efficiency. As for how large those gains are, well they won't disappoint. Unless you're a certain other person on this forum who'd be disappointed by anything less than a miracle if you're not Intel.

Platform features: Every time AMD has moved platform we've seen a massive gain to power management and platform features. FP5 with Raven Ridge was a giant leap forwards (although still lacklustre overall vs Intel), FP6 with Renoir shrunk the gap hugely and in some instances, surpassed Intel's. Now FP7/FP7r2 with Rembrandt has given AMD a significant edge. Phoenix has long since been rumoured to bring a new FP8 platform, and so it's probably a safe bet that we'll see a further improvement once again.

As a much more ambiguous statement: with Rembrandt AMD for the first time added IPUs and other IPs on die. You should expect Phoenix to integrate more.
 

JasonLD

Senior member
Aug 22, 2017
487
447
136
For the time being AMD s 6nm based APU is about as efficient in MT than Apple s 5nm based M1, and it will be no different for the M2.

Apple marleting material in their comparison with X86 SoCs is dishonnest at best, they use Intel as comparison since their designs are significantly less efficient than AMD s ones...


Using just cinebench to compare M1 and 6800u and declaring anything out of it is just a bad take. Besides that, that review is pretty terrible to begin with. They are basically taking all those different laptops from different brands using CPUs for different performance segment and trying to do comparison. With all those different variables coming to play, it is impossible to get accurate comparision.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
For the time being AMD s 6nm based APU is about as efficient in MT than Apple s 5nm based M1, and it will be no different for the M2.

Apple marleting material in their comparison with X86 SoCs is dishonnest at best, they use Intel as comparison since their designs are significantly less efficient than AMD s ones...


Using Cinebench is not really valid for comparison between ARM and x64 architectures, due to use of the Intel Embree library, which just has a static SIMD wrapper for ARM - which heavily degrades performance.
 
  • Like
Reactions: Viknet

Abwx

Lifer
Apr 2, 2011
11,542
4,325
136
Using Cinebench is not really valid for comparison between ARM and x64 architectures, due to use of the Intel Embree library, which just has a static SIMD wrapper for ARM - which heavily degrades performance.
CB R23 has been fully ported to ARM ISA, it s Cinebench R15 that is executed through emulation according to the article.
 

thunng8

Member
Jan 8, 2013
167
72
101
CB R23 has been fully ported to ARM ISA, it s Cinebench R15 that is executed through emulation according to the article.
No. It hasn’t. It might be compiled for Arm but all the simd instructions are going through a wrapper and is no where near optimized for it.

fyi I measured some cpu efficiency for some typically operations in Adobe photoshop and Lightroom and the m1 core is still >2x more efficient.
 
  • Like
Reactions: Viknet

Abwx

Lifer
Apr 2, 2011
11,542
4,325
136
No. It hasn’t. It might be compiled for Arm but all the simd instructions are going through a wrapper and is no where near optimized for it.

fyi I measured some cpu efficiency for some typically operations in Adobe photoshop and Lightroom and the m1 core is still >2x more efficient.

The numbers contradicts your saying, it wouldnt perform that well in ST if it was not accurately ported, SIMD included since if it were only a wrap the ST score would be awfull.

Edit : How did you measure efficency.?.
Hope power was not measured at the main because the reading is not accurate as the Apple device use inadequate AC adaptators that do not feed the full drained power, actually the remnant is drained from the battery, with the 68W Apple adaptator NBC measure 68W at the main but once they use a 140W adaptator the device drain 80W...
 
Last edited:

thunng8

Member
Jan 8, 2013
167
72
101
The numbers contradicts your saying, it wouldnt perform that well in ST if it was not accurately ported, SIMD included since if it were only a wrap the ST score would be awfull.

Edit : How did you measure efficency.?.
Hope power was not measured at the main because the reading is not accurate as the Apple device use inadequate AC adaptators that do not feed the full drained power, actually the remnant is drained from the battery, with the 68W Apple adaptator NBC measure 68W at the main but once they use a 140W adaptator the device drain 80W...
It is a well known that cinebench runs on intel embree as pointed out before which is not coded in neon. It uses a wrapper to convert intel simd code to neon which is not ideal. How much performance drop we don’t really know but out of all the benchmarks out there - comparatively cinebench on an m1 performs very badly.

No measurement from wall. Done using powermetrics. it is also interesting to note that when the MacBook Air throttles, it’s power reduces to approx 2w performance core, but it’s frequency is still hovering at about 2.5ghz - that is just amazing perf/watt. No wonder it doesn’t need a fan. Ryzen Is nowhere near that efficient.
 
  • Like
Reactions: Viknet
Jul 27, 2020
19,823
13,590
146
From what I've read around the web, Intel is the loudest with the fan ramping up to jet engine levels and temperature approaching 100 degrees celsius. Ryzen is much less annoying and M1 is almost whisper quiet.
 

Abwx

Lifer
Apr 2, 2011
11,542
4,325
136
It is a well known that cinebench runs on intel embree as pointed out before which is not coded in neon. It uses a wrapper to convert intel simd code to neon which is not ideal. How much performance drop we don’t really know but out of all the benchmarks out there - comparatively cinebench on an m1 performs very badly.

Embree is used since Cinebench R15 and indeed the M1 doesnt perform well in this older bench contrary to its score in CB R23, wich say that the latter was well ported.
Besides most of the instructions used in CB are SSE2-SSE4.2, there s very few AVX1/2 if any even in R23, otherwise, as already said, its R23 ST score would mimick the ST score in R15 wich is emulated probably at 40% perf deficit.

Actually R23 is representative of its FP capabilities, it may perform quite better in INT based code due to the sheer amount of ALUs.
 
  • Like
Reactions: Tlh97

Doug S

Platinum Member
Feb 8, 2020
2,744
4,670
136
Even R20 supported AVX, AVX2 and AVX512, though how much it benefits from them is another matter. Maybe there's a way to run it with and without AVX2 and AVX512 to see how much it affects the results?
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Embree is used since Cinebench R15 and indeed the M1 doesnt perform well in this older bench contrary to its score in CB R23, wich say that the latter was well ported.

Please inform yourself. Open the latest code - it is open source after all - and look for yourself. Neon is just wrapped inside SSE/AVX intrinsics. The result is a serious performance degradation for any ARM CPU.
This is different than Cinebench R15, which is runtime translated via Rosetta on M1.


@Doug S, you can completely switch off SIMD, when compiling Embree. However we have no way to change how Embree is compiled for Cinebench.
 
Last edited:

Doug S

Platinum Member
Feb 8, 2020
2,744
4,670
136
Please inform yourself. Open the latest code - it is open source after all - and look for yourself. Neon is just wrapped around SSE/AVX intrinsics. The result is a serious performance degradation for any ARM CPU.


@Doug S, you can completely switch off SIMD, when compiling Embree. However we have no way to change how Embree is compiled for Cinebench.


I didn't realize it was open source. It sure would be interesting for someone with access to the latest Intel and AMD CPUs to run the current version alongside one with AVX2 and AVX512 disabled to see what, if any, benefit it gets from wider SIMD.

And theoretically at least, someone who knows NEON really well could do a proper port and eliminate those wrappers. I wonder what that sort of setup does to Apple's ability to use all four NEON pipes?
 
  • Like
Reactions: lightmanek

Abwx

Lifer
Apr 2, 2011
11,542
4,325
136
Please inform yourself. Open the latest code - it is open source after all - and look for yourself. Neon is just wrapped inside SSE/AVX intrinsics. The result is a serious performance degradation for any ARM CPU.
This is different than Cinebench R15, which is runtime translated via Rosetta on M1.


@Doug S, you can completely switch off SIMD, when compiling Embree. However we have no way to change how Embree is compiled for Cinebench.

That change nothing, if CB was perfectly optimised and that it would allow say 20% better MT score then the cores would consume 20% more to provide this throughput, hence the current numbers are enough to characterise the FP computation perf/watt, and as said it s not more efficient than an AMD APU in this register.
 

Doug S

Platinum Member
Feb 8, 2020
2,744
4,670
136
That change nothing, if CB was perfectly optimised and that it would allow say 20% better MT score then the cores would consume 20% more to provide this throughput, hence the current numbers are enough to characterise the FP computation perf/watt, and as said it s not more efficient than an AMD APU in this register.

That's not true. A CPU may get better perf/watt (or worse perf/watt) if it uses wider SIMD, depending on too many factors to list.
 

Abwx

Lifer
Apr 2, 2011
11,542
4,325
136
That's not true. A CPU may get better perf/watt (or worse perf/watt) if it uses wider SIMD, depending on too many factors to list.

More FP calculations will imply more power proportionately to the throughput improvement, a SIMD instruction would allow for say 2x execution throughput per cycle and it would necessitate twice the amount of adder, multiplier and hence 2x the power for the most power hungry bloc, if anything AVX512 is a good exemple since frequencies are to be much lowered to keep with the higher throughput/cycle.
 

thunng8

Member
Jan 8, 2013
167
72
101
More FP calculations will imply more power proportionately to the throughput improvement, a SIMD instruction would allow for say 2x execution throughput per cycle and it would necessitate twice the amount of adder, multiplier and hence 2x the power for the most power hungry bloc, if anything AVX512 is a good exemple since frequencies are to be much lowered to keep with the higher throughput/cycle.
I have no idea why you are continuing with this. Cinebench r23 is definitely not optimised to run on m1 Which many have already pointed out. The Simd Code runs through a wrapper. We have no idea how efficient that wrapper is and it affect on performance but anything that is not optimised will not be as efficient.
 
  • Like
Reactions: Viknet

Abwx

Lifer
Apr 2, 2011
11,542
4,325
136
We have no idea how efficient that wrapper is and it affect on performance but anything that is not optimised will not be as efficient.

As said if the wrapper was 100% efficient then it would dispatch more instructions/cycle and this would led of course to more FP calculations done/cycle, power would increase accordingly.

To say it otherwise if the wrapper led to only 80% CPU utilisation then we can agree that if it was perfect it would allow 100% CPU utilisation, then those 25% better throughput and CPU being at 100% wouldnt come for free power wise.
 

thunng8

Member
Jan 8, 2013
167
72
101
As said if the wrapper was 100% efficient then it would dispatch more instructions/cycle and this would led of course to more FP calculations done/cycle, power would increase accordingly.

To say it otherwise if the wrapper led to only 80% CPU utilisation then we can agree that if it was perfect it would allow 100% CPU utilisation, then those 25% better throughput and CPU being at 100% wouldnt come for free power wise.
Now you are just being silly. We have no ideas how much more efficient it would be. No amount of hand waving will change that. I have measured in many other applications and the m1 performs very badly in cinebench in efficiency compared to other industry standard benchmarks and applications.

In fact, cinebench is possibly the worst benchmark to compare x64 to ARM and draw any conclusions.
 
  • Like
Reactions: Viknet

Henry swagger

Senior member
Feb 9, 2022
496
301
106
Now you are just being silly. We have no ideas how much more efficient it would be. No amount of hand waving will change that. I have measured in many other applications and the m1 performs very badly in cinebench in efficiency compared to other industry standard benchmarks and applications.

In fact, cinebench is possibly the worst benchmark to compare x64 to ARM and draw any conclusions.
Cinebench is the gold standard.. arm chips are weak because they have no smt2
 

Doug S

Platinum Member
Feb 8, 2020
2,744
4,670
136
More FP calculations will imply more power proportionately to the throughput improvement, a SIMD instruction would allow for say 2x execution throughput per cycle and it would necessitate twice the amount of adder, multiplier and hence 2x the power for the most power hungry bloc, if anything AVX512 is a good exemple since frequencies are to be much lowered to keep with the higher throughput/cycle.

You are assuming identical FP calculations in different functional blocks will require identical power. That's only true if they have the exact same design. Which isn't true, because the SIMD units don't perform the same functions (i.e. the FP units may perform sqrt while the SIMD units don't, the FP units don't support popcount while the SIMD units do) and beyond that may have completely different design points.

For example, a muladd in the FP units may use more power due to higher fanout circuits in order to achieve less latency (because you assume the SIMD units will be doing a lot of successive calculations where throughput matters more) They may not even use the same transistor types, with one unit using lower power transistors and another using high performance transistors.

Next we have to consider wider SIMD units that obviously won't support as many instructions executing in parallel, and may have different restrictions on the types of operations that can be issued/executed/retired in parallel than narrower units.

This is only scratching the surface of possible differences. You might as well try to tell us that an FP calculation in efficiency cores is the same performance/watt as the same calculation in performance cores.
 
  • Like
Reactions: lightmanek