Question Intel Raptor Lake vs AMD Zen 4 vs Apple M2

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
These CPUs are all going to square off against each other at some point this year assuming nothing catastrophic occurs to delay any of the product launches. So going by what we know from official sources and informed rumor mongers (many of which were very accurate before Alder Lake and the M1 launched), which CPU do you think will win out in these categories?

1) Single threaded performance
2) Multithreaded performance
3) Gaming performance
4) Performance per watt
5) Overall performance (who wins the majority of applications)

While I've been keeping a close eye on rumors and leaks for Zen 4 and Raptor Lake, I have not admittedly been doing so for the M2; as I'm unrepentant Apple hater :innocent: At least I'm honest about it... That said, this is my ranking based on what I've seen and heard:

I think the single threaded crown will go to Raptor Lake, and I say this based on informed rumors that Raptor Lake will have up to 10% more IPC from microarchitectural updates, cache upgrades and higher clock speeds than Alder Lake. From what I've seen, gauging IPC performance isn't easy as it varies so much based on application, but I'd say Alder Lake already has at least a 15% across the board IPC advantage over Zen 3, so Raptor Lake could conceivably have 25% better IPC than Zen 3, which is similar to what Zen 4 will reportedly possess. But I doubt Zen 4 will match Raptor Lake in clock speeds and memory latency performance, which is why I'm predicting Raptor Lake will take the single threaded performance crown.

For multithreaded performance, Zen 4 should easily take it due to having more big cores than its Intel counterpart and similar IPC.

Gaming performance is more complicated because while some games are inherently more reliant on single core performance (strategy games for instance), more and more 3D engines are becoming increasingly parallel due to the adoption of Vulkan and DX12 in addition to modernized programming methods. Still, very few 3D engines can scale beyond 8 threads and 6 to 8 cores remains the sweet spot for gaming and will be for some time. So overall, I feel more comfortable going with Raptor Lake for the gaming crown. Also if rumors are correct, Raptor Lake will officially support DDR5-5600 off the bat while Zen 4 will reportedly use DDR5-5200. The raw memory speed won't likely be a significant factor, but Intel's memory controller will be right next to the CPU cores while Zen 4's will be in the I/O die which while still on the same package will definitely incur a significant latency penalty; which I'm sure will be offset by a massive L3 cache. :)

On performance per watt, one would think the M2 should take this category easily......but from the small amount of research that I've collected on it, it seems that there won't be much of a performance increase with the M2, if any. Some rumors are even suggesting there may be a bit of a regression in that aspect. Also since Zen 4 will be on TSMC's 5nm node, it will undoubtedly have excellent performance per watt and I believe it will also easily crush Apple's best in single core. So for performance per watt, I'm going to go with Zen 4.

When it comes to overall performance, I'm leaning towards Zen 4 but it will be close. Raptor Lake will supposedly double the amount of Gracemont efficiency cores which will certainly help in multithreaded performance per watt, but ultimately they won't be a match for Zen 4's 16 big cores with SMT. AMD will have the core count advantage and when that's combined with IPC parity with Raptor Lake, Zen 4 will win the majority of the benchmarks.
 

mikk

Diamond Member
May 15, 2012
4,133
2,136
136
Thanks, and from the same source I quoted from earlier, but considerably more recent. So the Xe2 branding does seem to be going to Gen12.9 and Xe3 to Gen13, while the LP nomenclature is being replaced with LPG starting with Meteor Lake. Arrow Lake is LPG Plus and it looks like ELG is a Gen12.9 part that was inserted into the line up either ahead or in lieu of DG3 and JPS. Which gives us:


Yes this is correct from the informations we have. The important bit to know is that LPG on MTL/ARL (or whatever they call it in the end) is also based on Gen12HPG/Gen12.7/Xe HP.


But if you're parsing this like a sane person, Xe, Xe2, Xe3 is like RDNA, RDNA2, RDNA3, etc. There are some differences between the HP and HPG microarchitectures, and there may be something aside from just branding differentiating LP and LPG as well, but they're all still the same overall Xe architecture. The driver apparently groups ATS (Xe HP), DG2 (Xe HPG), MTL (Xe LPG), and ARL (Xe LPG Plus) all under the heading "gen12-hp",

It groups also all under Xe_HPG_Core as well as Gen12_7. It can't be more clear than that. Sure they could always disable some features like Raytracing if they wanted but this is not the question for now.

In the end, MTL is a client platform with a typical integrated GPU that uses a low-power variant of Gen12 Xe with up to 192 EUs and a shared memory interface. I wouldn't expect any miracles here.

Gen12HP @6nm is a big upgrade over Gen12LP @10SF. Intel claimed there is a 1.5x improvement in performance/watt. Meteor Lake GPU tile might use 4nm and some say 3nm which further increases the difference. They also claimed Gen12HP is able to clock roughly 1.5x higher than Gen12LP at the same voltage. The current Xe LP has a limited clock speed scaling.
 

coercitiv

Diamond Member
Jan 24, 2014
6,187
11,858
136
Could this be the culprit then? It makes no sense to me that even Intel marketing would push for so much power for so little gain after the 150w point.
Some boards overvolt, my "budget" MSI Z690 board certainly did at stock. However, the OP in that Reddit thread had an Asus TUF board, and AFAIK Asus managed to get their firmware settings right from the first UEFI versions. The P cores got the Vcore they needed, in fact the config was probably a bit aggressive, the OP appears to have done a bit of tweaking. The lower gains of higher TDP scenarios were also slightly exacerbated by the OP setting 38x multi on E cores, stock would be 36x.

I honestly don't know what Intel was thinking, but I can tell you that once we go past 150W package power, ADL-S not only fights that nasty V/f curve and the diminishing returns of higher clocks, it also starts overvolting E cores (this time for real, as in they actually need less voltage for their operating clocks). This is why I believe that keeping PL2 to a more reasonable 150W for 12700K and maybe 170W for 12900K would have done wonders with respect to efficiency (still bad numbers but more in line with what we expect from a flagship desktop).

The irony is that today the 12700k/12900k are probably the best CPUs for a very silent & very powerful system. On a good quality board and using low RPM fans or even passive cooling, these chips really shine around 60-90W. Even the stock Intel cooler is much better today, actually good to keep when buying a non-K SKU. And all that because they had to push thermal transfer higher and higher. So there's the half-full part of the cup, I guess.
 
  • Like
Reactions: Elfear

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Kinda hard for Intel to do much about voltage optimization ( short of going full FIVR route again, and even then MB VRM would have outsized impact ).
There's factory VID curve, that is being interpolated between points, there's AVX guard band, there's thermal velocity blabla tech that adjusts voltage with temperature. This results in VID value, that is then adjusted using AC/DC Loadline params and the VRM is asked to supply the voltage. Except of course MB can evaluate the VID asked as pessimistic/optimistic or simply add more voltage on BIOS author birthdays.

The goal of manufacturers is to work at stock in worst possible (but one that is still within spec) configuration in each app possible without crashing. No joke in case of 12900K and its nearly 200 amps of juice in Prime95.

Imo the job of enthusiast in OC nowadays is not 100mhz beyond what Intel allowed, but rather thinning the margin of voltage Intel/Nvidia/AMD add on their CPUs. That always results in drop in power/thermals and can increase performance substantially in some case too.

And let's not forget silicon variability as well, huge differences between factory VID curves exist, there's P cores, E cores, uncore each with its own curve and highest VID is the selected using above rules to request from MB VRM.
 
Jul 27, 2020
16,164
10,240
106
M2 again puts Apple in the slowest yet most power efficient category, compared to Zen 4 and Raptor Lake. With 18% faster CPU and up to 35% better GPU performance, Apple is focusing on a balanced approach to improving the performance of the whole SoC while Intel/AMD are putting the brunt of their engineering efforts in their CPU designs.
 

Ajay

Lifer
Jan 8, 2001
15,430
7,849
136
M2 again puts Apple in the slowest yet most power efficient category, compared to Zen 4 and Raptor Lake. With 18% faster CPU and up to 35% better GPU performance, Apple is focusing on a balanced approach to improving the performance of the whole SoC while Intel/AMD are putting the brunt of their engineering efforts in their CPU designs.
Also, the base M2 isn't Apple's performance line of CPUs. That’s just for their productivity class systems. The high performance line would be the Max and Ultra's.
 
  • Like
Reactions: Tlh97 and ftt

Doug S

Platinum Member
Feb 8, 2020
2,252
3,483
136
Also, the base M2 isn't Apple's performance line of CPUs. That’s just for their productivity class systems. The high performance line would be the Max and Ultra's.

Yes but they will only be faster for MT, the ST performance will be the same. Apple's designs don't allow a single core to consume the amount of power a single Intel or AMD core can, because they are designed for phones where that amount of power consumption (or the associated heat transfer) are not feasible.

Technically they could allow cores to "turbo" in Macs with sufficient cooling, but given that most of their Mac sales are laptops and one of the main reasons for appeal is lack of fan or at least a fan that almost never runs, they may never bother for the relative niche of Studio & Mac Pro that would have the ability to shed a lot of heat.
 

uzzi38

Platinum Member
Oct 16, 2019
2,622
5,880
146
M2 again puts Apple in the slowest yet most power efficient category, compared to Zen 4 and Raptor Lake. With 18% faster CPU and up to 35% better GPU performance, Apple is focusing on a balanced approach to improving the performance of the whole SoC while Intel/AMD are putting the brunt of their engineering efforts in their CPU designs.
Absolutely cringe take, considering you have no clue what AMD and Intel are actually focusing on for their mobile SoCs. Seriously, why dig a hole like that?
 

Abwx

Lifer
Apr 2, 2011
10,939
3,440
136
For the time being AMD s 6nm based APU is about as efficient in MT than Apple s 5nm based M1, and it will be no different for the M2.

Apple marleting material in their comparison with X86 SoCs is dishonnest at best, they use Intel as comparison since their designs are significantly less efficient than AMD s ones...

 

uzzi38

Platinum Member
Oct 16, 2019
2,622
5,880
146
OK. What are they actually focusing on?
I don't know a whole lot about Meteor Lake, but I can speak with confidence about Phoenix. But I'm still going to atick to talking about public rumours etc. So there's 3 points I want to bring up that we already know about:

CPU: Zen 4, duh. You're gonna see significant power efficiency gains coming out of this. Shouldn't be much of a surprise, so I'm gonna just leave that there.

GPU: As has been pointed out by RGT, you have an upgrade to RDNA3 which will bring an improvement to performance and power efficiency. As for how large those gains are, well they won't disappoint. Unless you're a certain other person on this forum who'd be disappointed by anything less than a miracle if you're not Intel.

Platform features: Every time AMD has moved platform we've seen a massive gain to power management and platform features. FP5 with Raven Ridge was a giant leap forwards (although still lacklustre overall vs Intel), FP6 with Renoir shrunk the gap hugely and in some instances, surpassed Intel's. Now FP7/FP7r2 with Rembrandt has given AMD a significant edge. Phoenix has long since been rumoured to bring a new FP8 platform, and so it's probably a safe bet that we'll see a further improvement once again.

As a much more ambiguous statement: with Rembrandt AMD for the first time added IPUs and other IPs on die. You should expect Phoenix to integrate more.
 

JasonLD

Senior member
Aug 22, 2017
485
445
136
For the time being AMD s 6nm based APU is about as efficient in MT than Apple s 5nm based M1, and it will be no different for the M2.

Apple marleting material in their comparison with X86 SoCs is dishonnest at best, they use Intel as comparison since their designs are significantly less efficient than AMD s ones...


Using just cinebench to compare M1 and 6800u and declaring anything out of it is just a bad take. Besides that, that review is pretty terrible to begin with. They are basically taking all those different laptops from different brands using CPUs for different performance segment and trying to do comparison. With all those different variables coming to play, it is impossible to get accurate comparision.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
For the time being AMD s 6nm based APU is about as efficient in MT than Apple s 5nm based M1, and it will be no different for the M2.

Apple marleting material in their comparison with X86 SoCs is dishonnest at best, they use Intel as comparison since their designs are significantly less efficient than AMD s ones...


Using Cinebench is not really valid for comparison between ARM and x64 architectures, due to use of the Intel Embree library, which just has a static SIMD wrapper for ARM - which heavily degrades performance.
 
  • Like
Reactions: Viknet

Abwx

Lifer
Apr 2, 2011
10,939
3,440
136
Using Cinebench is not really valid for comparison between ARM and x64 architectures, due to use of the Intel Embree library, which just has a static SIMD wrapper for ARM - which heavily degrades performance.
CB R23 has been fully ported to ARM ISA, it s Cinebench R15 that is executed through emulation according to the article.
 

thunng8

Member
Jan 8, 2013
152
61
101
CB R23 has been fully ported to ARM ISA, it s Cinebench R15 that is executed through emulation according to the article.
No. It hasn’t. It might be compiled for Arm but all the simd instructions are going through a wrapper and is no where near optimized for it.

fyi I measured some cpu efficiency for some typically operations in Adobe photoshop and Lightroom and the m1 core is still >2x more efficient.
 
  • Like
Reactions: Viknet

Abwx

Lifer
Apr 2, 2011
10,939
3,440
136
No. It hasn’t. It might be compiled for Arm but all the simd instructions are going through a wrapper and is no where near optimized for it.

fyi I measured some cpu efficiency for some typically operations in Adobe photoshop and Lightroom and the m1 core is still >2x more efficient.

The numbers contradicts your saying, it wouldnt perform that well in ST if it was not accurately ported, SIMD included since if it were only a wrap the ST score would be awfull.

Edit : How did you measure efficency.?.
Hope power was not measured at the main because the reading is not accurate as the Apple device use inadequate AC adaptators that do not feed the full drained power, actually the remnant is drained from the battery, with the 68W Apple adaptator NBC measure 68W at the main but once they use a 140W adaptator the device drain 80W...
 
Last edited:

thunng8

Member
Jan 8, 2013
152
61
101
The numbers contradicts your saying, it wouldnt perform that well in ST if it was not accurately ported, SIMD included since if it were only a wrap the ST score would be awfull.

Edit : How did you measure efficency.?.
Hope power was not measured at the main because the reading is not accurate as the Apple device use inadequate AC adaptators that do not feed the full drained power, actually the remnant is drained from the battery, with the 68W Apple adaptator NBC measure 68W at the main but once they use a 140W adaptator the device drain 80W...
It is a well known that cinebench runs on intel embree as pointed out before which is not coded in neon. It uses a wrapper to convert intel simd code to neon which is not ideal. How much performance drop we don’t really know but out of all the benchmarks out there - comparatively cinebench on an m1 performs very badly.

No measurement from wall. Done using powermetrics. it is also interesting to note that when the MacBook Air throttles, it’s power reduces to approx 2w performance core, but it’s frequency is still hovering at about 2.5ghz - that is just amazing perf/watt. No wonder it doesn’t need a fan. Ryzen Is nowhere near that efficient.
 
  • Like
Reactions: Viknet
Jul 27, 2020
16,164
10,240
106
From what I've read around the web, Intel is the loudest with the fan ramping up to jet engine levels and temperature approaching 100 degrees celsius. Ryzen is much less annoying and M1 is almost whisper quiet.
 

Abwx

Lifer
Apr 2, 2011
10,939
3,440
136
It is a well known that cinebench runs on intel embree as pointed out before which is not coded in neon. It uses a wrapper to convert intel simd code to neon which is not ideal. How much performance drop we don’t really know but out of all the benchmarks out there - comparatively cinebench on an m1 performs very badly.

Embree is used since Cinebench R15 and indeed the M1 doesnt perform well in this older bench contrary to its score in CB R23, wich say that the latter was well ported.
Besides most of the instructions used in CB are SSE2-SSE4.2, there s very few AVX1/2 if any even in R23, otherwise, as already said, its R23 ST score would mimick the ST score in R15 wich is emulated probably at 40% perf deficit.

Actually R23 is representative of its FP capabilities, it may perform quite better in INT based code due to the sheer amount of ALUs.
 
  • Like
Reactions: Tlh97

Doug S

Platinum Member
Feb 8, 2020
2,252
3,483
136
Even R20 supported AVX, AVX2 and AVX512, though how much it benefits from them is another matter. Maybe there's a way to run it with and without AVX2 and AVX512 to see how much it affects the results?
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Embree is used since Cinebench R15 and indeed the M1 doesnt perform well in this older bench contrary to its score in CB R23, wich say that the latter was well ported.

Please inform yourself. Open the latest code - it is open source after all - and look for yourself. Neon is just wrapped inside SSE/AVX intrinsics. The result is a serious performance degradation for any ARM CPU.
This is different than Cinebench R15, which is runtime translated via Rosetta on M1.


@Doug S, you can completely switch off SIMD, when compiling Embree. However we have no way to change how Embree is compiled for Cinebench.
 
Last edited:

Doug S

Platinum Member
Feb 8, 2020
2,252
3,483
136
Please inform yourself. Open the latest code - it is open source after all - and look for yourself. Neon is just wrapped around SSE/AVX intrinsics. The result is a serious performance degradation for any ARM CPU.


@Doug S, you can completely switch off SIMD, when compiling Embree. However we have no way to change how Embree is compiled for Cinebench.


I didn't realize it was open source. It sure would be interesting for someone with access to the latest Intel and AMD CPUs to run the current version alongside one with AVX2 and AVX512 disabled to see what, if any, benefit it gets from wider SIMD.

And theoretically at least, someone who knows NEON really well could do a proper port and eliminate those wrappers. I wonder what that sort of setup does to Apple's ability to use all four NEON pipes?
 
  • Like
Reactions: lightmanek

Abwx

Lifer
Apr 2, 2011
10,939
3,440
136
Please inform yourself. Open the latest code - it is open source after all - and look for yourself. Neon is just wrapped inside SSE/AVX intrinsics. The result is a serious performance degradation for any ARM CPU.
This is different than Cinebench R15, which is runtime translated via Rosetta on M1.


@Doug S, you can completely switch off SIMD, when compiling Embree. However we have no way to change how Embree is compiled for Cinebench.

That change nothing, if CB was perfectly optimised and that it would allow say 20% better MT score then the cores would consume 20% more to provide this throughput, hence the current numbers are enough to characterise the FP computation perf/watt, and as said it s not more efficient than an AMD APU in this register.
 

Doug S

Platinum Member
Feb 8, 2020
2,252
3,483
136
That change nothing, if CB was perfectly optimised and that it would allow say 20% better MT score then the cores would consume 20% more to provide this throughput, hence the current numbers are enough to characterise the FP computation perf/watt, and as said it s not more efficient than an AMD APU in this register.

That's not true. A CPU may get better perf/watt (or worse perf/watt) if it uses wider SIMD, depending on too many factors to list.