Question Raptor Lake - Official Thread

Page 182 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Hulk

Diamond Member
Oct 9, 1999
4,214
2,007
136
Since we already have the first Raptor Lake leak I'm thinking it should have it's own thread.
What do we know so far?
From Anandtech's Intel Process Roadmap articles from July:

Built on Intel 7 with upgraded FinFET
10-15% PPW (performance-per-watt)
Last non-tiled consumer CPU as Meteor Lake will be tiled

I'm guessing this will be a minor update to ADL with just a few microarchitecture changes to the cores. The larger change will be the new process refinement allowing 8+16 at the top of the stack.

Will it work with current z690 motherboards? If yes then that could be a major selling point for people to move to ADL rather than wait.
 
  • Like
Reactions: vstar

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
I dare say that hybrid design at Intel has been cooking for at least a decade. It's been a fundamental part of mobile ARM products for at least that long and while we may disrespect Intel from time to time, they're still a staggeringly smart bunch over there and wouldn't disregard a working solution in the industry just because they didn't have a current need for it. Now, on the software side, Windows being the prevailing OS platform in x86 and it having no real scheduler support for hybrid designs on the x86 side of the house until just a few years ago, we have a much more recent focus on making it work. I believe that it took a bit of Microsoft working to get Windows ARM working on the hybrid ARM designs combined with Intel needing to "catch up" for Hybrid x86 Windows to become a thing as opposed to being a more natural progression of the tech.
That is why I proposed my question. Intel has enough money and staff to throw at creative skunk projects to figure out what the next best thing. I'd put their dev time around 4th gen or earlier.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
If not for the process, Intel could probably have had 12-16c Golden Cove/Raptor Cove and been an actual competitor. Hybrid happened because of the process.
The argument for hybrid is process agnostic. For your hypothetical 12-16 GLC arrangement, they could have had 8+16/32, which with a full node shrink would absolutely destroy anything else on the market.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,497
136
The argument for hybrid is process agnostic. For your hypothetical 12-16 GLC arrangement, they could have had 8+16/32, which with a full node shrink would absolutely destroy anything else on the market.
I will take that bet... 7950x/x3d are both just awesome in perf and perf/watt. Raptor on a node shrink would only improve watt and perf/watt. They might be equal worse case. Best case is that AMD would still have an edge.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
I will take that bet... 7950x/x3d are both just awesome in perf and perf/watt. Raptor on a node shrink would only improve watt and perf/watt. They might be equal worse case. Best case is that AMD would still have an edge.
Taking the same 8+16 config, a full node shrink would allow for comparable perf/watt and/or significantly higher peak perf. With 8+32 (as suggested), it would be a blowout.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,497
136
Taking the same 8+16 config, a full node shrink would allow for comparable perf/watt and/or significantly higher peak perf. With 8+32 (as suggested), it would be a blowout.
First, I did not see 8+32, but that I won't comment on, except by then, AMD may have something else.

Lets just say for now that a node shrink will help them a LOT . That I agree with,
 

controlflow

Member
Feb 17, 2015
109
157
116
First, I did not see 8+32, but that I won't comment on, except by then, AMD may have something else.

Lets just say for now that a node shrink will help them a LOT . That I agree with,

It seems probable that sooner or later AMD will adopt hybrid as well. If Intel does gain process node parity, it is going to be hard to keep up with the performance per die area possible with 16-32 future Mont cores. The power usage can be drastically reduced not just from the intrinsic benefits of the smaller node but also from not having to push the cores to very high frequencies and such unfavorable spots on the V/F curve.

I believe this notion that hybrid is some kind of fad or gimmick will be proven incorrect. The complexity of dealing with more than 1 core type is worth it to gain the massive flexibility you earn by having cores that specialize in PPA vs all out IPC/1T performance vs low power/PPW. The mobile ecosystem has already proven that this is a viable direction.
 
Last edited:

Abwx

Lifer
Apr 2, 2011
10,940
3,445
136
Taking the same 8+16 config, a full node shrink would allow for comparable perf/watt and/or significantly higher peak perf. With 8+32 (as suggested), it would be a blowout.

8 + 32 is equivalent to 24 P cores and would perform poorly with softwares using 8 to 24 T, because everything is not Cinebench.

Assuming 24 threads used fully then the latter would perform at 2400 while the 8 + 32 would be at 1600 comparatively.

This is already the case with current SKUs, a 7950X using 16 threads perform at say 1600 while the 13900K is at 1280.

All in all the hybrid concept doesnt work for the PC world, mainly because power is at a much higher level than in smartphones and that what is optimal for the ultra mobile segment become subpar with PC s high throughput capabilities.
 

Hulk

Diamond Member
Oct 9, 1999
4,214
2,007
136
8 + 32 is equivalent to 24 P cores and would perform poorly with softwares using 8 to 24 T, because everything is not Cinebench.

Assuming 24 threads used fully then the latter would perform at 2400 while the 8 + 32 would be at 1600 comparatively.

This is already the case with current SKUs, a 7950X using 16 threads perform at say 1600 while the 13900K is at 1280.

All in all the hybrid concept doesnt work for the PC world, mainly because power is at a much higher level than in smartphones and that what is optimal for the ultra mobile segment become subpar with PC s high throughput capabilities.

Like many people around here you misunderstand the point of the hybrid approach. It's not power efficiency, it's area efficiency. The fact that Intel is at least a node behind TMSC/AMD parts and is competitive in performance proves the hybrid approach works.

Without hybrid Intel loses to AMD on both performance and power efficiency. With it they only lose on efficiency. Believe it or not the Intel engineers are pretty on the ball.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
8 + 32 is equivalent to 24 P cores and would perform poorly with softwares using 8 to 24 T, because everything is not Cinebench.
It wouldn't perform "poorly"; it just wouldn't be fully utilized. Just like any CPU with such a thread count. I was responding to someone primarily focused on embarrassingly parallel tasks, so not really relevant.
This is already the case with current SKUs, a 7950X using 16 threads perform at say 1600 while the 13900K is at 1280.
Why did you specifically choose 16 threads, instead of, say, 8?
All in all the hybrid concept doesnt work for the PC world, mainly because power is at a much higher level than in smartphones and that what is optimal for the ultra mobile segment become subpar with PC s high throughput capabilities.
Huh? What's your argument here? That workloads simply are not well threaded enough to ever use >16 or >24 cores?
 
  • Like
Reactions: controlflow

deasd

Senior member
Dec 31, 2013
516
746
136
Without hybrid Intel loses to AMD on both performance and power efficiency. With it they only lose on efficiency. Believe it or not the Intel engineers are pretty on the ball.

but in mobile 7945HX MT perf still outperform 13980HX by 5% despite 13980HX has DDR5600 while 7945HX has DDR4800. Efficiency is out of question.

everything make a look like Intel implement E cores due to process node disadvantage. And XEON/WS/SapphireRapid which are not well known to public they decided to delay-delay-delay and avoid E cores which could make launch earlier, due to unknown reason(AVX512?)
 
Last edited:

Abwx

Lifer
Apr 2, 2011
10,940
3,445
136
It wouldn't perform "poorly"; it just wouldn't be fully utilized. Just like any CPU with such a thread count. I was responding to someone primarily focused on embarrassingly parallel tasks, so not really relevant.

Why did you specifically choose 16 threads, instead of, say, 8?

Huh? What's your argument here? That workloads simply are not well threaded enough to ever use >16 or >24 cores?

Notice that i talked of threads, not of cores.

If we put the ST perf of a P core at 100 then with 8 threads both a 16 P like the 7950X or a 8P + 16E 13900K will perform similarly.

From 8 to 16 threads the 16P will scale at 100 for each added thread while the 8 + 16 will scale by 60 due to the lower E core IPC.

So as thread count increase the 16P will perform more and more better with the difference culminating at 16 threads and 25% better throughput.

A 8 + 32 would lag even more comparatively to a 24P as at 24 threads the difference would be 50%.

So once a software invole more than 8 threads and up to 16 threads the 16P will fare better, and we know that current softwares are more likely to scale decently up to 16 threads if not less like games.
 

dark zero

Platinum Member
Jun 2, 2015
2,655
138
106
First, I did not see 8+32, but that I won't comment on, except by then, AMD may have something else.

Lets just say for now that a node shrink will help them a LOT . That I agree with,
I can see in a more realistic way a 16+8 or even 16+16 chip, but it might be really big to start.
 

Timmah!

Golden Member
Jul 24, 2010
1,418
630
136
The argument for hybrid is process agnostic. For your hypothetical 12-16 GLC arrangement, they could have had 8+16/32, which with a full node shrink would absolutely destroy anything else on the market.

So what are you saying that the moment you need more than 8 cores, you are in Cinebench territory and for such use-cases more little cores is always better than additional big cores?
In other words, there are no cases where 12, 16, 24, etc... big cores would be better performing solution than 8P + 16/32/etc...E cores?
 

Kocicak

Senior member
Jan 17, 2019
982
973
136
I can guarantee that if AMD had two kinds of chiplets, one with large and the other with small cores, they would be selling these 8 + 24-32 core monsters like hot cakes.

And the 8 + 8 and 24-32 + 24-32 combinations would find their customers too.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
@Geddagod is saying that Redwood Cove core in Meteorlake is only 30% larger than Zen 4. That's pretty much ISO-process.

And the Crestmont core in Meteorlake is even smaller relative to the bigger neighbour Redwood Cove. Meaning it's 1/3rd the size of Zen 4. It's also likely we get greater performance gains Crestmont vs Raptormont versus Redwood Cove versus RaptorCove.

Raptorlake's E cores are relatively better than Alderlake's E cores. And Meteorlake does even better. Each generation it gets little bit better in density AND in performance.

In Alderlake, Golden Cove performed 45-50% faster per clock and density ratio was something like 3.43:1. In Raptorlake, Gracemont cores close that gap by 2-4%. In Meteorlake, the ratio is 3.6:1.

Let's say by Arrowlake generation the gap between Lion Cove and Skymont is only 30% per clock. So it becomes 8x Lion Cove which is 30% faster than Raptor Cove plus 16x Redwood Cove-like cores.

Then it becomes, really, really compelling. I bet it's going to be good in not just performance/mm2 but performance/watt.
 
Last edited:
  • Like
Reactions: Henry swagger

Geddagod

Golden Member
Dec 28, 2021
1,149
1,007
106
@Geddagod is saying that Redwood Cove core in Meteorlake is only 30% larger than Zen 4. That's pretty much ISO-process.

And the Crestmont core in Meteorlake is even smaller relative to the bigger neighbour Redwood Cove. Meaning it's 1/3rd the size of Zen 4. It's also likely we get greater performance gains Crestmont vs Raptormont versus Redwood Cove versus RaptorCove.

Raptorlake's E cores are relatively better than Alderlake's E cores. And Meteorlake does even better. Each generation it gets little bit better in density AND in performance.

In Alderlake, Golden Cove performed 45-50% faster per clock and density ratio was something like 3.43:1. In Raptorlake, Gracemont cores close that gap by 2-4%. In Meteorlake, the ratio is 3.6:1.

Let's say by Arrowlake generation the gap between Lion Cove and Skymont is only 30% per clock. So it becomes 8x Lion Cove which is 30% faster than Raptor Cove plus 16x Redwood Cove-like cores.

Then it becomes, really, really compelling. I bet it's going to be good in not just performance/mm2 but performance/watt.
Hold up hold up I'm saying ~30% core only excluding L2 and L3. It's prob different (as in larger) difference adding those as well.
RWC's 512KB L2 cache array is denser than Zen 4's, however RWC has 2X as much.
I don't have the L3 cache size on hand for both of them, but IIRC RWC's L3 block is both denser and smaller (3MB vs 4MB) compared to Zen 4 so they might save some space there.
 
  • Like
Reactions: Henry swagger

ondma

Platinum Member
Mar 18, 2018
2,720
1,280
136
So what are you saying that the moment you need more than 8 cores, you are in Cinebench territory and for such use-cases more little cores is always better than additional big cores?
In other words, there are no cases where 12, 16, 24, etc... big cores would be better performing solution than 8P + 16/32/etc...E cores?
Well, obviously, there are few situations where there are not at least some exceptions. A company simply has to make the best solution possible for the majority of use cases. That said, I wonder if the hybrid approach will be able to keep up with future gaming needs when (if) games require more than 8 big cores. I also would like to see some tests of streaming while gaming to see how the hybrid approach would handle that.
 

Hulk

Diamond Member
Oct 9, 1999
4,214
2,007
136
So what are you saying that the moment you need more than 8 cores, you are in Cinebench territory and for such use-cases more little cores is always better than additional big cores?
In other words, there are no cases where 12, 16, 24, etc... big cores would be better performing solution than 8P + 16/32/etc...E cores?

Yes.

If you are talking about well multi-threaded applications like Cinebench that will utilize all available threads, then for a given die area lots of small cores will perform better than a smaller number of large cores.

First of all note "for a given die area."
Also I'm specifically talking about Intel Golden/Raptor Cove "big" cores and Gracemont "E cores."

The question is nuanced.

Let's say you can have 13 P's OR 40 E's. That's about what would fit given 13900 die size.

For Cinebench R23 MT on a 13900K assuming P's are running 5.5GHz and E are running 4.3GHz at around 30 threads the all P part and all E part will perform about the same. But as the thread count increases to 40 the all E part will pull ahead as those cores are more area efficient than the big cores and we are area constrained in this example.
 
Last edited:
  • Like
Reactions: hemedans

ondma

Platinum Member
Mar 18, 2018
2,720
1,280
136
Yes.

If you are talking about well multi-threaded applications like Cinebench that will utilize all available threads, then for a given die area lots of small cores will perform better than a smaller number of large cores.

First of all note "for a given die area."
Also I'm specifically talking about Intel Golden/Raptor Cove "big" cores and Gracemont "E cores."

The question is nuanced.

Let's say you can have 13 P's OR 40 E's. That's about what would fit given 13900 die size.

If you have an application that used up to around 26 threads the all big core part is going to be more performant. And as the number of threads decreases from 26 to 1 the all big core part will pull further and further ahead of the all small core part. This is due to the fact that less HT is being used.

Now after 26 threads the big core part is out of threads but the all small core part can continue scaling all the way to 40 threads.

For Cinebench R23 MT on a 13900K assuming P's are running 5.5GHz and E are running 4.3GHz at around 30 threads the all P part and all E part will perform about the same. But as the thread count increases to 40 the all E part will pull ahead as those cores are more area efficient than the big cores and we are area constrained in this example.
IHow does an "e" core compare to a HT "core" in the big cores?
 

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,830
136
The argument for hybrid is process agnostic. For your hypothetical 12-16 GLC arrangement, they could have had 8+16/32, which with a full node shrink would absolutely destroy anything else on the market.

Amdahl's Law. There's a reason why AMD hasn't pushed past 16c for high-end consumer. The 3950x was arguably absurd for its time, but it has grown into its niche nicely (along with the other 16c parts).

Plus from a marketing standpoint, what Intel has always needed to do since Zen2 was release a product with a better core, higher clocks, the same number of cores, and (overall) better perf/watt and then ram it straight down AMD's throat. The last time they (kind of) did that was the 9900k which whipped the 2700x in everything except maybe perf/watt (though there's room for debate there).