Question Raptor Lake - Official Thread

Hulk · Dec 5, 2021

Since we already have the first Raptor Lake leak I'm thinking it should have it's own thread.
What do we know so far?
From Anandtech's Intel Process Roadmap articles from July:

Built on Intel 7 with upgraded FinFET
10-15% PPW (performance-per-watt)
Last non-tiled consumer CPU as Meteor Lake will be tiled

I'm guessing this will be a minor update to ADL with just a few microarchitecture changes to the cores. The larger change will be the new process refinement allowing 8+16 at the top of the stack.

Will it work with current z690 motherboards? If yes then that could be a major selling point for people to move to ADL rather than wait.

A/// · Mar 22, 2023

hemedans said:
Lakefield is as old as Zen 2, so definetely before Zen 2.

Q2 2020, but you missed the point of when design began, not when it released.

A/// · Mar 22, 2023

LightningZ71 said:
I dare say that hybrid design at Intel has been cooking for at least a decade. It's been a fundamental part of mobile ARM products for at least that long and while we may disrespect Intel from time to time, they're still a staggeringly smart bunch over there and wouldn't disregard a working solution in the industry just because they didn't have a current need for it. Now, on the software side, Windows being the prevailing OS platform in x86 and it having no real scheduler support for hybrid designs on the x86 side of the house until just a few years ago, we have a much more recent focus on making it work. I believe that it took a bit of Microsoft working to get Windows ARM working on the hybrid ARM designs combined with Intel needing to "catch up" for Hybrid x86 Windows to become a thing as opposed to being a more natural progression of the tech.

That is why I proposed my question. Intel has enough money and staff to throw at creative skunk projects to figure out what the next best thing. I'd put their dev time around 4th gen or earlier.

Exist50 · Mar 22, 2023

DrMrLordX said:
If not for the process, Intel could probably have had 12-16c Golden Cove/Raptor Cove and been an actual competitor. Hybrid happened because of the process.

The argument for hybrid is process agnostic. For your hypothetical 12-16 GLC arrangement, they could have had 8+16/32, which with a full node shrink would absolutely destroy anything else on the market.

Markfw · Mar 22, 2023

Exist50 said:
The argument for hybrid is process agnostic. For your hypothetical 12-16 GLC arrangement, they could have had 8+16/32, which with a full node shrink would absolutely destroy anything else on the market.

I will take that bet... 7950x/x3d are both just awesome in perf and perf/watt. Raptor on a node shrink would only improve watt and perf/watt. They might be equal worse case. Best case is that AMD would still have an edge.

Exist50 · Mar 22, 2023

Markfw said:
I will take that bet... 7950x/x3d are both just awesome in perf and perf/watt. Raptor on a node shrink would only improve watt and perf/watt. They might be equal worse case. Best case is that AMD would still have an edge.

Taking the same 8+16 config, a full node shrink would allow for comparable perf/watt and/or significantly higher peak perf. With 8+32 (as suggested), it would be a blowout.

Markfw · Mar 22, 2023

Exist50 said:
Taking the same 8+16 config, a full node shrink would allow for comparable perf/watt and/or significantly higher peak perf. With 8+32 (as suggested), it would be a blowout.

First, I did not see 8+32, but that I won't comment on, except by then, AMD may have something else.

Lets just say for now that a node shrink will help them a LOT . That I agree with,

controlflow · Mar 22, 2023

Markfw said:
First, I did not see 8+32, but that I won't comment on, except by then, AMD may have something else.

Lets just say for now that a node shrink will help them a LOT . That I agree with,

It seems probable that sooner or later AMD will adopt hybrid as well. If Intel does gain process node parity, it is going to be hard to keep up with the performance per die area possible with 16-32 future Mont cores. The power usage can be drastically reduced not just from the intrinsic benefits of the smaller node but also from not having to push the cores to very high frequencies and such unfavorable spots on the V/F curve.

I believe this notion that hybrid is some kind of fad or gimmick will be proven incorrect. The complexity of dealing with more than 1 core type is worth it to gain the massive flexibility you earn by having cores that specialize in PPA vs all out IPC/1T performance vs low power/PPW. The mobile ecosystem has already proven that this is a viable direction.

Abwx · Mar 22, 2023

Exist50 said:
Taking the same 8+16 config, a full node shrink would allow for comparable perf/watt and/or significantly higher peak perf. With 8+32 (as suggested), it would be a blowout.

8 + 32 is equivalent to 24 P cores and would perform poorly with softwares using 8 to 24 T, because everything is not Cinebench.

Assuming 24 threads used fully then the latter would perform at 2400 while the 8 + 32 would be at 1600 comparatively.

This is already the case with current SKUs, a 7950X using 16 threads perform at say 1600 while the 13900K is at 1280.

All in all the hybrid concept doesnt work for the PC world, mainly because power is at a much higher level than in smartphones and that what is optimal for the ultra mobile segment become subpar with PC s high throughput capabilities.

Hulk · Mar 22, 2023

Abwx said:
8 + 32 is equivalent to 24 P cores and would perform poorly with softwares using 8 to 24 T, because everything is not Cinebench.

Assuming 24 threads used fully then the latter would perform at 2400 while the 8 + 32 would be at 1600 comparatively.

This is already the case with current SKUs, a 7950X using 16 threads perform at say 1600 while the 13900K is at 1280.

All in all the hybrid concept doesnt work for the PC world, mainly because power is at a much higher level than in smartphones and that what is optimal for the ultra mobile segment become subpar with PC s high throughput capabilities.

Like many people around here you misunderstand the point of the hybrid approach. It's not power efficiency, it's area efficiency. The fact that Intel is at least a node behind TMSC/AMD parts and is competitive in performance proves the hybrid approach works.

Without hybrid Intel loses to AMD on both performance and power efficiency. With it they only lose on efficiency. Believe it or not the Intel engineers are pretty on the ball.

Exist50 · Mar 22, 2023

Abwx said:
8 + 32 is equivalent to 24 P cores and would perform poorly with softwares using 8 to 24 T, because everything is not Cinebench.

It wouldn't perform "poorly"; it just wouldn't be fully utilized. Just like any CPU with such a thread count. I was responding to someone primarily focused on embarrassingly parallel tasks, so not really relevant.

Abwx said:
This is already the case with current SKUs, a 7950X using 16 threads perform at say 1600 while the 13900K is at 1280.

Why did you specifically choose 16 threads, instead of, say, 8?

Abwx said:
All in all the hybrid concept doesnt work for the PC world, mainly because power is at a much higher level than in smartphones and that what is optimal for the ultra mobile segment become subpar with PC s high throughput capabilities.

Huh? What's your argument here? That workloads simply are not well threaded enough to ever use >16 or >24 cores?

A/// · Mar 23, 2023

Markfw said:
First, I did not see 8+32, but that I won't comment on, except by then, AMD may have something else.

I brought it up months ago based on an ancient rumor that got peddled around several years ago when we first learned of alderlake.

deasd · Mar 23, 2023

Hulk said:
Without hybrid Intel loses to AMD on both performance and power efficiency. With it they only lose on efficiency. Believe it or not the Intel engineers are pretty on the ball.

but in mobile 7945HX MT perf still outperform 13980HX by 5% despite 13980HX has DDR5600 while 7945HX has DDR4800. Efficiency is out of question.

AMD Ryzen 9 7945HX "Dragon Range" processor has better multi-core performance at less power than Intel Core i9-13980HX - VideoCardz.com

AMD Dragon Range claims performance crown at less power than Intel Raptor Lake-HX Notebookcheck has the very first review of AMD Zen4 laptop CPU from 7045HX range. ASUS 2023 ROG Zephyrus Duo 16, Source: Notebookcheck The 16-core Zen4 Raphael has now been tested by the Notebookcheck team who...

videocardz.com

everything make a look like Intel implement E cores due to process node disadvantage. And XEON/WS/SapphireRapid which are not well known to public they decided to delay-delay-delay and avoid E cores which could make launch earlier, due to unknown reason(AVX512?)

Abwx · Mar 23, 2023

Exist50 said:
It wouldn't perform "poorly"; it just wouldn't be fully utilized. Just like any CPU with such a thread count. I was responding to someone primarily focused on embarrassingly parallel tasks, so not really relevant.

Why did you specifically choose 16 threads, instead of, say, 8?

Huh? What's your argument here? That workloads simply are not well threaded enough to ever use >16 or >24 cores?

Notice that i talked of threads, not of cores.

If we put the ST perf of a P core at 100 then with 8 threads both a 16 P like the 7950X or a 8P + 16E 13900K will perform similarly.

From 8 to 16 threads the 16P will scale at 100 for each added thread while the 8 + 16 will scale by 60 due to the lower E core IPC.

So as thread count increase the 16P will perform more and more better with the difference culminating at 16 threads and 25% better throughput.

A 8 + 32 would lag even more comparatively to a 24P as at 24 threads the difference would be 50%.

So once a software invole more than 8 threads and up to 16 threads the 16P will fare better, and we know that current softwares are more likely to scale decently up to 16 threads if not less like games.

dark zero · Mar 23, 2023

Markfw said:
First, I did not see 8+32, but that I won't comment on, except by then, AMD may have something else.

Lets just say for now that a node shrink will help them a LOT . That I agree with,

I can see in a more realistic way a 16+8 or even 16+16 chip, but it might be really big to start.

Timmah! · Mar 23, 2023

Exist50 said:
The argument for hybrid is process agnostic. For your hypothetical 12-16 GLC arrangement, they could have had 8+16/32, which with a full node shrink would absolutely destroy anything else on the market.

So what are you saying that the moment you need more than 8 cores, you are in Cinebench territory and for such use-cases more little cores is always better than additional big cores?
In other words, there are no cases where 12, 16, 24, etc... big cores would be better performing solution than 8P + 16/32/etc...E cores?

Kocicak · Mar 23, 2023

I can guarantee that if AMD had two kinds of chiplets, one with large and the other with small cores, they would be selling these 8 + 24-32 core monsters like hot cakes.

And the 8 + 8 and 24-32 + 24-32 combinations would find their customers too.

IntelUser2000 · Mar 23, 2023

@Geddagod is saying that Redwood Cove core in Meteorlake is only 30% larger than Zen 4. That's pretty much ISO-process.

And the Crestmont core in Meteorlake is even smaller relative to the bigger neighbour Redwood Cove. Meaning it's 1/3rd the size of Zen 4. It's also likely we get greater performance gains Crestmont vs Raptormont versus Redwood Cove versus RaptorCove.

Raptorlake's E cores are relatively better than Alderlake's E cores. And Meteorlake does even better. Each generation it gets little bit better in density AND in performance.

In Alderlake, Golden Cove performed 45-50% faster per clock and density ratio was something like 3.43:1. In Raptorlake, Gracemont cores close that gap by 2-4%. In Meteorlake, the ratio is 3.6:1.

Let's say by Arrowlake generation the gap between Lion Cove and Skymont is only 30% per clock. So it becomes 8x Lion Cove which is 30% faster than Raptor Cove plus 16x Redwood Cove-like cores.

Then it becomes, really, really compelling. I bet it's going to be good in not just performance/mm2 but performance/watt.

A/// · Mar 23, 2023

I'm more interested in rpl cache refresh.

Geddagod · Mar 23, 2023

IntelUser2000 said:
@Geddagod is saying that Redwood Cove core in Meteorlake is only 30% larger than Zen 4. That's pretty much ISO-process.

And the Crestmont core in Meteorlake is even smaller relative to the bigger neighbour Redwood Cove. Meaning it's 1/3rd the size of Zen 4. It's also likely we get greater performance gains Crestmont vs Raptormont versus Redwood Cove versus RaptorCove.

Raptorlake's E cores are relatively better than Alderlake's E cores. And Meteorlake does even better. Each generation it gets little bit better in density AND in performance.

In Alderlake, Golden Cove performed 45-50% faster per clock and density ratio was something like 3.43:1. In Raptorlake, Gracemont cores close that gap by 2-4%. In Meteorlake, the ratio is 3.6:1.

Let's say by Arrowlake generation the gap between Lion Cove and Skymont is only 30% per clock. So it becomes 8x Lion Cove which is 30% faster than Raptor Cove plus 16x Redwood Cove-like cores.

Then it becomes, really, really compelling. I bet it's going to be good in not just performance/mm2 but performance/watt.

Hold up hold up I'm saying ~30% core only excluding L2 and L3. It's prob different (as in larger) difference adding those as well.
RWC's 512KB L2 cache array is denser than Zen 4's, however RWC has 2X as much.
I don't have the L3 cache size on hand for both of them, but IIRC RWC's L3 block is both denser and smaller (3MB vs 4MB) compared to Zen 4 so they might save some space there.

ondma · Mar 23, 2023

Timmah! said:
So what are you saying that the moment you need more than 8 cores, you are in Cinebench territory and for such use-cases more little cores is always better than additional big cores?
In other words, there are no cases where 12, 16, 24, etc... big cores would be better performing solution than 8P + 16/32/etc...E cores?

Well, obviously, there are few situations where there are not at least some exceptions. A company simply has to make the best solution possible for the majority of use cases. That said, I wonder if the hybrid approach will be able to keep up with future gaming needs when (if) games require more than 8 big cores. I also would like to see some tests of streaming while gaming to see how the hybrid approach would handle that.

Hulk · Mar 23, 2023

Timmah! said:
So what are you saying that the moment you need more than 8 cores, you are in Cinebench territory and for such use-cases more little cores is always better than additional big cores?
In other words, there are no cases where 12, 16, 24, etc... big cores would be better performing solution than 8P + 16/32/etc...E cores?

Yes.

If you are talking about well multi-threaded applications like Cinebench that will utilize all available threads, then for a given die area lots of small cores will perform better than a smaller number of large cores.

First of all note "for a given die area."
Also I'm specifically talking about Intel Golden/Raptor Cove "big" cores and Gracemont "E cores."

The question is nuanced.

Let's say you can have 13 P's OR 40 E's. That's about what would fit given 13900 die size.

For Cinebench R23 MT on a 13900K assuming P's are running 5.5GHz and E are running 4.3GHz at around 30 threads the all P part and all E part will perform about the same. But as the thread count increases to 40 the all E part will pull ahead as those cores are more area efficient than the big cores and we are area constrained in this example.

ondma · Mar 23, 2023

Hulk said:
Yes.

If you are talking about well multi-threaded applications like Cinebench that will utilize all available threads, then for a given die area lots of small cores will perform better than a smaller number of large cores.

First of all note "for a given die area."
Also I'm specifically talking about Intel Golden/Raptor Cove "big" cores and Gracemont "E cores."

The question is nuanced.

Let's say you can have 13 P's OR 40 E's. That's about what would fit given 13900 die size.

If you have an application that used up to around 26 threads the all big core part is going to be more performant. And as the number of threads decreases from 26 to 1 the all big core part will pull further and further ahead of the all small core part. This is due to the fact that less HT is being used.

Now after 26 threads the big core part is out of threads but the all small core part can continue scaling all the way to 40 threads.

For Cinebench R23 MT on a 13900K assuming P's are running 5.5GHz and E are running 4.3GHz at around 30 threads the all P part and all E part will perform about the same. But as the thread count increases to 40 the all E part will pull ahead as those cores are more area efficient than the big cores and we are area constrained in this example.

IHow does an "e" core compare to a HT "core" in the big cores?

igor_kavinski · Mar 23, 2023

ondma said:
IHow does an "e" core compare to a HT "core" in the big cores?

HT is the weakest core in the hybrid core hierarchy.

IntelUser2000 · Mar 23, 2023

Geddagod said:
Hold up hold up I'm saying ~30% core only excluding L2 and L3. It's prob different (as in larger) difference adding those as well.

SRAM is a different thing so L3 caches are not counted, and for better accuracy L2 is quite large so even that's omitted.

DrMrLordX · Mar 23, 2023

Exist50 said:
The argument for hybrid is process agnostic. For your hypothetical 12-16 GLC arrangement, they could have had 8+16/32, which with a full node shrink would absolutely destroy anything else on the market.

Amdahl's Law. There's a reason why AMD hasn't pushed past 16c for high-end consumer. The 3950x was arguably absurd for its time, but it has grown into its niche nicely (along with the other 16c parts).

Plus from a marketing standpoint, what Intel has always needed to do since Zen2 was release a product with a better core, higher clocks, the same number of cores, and (overall) better perf/watt and then ram it straight down AMD's throat. The last time they (kind of) did that was the 9900k which whipped the 2700x in everything except maybe perf/watt (though there's room for debate there).

Question Raptor Lake - Official Thread

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Moderator Emeritus, Elite Member

Platinum Member

Moderator Emeritus, Elite Member

Member

Lifer

Diamond Member

Platinum Member

Diamond Member

Senior member

Lifer

Platinum Member

Golden Member

Golden Member

Elite Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Elite Member

Lifer