Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Tigerick · Aug 22, 2022

Wildcat Lake (WCL) Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing Raptor Lake-U. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q1 2026.

	Intel Raptor Lake U	Intel Wildcat Lake 15W?	Intel Lunar Lake	Intel Panther Lake 4+4+4
Launch Date	Q1-2024	Q2-2026	Q3-2024	Q1-2026
Model	Intel 150U	Intel Core 7	Core Ultra 7 268V	Core Ultra 7 365
Dies	2	2	2	3
Node	Intel 7 + ?	Intel 18-A + TSMC N6	TSMC N3B + N6	Intel 18-A + Intel 3 + TSMC N6

CPU	2 P-core + 8 E-cores	2 P-core + 4 LP E-cores	4 P-core + 4 LP E-cores	4 P-core + 4 LP E-cores
Threads	12	6	8	8
Max Clock	5.4 GHz	?	5 GHz	4.8 GHz
L3 Cache	12 MB		12 MB	12 MB
TDP	15 - 55 W	15 W ?	17 - 37 W	25 - 55 W

Memory	128-bit LPDDR5-5200	64-bit LPDDR5	128-bit LPDDR5x-8533	128-bit LPDDR5x-7467
Size	96 GB		32 GB	128 GB
Bandwidth			136 GB/s

GPU	Intel Graphics	Intel Graphics	Arc 140V	Intel Graphics
RT	No	No	YES	YES
EU / Xe	96 EU	2 Xe	8 Xe	4 Xe
Max Clock	1.3 GHz	?	2 GHz	2.5 GHz

NPU	GNA 3.0	18 TOPS	48 TOPS	49 TOPS

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

IntelUser2000 · Nov 6, 2022

Geddagod said:
Ye but core architecture also plays a part in clock frequency. Different architectures have different frequency curves and max frequencies. So would a longer pipelined architecture allow for higher max frequency or just better frequency at iso power?

The reason for my post is that higher clocks always results in higher power consumption, because you are raising the clock of the entire CPU core. You need some radical differences(like Pentium 4 vs Pentium M) before one is more "efficient" per MHz. Actually Pentium M vs Pentium 4 is solid evidence that hyper pipelined CPUs use way more power per MHz.

Voltage scaling is pretty much dead. At the load clocks you aren't reducing voltage to any significant degree, if at all. So the whole thing about using deeper pipelines to clock higher so you can save power is thrown out the window.

Besides, extreme pipelined CPUs basically did not meet a single goal of the designers. Higher clocks? Barely. Efficient? Think opposite. Streamlined? Nope, it's more complex.

Realistically when you increase pipeline stages a lot all you get is lower performance per clock while noticeably increasing transistor count, die size, and power use. Look at Power 6, In-order Atoms(pre-22nm), Bulldozer, and Netburst uarch CPUs. The successors performed better, used less power and was simpler!

Thunder 57 · Nov 6, 2022

A/// said:
I know what a moonshot is. I was asking if it was a new core intel was designing. Zen possibly, core not really.

That wasn't obvious from your post.

IntelUser2000 · Nov 6, 2022

A/// said:
30% IPC gain over meteorlake or raptorlake?

Meteorlake.

@mikk We're getting 2 straight years of small gains and we have evidence of BIG changes with the Lion Cove core. It would actually be a disappointment but I guess it won't be a huge surprise since usually always disappoints.

It'll also be weird to see why they would add TWO extra decoders if they could just get away with 1.

Exist50 · Nov 6, 2022

DrMrLordX said:
This reason is another one why Intel in particular isn't keen on fixed-function hardware blocks for video encode/decode. They already leverage the iGPU.

The media block is an independent IP on most SoCs I'm aware of. You can use the GPU for hybrid decode, but I think that's relatively rare.

Exist50 · Nov 6, 2022

A/// said:
and this raichu person has never been wrong? Why should I place more faith in this person over that person from august?

¯\_(ツ)_/¯ then don't. For myself, I'm stating "Lion Cove is not Royal" in the same way I'd say "The sky is blue". It'll all bear out in due time.

A/// said:
I know what a moonshot is. I was asking if it was a new core intel was designing. Zen possibly, core not really.

Then yes, Royal is a new core.

BorisTheBlade82 · Nov 6, 2022

Geddagod said:
I'm guessing the problem with server is that Intel tiles have large sections of the tile stuffed with EMIB connectors, but also stuff like IO, which for AMD is moved off to it's own chiplet.

Yes, Intel uses about 20% IIRC for the EMIB interconnect alone. This is a massive bandwidth considering how dense in Gbps/mm2 the connection is. They need this in order to make all the resources of one tile available to another tile. (https://www.anandtech.com/show/1692...nextgen-xeon-scalable-gets-a-tiling-upgrade/2)
Interestingly, AMD with its topology does not need these bandwidths and still there does not seem to be a big impact on performance.
When I first heard of SPR my initial impression was that Intel might turn around DC with it. And in 2020 that might have worked. But Milan and Genoa show the superiority of AMD's approach and made Intel rethink theirs as well.

mikk · Nov 6, 2022

IntelUser2000 said:
Meteorlake.

@mikk We're getting 2 straight years of small gains and we have evidence of BIG changes with the Lion Cove core. It would actually be a disappointment but I guess it won't be a huge surprise since usually always disappoints.

It'll also be weird to see why they would add TWO extra decoders if they could just get away with 1.

I would like to add that the chief architect of Intel's performance core recently said we will see bigger and bigger jumps after Raptor Lake and Meteor Lake. Coupled with the stronger competition and fixed 8 big core count for now (thanks to big little), I can believe we might see bigger improvements compared to the past. Intel was stuck on 14nm and 10nm for many years. We will see Intel 4/TSMC 3nm/20A/18A in a relatively short timeframe which allows investing in more transistors and bigger achitectures in the next few years.

Exist50 · Nov 6, 2022

BorisTheBlade82 said:
Yes, Intel uses about 20% IIRC for the EMIB interconnect alone. This is a massive bandwidth considering how dense in Gbps/mm2 the connection is. They need this in order to make all the resources of one tile available to another tile. (https://www.anandtech.com/show/1692...nextgen-xeon-scalable-gets-a-tiling-upgrade/2)
Interestingly, AMD with its topology does not need these bandwidths and still there does not seem to be a big impact on performance.
When I first heard of SPR my initial impression was that Intel might turn around DC with it. And in 2020 that might have worked. But Milan and Genoa show the superiority of AMD's approach and made Intel rethink theirs as well.

I think the topology differences are more about cost than anything else. AMD pays some power/area overhead with the SERDES links and extra L3, but gains better yields, a relatively cheap IO die, and avoids some the cost of advanced packaging. But if SPR had launched around when Milan did, performance would likely not have been a major issue for Intel. It's the delays, independent of chiplet strategy, that have sunk their performance competitiveness.

Though that said, it's difficult to assess the pros and cons of each from the fairly limited testing most outlets perform. The greatest weakness of AMD's chiplet strategy would be things like bin-packing VMs with only 8c granularities per CCX. You're not going to see that kind of stuff from Cinebench, Geekbench, or SPEC. But clearly those are fairly minor issues in the big picture.

I think GNR vs Turin will make for some very interesting comparisons. Should be roughly iso-process, and I expect AMD to have a core uarch advantage if the RWC+ rumor is true, but topologically, seems like Intel's still using large tiles.

Hulk · Nov 6, 2022

I don't see how a 6+8 Meteor Lake could compete with 8+16 Raptor Lake? I've read that there will probably be a bit of a clock speed regression in moving to Intel 4 so some ground may be lost there.

It would require a 33% IPC increase in the P cores for 6 to equal 8. As we all know, that's enormous.
And even if Hypertheading were enabled on the E's, which provided about 26% MT uplift for Skylake that would still mean 12.7 E's would be required for parity.

Also I remember reading (can't find it though) that HT is a good way to increase MT performance from both an area and power point-of-view since it is basically exploiting unused CPU resources. Seems like a good fit for the E's, no? The only thing I'm thinking is that logical cores are already weaker than physical ones and in a hypothetical 8+16 arrangement you'd have 8 "weaker" logical P cores and 16 "weaker yet" logical E cores. So, in order to make good use of all of those threads you'd need a really well optimized MT application, and many of them don't exist outside of benchmarks so that is why Intel has not gone down this path?

Intel has set itself a pretty high performance bar with the 13900K. Or more correctly AMD forced their hand in setting this bar. Now they have to figure a way to jump it on their next pass of the track.

This feels similar to the situation with 10900K to 11900K if 6P core rumors are true for ML-S.

nicalandia · Nov 6, 2022

Hulk said:
I don't see how a 6+8 Meteor Lake could compete with 8+16 Raptor Lake?

They can't. the e cores are not getting 100% IPC boost.

But nothing prevents them from making 6P + 16e

https://twitter.com/x/status/1535725309265424389

Exist50 · Nov 6, 2022

Hulk said:
I don't see how a 6+8 Meteor Lake could compete with 8+16 Raptor Lake?

The latest rumor is 6+16, fwiw.

Hulk said:
It would require a 33% IPC increase in the P cores for 6 to equal 8. As we all know, that's enormous.

Now, I'm not going to claim to know how MTL will compare to RPL in everything, but they wouldn't need such an IPC increase. Even with no IPC gains, you can use the performance gains from the new node for better clocks at iso-power. If Intel 4 were worse than Intel 7 across the VF curve, it would be DOA.

Hulk said:
Also I remember reading (can't find it though) that HT is a good way to increase MT performance from both an area and power point-of-view since it is basically exploiting unused CPU resources. Seems like a good fit for the E's, no?

I think the reality tends to be a bit more complicated.

igor_kavinski · Nov 6, 2022

Hulk said:
Also I remember reading (can't find it though) that HT is a good way to increase MT performance from both an area and power point-of-view since it is basically exploiting unused CPU resources. Seems like a good fit for the E's, no?

I recall that HT needs 5% of die area for implementation. Do the E-cores have it implemented in their silicon? Coz that would increase the size of the E-core cluster by 20% per cluster and 80% overall for all the clusters combined for 16 cores. Seems wasteful if it's there but disabled.

Further, the extra threads would increase pressure on their shared cache. They would also need more bandwidth from RAM coz the extra threads need to be fed with data. All of this activity will produce extra heat in the already crammed area taken up by the closely packed E-cores. It's possible that Intel has tried this already and the cons outweighed the pros. Maybe in future when they are able to refine E-cores further.

Exist50 · Nov 6, 2022

igor_kavinski said:
I recall that HT needs 5% of die area for implementation. Do the E-cores have it implemented in their silicon? Coz that would increase the size of the E-core cluster by 20% per cluster and 80% overall for all the clusters combined for 16 cores. Seems wasteful if it's there but disabled.

Uh, that's not how the math worths. Taking those numbers at face value, a 5% area increase per core would increase the area of a cluster by 20% of a single core's area, not 20% flat.

Doug S · Nov 6, 2022

Exist50 said:
It's a question of overhead. With zero overhead, more pipestages reduced your critical path, giving you proportionally more speed OR you can lower the voltage for the same speed (saving power), or any combination of the two. But as others have pointed out, the flops between each stage add power, timing, and performance overhead, so there's a balance. IIRC, roughly 16 FO4 delay has been something of a floor, but I don't recall where/when I heard that, so take it with a grain of salt.

To expand on this a little, there is some engineering margin or "slop factor" in every pipe stage, because the work in a stage MUST complete during the clock cycle. Some stages may have tighter timing margins and others looser, depending on how much work there is a particular stage for a particular function.

So e.g. splitting up a 15 stage pipeline into 30 stages won't let you double your frequency, because that engineering margin "slop factor" is paid 30x instead of 15x.

If asynchronous CPUs ever became a thing then this wouldn't be a problem because you'd wouldn't have that wasted time in each cycle, and without a clock network you'd save that power too (though that's probably largely paid back or even more than paid back by the latching network that would replace it)

nicalandia · Nov 6, 2022

igor_kavinski said:
I recall that HT needs 5% of die area for implementation. Do the E-cores have it implemented in their silicon? Coz that would increase the size of the E-core cluster by 20% per cluster

Exist50 said:
Uh, that's not how the math worths. Taking those numbers at face value, a 5% area increase per core would increase the area of a cluster by 20% of a single core's area, not 20% flat.

To be honest a 5% per core would translate to 5% per cluster.

But the issue is the e core's design. They are simply not design for neither HT nor AVX-512, as you have seen from Meteor Lake and Arrow Lake diagram they follow the same design philosophy.

igor_kavinski · Nov 6, 2022

Guys, I ain't no math genius so Nicalandia and Exist50, show me your homework. Why are both of you not in agreement?

Exist50 · Nov 6, 2022

nicalandia said:
To be honest a 5% per core would translate to 5% per cluster.

I wanted to avoid any dependency on module overhead. And also show where than "20%" could mistakenly come from.

igor_kavinski said:
Guys, I ain't no math genius so Nicalandia and Exist50, show me your homework. Why are both of you not in agreement?

Say you have a 100mm2 core, just to make the math prettier. 5% HT overhead on top would be 0.05 * 100mm2 = 5mm2. If you have four cores, you have 4 x 5mm2 = 20mm2 for HT, but that's on top of 4 x 100mm2 = 400mm2 baseline. 20mm2/400mm2 = 5mm2/100mm2 = 5%.

Or perhaps more intuitively, if you increase the area of part of the die by 5%, you'll always get ≤5% for the die as a whole.

nicalandia · Nov 6, 2022

igor_kavinski said:
Guys, I ain't no math genius so Nicalandia and Exist50, show me your homework. Why are both of you not in agreement?

Let's use MTL Crestmont e core as an example.

1 e core size is 1.046, lets round that to 1mm^2
1 Quad Cluster size is 5.907 mm^2 , but for illustrations purposes we will say 4 mm^2 to keep numbers even

e core die area is about 1 mm^2 a 5% increase on die area is 1.05 mm^2 right? so 1.05 x 4 = 4.2 mm^2 and 4.2/4 is 1.05 which is 5%...

So at worst a 5% increase in die area per core would translate to 5% die area per cluster.

igor_kavinski · Nov 6, 2022

Exist50 said:
Uh, that's not how the math worths. Taking those numbers at face value, a 5% area increase per core would increase the area of a cluster by 20% of a single core's area, not 20% flat.

Ah! I understand it now. Thanks!

5% of single core IS NOT 5% of cluster. DOH!

nicalandia · Nov 6, 2022

igor_kavinski said:
Ah! I understand it now. Thanks!

5% of single core IS NOT 5% of cluster. DOH!

Well.... 5% die area is still 5% die area regardless of how many clusters they put, as far as I am ware MTL-S will have 4 quad cluster at the very top of the SKU(14900K) still 5% is really nothing when compared to the theoretical maximum MT performance boost which is 15%-30%...

Except those e cores were not designed with HT and AVX-512 in mind.. Intel pulled a 6% IPC boost from Gracemont to Raptormont just by doubling the L2(I don't have the exact die area size so it could be more than 5%) I suspect that the MTL-S will have 6MiB per cluster, faster internal ring bus and we could see double digit IPC boosts on those e cores.

Hulk · Nov 6, 2022

Exist50 said:
The latest rumor is 6+16, fwiw.

6+16 along with the rumors that while the P's are lightly upgraded the E's are significantly enhanced makes sense.
First, Intel is probably noticing that many apps that only really rely on 8 cores, can do as well with 6 and if those 6 cores have better IPC than Raptor then it's all the better.
Second, as we move into the future applications are getting better at MT so moving some more compute to the E's makes sense. 16 greatly enhance E's would be very beneficial to having ML surpass RPL. Currently 1 P is worth about 2 E's based on IPC alone, more when you figure in the increased clocks on the P's. If they could reduce that disparity by 15 or 20%.. well, there you go. At this point in the hybrid Big.Little development I would think that there is lower hanging fruit on the E trees compared to the P trees

poke01 · Nov 6, 2022

Arrow Lake will be Intel's M1 that is it be a wide arch like Firestorm. Guess what Firestorm clocks at 3.2Ghz. Arrow lake will be perfect for ultrabooks. (If Lunar Lake is based on Lion Cove no wonder Intel is aiming for pref/w leadership)

High IPC, low clocks but still very powerful. I never liked high clocks arch or as some people say speed demons. These sort archs like Golden Cove are horrible in laptops.

So we will have 2 chip designers that will have a 8-wide chip in 2024. Like Exist50 says, Arrow Lake will see Intel gain good ground.

Exist50 · Nov 6, 2022

poke01 said:
Arrow Lake will be Intel's M1 that is it be a wide arch like Firestorm. Guess what Firestorm clocks at 3.2Ghz. Arrow lake will be perfect for ultrabooks. (If Lunar Lake is based on Lion Cove no wonder Intel is aiming for pref/w leadership)

High IPC, low clocks but still very powerful. I never liked high clocks arch or as some people say speed demons. These sort archs like Golden Cove are horrible in laptops.

So we will have 2 chip designers that will have a 8-wide chip in 2024. Like Exist50 says, Arrow Lake will see Intel gain good ground.

Well slow down there a second. I really don't know what IPC gains we'll see with Lion Cove. Certainly bigger than RWC, but beyond that, not sure. I do expect it to look a whole lot better from a PPA standpoint, but how all that shakes out is very tbd. I would be surprised if it's <6GHz on 20A, however.

nicalandia · Nov 6, 2022

Meteor Lake on Top of Raptor Lake

Alder Lake On Top of Raptor Lake

Meteor Lake and Alder Lake appear to be very similar on design layout(At same size for comparison)

Geddagod · Nov 6, 2022

poke01 said:
Arrow Lake will be Intel's M1 that is it be a wide arch like Firestorm. Guess what Firestorm clocks at 3.2Ghz. Arrow lake will be perfect for ultrabooks. (If Lunar Lake is based on Lion Cove no wonder Intel is aiming for pref/w leadership)

High IPC, low clocks but still very powerful. I never liked high clocks arch or as some people say speed demons. These sort archs like Golden Cove are horrible in laptops.

So we will have 2 chip designers that will have a 8-wide chip in 2024. Like Exist50 says, Arrow Lake will see Intel gain good ground.

I think Arrow Lake will still have high clocks. When Intel went wider with Golden Cove, they still managed to keep 5 Ghz, same as Willow cove. They also used a better process, arrow lake will also use a better process compared to meteor lake.
Apple and Intel being 8 wide is nice, but you gotta wonder what about zen 5. Decode Width isn't everything, but I think zen 5 is going to end up being less wide than lion cove. I really don't think AMD is going to double their width in one generation with zen 5.

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Senior member

Attachments

Elite Member

Diamond Member

Elite Member

Platinum Member

Platinum Member

Senior member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Lifer

Platinum Member

Diamond Member

Diamond Member

Lifer

Platinum Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Golden Member