Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 25 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
846
799
106
Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

Intel Alder Lake - NIntel Wildcat LakeIntel Lunar LakeMediatek D9500
Launch DateQ1-2023Q2-2026 ?Q3-2024Q3-2025
ModelIntel N300?Core Ultra 7 268VDimensity 9500 5G
Dies2221
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6TSMC N3P
CPU8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-coresC1 1+3+4
Threads8688
Max Clock3.8 GHz?5 GHz
L3 Cache6 MB?12 MB
TDP7 WFanless ?17 WFanless
Memory64-bit LPDDR5-480064-bit LPDDR5-6800 ?128-bit LPDDR5X-853364-bit LPDDR5X-10667
Size16 GB?32 GB24 GB ?
Bandwidth~ 55 GB/s136 GB/s85.6 GB/s
GPUUHD GraphicsArc 140VG1 Ultra
EU / Xe32 EU2 Xe8 Xe12
Max Clock1.25 GHz2 GHz
NPUNA18 TOPS48 TOPS100 TOPS ?






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,028
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,522
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,430
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,318
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Ye but core architecture also plays a part in clock frequency. Different architectures have different frequency curves and max frequencies. So would a longer pipelined architecture allow for higher max frequency or just better frequency at iso power?

The reason for my post is that higher clocks always results in higher power consumption, because you are raising the clock of the entire CPU core. You need some radical differences(like Pentium 4 vs Pentium M) before one is more "efficient" per MHz. Actually Pentium M vs Pentium 4 is solid evidence that hyper pipelined CPUs use way more power per MHz.

Voltage scaling is pretty much dead. At the load clocks you aren't reducing voltage to any significant degree, if at all. So the whole thing about using deeper pipelines to clock higher so you can save power is thrown out the window.

Besides, extreme pipelined CPUs basically did not meet a single goal of the designers. Higher clocks? Barely. Efficient? Think opposite. Streamlined? Nope, it's more complex.

Realistically when you increase pipeline stages a lot all you get is lower performance per clock while noticeably increasing transistor count, die size, and power use. Look at Power 6, In-order Atoms(pre-22nm), Bulldozer, and Netburst uarch CPUs. The successors performed better, used less power and was simpler!
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
30% IPC gain over meteorlake or raptorlake?

Meteorlake.

@mikk We're getting 2 straight years of small gains and we have evidence of BIG changes with the Lion Cove core. It would actually be a disappointment but I guess it won't be a huge surprise since usually always disappoints.

It'll also be weird to see why they would add TWO extra decoders if they could just get away with 1.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
This reason is another one why Intel in particular isn't keen on fixed-function hardware blocks for video encode/decode. They already leverage the iGPU.
The media block is an independent IP on most SoCs I'm aware of. You can use the GPU for hybrid decode, but I think that's relatively rare.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
and this raichu person has never been wrong? Why should I place more faith in this person over that person from august?
¯\_(ツ)_/¯ then don't. For myself, I'm stating "Lion Cove is not Royal" in the same way I'd say "The sky is blue". It'll all bear out in due time.
I know what a moonshot is. I was asking if it was a new core intel was designing. Zen possibly, core not really.
Then yes, Royal is a new core.
 

BorisTheBlade82

Senior member
May 1, 2020
707
1,130
136
I'm guessing the problem with server is that Intel tiles have large sections of the tile stuffed with EMIB connectors, but also stuff like IO, which for AMD is moved off to it's own chiplet.
Yes, Intel uses about 20% IIRC for the EMIB interconnect alone. This is a massive bandwidth considering how dense in Gbps/mm2 the connection is. They need this in order to make all the resources of one tile available to another tile. (https://www.anandtech.com/show/1692...nextgen-xeon-scalable-gets-a-tiling-upgrade/2)
Interestingly, AMD with its topology does not need these bandwidths and still there does not seem to be a big impact on performance.
When I first heard of SPR my initial impression was that Intel might turn around DC with it. And in 2020 that might have worked. But Milan and Genoa show the superiority of AMD's approach and made Intel rethink theirs as well.
 

mikk

Diamond Member
May 15, 2012
4,296
2,382
136
Meteorlake.

@mikk We're getting 2 straight years of small gains and we have evidence of BIG changes with the Lion Cove core. It would actually be a disappointment but I guess it won't be a huge surprise since usually always disappoints.

It'll also be weird to see why they would add TWO extra decoders if they could just get away with 1.


I would like to add that the chief architect of Intel's performance core recently said we will see bigger and bigger jumps after Raptor Lake and Meteor Lake. Coupled with the stronger competition and fixed 8 big core count for now (thanks to big little), I can believe we might see bigger improvements compared to the past. Intel was stuck on 14nm and 10nm for many years. We will see Intel 4/TSMC 3nm/20A/18A in a relatively short timeframe which allows investing in more transistors and bigger achitectures in the next few years.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
Yes, Intel uses about 20% IIRC for the EMIB interconnect alone. This is a massive bandwidth considering how dense in Gbps/mm2 the connection is. They need this in order to make all the resources of one tile available to another tile. (https://www.anandtech.com/show/1692...nextgen-xeon-scalable-gets-a-tiling-upgrade/2)
Interestingly, AMD with its topology does not need these bandwidths and still there does not seem to be a big impact on performance.
When I first heard of SPR my initial impression was that Intel might turn around DC with it. And in 2020 that might have worked. But Milan and Genoa show the superiority of AMD's approach and made Intel rethink theirs as well.
I think the topology differences are more about cost than anything else. AMD pays some power/area overhead with the SERDES links and extra L3, but gains better yields, a relatively cheap IO die, and avoids some the cost of advanced packaging. But if SPR had launched around when Milan did, performance would likely not have been a major issue for Intel. It's the delays, independent of chiplet strategy, that have sunk their performance competitiveness.

Though that said, it's difficult to assess the pros and cons of each from the fairly limited testing most outlets perform. The greatest weakness of AMD's chiplet strategy would be things like bin-packing VMs with only 8c granularities per CCX. You're not going to see that kind of stuff from Cinebench, Geekbench, or SPEC. But clearly those are fairly minor issues in the big picture.

I think GNR vs Turin will make for some very interesting comparisons. Should be roughly iso-process, and I expect AMD to have a core uarch advantage if the RWC+ rumor is true, but topologically, seems like Intel's still using large tiles.
 
  • Like
Reactions: Tlh97 and Saylick

Hulk

Diamond Member
Oct 9, 1999
5,138
3,727
136
I don't see how a 6+8 Meteor Lake could compete with 8+16 Raptor Lake? I've read that there will probably be a bit of a clock speed regression in moving to Intel 4 so some ground may be lost there.

It would require a 33% IPC increase in the P cores for 6 to equal 8. As we all know, that's enormous.
And even if Hypertheading were enabled on the E's, which provided about 26% MT uplift for Skylake that would still mean 12.7 E's would be required for parity.

Also I remember reading (can't find it though) that HT is a good way to increase MT performance from both an area and power point-of-view since it is basically exploiting unused CPU resources. Seems like a good fit for the E's, no? The only thing I'm thinking is that logical cores are already weaker than physical ones and in a hypothetical 8+16 arrangement you'd have 8 "weaker" logical P cores and 16 "weaker yet" logical E cores. So, in order to make good use of all of those threads you'd need a really well optimized MT application, and many of them don't exist outside of benchmarks so that is why Intel has not gone down this path?

Intel has set itself a pretty high performance bar with the 13900K. Or more correctly AMD forced their hand in setting this bar. Now they have to figure a way to jump it on their next pass of the track.

This feels similar to the situation with 10900K to 11900K if 6P core rumors are true for ML-S.
 
  • Like
Reactions: Tlh97

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
I don't see how a 6+8 Meteor Lake could compete with 8+16 Raptor Lake?
The latest rumor is 6+16, fwiw.
It would require a 33% IPC increase in the P cores for 6 to equal 8. As we all know, that's enormous.
Now, I'm not going to claim to know how MTL will compare to RPL in everything, but they wouldn't need such an IPC increase. Even with no IPC gains, you can use the performance gains from the new node for better clocks at iso-power. If Intel 4 were worse than Intel 7 across the VF curve, it would be DOA.
Also I remember reading (can't find it though) that HT is a good way to increase MT performance from both an area and power point-of-view since it is basically exploiting unused CPU resources. Seems like a good fit for the E's, no?
I think the reality tends to be a bit more complicated.
 
Jul 27, 2020
27,991
19,121
146
Also I remember reading (can't find it though) that HT is a good way to increase MT performance from both an area and power point-of-view since it is basically exploiting unused CPU resources. Seems like a good fit for the E's, no?
I recall that HT needs 5% of die area for implementation. Do the E-cores have it implemented in their silicon? Coz that would increase the size of the E-core cluster by 20% per cluster and 80% overall for all the clusters combined for 16 cores. Seems wasteful if it's there but disabled.

Further, the extra threads would increase pressure on their shared cache. They would also need more bandwidth from RAM coz the extra threads need to be fed with data. All of this activity will produce extra heat in the already crammed area taken up by the closely packed E-cores. It's possible that Intel has tried this already and the cons outweighed the pros. Maybe in future when they are able to refine E-cores further.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
I recall that HT needs 5% of die area for implementation. Do the E-cores have it implemented in their silicon? Coz that would increase the size of the E-core cluster by 20% per cluster and 80% overall for all the clusters combined for 16 cores. Seems wasteful if it's there but disabled.
Uh, that's not how the math worths. Taking those numbers at face value, a 5% area increase per core would increase the area of a cluster by 20% of a single core's area, not 20% flat.
 

Doug S

Diamond Member
Feb 8, 2020
3,574
6,310
136
It's a question of overhead. With zero overhead, more pipestages reduced your critical path, giving you proportionally more speed OR you can lower the voltage for the same speed (saving power), or any combination of the two. But as others have pointed out, the flops between each stage add power, timing, and performance overhead, so there's a balance. IIRC, roughly 16 FO4 delay has been something of a floor, but I don't recall where/when I heard that, so take it with a grain of salt.


To expand on this a little, there is some engineering margin or "slop factor" in every pipe stage, because the work in a stage MUST complete during the clock cycle. Some stages may have tighter timing margins and others looser, depending on how much work there is a particular stage for a particular function.

So e.g. splitting up a 15 stage pipeline into 30 stages won't let you double your frequency, because that engineering margin "slop factor" is paid 30x instead of 15x.

If asynchronous CPUs ever became a thing then this wouldn't be a problem because you'd wouldn't have that wasted time in each cycle, and without a clock network you'd save that power too (though that's probably largely paid back or even more than paid back by the latching network that would replace it)
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
I recall that HT needs 5% of die area for implementation. Do the E-cores have it implemented in their silicon? Coz that would increase the size of the E-core cluster by 20% per cluster
Uh, that's not how the math worths. Taking those numbers at face value, a 5% area increase per core would increase the area of a cluster by 20% of a single core's area, not 20% flat.
To be honest a 5% per core would translate to 5% per cluster.

But the issue is the e core's design. They are simply not design for neither HT nor AVX-512, as you have seen from Meteor Lake and Arrow Lake diagram they follow the same design philosophy.
 
Jul 27, 2020
27,991
19,121
146
Guys, I ain't no math genius so Nicalandia and Exist50, show me your homework. Why are both of you not in agreement?
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
To be honest a 5% per core would translate to 5% per cluster.
I wanted to avoid any dependency on module overhead. And also show where than "20%" could mistakenly come from.
Guys, I ain't no math genius so Nicalandia and Exist50, show me your homework. Why are both of you not in agreement?
Say you have a 100mm2 core, just to make the math prettier. 5% HT overhead on top would be 0.05 * 100mm2 = 5mm2. If you have four cores, you have 4 x 5mm2 = 20mm2 for HT, but that's on top of 4 x 100mm2 = 400mm2 baseline. 20mm2/400mm2 = 5mm2/100mm2 = 5%.

Or perhaps more intuitively, if you increase the area of part of the die by 5%, you'll always get ≤5% for the die as a whole.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Guys, I ain't no math genius so Nicalandia and Exist50, show me your homework. Why are both of you not in agreement?

Let's use MTL Crestmont e core as an example.

1 e core size is 1.046, lets round that to 1mm^2
1 Quad Cluster size is 5.907 mm^2 , but for illustrations purposes we will say 4 mm^2 to keep numbers even

e core die area is about 1 mm^2 a 5% increase on die area is 1.05 mm^2 right? so 1.05 x 4 = 4.2 mm^2 and 4.2/4 is 1.05 which is 5%...

So at worst a 5% increase in die area per core would translate to 5% die area per cluster.
 
Jul 27, 2020
27,991
19,121
146
Uh, that's not how the math worths. Taking those numbers at face value, a 5% area increase per core would increase the area of a cluster by 20% of a single core's area, not 20% flat.
Ah! I understand it now. Thanks!

5% of single core IS NOT 5% of cluster. DOH! :flushed:
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Ah! I understand it now. Thanks!

5% of single core IS NOT 5% of cluster. DOH! :flushed:
Well.... 5% die area is still 5% die area regardless of how many clusters they put, as far as I am ware MTL-S will have 4 quad cluster at the very top of the SKU(14900K) still 5% is really nothing when compared to the theoretical maximum MT performance boost which is 15%-30%...

Except those e cores were not designed with HT and AVX-512 in mind.. Intel pulled a 6% IPC boost from Gracemont to Raptormont just by doubling the L2(I don't have the exact die area size so it could be more than 5%) I suspect that the MTL-S will have 6MiB per cluster, faster internal ring bus and we could see double digit IPC boosts on those e cores.
 
Last edited:

Hulk

Diamond Member
Oct 9, 1999
5,138
3,727
136
The latest rumor is 6+16, fwiw.

6+16 along with the rumors that while the P's are lightly upgraded the E's are significantly enhanced makes sense.
First, Intel is probably noticing that many apps that only really rely on 8 cores, can do as well with 6 and if those 6 cores have better IPC than Raptor then it's all the better.
Second, as we move into the future applications are getting better at MT so moving some more compute to the E's makes sense. 16 greatly enhance E's would be very beneficial to having ML surpass RPL. Currently 1 P is worth about 2 E's based on IPC alone, more when you figure in the increased clocks on the P's. If they could reduce that disparity by 15 or 20%.. well, there you go. At this point in the hybrid Big.Little development I would think that there is lower hanging fruit on the E trees compared to the P trees;)
 

poke01

Diamond Member
Mar 8, 2022
4,198
5,544
106
Arrow Lake will be Intel's M1 that is it be a wide arch like Firestorm. Guess what Firestorm clocks at 3.2Ghz. Arrow lake will be perfect for ultrabooks. (If Lunar Lake is based on Lion Cove no wonder Intel is aiming for pref/w leadership)

High IPC, low clocks but still very powerful. I never liked high clocks arch or as some people say speed demons. These sort archs like Golden Cove are horrible in laptops.

So we will have 2 chip designers that will have a 8-wide chip in 2024. Like Exist50 says, Arrow Lake will see Intel gain good ground.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
Arrow Lake will be Intel's M1 that is it be a wide arch like Firestorm. Guess what Firestorm clocks at 3.2Ghz. Arrow lake will be perfect for ultrabooks. (If Lunar Lake is based on Lion Cove no wonder Intel is aiming for pref/w leadership)

High IPC, low clocks but still very powerful. I never liked high clocks arch or as some people say speed demons. These sort archs like Golden Cove are horrible in laptops.

So we will have 2 chip designers that will have a 8-wide chip in 2024. Like Exist50 says, Arrow Lake will see Intel gain good ground.
Well slow down there a second. I really don't know what IPC gains we'll see with Lion Cove. Certainly bigger than RWC, but beyond that, not sure. I do expect it to look a whole lot better from a PPA standpoint, but how all that shakes out is very tbd. I would be surprised if it's <6GHz on 20A, however.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Meteor Lake on Top of Raptor Lake

1667779067032.png


Alder Lake On Top of Raptor Lake
1667779140360.png


Meteor Lake and Alder Lake appear to be very similar on design layout(At same size for comparison)
 

Geddagod

Golden Member
Dec 28, 2021
1,524
1,620
106
Arrow Lake will be Intel's M1 that is it be a wide arch like Firestorm. Guess what Firestorm clocks at 3.2Ghz. Arrow lake will be perfect for ultrabooks. (If Lunar Lake is based on Lion Cove no wonder Intel is aiming for pref/w leadership)

High IPC, low clocks but still very powerful. I never liked high clocks arch or as some people say speed demons. These sort archs like Golden Cove are horrible in laptops.

So we will have 2 chip designers that will have a 8-wide chip in 2024. Like Exist50 says, Arrow Lake will see Intel gain good ground.
I think Arrow Lake will still have high clocks. When Intel went wider with Golden Cove, they still managed to keep 5 Ghz, same as Willow cove. They also used a better process, arrow lake will also use a better process compared to meteor lake.
Apple and Intel being 8 wide is nice, but you gotta wonder what about zen 5. Decode Width isn't everything, but I think zen 5 is going to end up being less wide than lion cove. I really don't think AMD is going to double their width in one generation with zen 5.