Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 967 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
941
857
106
Wildcat Lake (WCL) Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing Raptor Lake-U. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q1 2026.

Intel Raptor Lake UIntel Wildcat Lake 15W?Intel Lunar LakeIntel Panther Lake 4+0+4
Launch DateQ1-2024Q2-2026Q3-2024Q1-2026
ModelIntel 150UIntel Core 7Core Ultra 7 268VCore Ultra 7 365
Dies2223
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6Intel 18-A + Intel 3 + TSMC N6
CPU2 P-core + 8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-cores4 P-core + 4 LP E-cores
Threads12688
Max Clock5.4 GHz?5 GHz4.8 GHz
L3 Cache12 MB12 MB12 MB
TDP15 - 55 W15 W ?17 - 37 W25 - 55 W
Memory128-bit LPDDR5-520064-bit LPDDR5128-bit LPDDR5x-8533128-bit LPDDR5x-7467
Size96 GB32 GB128 GB
Bandwidth136 GB/s
GPUIntel GraphicsIntel GraphicsArc 140VIntel Graphics
RTNoNoYESYES
EU / Xe96 EU2 Xe8 Xe4 Xe
Max Clock1.3 GHz?2 GHz2.5 GHz
NPUGNA 3.018 TOPS48 TOPS49 TOPS






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,044
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,531
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,439
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,326
Last edited:

itsmydamnation

Diamond Member
Feb 6, 2011
3,133
3,993
136
You;ve said that about 22nm before and I still disagree. Sure, it didn't clock as high but it clocked high enough and at less power.

Also Nehalem might work for a basic office PC but it is dated.. The L3 was serviceable but not great. More importantly there was no AVX or newer, and the iGPU probably couldn't decode anything in use today besides AVC most likely.
22nm was moores law dying we just didnt know it yet, late, "slow" and needing features to make up of lack of feature size scaling.

We would kill for what we got from 32nm to 22nm today....rofl :)
 

511

Diamond Member
Jul 12, 2024
5,431
4,854
106
22nm was moores law dying we just didnt know it yet, late, "slow" and needing features to make up of lack of feature size scaling.

We would kill for what we got from 32nm to 22nm today....rofl :)
32 to 22nm is like 2X the density improvements iirc which would be more than N5 -> N2
 

DavidC1

Platinum Member
Dec 29, 2023
2,170
3,320
106
By the way it seems Intel allows parts to be out of spec:

The 358H is allowing SO-DIMMs up to DDR5-7200 while the spec sheet talks about LPDDR5 only.
Oh I remember the reaction when Ivy Bridge came out. It wasn't until Devil's Canyon (Haswell refresh) they got them clocking to Sandy levels again. Honestly 4.4-4.5GHz was fine though especially considering the competition at the times. I would say the fact that i5 4/4's didn't age well was a far bigger factor.
So I'm saying 18A or the variants can eventually get there, it's just not there right now. It's same as both 22nm and Icelake. Neither were impressive. Now Pantherlake "faults" could also be the lack of design, or them being conservative with it with them being unsure of the maturity of the process. Regardless the results are just that.

Icelake disappointed me on the iGPU too. For all the hoopla around it with them even calling it Tick+ because of the iGPU, percentage-wise Sandy Bridge had a much bigger gain. SNB got 2-2.5x while IVB stayed at 1.5-1.7x. IVB went from 8 to 10 EUs for sure, but each unit Flops was 2x the predecessor.
32 to 22nm is like 2X the density improvements iirc which would be more than N5 -> N2
Moore's Law started being sick way before 22nm. Up until 0.18u it was a straight shrink.

0.13u-copper interconnects
90nm - Strained Silicon
45nm - High-K Metal Gate
22nm - FinFET
10nm - EUV/Multi patterning DUV
5nm - Double patterning DUV/GAA

More work for less gains.
Nahalem was a beast. Here are my notes from the release.

"Macro-op fusion enhancement, one die memory controller, added shared L3 cache among all cores, improved (2nd level) branch prediction and better Loop Stream Detector, increased buffers, registers, and scheduler entries. Hyperthreading is back. All Nehalem parts are 4/8 core, Turbo Boost 1.0 can increase clock 2 multiplier steps. Bloomfield - Performance desktop, Lynnfield - Value desktop much slower memory subsystem, Clarksfield - Mobile, all 4 cores on one die, Start of Core i3/5/i7 designations, 8xx, 9xx"
Nehalem was not a big advancement in terms of single thread, if we drop the Turbo mode. Except for applications that were very bandwidth bound, the gains were in the 5-7%. In RealWorldTech forums, they noted that Nehalem even regressed in some ST tests. What was appreciated for was the Hyperthreading, the integrated memory controller, and the QuickPath Interconnect along with the actual working Turbo Mode.
Yes. Sandy Bridge was beloved, especially the 2500k for being affordable, performant, and great for over-clocking. Generally whatever follows that type of part is not going to live up to expectation. The fact that Ivy Bridge was a new node and not clocking well sealed the deal for it being a "loser" of sorts for the Sandy Bridge faithful.
Sandy Bridge was the first generation they made Ultrabooks viable. The U series gained 30-40%. Penryn U chips were 1.2GHz, pretty much little better than Atom slow.* Arrandale-U was quite a bit better, but only because the working Turbo allowed it to clock much higher in ST. In MT it was ok but there were regressions in battery life. Sandy Bridge's Turbo 2.0 made it work dynamically and with heatsink headroom meaning depending on the heaviness of the workload even in MT it can clock quite high. And it actually had some battery life improvements over Penryn.

*This is especially why I'm so hard on the x86 side. There was a point where a low power version of a laptop chip even thinking of going remotely near high power laptop chip in ST seemed impossible, nevermind desktop. But we're living in that reality where the Mx series chips completely spank any desktop chip in existence as if Core 2 was spanking Athlon X2, but with multiple times more efficient than Core 2 vs AX2 ever did. Sometimes doing better does mean starting completely new and the Smartphone did just that, which PC and PC vendors were and are still not willing to do.
 
Last edited:

511

Diamond Member
Jul 12, 2024
5,431
4,854
106
So I'm saying 18A or the variants can eventually get there, it's just not there right now. It's same as both 22nm and Icelake. Neither were impressive Now Pantherlake "faults" could also be the lack of design, or them being conservative with it with them being unsure of the maturity of the process. Regardless the results are just that.

Icelake disappointed me on the iGPU too. For all the hoopla around it with them even calling it Tick+ because of the iGPU, percentage-wise Sandy Bridge had a much bigger gain. SNB got 2-2.5x while IVB stayed at 1.5-1.7x. IVB went from 8 to 10 EUs for sure, but each unit Flops was 2x the predecessor.
IceLake 10nm and 18A are two different stuff cause IceLake had horrible D0 yield as well and they were regression vs Skylake by parts in ST/MT can't really say that about PTL while Sunny Cove being the first proper tock with 15%+ IPC improvements where cougar is Lion Cove+

i7

i5

Referen for N3B vs 18A
LNL
ARL
PTL
 

DavidC1

Platinum Member
Dec 29, 2023
2,170
3,320
106
IceLake 10nm and 18A are two different stuff cause IceLake had horrible D0 yield as well and they were regression vs Skylake by parts in ST/MT can't really say that about PTL while Sunny Cove being the first proper tock with 15%+ IPC improvements where cougar is Lion Cove+
And the end result? Nearly identical. Neither chip was known for high performance on the chip side. Both were known for recovering from previous process issues. Both were known for their graphics.

That's why the expression is: "History doesn't repeat, but it rhymes".
 
  • Like
Reactions: Joe NYC and OneEng2

511

Diamond Member
Jul 12, 2024
5,431
4,854
106
And the end result? Nearly identical. Neither chip was known for high performance on the chip side. Both were known for recovering from previous process issues. Both were known for their graphics.
I don't see near Identical in this case? As for recovering from process Issue 18A is still the first process of it's kind in HVM can't say that about 10nm and ice lake with the delays of 3 years
That's why the expression is: "History doesn't repeat, but it rhymes".
not always but yeah
 

Magio

Senior member
May 13, 2024
207
246
76
I don't see near Identical in this case? As for recovering from process Issue 18A is still the first process of it's kind in HVM can't say that about 10nm and ice lake with the delays of 3 years

Definitely agree with this. Is 18A a smashing home run that yields perfectly and matches N2 in density/performance/... ? Absolutely not, but it's still the first ever BSPDN node in HVM and for that matter it's also to this point in time the first GAAFET node to actually get in notable products.

These are things that will shape the future of chipmaking and Intel is getting worthwhile experience with them early on which is really a positive sign for future nodes like 18A-P and 14A.

And 18A is also easily the closest Intel has been to the leading edge since their 10nm woes.
 

Thunder 57

Diamond Member
Aug 19, 2007
4,269
7,067
136
By the way it seems Intel allows parts to be out of spec:

The 358H is allowing SO-DIMMs up to DDR5-7200 while the spec sheet talks about LPDDR5 only.

So I'm saying 18A or the variants can eventually get there, it's just not there right now. It's same as both 22nm and Icelake. Neither were impressive. Now Pantherlake "faults" could also be the lack of design, or them being conservative with it with them being unsure of the maturity of the process. Regardless the results are just that.

Icelake disappointed me on the iGPU too. For all the hoopla around it with them even calling it Tick+ because of the iGPU, percentage-wise Sandy Bridge had a much bigger gain. SNB got 2-2.5x while IVB stayed at 1.5-1.7x. IVB went from 8 to 10 EUs for sure, but each unit Flops was 2x the predecessor.

Moore's Law started being sick way before 22nm. Up until 0.18u it was a straight shrink.

0.13u-copper interconnects
90nm - Strained Silicon
45nm - High-K Metal Gate
22nm - FinFET
10nm - EUV/Multi patterning DUV
5nm - Double patterning DUV/GAA

More work for less gains.

Nehalem was not a big advancement in terms of single thread, if we drop the Turbo mode. Except for applications that were very bandwidth bound, the gains were in the 5-7%. In RealWorldTech forums, they noted that Nehalem even regressed in some ST tests. What was appreciated for was the Hyperthreading, the integrated memory controller, and the QuickPath Interconnect along with the actual working Turbo Mode.

Sandy Bridge was the first generation they made Ultrabooks viable. The U series gained 30-40%. Penryn U chips were 1.2GHz, pretty much little better than Atom slow.* Arrandale-U was quite a bit better, but only because the working Turbo allowed it to clock much higher in ST. In MT it was ok but there were regressions in battery life. Sandy Bridge's Turbo 2.0 made it work dynamically and with heatsink headroom meaning depending on the heaviness of the workload even in MT it can clock quite high. And it actually had some battery life improvements over Penryn.

*This is especially why I'm so hard on the x86 side. There was a point where a low power version of a laptop chip even thinking of going remotely near high power laptop chip in ST seemed impossible, nevermind desktop. But we're living in that reality where the Mx series chips completely spank any desktop chip in existence as if Core 2 was spanking Athlon X2, but with multiple times more efficient than Core 2 vs AX2 ever did. Sometimes doing better does mean starting completely new and the Smartphone did just that, which PC and PC vendors were and are still not willing to do.

Slight OT but I always found it ironic, but AMD had copper interconnects with 180nm, while the P3 Coppermine using 180nm did not. I wonder if they intended to at some point? Also, I remember eading at the time that FinFET did more for Intel than the die shrink in terms of performance. Sounds right considering TSMC 20nm was limited and crap and GloFo outright cancelled it IIRC. And I believe TSMC 16nm was just 20nm with FinFet which was far better.
 
  • Like
Reactions: Schmide and 511

Hulk

Diamond Member
Oct 9, 1999
5,372
4,087
136
You take notes on release? That's dedication. I'd just reference the reviews. Macro op fusion was 32 bit only on Core 2 IIRC and Nehalem extended it to 64 bit, I assume that is what you are talking about? On die memory controller, about time and finally ended Opteron's reign, along with Q
Yes, fusion of 64 bit x86 instructions with Nahalem.

As far as taking notes, it's worse than that. I have a "notes" on architectures and in in-depth section going back to the 4004 I have put together. Besides being interested in this stuff, studying technical things seems to calm my mind. Reading a great book on Quantum Field Theory right now that is excellent brain exercise for me: https://www.amazon.com/dp/0691174296?ref=ppx_yo2ov_dt_b_fed_asin_title

I have a 39 page Excel doc with all client architectures summarized (Intel) going back to the 4004. If you want to see this madness DM me.
Here are my detailed notes on Nahalem.

Power Efficiency Improvements Nehalem quad core chips actually contain all four CPU cores on one die, unlike Kentsfield and Yorkfield, which were 2 dual core dies on one package.
Ability to power down cores by saving state data in un-core (L3)
1 million transistors dedicated to Power Control Unit (PSU). Each core can be clocked independently but with same voltage across all cores.
Core Improvements Increased the pipeline to 16 stages from 14 in Conroe/Penryn.
Front End - Macro-ops (x86 instructions) fusion enhancement. Now 64-bit x86 instructions can be fused together instead of only 32-bit instructions.
Front End - Improved Loop Stream Detection - If a software loop is detected the branch prediction, fetch and decode hardware can be powered down and the Loop Stream Detector
(LSD) can stream directly to the re-order buffer since it's storing uops. Can cache 28uops in the Loop Stream Detector (similar to P4's trace cache).
Front End - Branch Prediction Enhancements - New 2nd level branch predictor (acts like L2 cache) and renamed return stack buffer (prevents stack corruption).
Reorder buffer increased from 96 to 128uops. Nehalem can keep 128uops in flight as opposed to 96 in Conroe/Merom/Penryn
New cache structure. L1 and new L2 cache is per core, new L3 cache is shared among all cores.
Faster "unaligned" cache accesses, faster synchronization primatives.
The stack is a data structure that keeps track of where in memory the CPU should begin executing after working on a function.
Hyperthreading returns since it was last seen in the P4. Nehalem is much wider and has more bandwidth than P4 for HT.
One die memory controller (finally).
Can turbo up to 2 clock multiplier steps, 2x133=266MHz - Cores can be turned off completely.
ISA Improvements 128 bit wide SSE 4.1 instructions and new to Nehalem SSE 4.2 instructions
Lynnfield is basically Bloomfield with one memory controller removed and QPI links replaced with slower DMI
Lynnfield die is larger than Bloomfield because of the addition of the on die PCIe controller which provides ultra fast GPU communication
Bloomfield has 3 DDR3 1066 on die memory controllers. Lynnfield/Clarksfield supports 2 DDR2 1333 memory controllers on die
New Bloomfields were launched alongside Lynnfield to preserve the high end.
Nehalem is 5-10% faster than Penryn in general applications and 20-40% faster in video and 3D applications.
 

Thunder 57

Diamond Member
Aug 19, 2007
4,269
7,067
136
Yes, fusion of 64 bit x86 instructions with Nahalem.

As far as taking notes, it's worse than that. I have a "notes" on architectures and in in-depth section going back to the 4004 I have put together. Besides being interested in this stuff, studying technical things seems to calm my mind. Reading a great book on Quantum Field Theory right now that is excellent brain exercise for me: https://www.amazon.com/dp/0691174296?ref=ppx_yo2ov_dt_b_fed_asin_title

I have a 39 page Excel doc with all client architectures summarized (Intel) going back to the 4004. If you want to see this madness DM me.
Here are my detailed notes on Nahalem.

Power Efficiency Improvements Nehalem quad core chips actually contain all four CPU cores on one die, unlike Kentsfield and Yorkfield, which were 2 dual core dies on one package.
Ability to power down cores by saving state data in un-core (L3)
1 million transistors dedicated to Power Control Unit (PSU). Each core can be clocked independently but with same voltage across all cores.
Core Improvements Increased the pipeline to 16 stages from 14 in Conroe/Penryn.
Front End - Macro-ops (x86 instructions) fusion enhancement. Now 64-bit x86 instructions can be fused together instead of only 32-bit instructions.
Front End - Improved Loop Stream Detection - If a software loop is detected the branch prediction, fetch and decode hardware can be powered down and the Loop Stream Detector
(LSD) can stream directly to the re-order buffer since it's storing uops. Can cache 28uops in the Loop Stream Detector (similar to P4's trace cache).
Front End - Branch Prediction Enhancements - New 2nd level branch predictor (acts like L2 cache) and renamed return stack buffer (prevents stack corruption).
Reorder buffer increased from 96 to 128uops. Nehalem can keep 128uops in flight as opposed to 96 in Conroe/Merom/Penryn
New cache structure. L1 and new L2 cache is per core, new L3 cache is shared among all cores.
Faster "unaligned" cache accesses, faster synchronization primatives.
The stack is a data structure that keeps track of where in memory the CPU should begin executing after working on a function.
Hyperthreading returns since it was last seen in the P4. Nehalem is much wider and has more bandwidth than P4 for HT.
One die memory controller (finally).
Can turbo up to 2 clock multiplier steps, 2x133=266MHz - Cores can be turned off completely.
ISA Improvements 128 bit wide SSE 4.1 instructions and new to Nehalem SSE 4.2 instructions
Lynnfield is basically Bloomfield with one memory controller removed and QPI links replaced with slower DMI
Lynnfield die is larger than Bloomfield because of the addition of the on die PCIe controller which provides ultra fast GPU communication
Bloomfield has 3 DDR3 1066 on die memory controllers. Lynnfield/Clarksfield supports 2 DDR2 1333 memory controllers on die
New Bloomfields were launched alongside Lynnfield to preserve the high end.
Nehalem is 5-10% faster than Penryn in general applications and 20-40% faster in video and 3D applications.

Intel should hire you as an historian. I'd love to see the notes on the P4. Pentium Pro too if you have any. Hell, this is the kind of stuff I love and why I liked Anandtech and still like Chipsandcheese.
 
  • Like
Reactions: OneEng2

Hulk

Diamond Member
Oct 9, 1999
5,372
4,087
136
Cougar Cove vs Zen 5 back end reflections.

They way I understand this Zen 5 has 18 execution slots that can issue uops and 15 execution units. uops are issued from the execution slots and are "directed" to the proper execution unit (ALU, AGU, FP pipes, etc.). Two things I've been wondering.

How does this back end compare to Cougar Cove? Any changes from Lion Cove, which I believe is about 16-18 execution slots and about 15 execution units, depending on how you count them.

I know much earlier designs had more execution units than slots. What in the architecture has changed to reversed this trend? Of course I'm sure there are a number of factor but I'm not sure exactly how this "bottleneck" would have reversed itself as the architectures got wider and more OoO exactly?
 

Hulk

Diamond Member
Oct 9, 1999
5,372
4,087
136
Intel should hire you as an historian. I'd love to see the notes on the P4. Pentium Pro too if you have any. Hell, this is the kind of stuff I love and why I liked Anandtech and still like Chipsandcheese.
I'm no expert by a long shot! Just a fan.
I take no credit for this. I gathered info over the years from many places and compiled it.

P4.jpg
 

Hulk

Diamond Member
Oct 9, 1999
5,372
4,087
136
This is what I have on the P6 architecture. I do DO things outside like run and swim and play but this is a pet hobby of mine. I also enjoy helping my daughter with her AP Calc. homework. She hates it. As far as this stuff goes, I'm just a "wanna be." Probably should have been an EE instead of an ME looking back. But that train has long since left the station...
P6.jpg
 
  • Like
Reactions: eek2121 and Saylick

511

Diamond Member
Jul 12, 2024
5,431
4,854
106
Cougar Cove vs Zen 5 back end reflections.

They way I understand this Zen 5 has 18 execution slots that can issue uops and 15 execution units. uops are issued from the execution slots and are "directed" to the proper execution unit (ALU, AGU, FP pipes, etc.). Two things I've been wondering.

How does this back end compare to Cougar Cove? Any changes from Lion Cove, which I believe is about 16-18 execution slots and about 15 execution units, depending on how you count them.

I know much earlier designs had more execution units than slots. What in the architecture has changed to reversed this trend? Of course I'm sure there are a number of factor but I'm not sure exactly how this "bottleneck" would have reversed itself as the architectures got wider and more OoO exactly?
Cougar cove is same as Lion Cove with Branch prediction changes
 

Thunder 57

Diamond Member
Aug 19, 2007
4,269
7,067
136
I haven't read all that yet as its a bit much, but on your P4 section you missed the L1D increase in Prescott to 16KB. I look forward to reading more.
 
  • Like
Reactions: Hulk

oak8292

Senior member
Sep 14, 2016
205
222
116
Definitely agree with this. Is 18A a smashing home run that yields perfectly and matches N2 in density/performance/... ? Absolutely not, but it's still the first ever BSPDN node in HVM and for that matter it's also to this point in time the first GAAFET node to actually get in notable products.
A difference between 10 nm and 18A is the existential need for 18A. When 10 nm was going into production the yield requirements at Intel to meet corporate margins was high. Based on current loses reported for foundry the target yields to start HVM may be lower with a need to work through issues while ‘making’ at least some money.

Apple and TSMC may have pushed through lower yields in HVM than Intel was willing to accept to meet deadlines in that earlier timeframe. It seems Samsung has tried to push through some fairly low yields, selling known good die versus wafers.
 

jpiniero

Lifer
Oct 1, 2010
17,171
7,550
136
When 10 nm was going into production the yield requirements at Intel to meet corporate margins was high.

Not really.... well eventually. They would have gone ahead with Cannon Lake i3 if the yields were doable.

The difference now is that they have Arrow and Raptor and the part that is 18A is much more cuttable than back then.
 

oak8292

Senior member
Sep 14, 2016
205
222
116
Not really.... well eventually. They would have gone ahead with Cannon Lake i3 if the yields were doable.

The difference now is that they have Arrow and Raptor and the part that is 18A is much more cuttable than back then.
Another difference is they have partners in both Ireland and Arizona that need revenue from wafers.
 

OneEng2

Golden Member
Sep 19, 2022
1,002
1,203
106
I'm just a "wanna be." Probably should have been an EE instead of an ME looking back. But that train has long since left the station...
I was a nuke MM on a submarine. Got out and figured it would be silly to get a degree in ME considering how much I already knew and was qualified to do in that area. Decided I wanted a job where I would always be in air conditioning (Power plant work is noisy, dirty and hot) .... so I got a degree in EE and did lots of work in CS and embedded design, followed by lots of IT stuff (reporting systems, plant interfaces, etc). Sorta fell into the whole computer thing on accident.

It's never to late to learn a new trick though!