Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 968 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
942
857
106
Wildcat Lake (WCL) Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing Raptor Lake-U. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q1 2026.

Intel Raptor Lake UIntel Wildcat Lake 15W?Intel Lunar LakeIntel Panther Lake 4+0+4
Launch DateQ1-2024Q2-2026Q3-2024Q1-2026
ModelIntel 150UIntel Core 7Core Ultra 7 268VCore Ultra 7 365
Dies2223
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6Intel 18-A + Intel 3 + TSMC N6
CPU2 P-core + 8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-cores4 P-core + 4 LP E-cores
Threads12688
Max Clock5.4 GHz?5 GHz4.8 GHz
L3 Cache12 MB12 MB12 MB
TDP15 - 55 W15 W ?17 - 37 W25 - 55 W
Memory128-bit LPDDR5-520064-bit LPDDR5128-bit LPDDR5x-8533128-bit LPDDR5x-7467
Size96 GB32 GB128 GB
Bandwidth136 GB/s
GPUIntel GraphicsIntel GraphicsArc 140VIntel Graphics
RTNoNoYESYES
EU / Xe96 EU2 Xe8 Xe4 Xe
Max Clock1.3 GHz?2 GHz2.5 GHz
NPUGNA 3.018 TOPS48 TOPS49 TOPS






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,044
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,531
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,439
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,326
Last edited:

Hulk

Diamond Member
Oct 9, 1999
5,383
4,096
136
I was a nuke MM on a submarine. Got out and figured it would be silly to get a degree in ME considering how much I already knew and was qualified to do in that area. Decided I wanted a job where I would always be in air conditioning (Power plant work is noisy, dirty and hot) .... so I got a degree in EE and did lots of work in CS and embedded design, followed by lots of IT stuff (reporting systems, plant interfaces, etc). Sorta fell into the whole computer thing on accident.

It's never to late to learn a new trick though!
When I was in school in the late '80's the Navy came to Rutgers and told the ME's they would pay $50k/year to work on a nuclear sub. That was A LOT of money for a starting engineer back then. I started at 30k in a civil firm. Anyway being locked under the polar ice for 6 months at a time didn't appeal to me. One of my friends did sign up though. We used to joke because he was a big guy that we wouldn't fit through the hatch!
 

eek2121

Diamond Member
Aug 2, 2005
3,478
5,162
136
3K hours in a single game is insane! In game HUD items are something that no burn in protection can deal with , and are the primary cause of burn in in plasma and OLED devices. What do you do to mitigate that, or does that game not have any stat/lifebar overlays?
My oled monitor uses things like pixel shifting: https://www.dell.com/support/kbdoc/...ed-Monitor-Aw3423dw-Pixel-Shift-Pixel-Orbiter along with a full pixel refresh every few hours.

Factorio’s UI elements are static.

This is what I have on the P6 architecture. I do DO things outside like run and swim and play but this is a pet hobby of mine. I also enjoy helping my daughter with her AP Calc. homework. She hates it. As far as this stuff goes, I'm just a "wanna be." Probably should have been an EE instead of an ME looking back. But that train has long since left the station...
View attachment 137787
Oh my god, I’ve been out-nerded.
 
  • Haha
Reactions: OneEng2 and Elfear

Hulk

Diamond Member
Oct 9, 1999
5,383
4,096
136
I haven't read all that yet as its a bit much, but on your P4 section you missed the L1D increase in Prescott to 16KB. I look forward to reading more.
Thanks. Updated. OCD doesn't allow for me to leave something I know is incorrect! The Trace Cache was quite inventive. I think today the big cores do so much OoO that the trace cache wouldn't make sense.

P4.jpg
 
  • Like
Reactions: Sgraffite

Thunder 57

Diamond Member
Aug 19, 2007
4,277
7,088
136
Thanks. Updated. OCD doesn't allow for me to leave something I know is incorrect! The Trace Cache was quite inventive. I think today the big cores do so much OoO that the trace cache wouldn't make sense.

View attachment 137807

I think it needs another revision ;) .I always found the trace cache interesting and it took chipsandcheese to help me figure it out.
 

Hulk

Diamond Member
Oct 9, 1999
5,383
4,096
136
P4, aka "Netburst" Intel got seriously into the weeds.

The Intel gang is driving along Tualatin highway...
"Hey Willamette Road! That road has an 1800mph speed limit!"
"Okay, actually the maps says it's longer but the speed limit is higher, let try it!"
"I think this was a wrong turn. This is taking forever."
"Hey look, Core Blvd, it's got some lights but looks to be much shorter."
"No. Speed limit on Northwood Ave. is 3000mph! Take that one."
"Uh, I think we should go back to Core Blvd. Look, there's a spot to make a U-Turn."
"Yeah okay. But wait! Prescott freeway has a 4000mph speed limit! This is the way to go FOR SURE."
 
  • Like
Reactions: OneEng2

DavidC1

Platinum Member
Dec 29, 2023
2,181
3,328
106
One thing I wanted to do was collect bunch of different CPUs, put year of introduction, basic name and features, frame all of them as a display or a table with glass covering over it.
Thanks. Updated. OCD doesn't allow for me to leave something I know is incorrect! The Trace Cache was quite inventive. I think today the big cores do so much OoO that the trace cache wouldn't make sense.
Trace Cache took too much die and power for the hit rate. It was redone in Sandy Bridge as the uop cache. It does the same at a fraction of the size.
According to Intel, the uop cache performs like a 6KB instruction cache and has a roughly 80% hit rate. By comparison, the 12K uop trace cache of the P4 was supposed to have similar performance to a 8KB-16KB instruction cache. The 4-8X difference in effective storage between the two designs is driven by Sandy Bridge’s more powerful uops and eliminating duplicate entries, which plagued the P4’s trace cache.
Sandy Bridge's uop cache could hold 1.5K uops by the way, compared to 12K uop on P4. So 1.5kuop = 6KB and 12K uop = 8-16KB. Not even Zen 5 with 6.7k uop entries are remotely close to the size of Pentium 4's Trace Cache, and they did that with 0.18u!

By the way Celeron D was a great advancement on it's own. Northwood-128 was way too bottlenecked, so moving to 256KB cache and Prescott's branch prediction and bandwidth saving capabilities really helped that chip. It was 25-30% faster per clock than the predecessor. Northwood Celeron sucked, even though the big brother was held in great regard.
One of the critical differences between the uop cache in Sandy Bridge and the P4’s trace cache is that the uop cache is fundamentally meant to augment a traditional front-end. In contrast, the Pentium 4 attempted to use the trace cache to replace the front-end and relied on vastly slower fetch and decode mechanisms for any workloads which did not fit in the trace cache. The uop cache was carefully designed so that it is an enhancement for Sandy Bridge and does not penalize any workloads.
The primary driver for both Trace Cache and the uop cache was to reduce the impact of implementing decoders on an x86 uarch. x86 instruction size is not static so implementing decoders require potentially increasing transistor count difference compared to static ones like ARM and RISC-V. Clustered decode "solves" that problem, and actually even makes it better than the mono decode in some cases.

@Thunder 57
Intel was pretty much always better at cache until Zen. K7-K10 L1 being a clear exception.
SRAM structure-wise, Intel was way ahead of AMD all the way until 2016 when the design pipeline got stuck due to 10nm issues. L1 was better on AMD K7 because it's largely related to design choice and AMD had a superior core design, while Intel didn't really change the Pentium Pro core that came out in late 1995. Pentium II made it cheaper and improved 16-bit performance, while Pentium III just added SSE instructions and on-die cache. But they pretty much stopped changing it until Pentium 4 and Pentium M. It wasn't just their process propping up the Pentium III and 4 to be relatively close to AMD chips. Their spectacular memory controller and cache implementation was another factor.

Intel absolutely was the best in memory and memory controller related items(maybe due to their memory focused past) until 2016. Their caches were quite amazing. They also supported latest memory standards too, until then as well. And then slowly but decisively lost the lead due to brain drain, and bad management.
Slight OT but I always found it ironic, but AMD had copper interconnects with 180nm, while the P3 Coppermine using 180nm did not. I wonder if they intended to at some point? Also, I remember eading at the time that FinFET did more for Intel than the die shrink in terms of performance. Sounds right considering TSMC 20nm was limited and crap and GloFo outright cancelled it IIRC. And I believe TSMC 16nm was just 20nm with FinFet which was far better.
Intel 0.18u still kicked AMD version. I managed to find Idsat of the AMD version's 0.18u and Intel number was 25-30% better. AMD would have needed 0.13u to even thinking of matching that. They hid it deep somewhere, as it would have been embarassing to show those numbers after AMD-Motorola claiming their copper interconnect was some magical thing, plus SOI.

While Coppermine is a literal small town somewhere in Northern Canada, maybe they also chose that name as a sort of saying to AMD we can put your work to shame without using copper. Many Intel names have a modest background to them. Ivy Bridge is a name of a real bridge somewhere in Europe. When I say "bridge" I'm talking about the small ones made of rock over a river near your park, that's more for design than functionality.
 
Last edited:

DavidC1

Platinum Member
Dec 29, 2023
2,181
3,328
106
This is what I have on the P6 architecture. I do DO things outside like run and swim and play but this is a pet hobby of mine. I also enjoy helping my daughter with her AP Calc. homework. She hates it. As far as this stuff goes, I'm just a "wanna be." Probably should have been an EE instead of an ME looking back. But that train has long since left the station...
I think gathering that information and putting it up on a website would be good. I don't know if @ashFTW still has her website going on. Perhaps you can cooperate together or something.

When I want to refer back to tech information, the best is still by Anand Lai Shimpi, the founder. But it's gone now. It's such a big loss. Ok, I guess there's the Internet Archive, but that isn't always reliable and has everything.
 

ashFTW

Senior member
Sep 21, 2020
331
252
146
I think gathering that information and putting it up on a website would be good. I don't know if @ashFTW still has her website going on. Perhaps you can cooperate together or something.

When I want to refer back to tech information, the best is still by Anand Lai Shimpi, the founder. But it's gone now. It's such a big loss. Ok, I guess there's the Internet Archive, but that isn't always reliable and has everything.
You might be confusing me with someone else.
 

511

Diamond Member
Jul 12, 2024
5,442
4,869
106
Yea look how much better Tigerlake CPU was compared to Icelake.
+900 Mhz vs Ice Lake and better performance and power characteristics after like 3 attempts but yes.
@511 Pantherlake has memory compression capabilities, partly why it benches so high.
It surely has that
Intel 0.18u still kicked AMD version. I managed to find Idsat of the AMD version's 0.18u and Intel number was 25-30% better. AMD would have needed 0.13u to even thinking of matching that. They hid it deep somewhere, as it would have been embarassing to show those numbers after AMD-Motorola claiming their copper interconnect was some magical thing, plus SOI.
Intel Fabs were the best for 2-3 decades+ It's absolutely hilarious how you can blow it up.
 

OneEng2

Golden Member
Sep 19, 2022
1,007
1,210
106
When I was in school in the late '80's the Navy came to Rutgers and told the ME's they would pay $50k/year to work on a nuclear sub. That was A LOT of money for a starting engineer back then. I started at 30k in a civil firm. Anyway being locked under the polar ice for 6 months at a time didn't appeal to me. One of my friends did sign up though. We used to joke because he was a big guy that we wouldn't fit through the hatch!
The hatch isn't a problem width wise, but I am 6' tall and whacking your head on steel protruding things (and low hatches) is a real problem! About the 3rd time you whack your head over the same lump it really gets painful .... and frustrating ;).

I truly believe that Netburst was a pure marketing move. At that time, Mhz was the measuring bar that consumers had been trained to recognize. It was a stupid idea for certain, but I can't imagine that many good and competent engineers did not point this out ..... and simply got over-ruled.
 
  • Like
Reactions: Hulk

Josh128

Golden Member
Oct 14, 2022
1,537
2,290
106
My oled monitor uses things like pixel shifting: https://www.dell.com/support/kbdoc/...ed-Monitor-Aw3423dw-Pixel-Shift-Pixel-Orbiter along with a full pixel refresh every few hours.

Factorio’s UI elements are static.


Oh my god, I’ve been out-nerded.
Yeah but pixel shifting has been a thing for a long time. Its pretty pointless, IMO. It doesnt shift enough pixels to do anything but create a smeared burn in of a HUD element that is larger than 8x8 pixels. Pixel refresh might help, I suppose.
 

Hulk

Diamond Member
Oct 9, 1999
5,383
4,096
136
Trip down memory lane...

The Pentium back end.
U and V INT pipes and A FP pipe. No scheduler, no OoO but those clever engineers came up with a "controller" to send a pair of instructions in program order down the U pipe to be executed at the same time... boom! Superscaler with no OoO!

On top of that while the FP could not be sent an instruction during the same cycle, it could be sent an instruction a cycle later so the instruction excution "overlapped." It was the perfect middle ground from the 1 wide 486 before it and the step up to the wider P6 with a scheduler and OoO engine that came after.

I was looking over my old notes. I probably don't understand fully so feel free to jump in and clarify!
 

Sgraffite

Senior member
Jul 4, 2001
207
145
116

Thunder 57

Diamond Member
Aug 19, 2007
4,277
7,088
136
One thing I wanted to do was collect bunch of different CPUs, put year of introduction, basic name and features, frame all of them as a display or a table with glass covering over it.

Trace Cache took too much die and power for the hit rate. It was redone in Sandy Bridge as the uop cache. It does the same at a fraction of the size.

Sandy Bridge's uop cache could hold 1.5K uops by the way, compared to 12K uop on P4. So 1.5kuop = 6KB and 12K uop = 8-16KB. Not even Zen 5 with 6.7k uop entries are remotely close to the size of Pentium 4's Trace Cache, and they did that with 0.18u!

By the way Celeron D was a great advancement on it's own. Northwood-128 was way too bottlenecked, so moving to 256KB cache and Prescott's branch prediction and bandwidth saving capabilities really helped that chip. It was 25-30% faster per clock than the predecessor. Northwood Celeron sucked, even though the big brother was held in great regard.

The primary driver for both Trace Cache and the uop cache was to reduce the impact of implementing decoders on an x86 uarch. x86 instruction size is not static so implementing decoders require potentially increasing transistor count difference compared to static ones like ARM and RISC-V. Clustered decode "solves" that problem, and actually even makes it better than the mono decode in some cases.

@Thunder 57

SRAM structure-wise, Intel was way ahead of AMD all the way until 2016 when the design pipeline got stuck due to 10nm issues. L1 was better on AMD K7 because it's largely related to design choice and AMD had a superior core design, while Intel didn't really change the Pentium Pro core that came out in late 1995. Pentium II made it cheaper and improved 16-bit performance, while Pentium III just added SSE instructions and on-die cache. But they pretty much stopped changing it until Pentium 4 and Pentium M. It wasn't just their process propping up the Pentium III and 4 to be relatively close to AMD chips. Their spectacular memory controller and cache implementation was another factor.

Intel absolutely was the best in memory and memory controller related items(maybe due to their memory focused past) until 2016. Their caches were quite amazing. They also supported latest memory standards too, until then as well. And then slowly but decisively lost the lead due to brain drain, and bad management.

Intel 0.18u still kicked AMD version. I managed to find Idsat of the AMD version's 0.18u and Intel number was 25-30% better. AMD would have needed 0.13u to even thinking of matching that. They hid it deep somewhere, as it would have been embarassing to show those numbers after AMD-Motorola claiming their copper interconnect was some magical thing, plus SOI.

While Coppermine is a literal small town somewhere in Northern Canada, maybe they also chose that name as a sort of saying to AMD we can put your work to shame without using copper. Many Intel names have a modest background to them. Ivy Bridge is a name of a real bridge somewhere in Europe. When I say "bridge" I'm talking about the small ones made of rock over a river near your park, that's more for design than functionality.

K7 ended up better, but it had flaws like the "glued" ALU/AGU blocks and its L2 cache. But its L1 cache was large and low latency so I have to disagree it was due to core design. As for as .18u, AMd did OK with it. It was its crappy SSE performance that probably hurt it more. I would sooner say their iniital .13u process (T-bred A) that sucked.

The hatch isn't a problem width wise, but I am 6' tall and whacking your head on steel protruding things (and low hatches) is a real problem! About the 3rd time you whack your head over the same lump it really gets painful .... and frustrating ;).

I truly believe that Netburst was a pure marketing move. At that time, Mhz was the measuring bar that consumers had been trained to recognize. It was a stupid idea for certain, but I can't imagine that many good and competent engineers did not point this out ..... and simply got over-ruled.

Nah. No one knew that Dennard Scaling would end or that Moore's Law would end. The excellent branch predictor helped make it viable, even with the crappy trace cache. Marketing was a nice side effect and it probably worked well.

Editing this because I just realized the thread I'm posting in. Way off topic, won't do it again mods.
 

OneEng2

Golden Member
Sep 19, 2022
1,007
1,210
106
Nah. No one knew that Dennard Scaling would end or that Moore's Law would end. The excellent branch predictor helped make it viable, even with the crappy trace cache. Marketing was a nice side effect and it probably worked well.
To keep this on-track somewhat, the window of time where leakage and power density started showing conclusively was around 90nm to 65nm.

One could argue that Intel has relied heavily on its foundry to provide their designs with a distinct (up to 2 node shrinks) advantage over the competition.

The current Intel investment in 18A and BSPDN to try to leapfrog TSMC's process advantage (at a huge risk IMO) appears to provide evidence that Intel is still following the same basic playbook.

This concerns me (for Intel) much more than some of the questionable architecture decisions made with the latest Intel designs (lack of SMT, loss of AVX512, late to move to chiplets/tiles, etc).

If N2 ends up being a better node than 18A (I'd be willing to bet a coke on that one btw ;) ), one has to wonder even further about this direction.

For Laptop however, from a performance standpoint, I see nothing to quibble about with PTL. Price? yes. Profit? We will see in a few quarters how this works out.

From Q4 earnings, AMD +32% YOY, Intel -4% YOY.
 

Hulk

Diamond Member
Oct 9, 1999
5,383
4,096
136
To keep this on-track somewhat, the window of time where leakage and power density started showing conclusively was around 90nm to 65nm.

One could argue that Intel has relied heavily on its foundry to provide their designs with a distinct (up to 2 node shrinks) advantage over the competition.

The current Intel investment in 18A and BSPDN to try to leapfrog TSMC's process advantage (at a huge risk IMO) appears to provide evidence that Intel is still following the same basic playbook.

This concerns me (for Intel) much more than some of the questionable architecture decisions made with the latest Intel designs (lack of SMT, loss of AVX512, late to move to chiplets/tiles, etc).

If N2 ends up being a better node than 18A (I'd be willing to bet a coke on that one btw ;) ), one has to wonder even further about this direction.

For Laptop however, from a performance standpoint, I see nothing to quibble about with PTL. Price? yes. Profit? We will see in a few quarters how this works out.

From Q4 earnings, AMD +32% YOY, Intel -4% YOY.
18A I think only needs to be competitive with N2 for Intel to really stay in the game. Let me put some numbers to what I'm thinking.
If Zen 6 does indeed hit 6.4GHz on a single core and Nova Lake is stuck at 5.7GHz ST then that is not competitive.
Or if Zen 6 is hitting 65,000 in CB R23 at 200W and Nova Lake requires 275W then that is not competitive.
They need to be around 5% in the important metrics.

Now things get "fuzzy" with process and architecture because they are inexorably intertwined. A really great architecture can "cover" for a not so good node, and vice-versa. If 18A has great density and they can get all of those core crammed into a cost effective tile then Nova Lake can do well MT based on just having a transistor advantage.

Max ST frequency, max MT frequency/power/heat, ST IPC, lots of unknowns still. AMD has been firing on all cylinders since they moved to TMSC because for the first time they had the process advantage, which Intel has relied on previously. AMD really only has one ball in the air, architecture, TMSC handles the other one for them. Intel has two balls in the air. It can be a great show IF you don't drop a ball.

I think Panther Lake is looking to be a rather big technical success for Intel. 5.1GHz on 18A ain't bad for mobile. Efficiency looks to be very good if not great. iGPU is very, very good with no reports (so far) of major driver issues (Alchemist R&D paying dividends), and CPU performance did creep ahead of Lion Cove, which has the advantage of being desktop in terms of memory subsystem.

If they can get the clocks competitive with Zen 6 (whatever that will be) and pull another 5% out of Cougar Cove and catch up in gaming this could be fun. Yeah I know, lots of "ifs."
 
  • Like
Reactions: eek2121 and OneEng2