Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Page 295 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
686
574
106
PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

ModelCode-NameDateTDPNodeTilesMain TileCPULP E-CoreLLCGPUXe-cores
Core Ultra 100UMeteor LakeQ4 202315 - 57 WIntel 4 + N5 + N64tCPU2P + 8E212 MBIntel Graphics4
?Lunar LakeQ4 202417 - 30 WN3B + N62CPU + GPU & IMC4P + 4E08 MBArc8
?Panther LakeQ1 2026 ??Intel 18A + N3E3CPU + MC4P + 8E4?Arc12



Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

Meteor LakeArrow Lake (20A)Arrow Lake (N3B)Arrow Lake Refresh (N3B)Lunar LakePanther Lake
PlatformMobile H/U OnlyDesktop OnlyDesktop & Mobile H&HXDesktop OnlyMobile U OnlyMobile H
Process NodeIntel 4Intel 20ATSMC N3BTSMC N3BTSMC N3BIntel 18A
DateQ4 2023Q1 2025 ?Desktop-Q4-2024
H&HX-Q1-2025
Q4 2025 ?Q4 2024Q1 2026 ?
Full Die6P + 8P6P + 8E ?8P + 16E8P + 32E4P + 4E4P + 8E
LLC24 MB24 MB ?36 MB ??8 MB?
tCPU66.48
tGPU44.45
SoC96.77
IOE44.45
Total252.15



Intel Core Ultra 100 - Meteor Lake

INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

Clockspeed.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 23,978
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,450
Last edited:

Saylick

Diamond Member
Sep 10, 2012
3,361
7,056
136
C&C is continuing their MTL investigation, this time with an article on the NPU:

In summary, it looks like the NPU is of limited use because of the data types it supports (or rather, doesn't support). And of the cases it does support, the NPU offers lower power but not necessarily higher performance. The author seems to think that the iGPU is the better approach here because it's more powerful and more flexible, even at the cost of higher power because there's many situations where you can plug in a laptop these days. That is, until Intel develops a NPU which does cover more use cases with higher performance while continuing to use lower power.
Meteor Lake’s NPU is a fascinating accelerator. But its has narrow use cases and benefits. If I used AI day to day, I would run off-the-shelf models on the iGPU and enjoy better performance while spending less time getting the damn thing running. It probably makes sense when trying to stretch battery life, but I find myself almost never running off battery power. Even economy class plane seats have power outlets these days. Hopefully Intel will iterate on both hardware and software to expand NPU use cases going forward. GPU compute evolved over the past 15 years to reach a reasonably usable state today. There’s something magical about seeing work offloaded to a low power block, and I hope the same evolution happens with NPUs.
 
Last edited:

Khato

Golden Member
Jul 15, 2001
1,221
275
136
Oh no! Why did he stop at 150W???

@Khato , did you do any further testing at higher PL1 values?
Negative, didn't see a need to as it'd just continue the same trend. 18W per core is already well into the realm of diminishing efficiency for P cores, and well beyond the maximum the E cores will take.

With respect to SMT, in the 'perfect' workload for it which CB23 is, it's of greatest benefit at the highest power levels. Maybe I'll do a fixed frequency test to confirm actual performance and power increase, but lets assume 1.5x perf for 1.5x power to keep it simple for now. With a fixed 18W per core power limit that means non-SMT runs at the 18W frequency (say 5GHz) while SMT runs at the 12W frequency (say 4.4GHz.) Such results in SMT having a 6.6GHz effective performance, right around 30% above non-SMT. But as power per core goes down to something more like the 4W seen in mobile and servers the V/F curve is nowhere near so punishing, so the benefit of SMT drops.

Basically, my guess is that Intel dropping SMT won't have much of an impact. Improvements in the E-cores are going to negate the losses on the high power desktop side where removal of SMT would otherwise have the greatest impact. Mobile is probably going to come out ahead since the E-cores should be doing most of the multithreading work there anyway. And in server it just draws a clearer line in the segmentation between P-core and E-core based designs.

Oh, and for those concerned about 2 P-cores with ample E-core backup being inadequate for gaming? Don't be. While it's definitely true that many modern games benefit from having 8+ cores they need not be equal in their capabilities to extract that benefit. Most that I've seen still only have 2 'heavy' threads while the remainder are 'light'.
 

Panino Manino

Senior member
Jan 28, 2017
846
1,061
136
C&C is continuing their MTL investigation, this time with an article on the NPU:

In summary, it looks like the NPU is of limited use because of the data types it supports (or rather, doesn't support). And of the cases it does support, the NPU offers lower power but not necessarily higher performance. The author seems to think that the iGPU is the better approach here because it's more powerful and more flexible, even at the cost of higher power because there's many situations where you can plug in a laptop these days. That is, until Intel develops a NPU which does cover more use cases with higher performance while continuing to use lower power.

Isn't it enough for Windows tasks, CoPilot, etc?
In theory the NPU may be disappointing, but in practice may give the average user some relevant extra battery time.
 

Ghostsonplanets

Senior member
Mar 1, 2024
500
890
96
Isn't it enough for Windows tasks, CoPilot, etc?
In theory the NPU may be disappointing, but in practice may give the average user some relevant extra battery time.
Considering it doesn’t support future AI PC Windows features, it doesn’t seem so for MS.

Intel is growing up both NPU and GPU MatMul in next generations. In special GPU with the debut of of XMX on iGPUs starting with Lunar Lake and Arrow Lake.
 

DavidC1

Senior member
Dec 29, 2023
322
521
96
Raptor Cove has about 45% or so better IPC than Gracemont when it comes to single core. So while the increased size of Skymont is enticing I find it hard to believe Skymont will approach Raptor Cove IPC, which would be needed for Arrow Lake to compete with Raptor Lake in MT.
Raptor Cove is 40-45% in overall, where the Integer gap is closer and FP gap is large. It's something like 25% Int and 50-60% FP.

"Raptormont" gets 1-3% gain while Crestmont gets 4-6% gain. So a 30% gain with Skymont as did with Atom-based predecessors gets us to the "aiming for ADL" claim on Twitter. I think SKT will be able to reach Golden Cove similar to GMT reaching Skylake.

It means 10-15% faster than Golden Cove for Int while being 10-15% slower in FP. Consequently it means Sierra Glen(which is Crestmont without the 6-wide retire/allocate, or IOTW Gracemont) is similar to 10-15% faster per core than Skylake and is an excellent Cloud core.

If we extrapolate that to Darkmont-based Clearwater Forest, you essentially have an 18A 144-288 core better-than Golden Cove core chip.

On a side note, I speculate the possibility that they aren't backing down on clocks with Skymont hence the greater than expected core size while they are for Lion Cove.
 
  • Like
Reactions: Henry swagger
Apr 23, 2024
3
0
6
P core: 4.55mm²
E core cluster: 8,1mm²
E core (without L2): 1.52mm²
_________________________________
If that is true, the die size of ARL compute tile would be much smaller than the die size of RPL CPU part, is that right?
 

dullard

Elite Member
May 21, 2001
25,165
3,586
126
Isn't it enough for Windows tasks, CoPilot, etc?
In theory the NPU may be disappointing, but in practice may give the average user some relevant extra battery time.
Windows wants 40 TOPS of AI performance as a minimum. Meteor Lake's NPU is 10 TOPS. It is just too slow for what Microsoft wants as a minimum. That is something that won't be available (from any CPU supplier) until the next generation of CPUs later this year.

I appreciate Chips and Cheese going into these tests. But, insisting on 64-bit data for AI seems like totally the wrong approach. AI doesn't usually require that level of precision. AI is all about lots of variables with lots of math at low precision. Even 4-bit and 8-bit math is plenty for many AI applications. 64-bit data would allow for 8x fewer variables in memory and 8x fewer calculations than 8-bit variables. That said, the new drivers for the NPU do include FP64 as of last week, but the drivers now have a significant performance issue. AI, software, drivers, and hardware are still a work in progress. See this post and the one below it that would address Chip and Cheese's FP64 problem: https://github.com/openvinotoolkit/openvino/issues/22846#issuecomment-2056100285
 
Last edited:
  • Like
Reactions: Tlh97 and carancho
Jul 27, 2020
17,503
11,286
106
Windows wants 40 TOPS of AI performance as a minimum. Meteor Lake's NPU is 10 TOPS.
The real question is, what do users of existing systems do? Are dedicated NPU PCIe cards coming out soon or do users have to buy expensive AI-accelerated cards from AMD/nVdia with no video outputs?

What about millions of laptops with no easy way to upgrade the GPU? Will they use USB 3.0 AI acceleration devices? Or maybe tethering a mobile device with an NPU to the laptop and offloading AI calculations to that?
 

Ghostsonplanets

Senior member
Mar 1, 2024
500
890
96
The real question is, what do users of existing systems do? Are dedicated NPU PCIe cards coming out soon or do users have to buy expensive AI-accelerated cards from AMD/nVdia with no video outputs?
Desktop users buy an GPU or buy Arrow Lake.
What about millions of laptops with no easy way to upgrade the GPU? Will they use USB 3.0 AI acceleration devices? Or maybe tethering a mobile device with an NPU to the laptop and offloading AI calculations to that?
They won't get the features, at all. This also includes the latest and greatest from x86 duo: MTL and HWK.

AI PC is partly a push towards AI as MS doesn’t want to miss the next big thing and partly a push to phase out COVID and before laptops sales and push consumers towards new machines. Just think that AI PC push is coming on the heels of W10 EoL. Millions of corportate and consumers will want to replace their old machines.
 

Gideon

Golden Member
Nov 27, 2007
1,697
3,891
136
Finally a Notebookcheck review with a decent gen-on-gen battery life improvement (though the last gen had terrible scores):


Even a slight win over average 7840U laptops with similar battery, but only just. The full load battery life is impressive (40%) better, but only because the performance is similarily throttled. The sustained Cinebench score is about the same as the aforementioned Ryzen Lenovo in "balanced" mode between 19W and 15W CPU load (compared to 30-25W in high-performance mode). So the actual perf / watt is probably quite similar.

Still a decent generational uplift.
 

Ghostsonplanets

Senior member
Mar 1, 2024
500
890
96
Finally a Notebookcheck review with a decent gen-on-gen battery life improvement (though the last gen had terrible scores):


Even a slight win over average 7840U laptops with similar battery, but only just. The full load battery life is impressive (40%) better, but only because the performance is similarily throttled. The sustained Cinebench score is about the same as the aforementioned Ryzen Lenovo in "balanced" mode between 19W and 15W CPU load (compared to 30-25W in high-performance mode). So the actual perf / watt is probably quite similar.

Still a decent generational uplift.
One sore point is that idle, once again, regressed GoG. Tiger Lake, my beloved, had much better idle power.

But, yes, MTL-U is fairly impressive jump in efficiency. In this specific model, it basically doubled battery life at load and WLAN testing also had a healthy jump.

Another good point for Intel is that Arc Graphics 64 EUs are basically matching Iris Xe 96EU while also drawing much less power. Good PPA improvement over Iris Xe (Granted, N5 is basically a node ahead of Intel 7).
 

Ghostsonplanets

Senior member
Mar 1, 2024
500
890
96
The biggest question point about Meteor U is availability and pricing. So far, there's few models to choose and they're priced above similar AMD options. Also, I saw some models where the difference between the U SKU and the H SKU was only $50 - $100.

IMO Intel needs to drop prices on the U SKUs and higher availability, specially on the low-end side of things (Core Ultra 5 125U and Core Ultra 5 115U (The "i3 1215U of this gen")).

Next year, ARL-U should bring some effiency gains due to Intel 3 CPU and N4 iGPU and also some small performance gains on the P core. But I don't think that's enough to stave AMD/QCOM offerings. Lower price and higher availability will be key for Intel.

But there's not much to worry as Intel are king of volume. Their market reach worldwide far outpaces the others two.
 

dullard

Elite Member
May 21, 2001
25,165
3,586
126
The real question is, what do users of existing systems do? Are dedicated NPU PCIe cards coming out soon or do users have to buy expensive AI-accelerated cards from AMD/nVdia with no video outputs?

What about millions of laptops with no easy way to upgrade the GPU? Will they use USB 3.0 AI acceleration devices? Or maybe tethering a mobile device with an NPU to the laptop and offloading AI calculations to that?
I believe that right now a lot of it (such as Microsoft Copilot) is being run through online servers. Of course with business data, that could be a security risk. So, it would be best to get off of the cloud. And who knows how long these companies will be willing to run their servers for free, so you might want off the cloud eventually anyways.
 

Gideon

Golden Member
Nov 27, 2007
1,697
3,891
136
I believe that right now a lot of it (such as Microsoft Copilot) is being run through online servers. Of course with business data, that could be a security risk. So, it would be best to get off of the cloud. And who knows how long these companies will be willing to run their servers for free, so you might want off the cloud eventually anyways.
Business also run AI on local servers.

With ollama it's dead easy and now with llama 3 released, it's very-very competitive to commercial competitors. The biggest downside of it is, that the 70B parameter model requires at least 40GB of VRAM so you can't run it on any gaming GPUs (the 8B model is fine though).

Anyway this also goes to show why client NPUs are total meme for chatbot tasks. Meteor Lake NPU has 11 TOPS. Windows AI requirement will be 40 TOPS.

Well, RTX 4060 has 242 TOPS, RTX 4090 has 1321 TOPS (and is decent but far from fast enough) and most top end models require 64+ GB of memory (with loads of bandwidth to boot).

Those NPUs have some uses, but "running chatbots locally, thus replacing cloud" aint it
 

dullard

Elite Member
May 21, 2001
25,165
3,586
126
Business also run AI on local servers.
...
Those NPUs have some uses, but "running chatbots locally, thus replacing cloud" aint it
NPUs locally certainly aren't going to be running large language model chatbots any time soon. I however so no reason that I'd ever want to run my own chatbot either. Cloud based chatbots are just fine with me.

Tasks like these would run just fine locally with a bit more NPU power (no one in my work has a GPU to run them on any of our computers, all iGPUs here over hundreds of computers) and I would like to use them regularly :
  • What did I miss in that meeting that I had to miss because I was double booked?
  • What are my assignments from the last month?
  • Summarize this document.
  • Put all my emails from project ABC into a new folder.
  • Is there any correlation between these two data sets?
  • Highlight this data set in the chart better.
  • Does my data have an unusual/suspicious pattern indicating fraud?
  • Is there a pattern of issues on the production line indicating part EFG is out of alignment?
  • Are any of the bottles on the conveyor belt cracked?
  • Etc.
AI is more than chat.
 
Last edited:

Gideon

Golden Member
Nov 27, 2007
1,697
3,891
136
I however so no reason that I'd ever want to run my own chatbot either. Cloud based chatbots are just fine with me.

Yeah, they are fine for many, but plenty of companies has strict-enough rules/contracts that disallow them tot share any company's (or client's) data with these platforms.

AI is more than chat.

Agreed, that's why I mentioned they have some uses.

I'll end the OT talk now.
 

dullard

Elite Member
May 21, 2001
25,165
3,586
126
Yeah, they are fine for many, but plenty of companies has strict-enough rules/contracts that disallow them tot share any company's (or client's) data with these platforms.
I guess my point didn't go through. I think AI as a chatbot has very limited business usage. AI in general has lots of business use cases. Your posts made it sound like just some usage was other things and that chatbot were where it is at. Maybe it was your 4 paragraphs talking about chatbots and then one line of "have some uses" made me think you decided the other AI uses were not very important.

Any way you look at it though, Meteor Lake's NPU isn't sufficient for most uses.
 

dullard

Elite Member
May 21, 2001
25,165
3,586
126
With ollama it's dead easy and now with llama 3 released, it's very-very competitive to commercial competitors. The biggest downside of it is, that the 70B parameter model requires at least 40GB of VRAM so you can't run it on any gaming GPUs (the 8B model is fine though).
Good timing for this discussion. Microsoft just announced Phi-3. https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/
  • Phi-3-mini uses 3.8 billion parameters, half as much as Llama-3-8B, yet slightly higher quality.
  • Phi-3-small uses 7 billion parameters
  • Phi-3-medium uses 14 billion parameters--nothing like that 70 billion parameter model you discussed.
This is why I mentioned limited use for large language models on individual CPUs. Who really needs a local chatbot that can discuss everything? If you are a medical clinic, just train on medical texts and skip training on the fire codes of Zimbabwe. If you are selling widgets at a hardware store, just train on plumbing and electrical and construction text and skip texts on the history of 11th century Asia. These mini to medium models can be run locally on many of the new CPUs coming out later this year.
 
Last edited:

Gideon

Golden Member
Nov 27, 2007
1,697
3,891
136
Good timing for this discussion. Microsoft just announced Phi-3.
Yeah I actually wanted to discuss it, as it was released yesterday but i said i'll end the OT.

Phi3's smaller versions are probably something they will embed to windows, but tune for os assistance rather than chat.

The 14b medium model truly looks excellent, but i'd wait for more benchmarks before claiming it's across the board better than llama3 though, MS probably cherry picked the benchmark to show it in the strongest light
 

AMDK11

Senior member
Jul 15, 2019
325
219
116
I made a discovery about the LionCove core!

Apparently, the LionCove core graphics do not represent anything specific. Only apparently!

Enlarge the LionCove graphic and compare it to the Redwood Cove diagram!
Looks familiar? Yes!

LunarLake art includes an unlabeled diagram of the LionCove core!!!

From what I've read so far from LionCove's daigram:
8-Way Dispatch/Rename (GoldenCove 6-Way)
6x AGU + 2x Store/Data (GoldenCove 5x AGU + 2x SD)
6x ALU + 4x ALU-FP or 6x ALU + 4x FPU!!! (GoldenCove 3x ALU-FP + 2x ALU)

If these 10 execution ports are 4x ALU-FP + 6x ALU, this gives a gigantic amount of 10x ALU! Which, together with AGU and SD, gives 18 execution ports compared to 12 ports for GoldenCove.

Edit:
It seems that Skymont has a 3x 3-Way decoder (Gracemont and Crestmont 2x 3-Way).
 
Last edited:
  • Like
Reactions: Henry swagger

Cheesecake16

Junior Member
Aug 5, 2020
5
26
61
I made a discovery about the LionCove core!

Apparently, the LionCove core graphics do not represent anything specific. Only apparently!

Enlarge the LionCove graphic and compare it to the Redwood Cove diagram!
Looks familiar? Yes!

LunarLake art includes an unlabeled diagram of the LionCove core!!!

From what I've read so far from LionCove's daigram:
8-Way Dispatch/Rename (GoldenCove 6-Way)
6x AGU + 2x Store/Data (GoldenCove 5x AGU + 2x SD)
6x ALU + 4x ALU-FP or 6x ALU + 4x FPU!!! (GoldenCove 3x ALU-FP + 2x ALU)

If these 10 execution ports are 4x ALU-FP + 6x ALU, this gives a gigantic amount of 10x ALU! Which, together with AGU and SD, gives 18 execution ports compared to 12 ports for GoldenCove.

Edit:
It seems that Skymont has a 3x 3-Way decoder (Gracemont and Crestmont 2x 3-Way).
I think you are reading way too much into this diagram.....
8 way Dispatch and Rename seems reasonable as does the 6 AGUs and 2 Store Data ports likely split evenly between a separate 4 port load schedule and a 4 port store scheduler like Golden Cove.

But the assumption that you are going to have a single 10 port scheduler for the Unified Math Scheduler is insane when you consider the amount of routing that a 10 port scheduler would require.....
It's much more likely that what it will be is a 6 port scheduler where all 6 ports have integer ALUs and 4 of the ports also have floating point ALUs, again just like Golden Cove, which gives a total of 14 ports (6 for the unified math scheduler, 4 for the load scheduler, and 4 for the store scheduler).