Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 783 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
941
857
106
Wildcat Lake (WCL) Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing Raptor Lake-U. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q1 2026.

Intel Raptor Lake UIntel Wildcat Lake 15W?Intel Lunar LakeIntel Panther Lake 4+4+4
Launch DateQ1-2024Q2-2026Q3-2024Q1-2026
ModelIntel 150UIntel Core 7Core Ultra 7 268VCore Ultra 7 365
Dies2223
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6Intel 18-A + Intel 3 + TSMC N6
CPU2 P-core + 8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-cores4 P-core + 4 LP E-cores
Threads12688
Max Clock5.4 GHz?5 GHz4.8 GHz
L3 Cache12 MB12 MB12 MB
TDP15 - 55 W15 W ?17 - 37 W25 - 55 W
Memory128-bit LPDDR5-520064-bit LPDDR5128-bit LPDDR5x-8533128-bit LPDDR5x-7467
Size96 GB32 GB128 GB
Bandwidth136 GB/s
GPUIntel GraphicsIntel GraphicsArc 140VIntel Graphics
RTNoNoYESYES
EU / Xe96 EU2 Xe8 Xe4 Xe
Max Clock1.3 GHz?2 GHz2.5 GHz
NPUGNA 3.018 TOPS48 TOPS49 TOPS






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,042
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,531
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,439
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,325
Last edited:

gdansk

Diamond Member
Feb 8, 2011
4,742
8,036
136
It absolutely did work for AMD. Ryzen trailed in single core performance for Zen, Zen+, and Zen 2, while leading in core counts.
It was a different era. The transition from anemic to excessive multithread for consumer CPUs has already occured. The number of people who can make good use of 12 cores is relatively high. The number of people who can make good use of 48 or 52 cores is much lower.

It won't work as a part for Intel to make a lot of money. Too much cost and too much price pressure from 48T Zen 6 parts. Nor will it work to gain many customers. It's a small group of people who might have bought into their Xeon W series anyway. But better that you cannibalize yourself than let AMD do it.

And depending on price, power, performance and instruction set compatibility I may be interested myself.
 
  • Like
Reactions: Tlh97 and inquiss

coercitiv

Diamond Member
Jan 24, 2014
7,467
17,836
136
AMD’s memory controller is also on the SoC (IO die).
Yes it is, but AMD is comfortable with this trade-off as their product is server first, consumer second. Arrow Lake is consumer first, and their compute tile is almost 120mm2 so they're not taking full advantage of the modular approach (cost efficiency) while still paying the biggest cost (mem. latency).

The stacked cache will obviously help, but the current design sure looks like it had the cache in mind from the get-go.
 

OneEng2

Senior member
Sep 19, 2022
981
1,191
106
It will sell as intel need to make it work so price will be very competitive, its other unknown is how well new process will be, I am mostly interested perf/watt in multithreaded use + avx10

We know we lost any hope that any new intel node just do not compute and have a lots issue, they still have dozen tech that are lead edge and they sure need some luck and to get their fab in order
Intel can no longer afford to be inefficient. Putting the IMC on the compute tile is yet ANOTHER sign that Intel is NOT thinking differently than they have in the past IMO. By having larger die size than AMD they have guaranteed another generation of a non competitive part from a business perspective.

I am intrigued. If you are interested in perf/watt in MT with AVX10, why aren't you interested in Zen 5/ Zen 6 which both have full 512bit AVX data paths?

I agree with you that Intel is behind on fab tech. It isn't just the trace width any more. Packaging and a compute tile that is designed to work with the added latency of an off-tile memory controller are vitally important (as is 3D cache for gaming niche).
If ultra300s has 2 computer tiles , it will be 4 channel RAM!IMC is on the computer tile !
LOL. Yea, because consumers aren't at all sensitive to higher prices.
Meanwhile Intel shaved $100 from the price of Ultra 7:

This is what happens when you have extra cores but not the consistent ST performance uplift that users were expecting.
Agree. I would add that it wasn't JUST ST though. Core Ultra also did not manage to best Zen 5 in MT either (just CB24 where I think they only win because of bandwidth). In most apps, ARL was significantly slower than both 14K and Zen 5.
Intel is bleeding cash, we'll see if competitive pricing is enough.
Not a good sign IMO. The term "Between a rock and a hard place" comes to mind.
Quad channel RAM and distributed memory controller sounds like a very complex and expensive solution for a niche consumer product. Makes very little sense to me, unless this is meant for HEDT / workstation and not consumer.
Yep. Can't see anyone in the consumer market shelling out the cash for this when a less expensive system will do what they need, have better battery life, and be smaller and lighter.... and still get the job done.
It absolutely did work for AMD. Ryzen trailed in single core performance for Zen, Zen+, and Zen 2, while leading in core counts. Zen only became a single core beast with Zen 3 and X3D.
IIRC, on desktop and laptop AMD didn't regain much of anything with Zen, Zen+ and Zen 2 (in fact Zen 2 suffered a bit of a back track because of the latency penalty associated with moving to chiplets). It wasn't until Zen 3 that ST on Zen became competitive IIRC (please correct me if I am off base here).

Where AMD REALLY made headway with Zen was in DC IMO. While Intel was busy cranking up the clock cycles and heat staying ahead in the desktop market, AMD ate their lunch in the VERY profitable DC market. Bad product management on Intel's part IMO.
It was a different era. The transition from anemic to excessive multithread for consumer CPUs has already occured. The number of people who can make good use of 12 cores is relatively high. The number of people who can make good use of 48 or 52 cores is much lower.

It won't work as a part for Intel to make a lot of money. Too much cost and too much price pressure from 48T Zen 6 parts. Nor will it work to gain many customers. It's a small group of people who might have bought into their Xeon W series anyway. But better that you cannibalize yourself than let AMD do it.

And depending on price, power, performance and instruction set compatibility I may be interested myself.
True. Moving from 2-4 cores .... VERY big deal. Moving from 4 to 8 cores, decent. Moving from 8 to 16 .... more "eh" I think. Moving from 16 to 52? Why?

It's a simple example of the rule of diminishing returns.
 
  • Like
Reactions: Tlh97 and inquiss

coercitiv

Diamond Member
Jan 24, 2014
7,467
17,836
136
Seems that Intel's implementation for Arrow Lake is also affecting SSD performance. If it were just sequential PCIe 5.0 speed then I wouldn't care at all, but it's also affecting random read/write performance (relative to RPL in Z790):

some snippets from the article:
We have consistently received substandard results of 12GB/s sequential read data throughput in our Z890 motherboards, and have had peers test the same scenario and obtain the same results. We have also found countless published reports where the same issue is in print, always with a Gen5 14GB/s capable SSD within a Z890 motherboard. In fact, we have been unable to confirm ANY Z890 motherboard Gen 5 M.2 slots providing a result of 14GB/s sequential read from a Gen5 SSD.
random write performance is also substandard to what we have seen in our Z790 Test Bench
Intel can confirm that the PCIe Lanes 21 to 24 Gen5 root port on Intel Core Ultra 200S series processors may exhibit increased latencies compared to the PCIe Lanes 1 to 16 Gen5 root ports, owing to a longer die-to-die data path. However, any variations are contingent upon the specific workload and the capabilities of the PCIe endpoint device.

I hope they address some of the latency issues with an Arrow Lake refresh, that would give more confidence in NVL.
 

OneEng2

Senior member
Sep 19, 2022
981
1,191
106
Seems that Intel's implementation for Arrow Lake is also affecting SSD performance. If it were just sequential PCIe 5.0 speed then I wouldn't care at all, but it's also affecting random read/write performance (relative to RPL in Z790):

some snippets from the article:




I hope they address some of the latency issues with an Arrow Lake refresh, that would give more confidence in NVL.
It is BY FAR the biggest flaw in ARL architecture IMO. The good news is that we can only imagine how good the performance would be once this horrendous flaw is removed for Nova Lake.
 

511

Diamond Member
Jul 12, 2024
5,385
4,802
106
Any task, more program at same time, etc that can use 16 cores will use 64 and finish the process way faster, if a code can scale to 16 it can scale to 64 threads or more
Ahmdal's law the task parallelization is only limited by the part of a task that can't be parallelized(this is not the full defination but it is grossly oversimplified here).
 

511

Diamond Member
Jul 12, 2024
5,385
4,802
106
Yeah kinda of.
CWF is a year late which stinks.
it's the same kind of tech though they are just using it differently like AMD is doing in MI series NVL is using some different approach from what i heard.
Seems that Intel's implementation for Arrow Lake is also affecting SSD performance. If it were just sequential PCIe 5.0 speed then I wouldn't care at all, but it's also affecting random read/write performance (relative to RPL in Z790):

some snippets from the article:




I hope they address some of the latency issues with an Arrow Lake refresh, that would give more confidence in NVL.
Yeah i saw that it is affecting Read/Write Speed they are capping out at 12GB/s instead of 14GB/s but i doubt this would affect anyone at this speed your cache will fill first and this is sequential the problem is the drop in random 4K data which is more important. NVL is not a badly defined product like ARL/MTL It's incredibly based from what i have heard and all the variants it has from measly CPUs to NVL Mobile Halo and HEDT to Only P and E SKUs.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,673
3,372
136
Intel can no longer afford to be inefficient. Putting the IMC on the compute tile is yet ANOTHER sign that Intel is NOT thinking differently than they have in the past IMO. By having larger die size than AMD they have guaranteed another generation of a non competitive part from a business perspective.

I am intrigued. If you are interested in perf/watt in MT with AVX10, why aren't you interested in Zen 5/ Zen 6 which both have full 512bit AVX data paths?

I agree with you that Intel is behind on fab tech. It isn't just the trace width any more. Packaging and a compute tile that is designed to work with the added latency of an off-tile memory controller are vitally important (as is 3D cache for gaming niche).

LOL. Yea, because consumers aren't at all sensitive to higher prices.

Agree. I would add that it wasn't JUST ST though. Core Ultra also did not manage to best Zen 5 in MT either (just CB24 where I think they only win because of bandwidth). In most apps, ARL was significantly slower than both 14K and Zen 5.

Not a good sign IMO. The term "Between a rock and a hard place" comes to mind.

Yep. Can't see anyone in the consumer market shelling out the cash for this when a less expensive system will do what they need, have better battery life, and be smaller and lighter.... and still get the job done.

IIRC, on desktop and laptop AMD didn't regain much of anything with Zen, Zen+ and Zen 2 (in fact Zen 2 suffered a bit of a back track because of the latency penalty associated with moving to chiplets). It wasn't until Zen 3 that ST on Zen became competitive IIRC (please correct me if I am off base here).

Where AMD REALLY made headway with Zen was in DC IMO. While Intel was busy cranking up the clock cycles and heat staying ahead in the desktop market, AMD ate their lunch in the VERY profitable DC market. Bad product management on Intel's part IMO.

True. Moving from 2-4 cores .... VERY big deal. Moving from 4 to 8 cores, decent. Moving from 8 to 16 .... more "eh" I think. Moving from 16 to 52? Why?

It's a simple example of the rule of diminishing returns.
Moving to 52 cores isn't exactly the target. It's more that Intel seems to have run into a brick wall with how large they can ECONOMICALLY make their CCDs. It could be any number of reasons from ring stops to total die size to thermal limits, but no matter what, they had reason to believe that AMD was improving their two biggest limitations: memory bandwidth to the CCDs and the total number of cores on their products. 8+16 wasn't going to get it done against the expected Zen6 stack and they needed more. What was cheaper to do, make another CCD that was larger and would have limited volume, make a second CCD that was bespoke for their high end parts with just E cores, or just reuse their existing ccd as a second CCD in the same package, like AMD does? The result is that they now have a 52 core beast that is likely not to be optimal. I also think that they are also running up against 18a node thermal issues when trying to clock up their e cores to provide the desired throughput. Keeping clocks restrained near the efficiency peak while doubling their number was likely a far better bet, especially when those cores do their best when the threads are highly parallelizable where they already show good performance. This also allows them to spread the heat around from their P cores. Instead of having to try to clock all 8 of them to the moon on one tight CCD, they can aim to clock 4 per CCD as high as they can, better distributing the thermal load, while keeping the other 8 cores several hundred Mhz slower.

52 wasn't necessarily the target number, just a result of the tools they had in their toolbox. I'm also interested in seeing how the "die recovery" product will do, AKA the 12P+24e+4LPE - 40 core part will do. It may be very close to the 52 core part in performance across the board, save for very specific benchmarks.
 

511

Diamond Member
Jul 12, 2024
5,385
4,802
106
Moving to 52 cores isn't exactly the target. It's more that Intel seems to have run into a brick wall with how large they can ECONOMICALLY make their CCDs. It could be any number of reasons from ring stops to total die size to thermal limits, but no matter what, they had reason to believe that AMD was improving their two biggest limitations: memory bandwidth to the CCDs and the total number of cores on their products. 8+16 wasn't going to get it done against the expected Zen6 stack and they needed more. What was cheaper to do, make another CCD that was larger and would have limited volume, make a second CCD that was bespoke for their high end parts with just E cores, or just reuse their existing ccd as a second CCD in the same package, like AMD does? The result is that they now have a 52 core beast that is likely not to be optimal. I also think that they are also running up against 18a node thermal issues when trying to clock up their e cores to provide the desired throughput. Keeping clocks restrained near the efficiency peak while doubling their number was likely a far better bet, especially when those cores do their best when the threads are highly parallelizable where they already show good performance. This also allows them to spread the heat around from their P cores. Instead of having to try to clock all 8 of them to the moon on one tight CCD, they can aim to clock 4 per CCD as high as they can, better distributing the thermal load, while keeping the other 8 cores several hundred Mhz slower.

52 wasn't necessarily the target number, just a result of the tools they had in their toolbox. I'm also interested in seeing how the "die recovery" product will do, AKA the 12P+24e+4LPE - 40 core part will do. It may be very close to the 52 core part in performance across the board, save for very specific benchmarks.
Bruh there are only 3 dies can just mix match 8+16,4+8,4+0 AND 4 LP E cores Raichu leaked it eons ago how they are going to do it
1746800387220.jpeg

At least some parts of Intel are doing good stuff
1746800375013.jpeg
 

LightningZ71

Platinum Member
Mar 10, 2017
2,673
3,372
136
But that's my point, they have just 3 CCDs to choose from. Adding either of the smaller dies doesn't justify the expense. Go big or go home, so just use another 8+16. 52 cores wasn't necessarily the target number, just a function of what they had to work with. It's not bad, it just IS.
 

511

Diamond Member
Jul 12, 2024
5,385
4,802
106
But that's my point, they have just 3 CCDs to choose from. Adding either of the smaller dies doesn't justify the expense. Go big or go home, so just use another 8+16. 52 cores wasn't necessarily the target number, just a function of what they had to work with. It's not bad, it just IS.
If they improve upon their Physical design coupled with the gains from N2 they can make 8+16 at around 100mm2(~120/1.2 = 100mm2) rest from improving their design.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,673
3,372
136
You are assuming that their cores don't further bloat transistor count as they evolve. I don't doubt that their cores are going to gain transistors with each generation.
 

511

Diamond Member
Jul 12, 2024
5,385
4,802
106
You are assuming that their cores don't further bloat transistor count as they evolve. I don't doubt that their cores are going to gain transistors with each generation.
I am assuming they fix their mess up with LNC Physical design and N2 being denser as well vs 3-2 fin lib they are using for Lion Cove.
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,574
731
126
diminishing returns
as in frequency and IPC increase no longer improving much from one CPU generation to the next.

To get any substantial performance increase, adding more cores is then an option. E.g. 16C->32C is 100% increase. Even if you don’t get 100% perf increase in all workloads it’s still a massive increase. Compare that to measly 5-7% IPC increases from one CPU generation to the next.
 

coercitiv

Diamond Member
Jan 24, 2014
7,467
17,836
136
To get any substantial performance increase, adding more cores is then an option. E.g. 16C->32C is 100% increase. Even if you don’t get 100% perf increase in all workloads it’s still a massive increase.
So if I upgrade from 8 to 16 cores, will my browser be 50% faster or even more? Surely a 100% increase in core count will be massive for all mainstream consumer workloads, right?
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,574
731
126
So if I upgrade from 8 to 16 cores, will my browser be 50% faster or even more? Surely a 100% increase in core count will be massive for all mainstream consumer workloads, right?
As I wrote, not in all workloads. And different workloads will benefit more from it than others. But over time, more applications will be adapted to make better use of more cores, the more cores are commonly available.

We’ve already seen this trend over a long time. Compare the applications and games from 10+ years to what we have today and see how much they have improved in this regard already.

There”s really not many other options than to increase performance via increased core count currently, since we’re not getting much perf increase from IPC or frequency increase anymore.
 
Last edited:

OneEng2

Senior member
Sep 19, 2022
981
1,191
106
You are assuming that their cores don't further bloat transistor count as they evolve. I don't doubt that their cores are going to gain transistors with each generation.
It is my opinion that nearly ALL IPC improvements and ALL higher core count improvements are done ONLY when a new process node allows higher density within the same die size.

If process shrinks become increasingly longer between shrinks AND these shrinks get progressively lower density improvements, then the time between new generations of processors will also become increasingly longer.

Sure, there will be incremental improvements between generations (certainly Nova Lake can improve on Arrow Lake by simply fixing some of those God awful latency problems), but generally these are smaller incremental improvements.
as in frequency and IPC increase no longer improving much from one CPU generation to the next.

To get any substantial performance increase, adding more cores is then an option. E.g. 16C->32C is 100% increase. Even if you don’t get 100% perf increase in all workloads it’s still a massive increase. Compare that to measly 5-7% IPC increases from one CPU generation to the next.
I was specifically talking about increasing core count beyond the current 16c/32t for AMD and 24c/24t for Intel.