Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 203 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
942
857
106
Wildcat Lake (WCL) Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing Raptor Lake-U. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q1 2026.

Intel Raptor Lake UIntel Wildcat Lake 15W?Intel Lunar LakeIntel Panther Lake 4+0+4
Launch DateQ1-2024Q2-2026Q3-2024Q1-2026
ModelIntel 150UIntel Core 7Core Ultra 7 268VCore Ultra 7 365
Dies2223
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6Intel 18-A + Intel 3 + TSMC N6
CPU2 P-core + 8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-cores4 P-core + 4 LP E-cores
Threads12688
Max Clock5.4 GHz?5 GHz4.8 GHz
L3 Cache12 MB12 MB12 MB
TDP15 - 55 W15 W ?17 - 37 W25 - 55 W
Memory128-bit LPDDR5-520064-bit LPDDR5128-bit LPDDR5x-8533128-bit LPDDR5x-7467
Size96 GB32 GB128 GB
Bandwidth136 GB/s
GPUIntel GraphicsIntel GraphicsArc 140VIntel Graphics
RTNoNoYESYES
EU / Xe96 EU2 Xe8 Xe4 Xe
Max Clock1.3 GHz?2 GHz2.5 GHz
NPUGNA 3.018 TOPS48 TOPS49 TOPS






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,044
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,531
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,439
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,326
Last edited:

TESKATLIPOKA

Platinum Member
May 1, 2020
2,696
3,260
136
Apple Silicon has only one thread per core. Hyper threading isn't a necessity for IPC uplift in LNC I guess. New architecture, new paradigm.
HT increases nT performance by 25-30%. It's certainly not a small amount, but It also increased power consumption by a similar amount, as far as I remember from Skylake reviews I think.
So let's say 4 P-cores consume 20W at 2.5GHz, then enabling HT would increase It to 25W.
In this case by getting rid of HT, you save 5W, which can be used for higher clockspeed.
The problem is that this 5W or 25% higher power won't allow you to clock these 4 P-cores 25% higher, so you either loose performance or efficiency.
Intel has E-cores, but you would need 2 E-cores at 2.2-2.5GHz, which can't consume more than 5W in total to compensate for the missing HT.
I think by removing HT, they can dedicate more resources in the core to uplift ST performance.
HT uses very little core space <10%, as far as I remember.
 

SiliconFly

Golden Member
Mar 10, 2023
1,924
1,284
106
I have to admit I do listen to MLID, it's a guilty pleasure. Every now and then while I'm working I put him on in the background. You know what I noticed? He's not really a tech guy. Seems like his interest/knowledge in the tech is superficial. Where is his encyclopedic is in his marketing knowledge. He seems to know when every product CPU and GPU was released, how much it cost, and general performance.

He was explaining to the other guy in the latest video how the E cores are for efficiency but they're really not all that efficient. So many people still don't understand that Intel told us they were for area efficiency, not power efficiency. MLID was saying the P cores are actually more efficient than the E cores! Yeah, that's right, that's because they should be given the die space they consume for the compute they produce. Argh!
I'm sure he's directly sponsored by the same company he promotes all the time. He may lack technical knowledge, but he has one of the best sources (inside amd I presume).
 

CouncilorIrissa

Senior member
Jul 28, 2023
788
2,855
106
I'm sure he's directly sponsored by the same company he promotes all the time. He may lack technical knowledge, but he has one of the best sources (inside amd I presume).
I don't think he promotes AMD, he just throws shit at the wall and then cleans up that which didn't stick.
Didn't he suggest that RWC would bring double-digit IPC gains?
 

mikk

Diamond Member
May 15, 2012
4,333
2,413
136
Seems like Acer lowered the sustained power quite a bit in retail devices on the Acer Swift Go, it's 27W in this retail review :


From the pre production test from Notebookcheck:

As mentioned at the beginning, the Acer Swift Go 14 we are dealing with is a device that corresponds to the standard spec hardware. Our review machine's software and firmware aren't yet quite perfected. For example, in our Swift Go 14, the values for the boost performance were initially configured somewhat too high. In the course of the test, we also had problems with the preinstalled AlterView which creates visually enticing 3D backgrounds. After a lively exchange with Acer, we decided to remove the software. We were also able to lower the PL2 to 55 watts while leaving the PL1 at 45 watts, all with the aid of TechPowerUp's Throttle Stop. This helped the laptop run considerably better and more stable. Acer will undertake some significantly more detailed fine-tuning when it comes the final performance management. This should result in the laptop enjoying better performance than it currently does.
 

DavidC1

Platinum Member
Dec 29, 2023
2,182
3,329
106
I disagree that the P cores are all that matters. It’s entirely possible that Skymont nearly has a 12-14% IPC increase over Gracemont. This gets Skymont pretty close to Zen 3 IPC. So ARL will basically have 8 pcores with 16 ecores that are basically equivalent to a 5950X without SMT.
Nevermind Skymont. Crestmont* in Meteorlake improves it by 4-6% according to Intel, but based on one Chinese review, they got nearly 7.5% improvement.

*Sierra Glen gets zero pretty much. Opposite on the server, where the Granite Rapids core gets decent improvements but Redwood Cove in MTL gets almost nothing.
So they are preparing atom core for closing the gap with big cores -> introducing the 3rd cluster, having 3 fetch queues and chewing 36-48 bytes per clock from L1I
Chipsandcheese has got one thing wrong about Gracemont.

Gracemont is fed by the L1i cache at 2x32B rate, which is double Golden Cove's and also double the rate fed by it's own OD-ILD(2x16B).

Regards,
formerly IU2K
 

SiliconFly

Golden Member
Mar 10, 2023
1,924
1,284
106
HT increases nT performance by 25-30%. It's certainly not a small amount, but It also increased power consumption by a similar amount, as far as I remember from Skylake reviews I think.
So let's say 4 P-cores consume 20W at 2.5GHz, then enabling HT would increase It to 25W.
In this case by getting rid of HT, you save 5W, which can be used for higher clockspeed.
The problem is that this 5W or 25% higher power won't allow you to clock these 4 P-cores 25% higher, so you either loose performance or efficiency.
Intel has E-cores, but you would need 2 E-cores at 2.2-2.5GHz, which can't consume more than 5W in total to compensate for the missing HT.

HT uses very little core space <10%, as far as I remember.
One of the key reasons Intel might have ditched Hyper-threading in LNC is cos it's power hungry. One of the stated drawbacks of HT:

"HT was criticized for energy inefficiency. ARM stated SMT can use up to 46% more power than ordinary dual-core designs (and also increases cache thrashing)."

And LNC being a grounds up power efficient design, HT may not be a good fit. And the most important reason I think is, Hyper-threading does not contribute to ST performance in any way. And having HT in this age with each CPU having multiple cores doesn't make much sense (cos HT kicks in only when all the physical cores are running full steam already).

Note: HT is actually very good for servers though.
 

DavidC1

Platinum Member
Dec 29, 2023
2,182
3,329
106
One of the key reasons Intel might have ditched Hyper-threading in LNC is cos it's power hungry. One of the stated drawbacks of HT:

"HT was criticized for energy inefficiency. ARM stated SMT can use up to 46% more power than ordinary dual-core designs (and also increases cache thrashing)."

And LNC being a grounds up power efficient design, HT may not be a good fit. And the most important reason I think is, Hyper-threading does not contribute to ST performance in any way. And having HT in this age with each CPU having multiple cores doesn't make much sense (cos HT kicks in only when all the physical cores are running full steam already).
SMT uses less than 5% die area of a core, probably 2-3%.

The problem with SMT that even before the potential security flaws, it increases validation time with the design. Back in the low core days it made much sense, but it seems we're in an era where the tradeoff isn't as worth it.

Better execution over many generations may end up being better over having HT.

Remember, their own Atom team that has consistent track record of execution also does not use HT, abandoned ever since they moved to OoOE back in the second Atom.

@adroc_thurston GNR gets Improved FP, OoOE units, and improved branch predictor over just the L1i doubling present in client Redwood Cove. Decent meaning few % not zero.
 

SiliconFly

Golden Member
Mar 10, 2023
1,924
1,284
106
SMT uses less than 5% die area of a core, probably 2-3%.

The problem with SMT that even before the potential security flaws, it increases validation time with the design. Back in the low core days it made much sense, but it seems we're in an era where the tradeoff isn't as worth it.

Better execution over many generations may end up being better over having HT.

Remember, their own Atom team that has consistent track record of execution also does not use HT, abandoned ever since they moved to OoOE back in the second Atom.

@adroc_thurston GNR gets Improved FP, OoOE units, and improved branch predictor over just the L1i doubling present in client Redwood Cove. Decent meaning few % not zero.
Agree. Actually, recent implementations may take more than 5% due to further optimizations and security mitigations.

And the added complexity is just not worth it as it gets in the way of ST performance design/optimizations.
 

DavidC1

Platinum Member
Dec 29, 2023
2,182
3,329
106
Agree. Actually, recent implementations may take more than 5% due to further optimizations and security mitigations.
It don't matter. Extra space taken up by SMT is still 2-3%.

SMT is actually pretty power efficient too, when the tasks are well threaded. Hence why some call it "poor man's SMP". But now even pocket computers have 4+ cores.

But making it difficult to validate matters, because people always forget it's the guys working on the product is what makes it work, and any theoretical gains are nullified by increased risks. Every generation that gets delayed feeds into the successors. Every generation with SMT increases the potential for the delay.
 

SiliconFly

Golden Member
Mar 10, 2023
1,924
1,284
106
I'll wait for better sources as they had two distinct presentations and we know Redwood Cove on MTL is a 0% gain.
RWC is exactly same as previous gen, clock frequencies too are similar to previous gen & no known significant performance optimizations either. They played it too safe. So, we can't expect much performance gains with RWC at this point I guess.

But Intel 7 to Intel 4 combined with DLVR should have provided at least 15% to 20% efficiency gains for RWC alone. But the power efficiency results with pre-production laptops are all over the place and it's a bit confusing at the moment. Hopefully, newer tests with updated pcode should give clearer results.
 

ondma

Diamond Member
Mar 18, 2018
3,320
1,709
136
One of the key reasons Intel might have ditched Hyper-threading in LNC is cos it's power hungry. One of the stated drawbacks of HT:

"HT was criticized for energy inefficiency. ARM stated SMT can use up to 46% more power than ordinary dual-core designs (and also increases cache thrashing)."

And LNC being a grounds up power efficient design, HT may not be a good fit. And the most important reason I think is, Hyper-threading does not contribute to ST performance in any way. And having HT in this age with each CPU having multiple cores doesn't make much sense (cos HT kicks in only when all the physical cores are running full steam already).

Note: HT is actually very good for servers though.
I thought it was because they were supposed to go to "rentable units" and they could not get them working.
 

SiliconFly

Golden Member
Mar 10, 2023
1,924
1,284
106
I thought it was because they were supposed to go to "rentable units" and they could not get them working.
Rentable Units is a relatively new concept. And I believe, at this point, it's just a concept and may not make it into end products any time now. It's extremely complex. If it was doable, other companies like AMD, Apple, Qualcomm too would have picked up on it already. So, it's safe to say, not to give it too much thought until (or if) they announce it..

Other reason is, it sounds too good to be true. The golden rule is, on a given (existing) system, a thread's performance cannot exceed the performance of a single core. Whereas, in a Rentable Unit system, the thread performance can casually exceed that hard limit without breaking a sweat. Sounds like a dream, cos it probably is.

Other way to look at it is, on a 8 core system with a full implementation of Rentable Units, a single thread can run 8X faster when compared to running on a 8 core computer without Rentable Units. A single thread's speed is not limited by a single core's ST performance anymore!

In short. under very ideal circumstances, ST & MT performance will become the same in Rentable Units. Sounds way too good to be true. So, meh!
 
  • Like
Reactions: trivik12

controlflow

Member
Feb 17, 2015
198
348
136
Seems like Acer lowered the sustained power quite a bit in retail devices on the Acer Swift Go, it's 27W in this retail review :


From the pre production test from Notebookcheck:

They show 2 different Cinebench R23 MT scores. 15,047 and 13,446. Is this an error or is the 15k score at 45W+ and the lower score at 28W? He mentions "28W" for the 13.4k score but it is not clear if he is saying the test was truly capped at 28W or if it was boosting above it for a while.

These numbers seem much higher than the Zenbook on R23.
1703878896328.png
1703878858803.png
 

ondma

Diamond Member
Mar 18, 2018
3,320
1,709
136
that's nuts.
IDK, I am not a computer engineer, but I read an article comparing rentable units and hyperthreading, and I didnt see them come to that conclusion.
The surprising thing about ARL is that if it truly does not have hyperthreading, the leaks I saw didnt say anything about increasing E cores either.

Seems like the worst of both worlds. Loss of single thread performance due to clock regression, and loss of ultimate MT performance due to no HT.
My feeling is that Zen 5 will dominate in both.
 

Saylick

Diamond Member
Sep 10, 2012
4,121
9,641
136
Other way to look at it is, on a 8 core system with a full implementation of Rentable Units, a single thread can run 8X faster when compared to running on a 8 core computer without Rentable Units. A single thread's speed is not limited by a single core's ST performance anymore!
Pretty sure you cannot scale ST performance by X amount just by scaling up a theoretical core's resources by the same increase. A single thread will never fully saturate the width of a core at all times because of instruction dependencies, which is exactly why going wider doesn't give you proportional IPC uplift. It's also why SMT was created, so that you get more throughput of a given core, but not more ST performance. Rentable Units, if it's actually possible, likely means better utilization of silicon area since you don't need a separate big cores, which are not efficient from a perf/mm2 point of view since ST performance has diminishing returns with core area.
 
  • Like
Reactions: TESKATLIPOKA

Khato

Golden Member
Jul 15, 2001
1,379
487
136
Back on MTL, another video has come along comparing the Asus new and old BIOS:

The benchmarks themselves have the usual problem of not being run at a static power level, so I wouldn't say they're of much interest. What is nice is starting around 3 minutes in are graphs of average temperature, frequency, and power versus time on a Prime95 run with both new and old BIOS. While the power graph clearly indicates that the new BIOS does allow the CPU to consume more power for a time, by the end of the graph the power consumption is equivalent between new and old BIOS... but the clock speed with new BIOS at that steady state is about 10% higher than the old BIOS. Also the new BIOS shows a markedly more consistent clock speed in general.
 

cebri1

Senior member
Jun 13, 2019
373
405
136
Back on MTL, another video has come along comparing the Asus new and old BIOS:

The benchmarks themselves have the usual problem of not being run at a static power level, so I wouldn't say they're of much interest. What is nice is starting around 3 minutes in are graphs of average temperature, frequency, and power versus time on a Prime95 run with both new and old BIOS. While the power graph clearly indicates that the new BIOS does allow the CPU to consume more power for a time, by the end of the graph the power consumption is equivalent between new and old BIOS... but the clock speed with new BIOS at that steady state is about 10% higher than the old BIOS. Also the new BIOS shows a markedly more consistent clock speed in general.

That is pretty much in line with other results that showed a 10-12% increase in performance at different power levels.
 
  • Like
Reactions: Khato

Khato

Golden Member
Jul 15, 2001
1,379
487
136
Regarding rentable units... Sadly the reality is quite boring, especially compared to the fanciful fiction. I bet that the term was included without context in some presentation that a non-technical 'leaker' received. So clearly some explanation for the term needed to be created in order to be able to 'leak' it.
 
  • Like
Reactions: Exist50