Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 383 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
851
802
106
Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

Intel Alder Lake - NIntel Wildcat LakeIntel Lunar LakeMediatek D9500
Launch DateQ1-2023Q2-2026 ?Q3-2024Q3-2025
ModelIntel N300?Core Ultra 7 268VDimensity 9500 5G
Dies2221
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6TSMC N3P
CPU8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-coresC1 1+3+4
Threads8688
Max Clock3.8 GHz?5 GHz
L3 Cache6 MB?12 MB
TDP7 WFanless ?17 WFanless
Memory64-bit LPDDR5-480064-bit LPDDR5-6800 ?128-bit LPDDR5X-853364-bit LPDDR5X-10667
Size16 GB?32 GB24 GB ?
Bandwidth~ 55 GB/s136 GB/s85.6 GB/s
GPUUHD GraphicsArc 140VG1 Ultra
EU / Xe32 EU2 Xe8 Xe12
Max Clock1.25 GHz2 GHz
NPUNA18 TOPS48 TOPS100 TOPS ?






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,029
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,523
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,431
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,319
Last edited:

Hulk

Diamond Member
Oct 9, 1999
5,140
3,728
136
The issue I'm trying to wrap my head around with increased memory performance from Lunar Lake to ARL is that it seems people are comparing the memory structure of MTL to LNL. They are quite different with LNL being much faster and this will make Lion Cove IPC increase from LNL to ARL difficult.

For one thing, LNL uses 16GB or 32GB on package LPDDR5x-8533 DRAM (Low Power) memory with up to 8.5GT/sec per chip. I don't know about latency but that's pretty good for mobile main memory access. The fact that it is on package bodes well for latency I would think, not to mention the fact that Intel can tighten up the guard band for timings since they are controlling all settings for memory. ie they don't have to allow for lowest common denominator.

The other thing is the additional cache level for the Lion Cove cores, which will help mitigate memory subsystem performance loss in mobile as compared to desktop.

Intel really seems to have "thought out" LNL, perhaps they did with ARL as well and I'll be surprised, but I'm thinking maybe 3% better IPC for ARL P cores over LNL.
 
  • Like
Reactions: carancho

DavidC1

Golden Member
Dec 29, 2023
1,854
2,984
96
We have mostly the same configuration as Meteor Lake with the NOC fabric joining LP E-cores cluster, P-cores cluster, NPU, Media, GPU, IO-tile (at the bottom), and Side Cache as a buffer for the memory controller.
The difference might seem subtle, but it's big.

(continued...)
AFAIK in MTL the CPU cores are connected via a ring bus, with the SoC tile having a ring stop to collect traffic that is then passed to the SoC NOC fabric. If this is true, then calling it "mostly the same configuration" is a bit forced.
The thing about buses is that if ANY core needs access, then it needs to be on. Therefore, Lunarlake SEPARATES the E core bus from the P core, so it can be throttled independently.

Compared to on-die, even Foveros Omni is a negative with increased power and latency requirements. Direct is worse, and vanilla Foveros even more so. They need to mitigate Foveros loss FIRST, which LNL has an advantage over ARL because it has less tiles.
For one thing, LNL uses 16GB or 32GB on package LPDDR5x-8533 DRAM (Low Power) memory with up to 8.5GT/sec per chip. I don't know about latency but that's pretty good for mobile main memory access. The fact that it is on package bodes well for latency I would think, not to mention the fact that Intel can tighten up the guard band for timings since they are controlling all settings for memory. ie they don't have to allow for lowest common denominator.
They could, but since they only talked about PHY power reduction, it's safe to assume that's the focus for Lunarlake.

Bandwidth only benefits a subsection of workloads after a certain point. It is latency where it benefits general purpose compute. I doubt latency improved on LNL.

It's the classic car example. Increasing max speed benefits EVERYONE. Increasing lanes or number of cars owned does not always benefit YOU the individual.
The other thing is the additional cache level for the Lion Cove cores, which will help mitigate memory subsystem performance loss in mobile as compared to desktop.
This is a good point too. Oftentimes the effort taken to improve performance by a few % or just to mitigate losses are a bigger deal for low end parts.

Prescott didn't improve, but the Celeron Prescott improved greatly. Unlike on the high end, Celeron based on the Prescott was more than 20% better per clock compared to Celeron based on Northwood. Northwood was ok, Celeron sucked. On Prescott, it reversed it.
Intel really seems to have "thought out" LNL, perhaps they did with ARL as well and I'll be surprised, but I'm thinking maybe 3% better IPC for ARL P cores over LNL.
I can believe this, but no more. Remember the "big jump" only brought us 14%. And Lunarlake has the advantage of only needing two tiles versus Arrowlake's three.

Look how much better Emerald Rapids is by moving to a two-tile setup for compute. Memory subsystem performance improved substantially. It is not so much bandwidth but latency that improved.
 
  • Like
Reactions: Hulk

TwistedAndy

Member
May 23, 2024
159
150
76
Maybe worth a mention. Lunar Lake has a more advances NoC fabric than MTL. ARL should inherit the newer one. Not much details yet, but should improve interconnect bandwidth & latency.

Maybe there will be some adjustments, but I don't think they will be so noticeable.

AFAIK in MTL the CPU cores are connected via a ring bus, with the SoC tile having a ring stop to collect traffic that is then passed to the SoC NOC fabric. If this is true, then calling it "mostly the same configuration" is a bit forced.

Yes, correct. The P- and regular E-cores in MTL are connected using a ring bus, which has one connection to NOC (compute tile NOC agent). The same approach will probably be used in Arrow Lake.

In Lunar Lake, there's an LLC connecting P-cores and the NOC agent to connect to the NOC fabric:

LUNAR-LAKE-0160.jpg

On this slide, Intel labeled NOC as "North Fabric".

So, in terms of internal organization, Arrow Lake is expected to be closer to Meteor Lake than to Lunar Lake, but the difference is mostly in the connection to NOC.

Also, it's probable that for ARL-S and HX Intel will decide to use the old approach with one ring for all the cores, GPU, memory controller, etc.

Intel really seems to have "thought out" LNL, perhaps they did with ARL as well and I'll be surprised, but I'm thinking maybe 3% better IPC for ARL P cores over LNL.

Yes, the IPC difference between ARL and LNL is expected to be pretty small (~5%). LPDDR memory, in general, has higher latencies.
 

DavidC1

Golden Member
Dec 29, 2023
1,854
2,984
96
@TwistedAndy How can you claim they are the same, when they are clearly more different the more you look at them?

-P cores and E cores are entirely separate on LNL, not on ARL. The NOC will have to be low bandwidth to save power. The important stuff like the P/GPU/NPU is on the same ringbus anyway.
-SLC cache for power saving on LNL. Where would Intel put the 8MB system cache? It is not a significant amount by any means to benefit a 8+16 CPU for L4. Lunarlake has 10MB L2 caches for the 4 cores, already eclipsing the size of the SLC. For ARL it has to be significant because assuming 4MB per E core clusters and 3MB L3 per P cores it will have 40MB L3,so we're talking 64MB+, preferably 128MB minimum for the L4 to be of any benefit. In fact, we should not expect SLC for Arrowlake.
-Two tiles versus four. CPU, GPU, SoC, and IO for ARL versus Compute + Platform for LNL. Much saner approach needing much less connections and lower latency and faster communication for lower power.
-Same die memory controller for LNL. So in essence, Lunarlake is a natural progression of the on package CPU+PCH setup that -U chips used forever but with a more advanced Foveros interconnect versus MTL doing YOU GET A TILE! YOU GET A TILE! YOU ALSO GET A TILE!!*

*References to Oprah totally intentional.
 

DavidC1

Golden Member
Dec 29, 2023
1,854
2,984
96
My guess is if Lunarlake is more popular than Intel expects, then they'll make a derivative of Pantherlake to be a direct successor.*

Because Pantherlake is using the P+E+LPE setup again. It is possible that we need THIRD wildly different SoC. One for server, one for desktops and high performance laptops and one for ultra low power laptops.

*Lunarlake might be a one off thing just to fend off ARM. If they stop the momentum, it'll take time for them to lick their wounds and come back again. Then Intel can go back to the less optimal but cheaper MTL/PTL derivatives.

If that's the case I hope Gelsinger is the CEO that breaks this mentality and realize that they should do it first. Andy Grove's "Only the paranoid" survive.

If Intel had low power Atom cores without being forced by ARM, then they wouldn't have had to worry about competition. This company has been always reactionary, never a true leader.
 

TwistedAndy

Member
May 23, 2024
159
150
76
How can you claim they are the same, when they are clearly more different the more you look at them?

I claimed the structure is similar, not the same. In the previous messages, I described the differences ;)

-SLC cache for power saving on LNL. Where would Intel put the 8MB system cache? It is not a significant amount by any means to benefit a 8+16 CPU for L4. Lunarlake has 10MB L2 caches for the 4 cores, already eclipsing the size of the SLC. For ARL it has to be significant because assuming 4MB per E core clusters and 3MB L3 per P cores it will have 40MB L3,so we're talking 64MB+, preferably 128MB minimum for the L4 to be of any benefit.

SLC cache is required to alleviate the missing L3 cache for LP E-cores in the first place.

I'm not entirely sure which approach Intel will decide to use for Arrow Lake: the old one without NOC and the memory controller on the ring or the newer one with NOC and a separate memory controller. Technically, the new approach with NOC is more flexible, but the additional NOC latency costs 1-3% IPC (Meteor Lake vs Raptor Lake).
 

AcrosTinus

Senior member
Jun 23, 2024
221
226
76
It think it will, everyone seems to agree that the tile generation of Arrow Lake will still be similar to Meteor Lake without evidence.
If Intel wants this to be a low latency gaming and productivity SKU, there might be some changes to the active substrate, fevoros and packaging, it is not the same.

Hence, I believe the MEM controller is on the ring or they found a way to compensate the latency.
 

DavidC1

Golden Member
Dec 29, 2023
1,854
2,984
96
SLC cache is required to alleviate the missing L3 cache for LP E-cores in the first place.

I'm not entirely sure which approach Intel will decide to use for Arrow Lake: the old one without NOC and the memory controller on the ring or the newer one with NOC and a separate memory controller. Technically, the new approach with NOC is more flexible, but the additional NOC latency costs 1-3% IPC (Meteor Lake vs Raptor Lake).
SLC is the same approach for Apple parts, to lower power. Their problem was their IO(the chipset) had higher power. By having trivial uncore/IO data(compared to compute requirements) in SRAM, it saves an enormous amount of power. Power savings first, then performance.

I am not sure why you think Arrowlake will use a different approach when it's basically Meteorlake with different tiles. GPU being based on ACM is a proof of that. So you are saying that ARL will have a IMC on compute tile AND disabled one on the SoC Tile?
Hence, I believe the MEM controller is on the ring or they found a way to compensate the latency.
Cause Intel always made decisions that made sense right? And they never stumbled?:rolleyes:
 

AcrosTinus

Senior member
Jun 23, 2024
221
226
76
The difference might seem subtle, but it's big.

(continued...)

The thing about buses is that if ANY core needs access, then it needs to be on. Therefore, Lunarlake SEPARATES the E core bus from the P core, so it can be throttled independently.

Compared to on-die, even Foveros Omni is a negative with increased power and latency requirements. Direct is worse, and vanilla Foveros even more so. They need to mitigate Foveros loss FIRST, which LNL has an advantage over ARL because it has less tiles.

They could, but since they only talked about PHY power reduction, it's safe to assume that's the focus for Lunarlake.

Bandwidth only benefits a subsection of workloads after a certain point. It is latency where it benefits general purpose compute. I doubt latency improved on LNL.

It's the classic car example. Increasing max speed benefits EVERYONE. Increasing lanes or number of cars owned does not always benefit YOU the individual.

This is a good point too. Oftentimes the effort taken to improve performance by a few % or just to mitigate losses are a bigger deal for low end parts.

Prescott didn't improve, but the Celeron Prescott improved greatly. Unlike on the high end, Celeron based on the Prescott was more than 20% better per clock compared to Celeron based on Northwood. Northwood was ok, Celeron sucked. On Prescott, it reversed it.

I can believe this, but no more. Remember the "big jump" only brought us 14%. And Lunarlake has the advantage of only needing two tiles versus Arrowlake's three.

Look how much better Emerald Rapids is by moving to a two-tile setup for compute. Memory subsystem performance improved substantially. It is not so much bandwidth but latency that improved.
There is no evidence that Arrow Lake is the same tile generation as MTL. I cannot be, different Node, different Fovo gen, different everything. I think they learned a lesson from MTL and implemented some changes to compensate the shortcomings.
 

DavidC1

Golden Member
Dec 29, 2023
1,854
2,984
96
There is no evidence that Arrow Lake is the same tile generation as MTL. I cannot be, different Node, different Fovo gen, different everything. I think they learned a lesson from MTL and implemented some changes to compensate the shortcomings.
It might be faster but it's MTL with minimal changes except in the compute Tile. I agree it can be better but MTL's deficiency needs to be overcome FIRST before they can imagine faster than the sane design called Lunarlake.

Let me put it this way. Both Intel and AMD camp are disappointed, and people are in still denial. One HAS to be better than the other right? No?

Sure, the ARM camp is doing better. Both x86 vendors aren't doing that well.
 

AcrosTinus

Senior member
Jun 23, 2024
221
226
76
SLC is the same approach for Apple parts, to lower power. Their problem was their IO(the chipset) had higher power. By having trivial uncore/IO data(compared to compute requirements) in SRAM, it saves an enormous amount of power. Power savings first, then performance.

I am not sure why you think Arrowlake will use a different approach when it's basically Meteorlake with different tiles. GPU being based on ACM is a proof of that. So you are saying that ARL will have a IMC on compute tile AND disabled one on the SoC Tile?

Cause Intel always made decisions that made sense right? And they never stumbled?:rolleyes:
I mean they have deep insight into their product and iterate over it. Do you really believe that the tiles stay the same ? Look at Gaudi, Xeon and the current Lunar Lake. They went full in on tiles, learned a lesson and are now reducing them and using them more efficiently.
 

DavidC1

Golden Member
Dec 29, 2023
1,854
2,984
96
I mean they have deep insight into their product and iterate over it. Do you really believe that the tiles stay the same ? Look at Gaudi, Xeon and the current Lunar Lake. They went full in on tiles, learned a lesson and are now reducing them and using them more efficiently.
Which isn't Arrowlake. Yea they can do better, but within the insanity of having four tiles arbitrarily separated basically.

Let me tell you what the modern "Conroe" and "Athlon 64" is. It's Apple's M1.
 

AcrosTinus

Senior member
Jun 23, 2024
221
226
76
It might be faster but it's MTL with minimal changes except in the compute Tile. I agree it can be better but MTL's deficiency needs to be overcome FIRST before they can imagine faster than the sane design called Lunarlake.
Where do you get this from, I found nothing that even relates MTL to ARL.
If I look at the current development, since the introduction of the tiles, they never really stayed the same.
 

DavidC1

Golden Member
Dec 29, 2023
1,854
2,984
96
Where do you get this from, I found nothing that even relates MTL to ARL.
If I look at the current development, since the introduction of the tiles, they never really stayed the same be it on server or gaudi.
If you don't know this then the discussion is pretty much over. I suggest you look at leaks and former Intel presentations. No more "Oh I think" which means for a newcomer is essentially "in my head".
 

AcrosTinus

Senior member
Jun 23, 2024
221
226
76
If you don't know this then the discussion is pretty much over. I suggest you look at leaks and former Intel presentations. No more "Oh I think" which means for a newcomer is essentially "in my head".
Very friendly reply, I'll read into it.
The official reveal of the product is quite near and we will see :)
 

TwistedAndy

Member
May 23, 2024
159
150
76
I am not sure why you think Arrowlake will use a different approach when it's basically Meteorlake with different tiles. GPU being based on ACM is a proof of that. So you are saying that ARL will have a IMC on compute tile AND disabled one on the SoC Tile?

I have found a slide from Intel:

22_intel_arrow_lake_s_ma_podobno_powstac_w_litografii_tsmc_n3_podczas_gdy_arrow_lake_p_skorzys...png

It looks like Arrow Lake will have a similar structure to Meteor Lake, with an NOC, a separated memory controller, and other stuff (aka the new approach).

SLC is the same approach for Apple parts, to lower power. Their problem was their IO(the chipset) had higher power. By having trivial data(compared to compute requirements) in SRAM, it saves an enormous amount of power. Power savings first, then performance.

Yes. Meteor Lake suffered a problem when two LP E-cores weren't powerful enough to perform most of the light tasks. As a result, the CPU tile was used much more frequently. Having that in place decreases the power consumption.
 

DavidC1

Golden Member
Dec 29, 2023
1,854
2,984
96
I have found a slide from Intel:

View attachment 102155

It looks like Arrow Lake will have a similar structure to Meteor Lake, with an NOC, a separated memory controller, and other stuff (aka the new approach).
No indication that it uses a separate memory controller. This is just a hopeful guess right? The whole point of the hodgepodge four tile config is so they can change one at a time as needed at a different time. If they are changing where the IMC is then they have to change the SoC Tile as well, and how all the blocks within work and communicate with each other.

The presentation shows that they aren't even changing the Foveros size for Arrowlake and remains at 36um. Meteorlake is the System Change and Arrowlake is the Core Change.

GPU using ACM++ is an example. Apparently based on the results it's not working as well as they expected. And what are they going to do by having four different tiles? Do a 3 month refresh? They aren't going to do that. It's still 1 year.
Yes. Meteor Lake suffered a problem when two LP E-cores weren't powerful enough to perform most of the light tasks. As a result, the CPU tile was used much more frequently. Having that in place decreases the power consumption.
Not just that. The E cores are on a very high performance ring bus. Buses are the LAST agent to get idle, because it's what's common among the cores. If one measly E core wants communication, then the ring has to be on. This means buses and IO are the floor to how low you can go low in power.

Lunarlake puts them on a separate much slower bus for the same reason, optimized for power efficiency. Thus Lunarlake simplifies the setup, reduces complexity and thus execution issues, improves performance, saves space, and lowers power.
 
Last edited:

TwistedAndy

Member
May 23, 2024
159
150
76
No indication that it uses a separate memory controller. This is just a hopeful guess right? The whole point of the hodgepodge four tile config is so they can change one at a time as needed at a different time. If they are changing where the IMC is then they have to change the SoC Tile as well, and how all the blocks within work and communicate with each other.

With the tiled approach, Intel has to separate the memory controller and put it into the SoC tile, as we have in Meteor Lake. There are no other options. Intel will probably also put some hidden LP E-cores in the Arrow Lake SoC tile. At least it makes sense for ARL-H and even HX.

I was considering an option when the whole CPU is monolithic. In this case, it makes sense to use the old approach with the ring and a memory controller on it.

The E cores are on a very high performance ring bus. Buses are the LAST agent to get idle, because it's what's common among the cores. If one measly E core wants communication, then the ring has to be on. This means buses and IO are the floor to how low you can go low in power.

Yep, and that's why we have those cores in the SoC tile :)
 

DavidC1

Golden Member
Dec 29, 2023
1,854
2,984
96
With the tiled approach, Intel has to separate the memory controller and put it into the SoC tile, as we have in Meteor Lake. There are no other options. Intel will probably also put some hidden LP E-cores in the Arrow Lake SoC tile. At least it makes sense for ARL-H and even HX.
They aren't doing that in Arrowlake which is the point. It's still four tiles. Pantherlake, Novalake, Anandlake, or Twisted Andylake can do whatever, but not in Arrowlake without changing both tiles significantly.
Yep, and that's why we have those cores in the SoC tile :)
No, on Lunarlake it's on the compute tile. On Meteorlake it basically doesn't work. It is pretty looking at the die, but that's about it. It literally sits there pretty. Intel themselves said it saves a mere 150mW! My efficient Kabylake Yoga uses 4W on video playback. I couldn't care about 0.15W unless I can get that from driver updates. It's a margin of error difference.
 
  • Like
Reactions: Hitman928

TwistedAndy

Member
May 23, 2024
159
150
76
No, on Lunarlake it's on the compute tile. On Meteorlake it basically doesn't work. It is pretty looking at the die, but that's about it. It literally sits there pretty. Intel themselves said it saves a mere 150mW! My efficient Kabylake Yoga uses 4W on video playback. I couldn't care about 0.15W unless I can get that from driver updates. It's a margin of error difference.

Lunar Lake is similar to Meteor Lake (with the differences I described earlier), but with three tiles fused in one to save power.

Lion Cove cores in Arrow Lake can offer a higher IPC because of the bigger L2 cache and lower memory latency. But the difference is pretty small. I expect it to be nearly 5% with the fast memory.
 

DavidC1

Golden Member
Dec 29, 2023
1,854
2,984
96
Lunar Lake is similar to Meteor Lake (with the differences I described earlier), but with three tiles fused in one to save power.

Lion Cove cores in Arrow Lake can offer a higher IPC because of the bigger L2 cache and lower memory latency. But the difference is pretty small. I expect it to be nearly 5% with the fast memory.
Lunarlake is not similar to Meteorlake. Details are what counts. Why do you keep saying it's similar? The differences are what enables Lunarlake to be in the high battery life while Meteorlake is mediocre and is basically zero advancement from the 2016 Kabylake Yoga I am using right now.
 

TwistedAndy

Member
May 23, 2024
159
150
76
Why do you keep saying it's similar?
Lunar Lake has mostly the same structure, including NOC fiber, LP E-core island, separated GPU, IO controller, separated memory controller, etc. The most notable difference is the way the P-core cluster and the Side Cache are organized.

At the same time, the physical implementation is different. Three tiles in Meteor Lake become fused into one in Lunar Lake to save power, but this does not significantly affect performance.
 
  • Like
Reactions: AcrosTinus

Hulk

Diamond Member
Oct 9, 1999
5,140
3,728
136
Regarding the 14% Lion Cove IPC increase despite the relatively massive architectural changes I'm wondering if either of the following could be a logical explanation?

1. It is possible we are starting to approach an IPC limit of some sort due to the maximum amount of parallelism that can be extracted from x86 code? What I'm asking here is if there is a point where no matter how much wider you make the design, and how much smarter the OoO engine becomes, there will still be unused structures due to the inherent serial nature of the code? If we look at IPC of x86 from its inception (IPC vs. time), would this plot be linear or exponential? I realize this depends very much on the software being used to gather the data.

2. If #2 is not the case (the sequential nature of the code isn't the main bottleneck) then could it be that the current P core architecture has maxed out from an IPC point-of-view and a completely new and different direction is required, something more along direction of Skymont?
 
  • Like
Reactions: carancho

Henry swagger

Senior member
Feb 9, 2022
512
313
106
Regarding the 14% Lion Cove IPC increase despite the relatively massive architectural changes I'm wondering if either of the following could be a logical explanation?

1. It is possible we are starting to approach an IPC limit of some sort due to the maximum amount of parallelism that can be extracted from x86 code? What I'm asking here is if there is a point where no matter how much wider you make the design, and how much smarter the OoO engine becomes, there will still be unused structures due to the inherent serial nature of the code? If we look at IPC of x86 from its inception (IPC vs. time), would this plot be linear or exponential? I realize this depends very much on the software being used to gather the data.

2. If #2 is not the case (the sequential nature of the code isn't the main bottleneck) then could it be that the current P core architecture has maxed out from an IPC point-of-view and a completely new and different direction is required, something more along direction of Skymont?
Desktop lion cove will have higher ipc than lunar lake lion cove.. wait for intel to reveal it
 

ondma

Diamond Member
Mar 18, 2018
3,310
1,695
136
Regarding the 14% Lion Cove IPC increase despite the relatively massive architectural changes I'm wondering if either of the following could be a logical explanation?

1. It is possible we are starting to approach an IPC limit of some sort due to the maximum amount of parallelism that can be extracted from x86 code? What I'm asking here is if there is a point where no matter how much wider you make the design, and how much smarter the OoO engine becomes, there will still be unused structures due to the inherent serial nature of the code? If we look at IPC of x86 from its inception (IPC vs. time), would this plot be linear or exponential? I realize this depends very much on the software being used to gather the data.

2. If #2 is not the case (the sequential nature of the code isn't the main bottleneck) then could it be that the current P core architecture has maxed out from an IPC point-of-view and a completely new and different direction is required, something more along direction of Skymont?
What are the prospects for Nova Lake? Is it still a thing? I thought it (Nova Lake link) was supposed to be the biggest architectural change in Intel's history, and expected to bring a huge IPC gain (up to 50%)? Intel definitely needs to step up their game on the P core front.