Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 758 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
867
805
106
Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

Intel Alder Lake - NIntel Wildcat LakeIntel Lunar LakeMediatek D9500
Launch DateQ1-2023Q2-2026 ?Q3-2024Q3-2025
ModelIntel N300?Core Ultra 7 268VDimensity 9500 5G
Dies2221
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6TSMC N3P
CPU8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-coresC1 1+3+4
Threads8688
Max Clock3.8 GHz?5 GHz
L3 Cache6 MB?12 MB
TDP7 WFanless ?17 WFanless
Memory64-bit LPDDR5-480064-bit LPDDR5-6800 ?128-bit LPDDR5X-853364-bit LPDDR5X-10667
Size16 GB?32 GB24 GB ?
Bandwidth~ 55 GB/s136 GB/s85.6 GB/s
GPUUHD GraphicsArc 140VG1 Ultra
EU / Xe32 EU2 Xe8 Xe12
Max Clock1.25 GHz2 GHz
NPUNA18 TOPS48 TOPS100 TOPS ?






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,034
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,527
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,435
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,321
Last edited:

511

Diamond Member
Jul 12, 2024
4,753
4,320
106
I don't think 18A is cheaper than Intel 3 but Intel 7 really pushed using multi-patterning while 18A uses more EUV steps than Intel 3 and cancels out some of the cost increases due to PowerVia.

18A is basically Intel 3 with PowerVia and RibbonFET. I'm expecting almost no optical shrinks.
They answered it in the earnings call and with Which we can estimate 18A wafer price they said 3X ASP vs Intel 7 wafer which is same amount as TSMC N7 Wafer as they are reporting it.

You are wrong though about optical shrink their own data says 1.3X Chip density same shrink Tsmc got out of N5 vs N3E.

images(1).jpg

IMG_20250218_091351.jpg
 

DavidC1

Golden Member
Dec 29, 2023
1,934
3,074
96
You are wrong though about optical shrink their own data says 1.3X Chip density same shrink Tsmc got out of N5 vs N3E.
That's exactly what they are referring to. They are getting 1.3x density with PowerVia. There's no real optical shrinks coming, hence why they are thinking of complementary FETs and stacking dies and wafers.

If they got more than 1.3x they would say it. It's a marketing presentation.

@Hitman928 The whole point was that Intel 7 to 18A wouldn't save costs, and I'm saying that's not the case as Intel 7 uses DUV with multi-patterning and 18A saves some by using EUV. Thus "countering costs of PowerVia".
 

OneEng2

Senior member
Sep 19, 2022
891
1,136
106
That's exactly what they are referring to. They are getting 1.3x density with PowerVia. There's no real optical shrinks coming, hence why they are thinking of complementary FETs and stacking dies and wafers.

If they got more than 1.3x they would say it. It's a marketing presentation.

@Hitman928 The whole point was that Intel 7 to 18A wouldn't save costs, and I'm saying that's not the case as Intel 7 uses DUV with multi-patterning and 18A saves some by using EUV. Thus "countering costs of PowerVia".
I suspect you are right about BSPDN being the cause of the 1.3x number.

Until Intel moves to High NA, this will likely be the case. So with that in mind, TSMC's A16 (N2 with BSPDN) should provide a healthy density improvement as well ..... correct?

Still, Intel can make more density by doing more EUV multi-pattern passes. Of course, this lowers yield and increases process time (through put goes down) .... which makes the process EXPENSIVE.

You never get something for nothing in engineering ;).
 

DavidC1

Golden Member
Dec 29, 2023
1,934
3,074
96
I suspect you are right about BSPDN being the cause of the 1.3x number.

Until Intel moves to High NA, this will likely be the case. So with that in mind, TSMC's A16 (N2 with BSPDN) should provide a healthy density improvement as well ..... correct?
With 14A which is coming 2 years after 18A they said another 20% density gain, with 15% perf, so High NA is further proof of more work for less gain. Future process gains are absolutely crashing to the floor.

There never was real any chance of the general transistor structure reaching 1nm size, nevermind one made of several Atoms. The real limit is around the 10nm range, and it's coming only with extreme amount of effort and money.

An hypothetical future computer that is a revolutionary advancement such as with the first computer is likely going to be a great shift similar to going from x86 PCs to ARM Smartphones. It will likely not be directly portable between each other.
 
Last edited:

OneEng2

Senior member
Sep 19, 2022
891
1,136
106
With 14A which is coming 2 years after 18A they said another 20% density gain, with 15% perf, so High NA is further proof of more work for less gain. Future process gains are absolutely crashing to the floor.

There never was real any chance of the general transistor structure reaching 1nm size, nevermind one made of several Atoms. The real limit is around the 10nm range, and it's coming only with extreme amount of effort and money.

An hypothetical future computer that is a revolutionary advancement such as with the first computer is likely going to be a great shift similar to going from x86 PCs to ARM Smartphones. It will likely not be directly portable between each other.
Agree. With each new node we will continue to see less advantage, longer and lower yield processes, and more expensive development costs.

As for the discussion that CWF is delayed due to packaging, packaging may well be the next frontier!

It could easily be a bigger differentiator than the lithography process moving forward.
 

jdubs03

Golden Member
Oct 1, 2013
1,300
904
136
  • Like
Reactions: 511

511

Diamond Member
Jul 12, 2024
4,753
4,320
106
That's exactly what they are referring to. They are getting 1.3x density with PowerVia. There's no real optical shrinks coming, hence why they are thinking of complementary FETs and stacking dies and wafers.
If you mean the pitches than I can't say for sure cause the data on 18A has not been released yet.
If they got more than 1.3x they would say it. It's a marketing presentation.
Yes
@Hitman928 The whole point was that Intel 7 to 18A wouldn't save costs, and I'm saying that's not the case as Intel 7 uses DUV with multi-patterning and 18A saves some by using EUV. Thus "countering costs of PowerVia".
While 18A may be costlier than Intel 7 the ASP Difference is 3X so it more than offset the cost.
 

511

Diamond Member
Jul 12, 2024
4,753
4,320
106
I suspect you are right about BSPDN being the cause of the 1.3x number.
N3E -> N2 is 1.15X density improvement brought by GAAFET majorly and N2 -> A16 is 1.07-1.1X density increase due to BSPDN that is near the 1.3X gained by Intel with a combination of GAAFET+BSPDN.
Until Intel moves to High NA, this will likely be the case. So with that in mind, TSMC's A16 (N2 with BSPDN) should provide a healthy density improvement as well ..... correct?

Still, Intel can make more density by doing more EUV multi-pattern passes. Of course, this lowers yield and increases process time (through put goes down) .... which makes the process EXPENSIVE.

You never get something for nothing in engineering ;).
N2 is EUV Multipatterning as well though.
 
  • Like
Reactions: OneEng2

Geddagod

Golden Member
Dec 28, 2021
1,548
1,631
106
We should have a better picture from Intels’ 18A presentation later today (at least on performance, v/f, etc.). It starts at 11:25am EST. I’m sure the press will be on it quickly.
I'm not sure about if the presentations would be immediately available to the public until a while after conference ends or not. ISSCC even uploads some presentations to YT eventually, but I think they wait a bit until after the conference to let the people who went to the conference get their money's worth lol.
Could be by a case by case basis, idk, because I have not seen any of the previous presentations online publicly, or even at education-paywalled websites yet either. Zen 5, Intel GNR IO die, and Intel Intel 4 SRAM + TSMC chiplets presentations have all been concluded since like 2 days, and there's nothing there yet.
I'd like to believe that they are doing two middle of the road gains in a row rather than 0% on one year and 20% a later later.
I would like to believe that too, but idk I lowkey don't see it happening. Who knows though ¯\_(ツ)_/¯
Plus if Cougar Cove gets nothing it starts getting real bad for P cores real quick, especially if the P core successor to Panther Lake is getting only 8-10% again. That means Arctic Wolf may potentially only be ~10% behind the P core including the clock differences, why have them at all then?
I don't think the next P-core in NVL is going to be that bad. I fully expect Intel to blow up core structure capacities again if it means hitting their IPC goals, they lowkey held off in LNC, maybe because the core was more "reworked" than before. Hitting 5.8GHz on 18A-P doesn't sound all that unambitious, and I don't think there will be any negative unexpected surprises.
I'm very curious what the L2 changes are going to look like though, if the "cache sharing" thing causes a drastic increase in L2 cycles for the increased capacity, maybe that might be the unexpected surprise of the NVL P-core, but other than that...
i18A is able to choose leaving out PowerVia.
Really? Source?
It matched Zen5 in David Huang testing with LNC in LNL vs Zen5 in strix Point with the horrible uncore of MTL/ARL the core suffered quite a lot .
The core has like the same IPC in ARL-H as in LNL. And even when the tests fit esentially all in the uOP or core private caches, LNC in LNL curve is better than ARL.
It's not the uncore, it's gotta be something else.
Also, Zen 5's DT skus are a good 10-15% better than mobile Zen 5 in the exact same tests (where the uncore doesn't matter much), I'm not sure if Intel's stuff follows the same rule.
 

jdubs03

Golden Member
Oct 1, 2013
1,300
904
136
I'm not sure about if the presentations would be immediately available to the public until a while after conference ends or not. ISSCC even uploads some presentations to YT eventually, but I think they wait a bit until after the conference to let the people who went to the conference get their money's worth lol.
Could be by a case by case basis, idk, because I have not seen any of the previous presentations online publicly, or even at education-paywalled websites yet either. Zen 5, Intel GNR IO die, and Intel Intel 4 SRAM + TSMC chiplets presentations have all been concluded since like 2 days, and there's nothing there yet.
You’re probably right. There is just so much going on about Intels’ future and 18A is a massive part of it that I’m thinking that there would be a decent amount of eyes on it looking for info on yields, perf, etc.
 

Geddagod

Golden Member
Dec 28, 2021
1,548
1,631
106
You’re probably right. There is just so much going on about Intels’ future and 18A is a massive part of it that I’m thinking that there would be a decent amount of eyes on it looking for info on yields, perf, etc.
I definitely agree, which is why I mentioned the part about the case by case basis.
More microarchitectural and physical design details of Zen 5 is nice and all, but not nearly as much of a hot topic, IMO, as 18A and also N2 tbh. I'm not sure that if you publish a presentation at ISSCC, you aren't also allowed to share that presentation publicly at the same time, or any other stuff.
I'm checking on IEEE xplore, the 2024 conference was Feb 18-22, and the forums and individual articles were published on that website like 3 weeks later.
 

DavidC1

Golden Member
Dec 29, 2023
1,934
3,074
96
I don't think the next P-core in NVL is going to be that bad. I fully expect Intel to blow up core structure capacities again if it means hitting their IPC goals, they lowkey held off in LNC, maybe because the core was more "reworked" than before. Hitting 5.8GHz on 18A-P doesn't sound all that unambitious, and I don't think there will be any negative unexpected surprises.
If they have the same design philosophy, and team as Lion Cove, then I don't expect much for NVL. They need the Unified Core to replace it ASAP.

The P core team at their peak during the PM/C2D days would have struggled mightily to compete with E core team, nevermind now. I watched them for years make tiny little changes with big changes and new ideas only coming when their life was at stake, while the E cores expand substantially, introduce new ideas, and upend some while somehow making it all work. It's stupid to say E cores caught up because they were behind, when they achieved the performance with new ideas and efficiently.
The core has like the same IPC in ARL-H as in LNL. And even when the tests fit esentially all in the uOP or core private caches, LNC in LNL curve is better than ARL.
It's not the uncore, it's gotta be something else.
Simply bad execution. We don't know all the details. There are things like internal firmware optimization and others that matter.

By the way Robert Hallock is shady as he has ever been. Arrowlake FW is basically zero gains unless you were testing in a non-consumer facing scenario as a reviewer and had a pre-BIOS.
Also, Zen 5's DT skus are a good 10-15% better than mobile Zen 5 in the exact same tests (where the uncore doesn't matter much), I'm not sure if Intel's stuff follows the same rule.
How is the curve better in LNL? You mean perf/W where it's affected by a lower power idle? That will affect the curve, plus LNL would be ideally more optimized for lower power over peak clocks. So if you have 10W TDP and one uses 0.5W less idle, then that'll drop to 9.5W.

It's not big like 10-15% but it exists. It's more like 5-7% in average from memory, and it's been like that forever. Saying Redwood Cove is slower than Raptor Cove but on Meteorlake was misleading. There's some evidence of slowdown but it's not as great as people think because of the DT vs MB difference.

It's not just memory latency. It's all the power management quirks throughout the system that slows it down. NVMe power management, graphics power management, CPU power management, etc.
 
Last edited:

Geddagod

Golden Member
Dec 28, 2021
1,548
1,631
106
If they have the same design philosophy, and team as Lion Cove, then I don't expect much for NVL. They need the Unified Core to replace it ASAP.
I think their design philosophy changes/has changed. SNC and GLC were huge tock cores that increased internal structure capacity at a much higher % in most aspects than LNC did vs GLC. I kinda expect PTC or whatever the name for the P core in NVL is to be more like that than another LNC.
I just really don't see how Intel expects to be competitive if they don't do that. It may be at the expense of area, and power at the low end of the curve, but I still expect them to do it, just like they did with SNC.
Simply bad execution. We don't know all the details. There are things like internal firmware optimization and others that matter.
LNC in LNL apparently has a slightly tweaked physical layout but even that doesn't explain the magnitude of the difference I think.
By the way Robert Hallock is shady as he has ever been. Arrowlake FW is basically zero gains unless you were testing in a non-consumer facing scenario as a reviewer and had a pre-BIOS.
SMH
How is the curve better in LNL? You mean perf/W where it's affected by a lower power idle? That will affect the curve, plus LNL would be ideally more optimized for lower power over peak clocks. So if you have 10W TDP and one uses 0.5W less idle, then that'll drop to 9.5W.
This shouldn't effect core only power measurement though. Total package power, sure.
1739969011234.png
It's not big like 10-15% but it exists. It's more like 5-7% in average from memory, and it's been like that forever. Saying Redwood Cove is slower than Raptor Cove but on Meteorlake was misleading. There's some evidence of slowdown but it's not as great as people think because of the DT vs MB difference.
The problem is that when isolating it to tests that run mostly from core private caches, the different memory configs or different L3 setups in mobile platforms vs desktop platforms shouldn't matter.
It's not just memory latency. It's all the power management quirks throughout the system that slows it down. NVMe power management, graphics power management, CPU power management, etc.
None of this should matter for measurements of just core power though. Maybe core only firmware immaturity may be a reason for ARL-S's curves, but Intel had plenty of time to get it right by ARL-H.
 

Attachments

  • 1739968791276.png
    1739968791276.png
    199.5 KB · Views: 26

OneEng2

Senior member
Sep 19, 2022
891
1,136
106
I think their design philosophy changes/has changed. SNC and GLC were huge tock cores that increased internal structure capacity at a much higher % in most aspects than LNC did vs GLC. I kinda expect PTC or whatever the name for the P core in NVL is to be more like that than another LNC.
I just really don't see how Intel expects to be competitive if they don't do that. It may be at the expense of area, and power at the low end of the curve, but I still expect them to do it, just like they did with SNC.
I think that the days of buying their way out of design problems is over at Intel. Intel can't afford to raise the area. Arguably, they can't afford to make ARL and LNL profitably as it is.

I agree, 20 years ago, Intel would have thrown caution (and profit) to the wind and simply reduced their profit margin in the interest of maintaining market share. I don't think they can do that any longer.

In fact, I think that the true price of all that equipment is starting to be more and more visible to the many entities that are now looking closely at Intel (both internal and external).

You can't spend the equivalent of a US Ford aircraft carrier developing a new process node every 18 months without accounting for that cost in your product.
 

511

Diamond Member
Jul 12, 2024
4,753
4,320
106
I think that the days of buying their way out of design problems is over at Intel. Intel can't afford to raise the area. Arguably, they can't afford to make ARL and LNL profitably as it is.

I agree, 20 years ago, Intel would have thrown caution (and profit) to the wind and simply reduced their profit margin in the interest of maintaining market share. I don't think they can do that any longer.

In fact, I think that the true price of all that equipment is starting to be more and more visible to the many entities that are now looking closely at Intel (both internal and external).

You can't spend the equivalent of a US Ford aircraft carrier developing a new process node every 18 months without accounting for that cost in your product.
Yes the area efficiency is lacking in P cores and ARC GPUs for P core of this size they should have the best 1T PPA to justify this size
 

Geddagod

Golden Member
Dec 28, 2021
1,548
1,631
106
I think that the days of buying their way out of design problems is over at Intel. Intel can't afford to raise the area. Arguably, they can't afford to make ARL and LNL profitably as it is.

I agree, 20 years ago, Intel would have thrown caution (and profit) to the wind and simply reduced their profit margin in the interest of maintaining market share. I don't think they can do that any longer.

In fact, I think that the true price of all that equipment is starting to be more and more visible to the many entities that are now looking closely at Intel (both internal and external).

You can't spend the equivalent of a US Ford aircraft carrier developing a new process node every 18 months without accounting for that cost in your product.
I think even if they increase core area a decent bit, the fact that so much of the total chiplet is still SRAM means that looking at the total CCD area %, it wouldn't be that bad for margins.
Plus, apparently NVL's P cores cut down the L2 cache capacity a bit since they are doing the shared P-core L2 clusters, so that might give them some leeway in increasing core area again.
Also lol are you OneEng on semiwiki forums?
 

511

Diamond Member
Jul 12, 2024
4,753
4,320
106
I think even if they increase core area a decent bit, the fact that so much of the total chiplet is still SRAM means that looking at the total CCD area %, it wouldn't be that bad for margins.
Plus, apparently NVL's P cores cut down the L2 cache capacity a bit since they are doing the shared P-core L2 clusters, so that might give them some leeway in increasing core area again.
Where was it leaked that they cut SRAM ?
Also lol are you OneEng on semiwiki forums?
I think yes lol
 

DavidC1

Golden Member
Dec 29, 2023
1,934
3,074
96
Plus, apparently NVL's P cores cut down the L2 cache capacity a bit since they are doing the shared P-core L2 clusters, so that might give them some leeway in increasing core area again.
Sounds like they are readying for the Unified Core lead by the E core team, because that's what they've been doing. Shared L2 is for reducing size, hence why since Silvermont they were using dual setups, and then transferred to quad setups with Goldmont.
I kinda expect PTC or whatever the name for the P core in NVL is to be more like that than another LNC.
Are you saying you expect core size increase to be quite a bit greater to get more performance?
The problem is that when isolating it to tests that run mostly from core private caches, the different memory configs or different L3 setups in mobile platforms vs desktop platforms shouldn't matter.

None of this should matter for measurements of just core power though. Maybe core only firmware immaturity may be a reason for ARL-S's curves, but Intel had plenty of time to get it right by ARL-H.
There's detail that we don't know. Even back in the Core 2 days they were talking about having different internal firmware optimized for the segment.

If the differences are that big, then it can only be explained by that they screwed up something in the implementation in Arrowlake, because Lunarlake was basically a year away from it yet Arrowlake actually arrived later.

How do you explain utterly strange results where in gaming where the uarch differences are essentially amplified has a 1P+16E faster than 8P? The P has the advantage of ~10% perf/clock advantage plus quite a bit higher clocks. It should no way lose in gaming against the E cores with cluster shared L2 which are quite slow too.
 

OneEng2

Senior member
Sep 19, 2022
891
1,136
106
I think even if they increase core area a decent bit, the fact that so much of the total chiplet is still SRAM means that looking at the total CCD area %, it wouldn't be that bad for margins.
Plus, apparently NVL's P cores cut down the L2 cache capacity a bit since they are doing the shared P-core L2 clusters, so that might give them some leeway in increasing core area again.
Also lol are you OneEng on semiwiki forums?
Guilty ;)

This is the fundamental issue facing Intel (and others).

Process tech changes have become prohibitively expensive and as a result, will occur less and less frequently.

Each successive process tech change is providing less and less density and power improvements.

The market price for consumer products is stagnant ... or decreasing.

These factors create a huge Issue. CPU manufacturers can't raise performance very quickly. Performance through innovative and clever engineering is also reaching the point of diminishing returns (or beyond). exe. If you go from NO AVX, to AVX256 you get a big bump, but going from 256 to 512 you get less. Moving to 1024 would likely be much less, yet cost much much more in die size.

IMO, packaging and chiplet tech is one of the most promising paths to future upgrades as different chiplets can be combined easily for different product requirements easily, but improvements in Lithography? Not so much.
 

DavidC1

Golden Member
Dec 29, 2023
1,934
3,074
96
If you go from NO AVX, to AVX256 you get a big bump, but going from 256 to 512 you get less. Moving to 1024 would likely be much less, yet cost much much more in die size.
Primary limiter is power, rather than die size. Hence why it took 10nm Icelake before they could make a chip without much clock speed degradation. AVX2 was already large, so unlike preceding generations, 1 process gen wasn't enough to make up for the power increases if you doubled that up which is basically AVX512.

AVX1024 would have basically continued the insanity of Intel trying to stem Nvidia GPUs from encroaching on their space, while performing much less on the applications that GPUs are better at anyway.

At some point they have to ask themselves: "Do we want to make a general purpose CPU or a move to a more specialized one?"

I'd say based on how successful Nvidia is and how long it took for them to make AVX512 work, even the 512-bit extension was a waste. AVX512 should have been AVX3-256, basically half the width but the same instructions.

I have to wonder how much of Intel's past ISA extensions like MMX, SSE, and AVX is all an arbitrary thing to keep AMD struggling all these years versus actual innovation. I know from a general perspective only up to SSE4 it's being widely used. Had they made an ISA with actual good instructions from the beginning they could have just doubled number of FP units and benefitted EVERYBODY.
 

MS_AT

Senior member
Jul 15, 2024
889
1,783
96
Moving to 1024 would likely be much less, yet cost much much more in die size.
That would also require having wider cachelines, otherwise you would need 2 loads to fill a register (I mean 2 bus transfers) what would cause a ripple across the whole core design rather than just fpu unit.
 
  • Like
Reactions: 511

OneEng2

Senior member
Sep 19, 2022
891
1,136
106
Primary limiter is power, rather than die size. Hence why it took 10nm Icelake before they could make a chip without much clock speed degradation. AVX2 was already large, so unlike preceding generations, 1 process gen wasn't enough to make up for the power increases if you doubled that up which is basically AVX512.

AVX1024 would have basically continued the insanity of Intel trying to stem Nvidia GPUs from encroaching on their space, while performing much less on the applications that GPUs are better at anyway.

At some point they have to ask themselves: "Do we want to make a general purpose CPU or a move to a more specialized one?"

I'd say based on how successful Nvidia is and how long it took for them to make AVX512 work, even the 512-bit extension was a waste. AVX512 should have been AVX3-256, basically half the width but the same instructions.

I have to wonder how much of Intel's past ISA extensions like MMX, SSE, and AVX is all an arbitrary thing to keep AMD struggling all these years versus actual innovation. I know from a general perspective only up to SSE4 it's being widely used. Had they made an ISA with actual good instructions from the beginning they could have just doubled number of FP units and benefitted EVERYBODY.
I have thought this as well over the decades Intel has been doing it.

I think the future is more chiplets! Specialized processing units combined for different tasks, all well integrated and optimized for a more specific purpose.

The chiplets size must be kept small for yields/cost and the latency handled well with design and packaging techniques.

Intel is obviously very late to this realization and have some catching up to do.
 
  • Like
Reactions: igor_kavinski