Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 435 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
851
802
106
Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

Intel Alder Lake - NIntel Wildcat LakeIntel Lunar LakeMediatek D9500
Launch DateQ1-2023Q2-2026 ?Q3-2024Q3-2025
ModelIntel N300?Core Ultra 7 268VDimensity 9500 5G
Dies2221
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6TSMC N3P
CPU8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-coresC1 1+3+4
Threads8688
Max Clock3.8 GHz?5 GHz
L3 Cache6 MB?12 MB
TDP7 WFanless ?17 WFanless
Memory64-bit LPDDR5-480064-bit LPDDR5-6800 ?128-bit LPDDR5X-853364-bit LPDDR5X-10667
Size16 GB?32 GB24 GB ?
Bandwidth~ 55 GB/s136 GB/s85.6 GB/s
GPUUHD GraphicsArc 140VG1 Ultra
EU / Xe32 EU2 Xe8 Xe12
Max Clock1.25 GHz2 GHz
NPUNA18 TOPS48 TOPS100 TOPS ?






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,029
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,523
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,431
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,319
Last edited:

dullard

Elite Member
May 21, 2001
26,021
4,634
126
Cyberpunk TLOU PART 1, Stanfield, star Citizen, Spiderman Resmastered, Dragons Dogma 2.

12 p cores no need for cross latency penalty and no big.little scheduling quirks.

Intel had it with Comet Lake.
Really, you think Comet Lake did well with more cores? Lets look at just one of your own examples: Starfield.
https://www.pcgameshardware.de/Star...benchmark-requirements-anforderungen-1428119/
  • 10 P core Comet Lake 10900K: 49.2 FPS.
  • The much dissed 8 P core Rocket Lake 11900K: 62.6 FPS. 27% faster on the same node with 2 fewer P cores.
  • Or a more modern CPU: 6 P core Raptor Lake 13600K: 93.3 FPS. 89% faster with 4 fewer P cores.
Your idea of more P cores doesn't even work with your own chosen example.
 
Last edited:

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
Really, you think Comet Lake did well with more cores? Lets look at just one of your own examples: Starfield.
https://www.pcgameshardware.de/Star...benchmark-requirements-anforderungen-1428119/
  • 10 P core Comet Lake 10900K: 49.2 FPS.
  • The much dissed 8 P core Rocket Lake 11900K: 62.6 FPS. 27% faster on the same node with 2 fewer P cores.
  • Or a more modern CPU: 6 P core Raptor Lake 13600K: 93.3 FPS. 89% faster with 4 fewer P cores.
Your idea of more P cores doesn't even work with your own chosen example.
But as times goes by and games being more friendly to more cores things can change unexpectedly, and that s at 720p which increase artificialy the differences :


 

Attachments

  • Screenshot 2024-08-19 at 16-33-32 Intel Core i9-14900KS im Test Benchmarks in Spielen vs. AMD ...png
    Screenshot 2024-08-19 at 16-33-32 Intel Core i9-14900KS im Test Benchmarks in Spielen vs. AMD ...png
    69.6 KB · Views: 24
  • Like
Reactions: Wolverine2349

Wolverine2349

Senior member
Oct 9, 2022
525
179
86
Really, you think Comet Lake did well with more cores? Lets look at just one of your own examples: Starfield.
https://www.pcgameshardware.de/Star...benchmark-requirements-anforderungen-1428119/
  • 10 P core Comet Lake 10900K: 49.2 FPS.
  • The much dissed 8 P core Rocket Lake 11900K: 62.6 FPS. 27% faster on the same node with fewer cores.
  • Or a more modern CPU: 6 P core Raptor Lake 13600K: 93.3 FPS. 89% faster with 4 fewer P cores.
Your idea of more P cores doesn't even work with your own chosen example.

Well of course it does worse. The IPC of Comet Lake is so far behind any modern CPU. My example of Comet Lake was just to point out more than 8 cores can be put on a single ring bus as Comet Lake had 10 cores and it is the last CPU to have it as unfortunately no modern arch has had it since and IPC has jumped massively since that Skylake based derivative.

Newer fewer faster cores if at least 6 will be faster than more slower cores especially if they are much slower.

But up the IPC and modernization and things change a lot. Raptor Cove and Golden Cove IPC is like 50% better than Comet Lake.

For games that can benefit a little form more cores, a few more faster ones will be of some help and possibly big help in rare situations of games and will likely be of bigger hep down the line. But Comet Lake no way as it has such slow IPC compared to even vanilla Zen 3 let alone vanila Zen 4 and Golden Cove and Raptor Cove.. Comet Lake is just a point that Intel can easily get more than 8 on a single ring bus. But its IPC and PCIe Gen is so far behind it does not matter and not to be used itself over modern 8 core counterparts. Though it even beats 4 core and sometimes 6 core modern parts in some thread heavy games.

 
Last edited:

Wolverine2349

Senior member
Oct 9, 2022
525
179
86
But as times goes by and games being more friendly to more cores things can change unexpectedly, and that s at 720p which increase artificialy the differences :



Yes exactly good point. No game gets massive benefit form more than 8 cores today and 8 cores is by no means unplayable. Not yet. But some games are starting to see marginal or even modest benefit in some game situations from more than 8 cores. Though need to be modern not old CPUs of more than 8 cores.
 

dullard

Elite Member
May 21, 2001
26,021
4,634
126
But as times goes by and games being more friendly to more cores things can change unexpectedly, and that s at 720p which increase artificialy the differences :


https://www.computerbase.de/2024-03...eistungsaufnahme_in_spielen_rekordverdaechtig
Even in that example with higher resolution, the 6 P core chip (14500) wins over the 10 P core chip. Core counts over the 8 just don't matter for very many games. The more cores for gaming debate has been going on for what, ~10 years? And we still don't see these future games that really need more than 8 cores.

I have multiple relatives in the gaming business (yes you have heard of the games and most gamers probably played them, no I will not mention names). They always design for the bare minimum core count of all current consoles. Right now, that is 8 cores.

Yes, you could find a rare PC-focused game that could use more cores. But until both the XBox and Playstation have more than 8 cores, you won't see very many games using more cores very well. And that is even if they can figure out how to use more cores (many types of games are already reaching the end of easy ability to utilize more cores). Simulation games might be an exception going into the future--where you can add more things to simulate in bigger worlds.
 
  • Like
Reactions: Thunder 57

Wolverine2349

Senior member
Oct 9, 2022
525
179
86
Even in that example with higher resolution, the 6 P core chip (14500) wins over the 10 P core chip. Core counts over the 8 just don't matter for very many games. The more cores for gaming debate has been going on for what, ~10 years? And we still don't see these future games that really need more than 8 cores.

I have multiple relatives in the gaming business (yes you have heard of the games and most gamers probably played them, no I will not mention names). They always design for the bare minimum core count of all current consoles. Right now, that is 8 cores.

Yes, you could find a rare PC-focused game that could use more cores. But until both the XBox and Playstation have more than 8 cores, you won't see very many games using more cores very well. And that is even if they can figure out how to use more cores (many types of games are already reaching the end of easy ability to utilize more cores). Simulation games might be an exception going into the future--where you can add more things to simulate in bigger worlds.

You have some good points. However 10 years ago is kind of not a good reference point. There were 0 games 10 years ago that benefitted at all from more than 4 cores. I mean like 0. As time went on a few in mid 2010s started to benefit a little but still not a lot. Later 2010s, 6 cores started to benefit a lot. 4 cores is a struggle and stutter now.

No 8 cores will not be obsolete anytime soon. But that does not mean more will not benefit some games and more and more so. Consoles are not always the only thing. Consoles also have way less overhead running and run games at reduced settings as they are limited.

Games like Starfield, TLOU Part 1, Spiderman Remastered, Cyberpunk benefit in some game situations from more than 8 cores. Just turning off e-cores hurts in those games. Yes e-cores help in games as being more cores and cross latency not near as bad as AMD. Though for overall set and forget and no APO nor Process Lasso and no scheduling gimmicks and P cores having better latency would be so nice to have a 12 P core on one ring bus CPU. Hopefully Intel delivers it, but if they do not oh well it is what it is.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
Even in that example with higher resolution, the 6 P core chip (14500) wins over the 10 P core chip. Core counts over the 8 just don't matter for very many games. The more cores for gaming debate has been going on for what, ~10 years? And we still don't see these future games that really need more than 8 cores.

I have multiple relatives in the gaming business (yes you have heard of the games and most gamers probably played them, no I will not mention names). They always design for the bare minimum core count of all current consoles. Right now, that is 8 cores.

Yes, you could find a rare PC-focused game that could use more cores. But until both the XBox and Playstation have more than 8 cores, you won't see very many games using more cores very well. And that is even if they can figure out how to use more cores (many types of games are already reaching the end of easy ability to utilize more cores). Simulation games might be an exception going into the future--where you can add more things to simulate in bigger worlds.

Looking at the charts it s obvious that there s more than the core count at play here, we can see that the 6C 5600 almost match the 8C 5800X but both have the same 32MB cache.

On the other hand the 6C 10600K is not as close from the 10900K, in that case that s surely the 12MB cache vs 20MB that make the difference and that get the latter close to the 16MB equipped 11900K since both are clocked identicaly.

Apparently more cache can compensate for some IPC difference when it comes to games, much less in applications.
 

Hulk

Diamond Member
Oct 9, 1999
5,140
3,728
136
Well as long as its stable and does not have degradation (Decent chance with new stepping and die), it will be a gaming monster and desirable for thread heavy games and lightly threaded games for a single set and forget it solution of all game types for those who hate APO and process Lasso and want all cores on one die.

Especially given Zen 5 is an underwhelming 5% at best gaming uplift flop over vanilla Zen 4 and current Raptor Lake already trades blows or only slightly behind 7800X3D in gaming. And 7800X3D and vanilla Zen 4 and 5 only 8 cores on one CCX within a CCD. Yes current RPL has bad stability and degradation, but if that is fixed with 12 + 0 Bartlett Lake die, its game on.
It will be interesting to see if a 12 Raptor Cove core processor will be more performant than ARL in games keeping in mind that ARL will have 8 cores that are more performant than Raptor Cove, and 16 additional cores, which may have the IPC of Raptor Cove with a 20% clock deficit.

I am in the camp that sees no use for a 12 Raptor Cove core part (vs. 8+16 ARL) but am willing to be proven wrong.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,696
3,260
136
Even If a game is using 8 threads or more, It's not like every thread is loading the assigned core the same way. So the fps should depend on the most heavily used core.

@DavidC1 Yeah, exactly. So in theory even E-cores should be good enough for the weaker threads, although in reality It's the opposite.
 
Last edited:

DavidC1

Golden Member
Dec 29, 2023
1,854
2,984
96
It will be interesting to see if a 12 Raptor Cove core processor will be more performant than ARL in games keeping in mind that ARL will have 8 cores that are more performant than Raptor Cove, and 16 additional cores, which may have the IPC of Raptor Cove with a 20% clock deficit.

I am in the camp that sees no use for a 12 Raptor Cove core part (vs. 8+16 ARL) but am willing to be proven wrong.
In Alder/Raptor generation, the P cores were in average 40-45% faster than the E cores plus it had the clock speed advantage meaning ST differences were roughly about 2x. Games are also one of the usage scenarios where the SIMD units are used, where Gracemont was really weak at.

With the top bin Arrowlake, it's 5.7GHz vs 4.6GHz in 1T, but all core is 5.4 vs 4.6GHz, with the gap in per thread and clock performance only about 11-12%.

Going from a 100% gap to 30% is going to make a big change. The worst case scenarios will end up being better. I actually think this is why they waited until Gracemont to have a full hybrid CPU, because Tremont was too far away from being viable.
Even If a game is using 8 threads or more, It's not like every thread is loading the assigned core the same way.
This is the real reason why uarch is still king. You are talking 1, 2, maybe up to 4 for big jobs and the rest are small.
 
Last edited:
  • Like
Reactions: AcrosTinus and Hulk

Wolverine2349

Senior member
Oct 9, 2022
525
179
86
It will be interesting to see if a 12 Raptor Cove core processor will be more performant than ARL in games keeping in mind that ARL will have 8 cores that are more performant than Raptor Cove, and 16 additional cores, which may have the IPC of Raptor Cove with a 20% clock deficit.

I am in the camp that sees no use for a 12 Raptor Cove core part (vs. 8+16 ARL) but am willing to be proven wrong.

It may not, but it also will be best set and forget it for all types of games without dealing with APO or Process Lasso for Big.Little heterogenous arch. for some of the games that do not like it. For games with no issues with Big.Little Arrow Lake will perform better but the gap will not be near as dramatically so huge like it was from Zen 3 to Zen 3 X3D or vanilla Zen 4 to Zen 4 X3D or let alone Zen 2 to Zen 3. Probably more like the gap form Alder to Raptor Lake.

I would rather have something that just works perfecty for all games and is marginally worse gaming than dealing with BIg.Little and APO in the times it may be needed.

Also have to keep in mind the e-cores even if IPC on par with Raptor Cove have worse latency and are in clusters of 4 increasing communication to them with poor latency rather than all on a ring bus direct.
 
Last edited:

Wolverine2349

Senior member
Oct 9, 2022
525
179
86
And I thought it was my turn! ;)


True. Some games may work better with 12P than 8P+16E. But buying a 12 core processor over a 24 core processor just for a very few subset of games may not be worth it for most.

For me it is especially since I wanted it for so long and IPC is not improving that much if Zen 5 any indication. And I want to stay on WIN10 as long as possible. I hate WIN11 and all the forced stuff they shove down your throat like the AI co pilot spy garbage.
 
  • Like
Reactions: SiliconFly

Hulk

Diamond Member
Oct 9, 1999
5,140
3,728
136
It may not, but it also will be best set and forget it for all types of games without dealing with APO or Process Lasso for Big.Little heterogenous arch. for some of the games that do not like it. For games with no issues with Big.Little Arrow Lake will perform better but the gap will not be near as dramatically so huge like it was from Zen 3 to Zen 3 X3D or vanilla Zen 4 to Zen 4 X3D or let alone Zen 2 to Zen 3. Probably more like the gap form Alder to Raptor Lake.

I would rather have something that just works perfecty for all games and is marginally worse gaming than dealing with BIg.Little and APO in the times it may be needed.

Also have to keep in mind the e-cores even if IPC on par with Raptor Cove have worse latency and are in clusters of 4 increasing communication to them with poor latency rather than all on a ring bus direct.
Admittedly I don't really game but if you have say a 14900K, what is the difference in fps or the gaming experience if you run stock vs using process lasso to shut down the E's for some games?
 

Wolverine2349

Senior member
Oct 9, 2022
525
179
86
Admittedly I don't really game but if you have say a 14900K, what is the difference in fps or the gaming experience if you run stock vs using process lasso to shut down the E's for some games?

I threw in towel AND THUS SOLD or returned on any 13th or 14th Gen part because of stability and degradation issues. SO never really tested it,
I have a 7800X3D right now.

But if they have a 12 P core on a single ring bus Raptor Lake and it has stability issues fixed (I would think given new die or otherwise Intel will not make it), I am for sure a buyer. If it has the 8 + 16 RPL die stability and degradation issues, a hard pass despite me wanting such a product.

I wish as a compromise a stronger 8 core IPC uplift. Zen 5 was supposed to have that but it is a flop only 5% or even less gaming uplift over vanilla Zen 4, so Zen 5 X3D will hardly be better than Zen 4 X3D. And still max of 8 cores per CCX within a CCD so need much stronger 8 cores to compensate for not having more than 8 cores for better future proofing for more and more thread demanding games.
Hopefully given that if the 2 P core Bartlett never comes games do not start scaling too much to more cores than they already have or will have to deal with dual CCD scheduling or Big.Little scheduling issues.
 

ondma

Diamond Member
Mar 18, 2018
3,310
1,695
136
I threw in towel AND THUS SOLD or returned on any 13th or 14th Gen part because of stability and degradation issues. SO never really tested it,
I have a 7800X3D right now.

But if they have a 12 P core on a single ring bus Raptor Lake and it has stability issues fixed (I would think given new die or otherwise Intel will not make it), I am for sure a buyer. If it has the 8 + 16 RPL die stability and degradation issues, a hard pass despite me wanting such a product.

I wish as a compromise a stronger 8 core IPC uplift. Zen 5 was supposed to have that but it is a flop only 5% or even less gaming uplift over vanilla Zen 4, so Zen 5 X3D will hardly be better than Zen 4 X3D. And still max of 8 cores per CCX within a CCD so need much stronger 8 cores to compensate for not having more than 8 cores for better future proofing for more and more thread demanding games.
Hopefully given that if the 2 P core Bartlett never comes games do not start scaling too much to more cores than they already have or will have to deal with dual CCD scheduling or Big.Little scheduling issues.
This is an interesting point. You seem to be extremely concerned about the cross-ccx penalty for AMD, but I dont really see any data about what the penalty actually is. Do you have actual personal experience with this, or any data to support it? I mean, isn't 7950x at least as fast as vanilla 7800x? So where is the penalty?

Edit: I seem to remember AMD saying the 9xxx x3d would have some improvements over previous versions. Maybe it will show more uplift over Zen 4 than the vanilla versions. Of course, Zen 5 didnt come close to the gaming uplift AMD claimed, so???
 
Last edited:
  • Like
Reactions: Elfear

Wolverine2349

Senior member
Oct 9, 2022
525
179
86
This is an interesting point. You seem to be extremely concerned about the cross-ccx penalty for AMD, but I dont really see any data about what the penalty actually is. Do you have actual personal experience with this, or any data to support it? I mean, isn't 7950x at least as fast as vanilla 7800x? So where is the penalty?

Look at benchmarks. The 7700X was much better than 7950X in gaming in many games. 1% lows tank a lot with cross CCX/CCD penalty. Its like a 400% penalty. I mean its really just 2 7700X CPUs glued together. Same with 5950X is really just 2 5800X CPUs glued together. Kind of like a dual socket system. The scheduling issues are a mess too. Just look at the 9950X vs 9700X. Much worse in gaming.

There is a reason why the 5900X and 7900X are considered worse for gaming than the 7700X and 5800X.

Plus with 12 P cores on a single die you can shut off HT and have best possible latency and clocks. Try doing that when it has to cross a CCX/CCD and yikes it will have to cross even more and tank performance. On a single ring with enough cores HT off no such problem.
 
  • Like
Reactions: SiliconFly

ondma

Diamond Member
Mar 18, 2018
3,310
1,695
136
Look at benchmarks. The 7700X was much better than 7950X in gaming in many games. 1% lows tank a lot with cross CCX/CCD penalty. Its like a 400% penalty. I mean its really just 2 7700X CPUs glued together. Same with 5950X is really just 2 5800X CPUs glued together. Kind of like a dual socket system. The scheduling issues are a mess too. Just look at the 9950X vs 9700X. Much worse in gaming.

There is a reason why the 5900X and 7900X are considered worse for gaming than the 7700X and 5800X.

Plus with 12 P cores on a single die you can shut off HT and have best possible latency and clocks. Try doing that when it has to cross a CCX/CCD and yikes it will have to cross even more and tank performance. On a single ring with enough cores HT off no such problem.
Here is a recent 13 game average from Techspot. I just don't see the results you are claiming. Yes, for the x3d chips the single CCD is best because you don't have to worry about the gaming threads going to the wrong ccd. However, for vanilla Zen 4, the best framerate is for the 7950x, and 1% lows dont suffer either. 7700x is actually slower than 7900x as well, although admittedly there is a slight clockspeed advantage for the 79xx chips, but I dont really see any penalty for the dual ccd configuration.gamesy.jpg.
Edit: although this topic is related to Intel architecture and possible new chips, this is getting off topic in an intel thread. I am not going to post any more about it.
 

DavidC1

Golden Member
Dec 29, 2023
1,854
2,984
96
Why clustered decode is the future of x86. Contrary to expectations, clustered decode setup used in Mont and Zen 5 cores aren't just "cheap way of decode", but in some cases more advantageous than the traditional decode.

Here's a tidbit from C&C:
To programs, Gracemont’s decoder should function just like a traditional 6-wide decoder, fed by a 32 byte/cycle fetch port which goes both ways and compared to old school linear decoders, a clustered out-of-order decoder should lose less throughput around taken branches.
Skymont doesn't even have the Loop Stream Detector either. The clustered decode handles loops. The chief architect of Monts, Stephen Robinson says that 3-wide decode cluster is used partially because it's easier to fill. Since Gracemont, it no longer is just about branches. The load balancer inserts fake branches as needed so it can continue to parallel execute.

The clustered decode also allows neat tricks such as Skymont's "Nanocode" feature, which replicates the ucode instruction per cluster, so parallel execution isn't limited by the ucode being used. This is a cheap and power efficient way compared to Fast Path, and a compromise. If a particular instruction was say, 50% slower than an average instruction, why use Fast Path and make it 3x faster? Just make it roughly equal.

Because 3-wide is already reaching in ILP limitations, 6 and 9 wide decode is really only for peaks. Therefore using Nanocode to reduce cases where the clusters can't be parallelized is a innovative idea, and you are reducing worst case scenarios just enough so it's not a bottleneck anymore.
 
  • Like
Reactions: Elfear

alcoholbob

Diamond Member
May 24, 2005
6,387
465
126
Here is a recent 13 game average from Techspot. I just don't see the results you are claiming. Yes, for the x3d chips the single CCD is best because you don't have to worry about the gaming threads going to the wrong ccd. However, for vanilla Zen 4, the best framerate is for the 7950x, and 1% lows dont suffer either. 7700x is actually slower than 7900x as well, although admittedly there is a slight clockspeed advantage for the 79xx chips, but I dont really see any penalty for the dual ccd configuration.View attachment 105781.
Edit: although this topic is related to Intel architecture and possible new chips, this is getting off topic in an intel thread. I am not going to post any more about it.

There's no penalty because reviewers are manually doing core parking in their gaming benchmarks. A lot of casuals probably don't even know to do that and get worse performance.
 

dttprofessor

Member
Jun 16, 2022
163
45
71
Why clustered decode is the future of x86. Contrary to expectations, clustered decode setup used in Mont and Zen 5 cores aren't just "cheap way of decode", but in some cases more advantageous than the traditional decode.

Here's a tidbit from C&C:

Skymont doesn't even have the Loop Stream Detector either. The clustered decode handles loops. The chief architect of Monts, Stephen Robinson says that 3-wide decode cluster is used partially because it's easier to fill. Since Gracemont, it no longer is just about branches. The load balancer inserts fake branches as needed so it can continue to parallel execute.

The clustered decode also allows neat tricks such as Skymont's "Nanocode" feature, which replicates the ucode instruction per cluster, so parallel execution isn't limited by the ucode being used. This is a cheap and power efficient way compared to Fast Path, and a compromise. If a particular instruction was say, 50% slower than an average instruction, why use Fast Path and make it 3x faster? Just make it roughly equal.

Because 3-wide is already reaching in ILP limitations, 6 and 9 wide decode is really only for peaks. Therefore using Nanocode to reduce cases where the clusters can't be parallelized is a innovative idea, and you are reducing worst case scenarios just enough so it's not a bottleneck anymore.
Stephen Robinson say the major obstacle of x86 is over.
 
  • Like
Reactions: DavidC1

Hitman928

Diamond Member
Apr 15, 2012
6,695
12,370
136
There's no penalty because reviewers are manually doing core parking in their gaming benchmarks. A lot of casuals probably don't even know to do that and get worse performance.

I don’t believe that’s true for HWUB, at least, I haven’t seen them say that they test this way.