Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Tigerick · Aug 22, 2022

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

Intel Core Ultra 100 - Meteor Lake

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

DavidC1 · Aug 7, 2024

SiliconFly said:
Twitter/X is abuzz with Zen5 benchmarks. And it’s nowhere near what was suggested before. Even in gaming, it’s only almost on par with 14th gen Raptor refresh.

This gives even more headroom for ARL than expected.

See? I'm telling you every leak goes from Mad AMD bullish to Mad Intel bullish.

GTracing said:
That seems like a over generalization to me. Plenty of people watch Netflix on their laptop in full screen.

And Netflix is Local Video Playback? LVP is the idealistic scenario for battery life because you have an optimized application not needing to activate Network devices running video, where the CPU could go low power in-between frames to save power, something which won't be the case in Streaming.

Many other people also do not.

SiliconFly · Aug 7, 2024

DavidC1 · Aug 7, 2024

GTracing said:
That seems like a over generalization to me. Plenty of people watch Netflix on their laptop in full screen.

I can tell you from my Kabylake that it struggles to go to the ideal low power if I have more than 2 tabs open and computer not restarted for more than a few days.

The programs and the coding is far, far from perfect. So the things that need to sleep don't, the things that need to flush memory don't.

This kinda reminds me of Icelake, where it could idle so low and get 20+ hour idle screen-on battery, but the bursty browser workload gain was at best equal to the predecessor, meaning the low idle power was in practice, useless.

Det0x · Aug 7, 2024

So we get 253w instead of 353w ?

Taken from here:

Intel says Raptor Lake microcode update will not affect overclocking and performance, new Arrow Lake/Battlemage updates - VideoCardz.com

Intel makes several claims at ASUS event in China Updates for Raptor Lake, Arrow Lake and Battlemage. Source: 小敌鸽/Weibo “小敌鸽” aka Little Pigeon is a Chinese tech blogger who has shared some first-hand information in the past. This person was invited to Intel x ASUS event (said to be an internal...

videocardz.com

DavidC1 · Aug 7, 2024

So for those that care about the reddit stuff.

One guy who said was in the Royal Core team says refutes another guy saying Royal Core will invalidate both the P and the E core. He says that just before he left the gains on SpecINT are mediocre.

I always had doubts on my mind about Royal Core claims that it'll be a titanic gain. The revolutionary stuff is often abandoned because the plain boring regular stuff advances enough that developing what exists is good enough.

DavidC1 · Aug 7, 2024

Also, people say the E core has been gaining large because it started from the lower end of the scale. This is a lazy explanation.

Well, what do you say about the way different approach the E core team uses? There hasn't been any new ideas on the P core team since Sandy Bridge in 2011! While the E core team has been bringing new ones every 1-2 generations. Also every generation has been a sweeping change.

Atom Bonnell - Macro Op execution: https://www.anandtech.com/show/2493/9
Atom Silvermont - First OoOE, proper memory subsystem
Atom Goldmont - OoOE FP, 3-way decode, 16KB pre-decode L2 cache
Atom Goldmont Plus - 4-way backend, 64KB predecode L2 cache
Atom Tremont - Clustered decode, 128KB predecode L2
Gracemont - Improved Clustered 2x3 decode with auto-balancer. OD-ILD replaces the 128KB predecode L2 and can work with large code sizes, so now the clustered decode works on all code.
Skymont - 3x3 decode, with commonly used ucode instructions per cluster to improve parallelism. This is probably a area/power efficient way of having a Fast Path for instruction. Doubled FP/Vector units which benefits every code out there, not just few.

They improved branch prediction, size of the BTB, and backend width on Crestmont a mere Tick!

Willing to try new things
-Macro Op execution, rather than decode all
-Clustered decode
-Taking out 128KB L2 Predecode and replacing it with OD-ILD
-Very wide 16-wide retire on Skymont to save resources elsewhere. Probably also benefits clustered decode
-More Stores than Loads

Different ideas at a fundamental level
-Clustered decode can execute loops and it has no Loop Stream Buffer
-Rather than shared buffers, everything is independent
-Many, many simple ports over few very powerful ones
-Doubling FP benefits everyone, over AVX needing recompile every time

This is a truly inspired and dynamic team, and this is why the future is likely with them. AMD is using clustered decode on Zen 5 and so far David Huang isn't seeing it working on single thread. Tremont worked better 4 years ago.

MoistOintment · Aug 7, 2024

DavidC1 said:
Lunarlake is currently the only promising client part for Intel. Arrowlake seems nothing special. 5% ST? We used to get that with Ticks.

Unless Pantherlake has something that covers deficiencies, don't count on it.

Lunarlake proves that they need a separate lineup for low power. They had to do that for server right?

By logic LPE can't be "used on any workloads". It is limited by it's performance. Lunarlake's two core setup can already use the E cores for boosting performance. That means the "LPE" core is going to be low frequency again.

Also 4+8+4LPE isn't gonna be a performance leader, similar to not counting Meteorlake as 6+8+2. 6+8 is already low. How will they cover the -H market in 2026?

I hoped they would continue the excellent battery life that Cherry Trail Tablets had. In Braswell they completely gave that up. Why? Because by eliminating that lineup they could save extra few hundred million. They need a specialized lineup again not a generic one.

SoCs aimed at respective markets
-Server P
-Server E for Cloud/VM
-Client High end mobile to Client Desktop: 45W-125W
-Client ultra battery life<--9W to 35W

The rumored 5% ST improvement is for the 285K vs 14900K, which will see the biggest stagnation vs 14th gen as it will have the largest clock speed regression (6Ghz -> 5.7Ghz). Every other SKU that'll offer nearly the same clocks as its predecessor should see the 14% average IPC increase more directly translate to ST improvements.

SiliconFly · Aug 7, 2024

KompuKare · Aug 7, 2024

Hitman928 said:
Not sure what you mean by frame lag, but games are very latency sensitive.

If LNC has 5% regression in frequency, that means the performance drops from mid-double to single digit percentage increase. Add on that a potential tile latency penalty, and it may end up not being a very good gaming improvement, that's all I was saying.

Volunteers already wanted to test ARL in the Fallout 4 CPU testing thread!

Now there is an engine which is latency and memory bandwidth sensitive.

Hitman928 · Aug 7, 2024

SiliconFly said:
Really? Is that what you see? You're so focused on the first point you missed out the rather impressive second point that says:

"Performance details of the new gen (ARL) are confidential but are expected to be impressive."

It pretty *literally* says Arrow Lake is gonna have impressive performance, with lower power draw even at higher frequencies, and all that without the issues associated with previous gen parts!

I would say that's significant, but the last bullet point says that the ASUS motherboards are, "highly impressive," so that kind of negates the meaningfulness of the comment. /s

DavidC1 · Aug 7, 2024

MoistOintment said:
The rumored 5% ST improvement is for the 285K vs 14900K, which will see the biggest stagnation vs 14th gen as it will have the largest clock speed regression (6Ghz -> 5.7Ghz). Every other SKU that'll offer nearly the same clocks as its predecessor should see the 14% average IPC increase more directly translate to ST improvements.

This is why the Desktop needs a dedicated part and something that can be used for high end mobile too, and another one for Laptop that reaches Lunarlake level of battery life.

SiliconFly · Aug 7, 2024

cebri1 · Aug 7, 2024

If it's indeed more than 7-8% faster in ST than 14th gen then I could see them matching Zen5 3D

DavidC1 · Aug 7, 2024

Quoting from the Intel x86 Optimization Manual about Gracemont. Very bullish about clustered decode. Not just "cheap way of uop caches".

Gracemont and Tremont microarchitectures, large code footprint workloads may see large benefits. This overall approach to x86 instruction decoding provides a clear path forward to very wide designs without needing to cache post-decoded instructions

Regarding the clustered decode and OD-ILD

One unique performance issue for a microarchitecture of clustered decoders can occur when very long basic blocks are executed. Compilers will sometimes unroll loops of code and generate blocks that can be hundreds of instructions long, trying to provide additional parallelism and reduce the overhead of loops. This is very common for some compilers for floating point and vector processing. Since the method of clustering relies on toggle points, inserting unconditional JMP instructions to the next sequential instruction pointer could have been employed by handwritten assembly using the Tremont microarchitecture. Such insertions should no longer be necessary on Gracemont microarchitecture and beyond. Gracemont microarchitecture addresses this bottleneck by introducing a hardware load-balancer. When the hardware detects long basic blocks, additional toggle points can be created based on internal heuristics. These toggle points are added to the predictors, thus guiding the machine to toggle within the basic block.

One potential weakness can be determining the predecode bits and using those to mark the instruction boundaries. An additional change from the Tremont microarchitecture is the removal of the large (128KB) shared second level predecode cache. This cache helped seed the first level predecode cache whenever there were misses in the first level instruction cache. While this handled the majority of performant cases, loops of critical code with a footprint exceeding 1MB+ could still suffer additional front-end bottlenecks due to low decode bandwidth from incorrect predecode bits.

This could be seen via the event TOPDOWN_FE_BOUND.PREDECODE. Instead of a second level predecode cache, the Gracemont microarchitecture introduces an “on-demand” instruction length decoder (OD-ILD). This block is typically only active when new instruction bytes are brought into the instruction cache from a miss. When this happens, two extra cycles are added to the fetch pipeline in order to generate predecode bits on the fly. These are done across 16 bytes per cycle. With clustering, this means the Gracemont microarchitecture is capable of 32 bytes per cycle across the two independent OD-ILDs. While many workloads will not notice a difference in behavior between the Gracemont and Tremont microarchitectures, large code footprint workloads may see large benefits.

SiliconFly · Aug 7, 2024

DavidC1 · Aug 7, 2024

SiliconFly said:
Considering all the leaks and data that I keep seeing, I'm still gonna stick to my original claims of double digit ST increase for ARL-S (a bit revised). Hopefully somewhere between 10% to 20%. Definitely not just 5%.

It's impossible to have 10-20% gains in average for ST when the clockspeed is nearly down by 10% and Intel themselves claim only 14% per clock.

Arrowlake does not have magic over Lunarlake that gives it additional 10%. Granite Rapids may do over regular RWC, but only thing about Arrowlake is that it goes from 2.5MB to 3MB L2, which will have less than 1% gain in average.

There are two reason I can see why they are expanding the L2 cache, and it applies for Willow Cove as well.

1) It avoids the high latency L3 cache, reduces cases where it can have a big impact.
2) It is more power efficient.

One of the reasons Sandy Bridge was so good is the Ring Bus' simplicity allowed it to have real low latency access to the L3 cache. Since then due to the crazy focus on clocks they lost it.

cebri1 · Aug 7, 2024

SiliconFly said:
That one I'm not so sure. Again, I'm shooting in the dark here, but I think X3D parts have significantly more performance in gaming. Not single, but double digit increase i presume. Just guessing.

On average, it's usually around 10% over regular Zen. Some games really benefit from the added cache, , others don't notice it at all, others notice it until the cache is filled (e.g. Factorio),etc.

SiliconFly · Aug 7, 2024

Det0x · Aug 7, 2024

cebri1 said:
On average, it's usually around 10% over regular Zen. Some games really benefit from the added cache, , others don't notice it at all, others notice it until the cache is filled (e.g. Factorio),etc.

So Z4X3D with its ~700mhz clock handicap is ~10% faster than vanilla Z4, correct ?
(~5000mhz vs ~5700mhz)

Now comes the kicker, dont expect Z5X3D to run at the same clockspeed as Z4X3D

Hitman928 · Aug 7, 2024

SiliconFly said:
Clock speed is down by exactly 5% when compare to 14900K. 6->5.7 thats only -5%. That still works out to, 14 + 1 - 5 = 10% for the equivalent top-end ARL sku.

Percentages multiply, not add/subtract. Close enough in this case though, assuming the extra L2 does actually push the IPC increase to 15%.

cebri1 · Aug 7, 2024

Hitman928 said:
Percentages multiply, not add/subtract. Close enough in this case though, assuming the extra L2 does actually push the IPC increase to 15%.

9.25%🤓

cebri1 · Aug 7, 2024

Det0x said:
So Z4X3D with its ~700mhz clock handicap is ~10% faster than vanilla Z4, correct ?
(~5000mhz vs ~5700mhz)

Now comes the kicker, dont expect Z5X3D to run at the same clockspeed as Z4X3D

7700X is clocked at 5400mhz not at 5700mhz. The 9700X boosts up to 5500mhz so maybe a 100mhz bump but not much more.

Det0x · Aug 7, 2024

cebri1 said:
7700X is clocked at 5400mhz not at 5700mhz. The 9700X boosts up to 5500mhz so maybe a 100mhz bump but not much more.

Ukai, i were comparing clockspeed-difference between fastest SKUs (7800X3D vs 7950X)

Anyway in HUB latest comparison (3 months old) the 7800X3D is 24% faster than the 7700X, while having a 350mhz clockspeed difference (5050mhz vs 5400)
The clockspeed delta between Z5X3D and 9700X will not we the same as with the 7000 series...

AMD is 'working actively on really cool differentiators' to make the next generation of 3D V-cache 'even better'

X3D versions of Ryzen 9000-series chips won't be announced for a good while, though.

www.pcgamer.com

This is the real competition for Arrow Lake, not vanilla Z5

cebri1 · Aug 7, 2024

Det0x said:
Ukai, i were comparing clockspeed-difference between fastest SKUs (7800X3D vs 7950X)

Anyway in HUB latest comparison (3 months old) the 7800X3D is 24% faster than the 7700X, while having a 350mhz clockspeed difference (5050mhz vs 5400)
Z5X3D will not run 350mhz slower than 9700X
View attachment 104708

AMD is 'working actively on really cool differentiators' to make the next generation of 3D V-cache 'even better'

X3D versions of Ryzen 9000-series chips won't be announced for a good while, though.

www.pcgamer.com

This is the real competition for Arrow Lake, not vanilla Z5

Gaming averages at 1080p. Between 10-15%, depending on the benchmarks used.

Hitman928 · Aug 7, 2024

cebri1 said:
View attachment 104712
View attachment 104713

Gaming averages at 1080p. Between 10-15%, depending on the benchmarks used.

He did specify their latest comparison, which does show a 24% increase.

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Senior member

Attachments

Golden Member

Golden Member

Golden Member

Golden Member

Golden Member

Golden Member

Member

Golden Member

Golden Member

Diamond Member

Golden Member

Golden Member

Senior member

Golden Member

Golden Member

Golden Member

Senior member

Golden Member

Golden Member

Diamond Member

Senior member

Senior member

Golden Member

Senior member

Diamond Member