Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 776 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
917
834
106
Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

Intel Alder Lake - NIntel Wildcat LakeIntel Lunar LakeMediatek D9500
Launch DateQ1-2023Q2-2026 ?Q3-2024Q3-2025
ModelIntel N300?Core Ultra 7 268VDimensity 9500 5G
Dies2221
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6TSMC N3P
CPU8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-coresC1 1+3+4
Threads8688
Max Clock3.8 GHz?5 GHz
L3 Cache6 MB?12 MB
TDP7 WFanless ?17 WFanless
Memory64-bit LPDDR5-480064-bit LPDDR5-6800 ?128-bit LPDDR5X-853364-bit LPDDR5X-10667
Size16 GB?32 GB24 GB ?
Bandwidth~ 55 GB/s136 GB/s85.6 GB/s
GPUUHD GraphicsArc 140VG1 Ultra
EU / Xe32 EU2 Xe8 Xe12
Max Clock1.25 GHz2 GHz
NPUNA18 TOPS48 TOPS100 TOPS ?






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,034
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,527
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,435
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,321
Last edited:

MS_AT

Senior member
Jul 15, 2024
913
1,827
96
But asking how big a register is, isn't exactly cut and dry these days anyways.
This I think here is where we disagree. I mean while physical entry size in register file is up to implementation, SSE architectural register size is defined to be 128b (the xmm register), AVX512 with VL extension supports xmm, ymm (256b) and zmm (512b) registers each with well defined bit width. How this is implemented in HW is another matter:)

They just allocate another virtual register from the physical file making sure writes to memory are made in order.
Compiler cannot allocate more than there are architectural registers. It will spill to stack as soon as it thinks it already used all architectural registers. Renaming is opaque to the compiler and is used by OoO engine to solve other problems like write after write or write after read. And its OoO engine that makes sure the results are observable in program order, not the compiler.
 

Win2012R2

Golden Member
Dec 5, 2024
1,246
1,287
96
How big is a SSE register.
128 bits as per Intel who invented it and documented a few aeons ago

What's your number?

Am I right?

Wrong because you are still confusing hardware implementation with ISA spec - sure there are lots of registers now and obviously they have to be contiguous, so 512 bit register can be used for 256 bit purposes, so what - does it mean SSE is suddenly not 128 bit but 512 bit because in modern CPU hardware decided to use one of 512 bit registers?

No - it's still 128 bit, as per bloody spec.

On Zen5 it can be 512b with a mask set to 128b. On Zen5 mobile it can be both that and a 256b

No, AVX-512 registers are 512 bits wide - on Zen 4 or Zen 5 or Zen 5C and on Zen 6 they will also be 512 bits wide. In fact even in Zen 1000 they will have to be 512 bits wide, because that's how they are specified!
 
Last edited:

Schmide

Diamond Member
Mar 7, 2002
5,753
1,046
126
If you look at the opcodes spit out by compilers they continuously reuse the same register references over and over rarely stacking more than a couple depths. Does this reuse block execution? No. They just allocate another virtual register from the physical file making sure writes to memory are made in order.

Compiler cannot allocate more than there are architectural registers. It will spill to stack as soon as it thinks it already used all architectural registers. Renaming is opaque to the compiler and is used by OoO engine to solve other problems like write after write or write after read. And its OoO engine that makes sure the results are observable in program order, not the compiler.

You took this this out of context. I said the compiler is reusing the same registers over and over.

example code

Code:
    int xvals[8] = {0,1,2,3,4,5,6,7};
    int end = sizeof(xvals) / sizeof(int);
    int i = 0;
    do {
        xvals[i]++;
    } while (++i < end);

the loop minus the array setup

Code:
.L2:
        mov     eax, DWORD PTR [rbp-4]
        cdqe
        mov     eax, DWORD PTR [rbp-48+rax*4]
        lea     edx, [rax+1]
        mov     eax, DWORD PTR [rbp-4]
        cdqe
        mov     DWORD PTR [rbp-48+rax*4], edx
        add     DWORD PTR [rbp-4], 1
        mov     eax, DWORD PTR [rbp-4]
        cmp     eax, DWORD PTR [rbp-8]
        setl    al
        test    al, al
        jne     .L2

The same 4 registers are used over and over. It is not going to spill to the stack because they are renamed by the out of order engine. My point was compilers generally don't use more than a few named registers.
 

MS_AT

Senior member
Jul 15, 2024
913
1,827
96
The same 4 registers are used over and over. It is not going to spill to the stack because they are renamed by the out of order engine
Nope, they are not spilled over to the stack because compiler is storing results on the stack because array is kept there;) So in other words compiler is loading value from the stack, incrementing it by one and storing it to stack. There is no reason it would need to use more registers than EAX/RAX since x64 allows memory operands. OoO engine will rename whatever false dependencies it will spot in this code but the reason you are seeing EAX/RAX only is because your code does not need more active registers. In other words compiler does not have to keep the value "alive" in registers for the whole duration of your program.

Out of Order engine is a hardware property. Compiler does not know on what hardware your code will run [most of the time] so it will not assume you have X size of register file to know how well it can do register renaming. The whole magic of OoO is that it is opaque to the compiler and the programmer.
 
Last edited:

Schmide

Diamond Member
Mar 7, 2002
5,753
1,046
126
Nope, they are not spilled over to the stack because compiler is storing results on the stack because array is kept there;) So in other words compiler is loading value from the stack, incrementing it by one and storing it to stack. There is no reason it would need to use more registers than EAX/RAX since x64 allows memory operands. OoO engine will rename whatever false dependencies it will spot in this code but the reason you are seeing EAX/RAX only is because your code does not need more active registers. In other words compiler does not have to keep the value "alive" in registers for the whole duration of your program.

Out of Order engine is a hardware property. Compiler does not know on what hardware your code will run [most of the time] so it will not assume you have X size of register file to know how well it can do register renaming. The whole magic of OoO is that it is opaque to the compiler and the programmer.

The OoO may be opaque to the compiler. Compilers still produce code in ways that help the OoO engine take advantage of register renaming.

I've looked at a lot of code spit out by the compilers. Rarely do you see more than 3 or 4 registers. I don't think I've ever seen a numbered register.

As for referenced values. That's the whole x86 thing. RISC requires you to load everything then operate on it. Part of what I was saying previously was labeled memory is more like named registers.
 

eek2121

Diamond Member
Aug 2, 2005
3,458
5,104
136
I personally cannot fathom a single scenario where Intel sells 100 million of the 52-core variant of Nova Lake. AMD + Intel combined often sell 70 million to 80 million desktops / workstations CPU per year. Intel's portion would be smaller and the portion of just one rumored top-end desktop CPU would be even smaller than that. Divide your number by 10 at least (probably more) to get a much more realistic value. Plus, again, Intel isn't the one buying the memory AND the new memory premium fades quickly after a few months.

There never was a rumor for only a 52-core variant. Here are rumors for 16 core and 28 core versions.

Here is another rumor from our own source that adds 24 core, 12 core, and 4 core variants.

Plus, I can't think of a recent time where Intel sold only one desktop variant without lower end models with fewer cores. Take Arrow Lake for example. There is Ultra 9 with 8P + 16E, Ultra 7 with 8P + 12E, and Ultra 5 that is either 6P + 8E or 6P + 4E. The rumored 52-core, if it exists, is only for the very top SK processor (or maybe X if Intel brings that back).
Maybe I missed it, however I don’t think anyone was ever claiming there would only be a 52 score variant.

FWIW I have a hard time believing such a part exists at all. N2 and 18A cost significantly more than previous processes, mastering PPA is a must for both Intel and AMD. The 285k is on N3, and Intel’s core sizes will eat up any improvements that N2/18A provide.

If anything, they will probably have a 12/24 + 4 part at the top end, with 2 6 + 12 core CCDs. I have seen Intel do crazier things, however.

Just recently, they announced another round of layoffs, so I definitely wouldn’t hold my breath.
 
  • Like
Reactions: OneEng2

Win2012R2

Golden Member
Dec 5, 2024
1,246
1,287
96
That was all theorical because the data path was still 64b, so only half of the SSE throughput was actualy possible
The size of SSE registers, however, was, is and will always be - 128 bits, even if there is no single data type that is 128 bits, which obviously is the case because it's called SIMD for a reason.

Register renaming, compilers, spilling stack, not full data path, dinosaurs roaming the Earth, all that are entirely different matters.
 
  • Like
Reactions: Nothingness

Abwx

Lifer
Apr 2, 2011
11,935
4,909
136
The size of SSE registers, however, was, is and will always be - 128 bits, even if there is no single data type that is 128 bits, which obviously is the case because it's called SIMD for a reason.

Register renaming, compilers, spilling stack, not full data path, dinosaurs roaming the Earth, all that are entirely different matters.
That was 128b on paper and 64b in real world, the same way as a tank with 2x the volume but with an unchanged output pipe diameter and flow speed.
 

Win2012R2

Golden Member
Dec 5, 2024
1,246
1,287
96
That was 128b on paper and 64b in real world, the same way as a tank with 2x the volume but with an unchanged output pipe diameter and flow speed.
So then are AVX-512 registers in Zen 4 also 512 bit only on paper because execution is double pumped and data path can't load whole register in one go, yes or no?
 
  • Like
Reactions: Nothingness

Abwx

Lifer
Apr 2, 2011
11,935
4,909
136
So then are AVX-512 registers in Zen 4 also 512 bit only on paper because execution is double pumped and data path can't load whole register in one go, yes or no?
Not the same thing as the pentium 3 also lacked the necessary exe ressources, wich is not the case of Zen 4.

I put it again since you have trouble grasping all the info :

To compensate partially for implementing only half of SSE's architectural width, Katmai implements the SIMD-FP adder as a separate unit on the second dispatch port. This organization allows one half of a SIMD multiply and one half of an independent SIMD add to be issued together bringing the peak throughput back to four floating point operations per cycle — at least for code with an even distribution of multiplies and adds.The issue was that Katmai's hardware-implementation contradicted the parallelism model implied by the SSE instruction-set. Programmers faced a code-scheduling dilemma: "Should the SSE-code be tuned for Katmai's limited execution resources, or should it be tuned for a future processor with more resources?"

 

Win2012R2

Golden Member
Dec 5, 2024
1,246
1,287
96
Not the same thing as the pentium 3 also lacked the necessary exe ressources, wich is not the case of Zen 4.
Zen 4 totally lacks necessary resources for AVX 512 which is why it's "double pumped", you really have double standards here.

Even Zen 5 can't load 64 bytes from L3 in one go (only L1 and L2), does it make Zen 5's registers half size? No!

Anyway, SSE is 128 bit, as per Intel spec, as per physical implementation, actual execution how fast or slow it is, whether it's microcode even does not matter - spec for size is spec for size, end of.
 

Abwx

Lifer
Apr 2, 2011
11,935
4,909
136
Zen 4 totally lacks necessary resources for AVX 512 which is why it's "double pumped", you really have double standards here.

Anyway, SSE is 128 bit, as per Intel spec, as per physical implementation, actual execution how fast or slow it is, whether it's microcode even does not matter - spec for size is spec for size, end of.

A baby need to be fed with a spoon.
Zen 4 has not such a limitation, read better :

This organization allows one half of a SIMD multiply and one half of an independent SIMD add to be issued together bringing the peak throughput back to four floating point operations per cycle — at least for code with an even distribution of multiplies and adds.
 

Win2012R2

Golden Member
Dec 5, 2024
1,246
1,287
96
You are deflecting the subject.
I am not deflecting anything, it is you who is bringing all sort of historic limitations in execution which got nothing to do with register size - all CPUs got limitations of different sorts, many of which get relaxed later, that does not affect register size as it is defined in uarch.

Anyway, I am done here, you clearly operating on a different plane than me on this one.

P.S. Why won't you bring 64-bit memory addressing, which in reality is not 64-bit because CPUs use less real bits for memory addressing, yet the registers used for pointers are 64-bit... or are they now? Rhetorical question.
 

OneEng2

Senior member
Sep 19, 2022
957
1,169
106
Not according to this a few pages back in this thread.
Current CB24 scores appear to be quite bandwidth limited. Certainly even from that post, it is clear that bandwidth has a significant effect on performance across a wide range of benchmarks.

Assuming that Intel will shell out the dough for state-of-the-art new memory for high volume products is a fallacy IMO.

Assuming that average desktop and laptop users utilize higher core counts than today's processors provide is another fallacy IMO.

Finally, assuming that HPC and DC workloads where higher core counts are justified by the applications will not be bandwidth starved with only 2 channels is equally hard to imagine.

A 52 core Nova Lake (lets just call it a 48 core since I doubt those LP cores are worth the die space anyway) with higher IPC P and E cores like everyone is expecting will crave even more bandwidth than the current Arrow Lake per core.

I see neither the market for a 52 core desktop/laptop processor nor the technical merit of pairing such a beast with only 2 channels of DDR5.

As this is only my opinion, I suspect time will tell.
 

dullard

Elite Member
May 21, 2001
26,185
4,841
126
Maybe I missed it, however I don’t think anyone was ever claiming there would only be a 52 score variant.
FWIW I have a hard time believing such a part exists at all. N2 and 18A cost significantly more than previous processes, mastering PPA is a must for both Intel and AMD. The 285k is on N3, and Intel’s core sizes will eat up any improvements that N2/18A provide.

If anything, they will probably have a 12/24 + 4 part at the top end, with 2 6 + 12 core CCDs.
That was implied by the "multiply by 100 million" statement which was in reference to the rumored 52 core variant. The need for DDR6 is only for the 52 core variant. There is absolutely no need for the rest of the Nova Lake chips to have DDR6. They may have it, but don't need it. I can certainly see a desktop CPU with one tile of 26 or fewer cores on DDR5 and a dual tile workstation CPU with 52 cores and DDR6 (remember rumors state that Nova Lake separates the memory tile from the CPU tile). Will it happen? I have no idea. But it could.

Take the expense and time to design just one 1 tile with 26 cores for the desktop crowd. Mass produce it for yield and cost savings. Put two of those tiles together for the workstation crowd with a different memory controller. Apple put two M1 tiles together with the M1 Ultra. AMD did it with Threadripper. Heck, Intel did it (poorly) back with the Pentium D which started the whole glue it together meme. Plus, this is exactly what the rumors state: 2x26 cores.
 

OneEng2

Senior member
Sep 19, 2022
957
1,169
106
That was implied by the "multiply by 100 million" statement which was in reference to the rumored 52 core variant. The need for DDR6 is only for the 52 core variant. There is absolutely no need for the rest of the Nova Lake chips to have DDR6. They may have it, but don't need it. I can certainly see a desktop CPU with one tile of 26 or fewer cores on DDR5 and a dual tile workstation CPU with 52 cores and DDR6 (remember rumors state that Nova Lake separates the memory tile from the CPU tile). Will it happen? I have no idea. But it could.

Take the expense and time to design just one 1 tile with 26 cores for the desktop crowd. Mass produce it for yield and cost savings. Put two of those tiles together for the workstation crowd with a different memory controller. Apple put two M1 tiles together with the M1 Ultra. AMD did it with Threadripper. Heck, Intel did it (poorly) back with the Pentium D which started the whole glue it together meme. Plus, this is exactly what the rumors state: 2x26 cores.
Seeing how they put 24 cores on N3B @ 114mm2, seems logical that 26 cores would be doable on 18A (or N2). They should be able to keep the die down to around 100mm2 or somewhere near that I would think as well. Doubling this and expecting the RAM to keep up with just 2 channels? That seems a bit optimistic IMO.

I believe I saw a rumor that AMD's 12c CCD would be around 75mm2 on N2. This next round is going to be interesting for sure.

Currently the 16c/32t Zen 5 generally bests Arrow Lake 24c/24t by 5% in multi-threaded loads (pretty close though). So essentially, 1 Zen 5 core = 1.5 Arrow Lake cores overall in MT. So if everything stays in the same proportion next generation, 24 Zen 6 cores would be equivalent to 36 Nova Lake cores in MT.

1745635580487.png

I believe we are entering an interesting time with the next generation of processors though. What percentage of the consumer market needs more than a 26 core Nova Lake (or 12c/24t Zen 6)? Sure, you can make one with a dual CCD and a decent memory controller (and potentially faster memory on dual channel), but how many consumers need it?

For the HPC/Workstation work, wouldn't something like Threadripper be better? Much more memory bandwidth and way more threads?

Perhaps this is what the 52 core Nova Lake will be targeting?
 

511

Diamond Member
Jul 12, 2024
5,118
4,605
106
I'm really surprised people fight over SSE vs 3dnow, some depending on their preference for Intel or AMD. What I will always remember is 64-bit x86, which was designed by AMD; this had a much higher impact on x86 than any SIMD extension.

Also register banks exist. They can even be spotted on floor plans. They are obviously much larger than what the ISA requires due to renaming.
Intel had it designed as well they were super focused on Itanium they didn't wanted AMD to have thier money making x86 license.

Screenshot_20250426-125137.png
 

511

Diamond Member
Jul 12, 2024
5,118
4,605
106
You took this this out of context. I said the compiler is reusing the same registers over and over.

example code

Code:
    int xvals[8] = {0,1,2,3,4,5,6,7};
    int end = sizeof(xvals) / sizeof(int);
    int i = 0;
    do {
        xvals[i]++;
    } while (++i < end);

the loop minus the array setup

Code:
.L2:
        mov     eax, DWORD PTR [rbp-4]
        cdqe
        mov     eax, DWORD PTR [rbp-48+rax*4]
        lea     edx, [rax+1]
        mov     eax, DWORD PTR [rbp-4]
        cdqe
        mov     DWORD PTR [rbp-48+rax*4], edx
        add     DWORD PTR [rbp-4], 1
        mov     eax, DWORD PTR [rbp-4]
        cmp     eax, DWORD PTR [rbp-8]
        setl    al
        test    al, al
        jne     .L2

The same 4 registers are used over and over. It is not going to spill to the stack because they are renamed by the out of order engine. My point was compilers generally don't use more than a few named registers.
there are only 16 GPR Registers available in x86_64 to begin with

Seeing how they put 24 cores on N3B @ 114mm2, seems logical that 26 cores would be doable on 18A (or N2). They should be able to keep the die down to around 100mm2 or somewhere near that I would think as well. Doubling this and expecting the RAM to keep up with just 2 channels? That seems a bit optimistic IMO.

I believe I saw a rumor that AMD's 12c CCD would be around 75mm2 on N2. This next round is going to be interesting for sure.
where did it leak i had guessed around 80mm2 i was close :rofl:
Currently the 16c/32t Zen 5 generally bests Arrow Lake 24c/24t by 5% in multi-threaded loads (pretty close though). So essentially, 1 Zen 5 core = 1.5 Arrow Lake cores overall in MT. So if everything stays in the same proportion next generation, 24 Zen 6 cores would be equivalent to 36 Nova Lake cores in MT.

View attachment 122734
It has Y cruncher in the mix with AVX-512 when NVL will get AVX-512 there would be a change in it if in handbrake they are using SVT-AV1 it also supports AVX-512 .
I believe we are entering an interesting time with the next generation of processors though. What percentage of the consumer market needs more than a 26 core Nova Lake (or 12c/24t Zen 6)? Sure, you can make one with a dual CCD and a decent memory controller (and potentially faster memory on dual channel), but how many consumers need it?
For this the Tiles that were leaked were 8+16/4+8/4+0 and the SOC tile contains 4LPE Cores no matter the config i think a 4+4(4 cores disabled)+4 I3 would be ridiculous.
THe SKU can be but not limited to
  • 8+16+4
  • 2*(8+16)+4
  • 2*(4+8)+4
  • 4+8+4
  • 4+0+4
the 8+16 Tile is N2 and the 4+8+4/4+0 tile is 18AP and the common SoC tile is shared across all the SKUs.

For the HPC/Workstation work, wouldn't something like Threadripper be better? Much more memory bandwidth and way more threads?

Perhaps this is what the 52 core Nova Lake will be targeting?
Probable around $1000 this should be a great buy for people looking at ST/MT performance where you don't need a ton of PCI-E and for RAM i think 64 GB DIMM should be more available by than so 256 GB should be enough for most enthusiasts.
 

Thunder 57

Diamond Member
Aug 19, 2007
4,198
6,987
136
Intel had it designed as well they were super focused on Itanium they didn't wanted AMD to have thier money making x86 license.

View attachment 122737

100% fake. There was absolutely no 64 bit in the P4 until it needed it. I cannot believe I read such stupidity. His 64 bit just happened to be the same as AMD's? What a fool. Totally worthless fake chump. Have I made my opinion clear?
 
Last edited:
  • Like
Reactions: Thibsie

511

Diamond Member
Jul 12, 2024
5,118
4,605
106
100% fake. There was absolutely no 64 bit in the P4 until it needed it. I cannot believe I read such stupidity. His 64 bit just happened to be the same as AMD's? What a fool. Totally worthless fake chump. Have I mad my opinion clear?
Ok so you are saying one of the Chief architects of x86 of his time is lying that's pretty bold of you
 
  • Like
Reactions: Nothingness