• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 776 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Wildcat Lake (WCL) Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing Raptor Lake-U. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q1 2026.

Intel Raptor Lake UIntel Wildcat Lake 15WIntel Lunar LakeIntel Panther Lake 4+0+4
Launch DateQ1-2024Q2-2026Q3-2024Q1-2026
ModelIntel 150UIntel Core 7 360Core Ultra 7 268VCore Ultra 7 365
Dies2223
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6Intel 18-A + Intel 3 + TSMC N6
CPU2 P-core + 8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-cores4 P-core + 4 LP E-cores
Threads12688
Max Clock5.4 GHz4.8 GHz5 GHz4.8 GHz
L3 Cache12 MB6 MB12 MB12 MB
TDP15 - 55 W15 - 35 W17 - 37 W25 - 55 W
Memory128-bit LPDDR5-520064-bit LPDDR5x-7467128-bit LPDDR5x-8533128-bit LPDDR5x-7467
Size96 GB48 GB32 GB128 GB
Bandwidth83 GB/s60 GB/s136 GB/s120 GB/s
GPUIntel GraphicsIntel GraphicsArc 140VIntel Graphics
RTNoNoYESYES
EU / Xe96 EU2 Xe8 Xe4 Xe
Max Clock1.3 GHz2.6 GHz2 GHz2.5 GHz
NPUGNA 3.017 TOPS48 TOPS49 TOPS






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,049
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,534
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,443
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,329
Last edited:
But asking how big a register is, isn't exactly cut and dry these days anyways.
This I think here is where we disagree. I mean while physical entry size in register file is up to implementation, SSE architectural register size is defined to be 128b (the xmm register), AVX512 with VL extension supports xmm, ymm (256b) and zmm (512b) registers each with well defined bit width. How this is implemented in HW is another matter🙂

They just allocate another virtual register from the physical file making sure writes to memory are made in order.
Compiler cannot allocate more than there are architectural registers. It will spill to stack as soon as it thinks it already used all architectural registers. Renaming is opaque to the compiler and is used by OoO engine to solve other problems like write after write or write after read. And its OoO engine that makes sure the results are observable in program order, not the compiler.
 
How big is a SSE register.
128 bits as per Intel who invented it and documented a few aeons ago

What's your number?

Am I right?

Wrong because you are still confusing hardware implementation with ISA spec - sure there are lots of registers now and obviously they have to be contiguous, so 512 bit register can be used for 256 bit purposes, so what - does it mean SSE is suddenly not 128 bit but 512 bit because in modern CPU hardware decided to use one of 512 bit registers?

No - it's still 128 bit, as per bloody spec.

On Zen5 it can be 512b with a mask set to 128b. On Zen5 mobile it can be both that and a 256b

No, AVX-512 registers are 512 bits wide - on Zen 4 or Zen 5 or Zen 5C and on Zen 6 they will also be 512 bits wide. In fact even in Zen 1000 they will have to be 512 bits wide, because that's how they are specified!
 
Last edited:
If you look at the opcodes spit out by compilers they continuously reuse the same register references over and over rarely stacking more than a couple depths. Does this reuse block execution? No. They just allocate another virtual register from the physical file making sure writes to memory are made in order.

Compiler cannot allocate more than there are architectural registers. It will spill to stack as soon as it thinks it already used all architectural registers. Renaming is opaque to the compiler and is used by OoO engine to solve other problems like write after write or write after read. And its OoO engine that makes sure the results are observable in program order, not the compiler.

You took this this out of context. I said the compiler is reusing the same registers over and over.

example code

Code:
    int xvals[8] = {0,1,2,3,4,5,6,7};
    int end = sizeof(xvals) / sizeof(int);
    int i = 0;
    do {
        xvals[i]++;
    } while (++i < end);

the loop minus the array setup

Code:
.L2:
        mov     eax, DWORD PTR [rbp-4]
        cdqe
        mov     eax, DWORD PTR [rbp-48+rax*4]
        lea     edx, [rax+1]
        mov     eax, DWORD PTR [rbp-4]
        cdqe
        mov     DWORD PTR [rbp-48+rax*4], edx
        add     DWORD PTR [rbp-4], 1
        mov     eax, DWORD PTR [rbp-4]
        cmp     eax, DWORD PTR [rbp-8]
        setl    al
        test    al, al
        jne     .L2

The same 4 registers are used over and over. It is not going to spill to the stack because they are renamed by the out of order engine. My point was compilers generally don't use more than a few named registers.
 
The same 4 registers are used over and over. It is not going to spill to the stack because they are renamed by the out of order engine
Nope, they are not spilled over to the stack because compiler is storing results on the stack because array is kept there😉 So in other words compiler is loading value from the stack, incrementing it by one and storing it to stack. There is no reason it would need to use more registers than EAX/RAX since x64 allows memory operands. OoO engine will rename whatever false dependencies it will spot in this code but the reason you are seeing EAX/RAX only is because your code does not need more active registers. In other words compiler does not have to keep the value "alive" in registers for the whole duration of your program.

Out of Order engine is a hardware property. Compiler does not know on what hardware your code will run [most of the time] so it will not assume you have X size of register file to know how well it can do register renaming. The whole magic of OoO is that it is opaque to the compiler and the programmer.
 
Last edited:
Nope, they are not spilled over to the stack because compiler is storing results on the stack because array is kept there😉 So in other words compiler is loading value from the stack, incrementing it by one and storing it to stack. There is no reason it would need to use more registers than EAX/RAX since x64 allows memory operands. OoO engine will rename whatever false dependencies it will spot in this code but the reason you are seeing EAX/RAX only is because your code does not need more active registers. In other words compiler does not have to keep the value "alive" in registers for the whole duration of your program.

Out of Order engine is a hardware property. Compiler does not know on what hardware your code will run [most of the time] so it will not assume you have X size of register file to know how well it can do register renaming. The whole magic of OoO is that it is opaque to the compiler and the programmer.

The OoO may be opaque to the compiler. Compilers still produce code in ways that help the OoO engine take advantage of register renaming.

I've looked at a lot of code spit out by the compilers. Rarely do you see more than 3 or 4 registers. I don't think I've ever seen a numbered register.

As for referenced values. That's the whole x86 thing. RISC requires you to load everything then operate on it. Part of what I was saying previously was labeled memory is more like named registers.
 
I personally cannot fathom a single scenario where Intel sells 100 million of the 52-core variant of Nova Lake. AMD + Intel combined often sell 70 million to 80 million desktops / workstations CPU per year. Intel's portion would be smaller and the portion of just one rumored top-end desktop CPU would be even smaller than that. Divide your number by 10 at least (probably more) to get a much more realistic value. Plus, again, Intel isn't the one buying the memory AND the new memory premium fades quickly after a few months.

There never was a rumor for only a 52-core variant. Here are rumors for 16 core and 28 core versions.

Here is another rumor from our own source that adds 24 core, 12 core, and 4 core variants.

Plus, I can't think of a recent time where Intel sold only one desktop variant without lower end models with fewer cores. Take Arrow Lake for example. There is Ultra 9 with 8P + 16E, Ultra 7 with 8P + 12E, and Ultra 5 that is either 6P + 8E or 6P + 4E. The rumored 52-core, if it exists, is only for the very top SK processor (or maybe X if Intel brings that back).
Maybe I missed it, however I don’t think anyone was ever claiming there would only be a 52 score variant.

FWIW I have a hard time believing such a part exists at all. N2 and 18A cost significantly more than previous processes, mastering PPA is a must for both Intel and AMD. The 285k is on N3, and Intel’s core sizes will eat up any improvements that N2/18A provide.

If anything, they will probably have a 12/24 + 4 part at the top end, with 2 6 + 12 core CCDs. I have seen Intel do crazier things, however.

Just recently, they announced another round of layoffs, so I definitely wouldn’t hold my breath.
 
That was all theorical because the data path was still 64b, so only half of the SSE throughput was actualy possible
The size of SSE registers, however, was, is and will always be - 128 bits, even if there is no single data type that is 128 bits, which obviously is the case because it's called SIMD for a reason.

Register renaming, compilers, spilling stack, not full data path, dinosaurs roaming the Earth, all that are entirely different matters.
 
The size of SSE registers, however, was, is and will always be - 128 bits, even if there is no single data type that is 128 bits, which obviously is the case because it's called SIMD for a reason.

Register renaming, compilers, spilling stack, not full data path, dinosaurs roaming the Earth, all that are entirely different matters.
That was 128b on paper and 64b in real world, the same way as a tank with 2x the volume but with an unchanged output pipe diameter and flow speed.
 
That was 128b on paper and 64b in real world, the same way as a tank with 2x the volume but with an unchanged output pipe diameter and flow speed.
So then are AVX-512 registers in Zen 4 also 512 bit only on paper because execution is double pumped and data path can't load whole register in one go, yes or no?
 
So then are AVX-512 registers in Zen 4 also 512 bit only on paper because execution is double pumped and data path can't load whole register in one go, yes or no?
Not the same thing as the pentium 3 also lacked the necessary exe ressources, wich is not the case of Zen 4.

I put it again since you have trouble grasping all the info :

To compensate partially for implementing only half of SSE's architectural width, Katmai implements the SIMD-FP adder as a separate unit on the second dispatch port. This organization allows one half of a SIMD multiply and one half of an independent SIMD add to be issued together bringing the peak throughput back to four floating point operations per cycle — at least for code with an even distribution of multiplies and adds.The issue was that Katmai's hardware-implementation contradicted the parallelism model implied by the SSE instruction-set. Programmers faced a code-scheduling dilemma: "Should the SSE-code be tuned for Katmai's limited execution resources, or should it be tuned for a future processor with more resources?"

 
Not the same thing as the pentium 3 also lacked the necessary exe ressources, wich is not the case of Zen 4.
Zen 4 totally lacks necessary resources for AVX 512 which is why it's "double pumped", you really have double standards here.

Even Zen 5 can't load 64 bytes from L3 in one go (only L1 and L2), does it make Zen 5's registers half size? No!

Anyway, SSE is 128 bit, as per Intel spec, as per physical implementation, actual execution how fast or slow it is, whether it's microcode even does not matter - spec for size is spec for size, end of.
 
Zen 4 totally lacks necessary resources for AVX 512 which is why it's "double pumped", you really have double standards here.

Anyway, SSE is 128 bit, as per Intel spec, as per physical implementation, actual execution how fast or slow it is, whether it's microcode even does not matter - spec for size is spec for size, end of.

A baby need to be fed with a spoon.
Zen 4 has not such a limitation, read better :

This organization allows one half of a SIMD multiply and one half of an independent SIMD add to be issued together bringing the peak throughput back to four floating point operations per cycle — at least for code with an even distribution of multiplies and adds.
 
You are deflecting the subject.
I am not deflecting anything, it is you who is bringing all sort of historic limitations in execution which got nothing to do with register size - all CPUs got limitations of different sorts, many of which get relaxed later, that does not affect register size as it is defined in uarch.

Anyway, I am done here, you clearly operating on a different plane than me on this one.

P.S. Why won't you bring 64-bit memory addressing, which in reality is not 64-bit because CPUs use less real bits for memory addressing, yet the registers used for pointers are 64-bit... or are they now? Rhetorical question.
 
Not according to this a few pages back in this thread.
Current CB24 scores appear to be quite bandwidth limited. Certainly even from that post, it is clear that bandwidth has a significant effect on performance across a wide range of benchmarks.

Assuming that Intel will shell out the dough for state-of-the-art new memory for high volume products is a fallacy IMO.

Assuming that average desktop and laptop users utilize higher core counts than today's processors provide is another fallacy IMO.

Finally, assuming that HPC and DC workloads where higher core counts are justified by the applications will not be bandwidth starved with only 2 channels is equally hard to imagine.

A 52 core Nova Lake (lets just call it a 48 core since I doubt those LP cores are worth the die space anyway) with higher IPC P and E cores like everyone is expecting will crave even more bandwidth than the current Arrow Lake per core.

I see neither the market for a 52 core desktop/laptop processor nor the technical merit of pairing such a beast with only 2 channels of DDR5.

As this is only my opinion, I suspect time will tell.
 
Maybe I missed it, however I don’t think anyone was ever claiming there would only be a 52 score variant.
FWIW I have a hard time believing such a part exists at all. N2 and 18A cost significantly more than previous processes, mastering PPA is a must for both Intel and AMD. The 285k is on N3, and Intel’s core sizes will eat up any improvements that N2/18A provide.

If anything, they will probably have a 12/24 + 4 part at the top end, with 2 6 + 12 core CCDs.
That was implied by the "multiply by 100 million" statement which was in reference to the rumored 52 core variant. The need for DDR6 is only for the 52 core variant. There is absolutely no need for the rest of the Nova Lake chips to have DDR6. They may have it, but don't need it. I can certainly see a desktop CPU with one tile of 26 or fewer cores on DDR5 and a dual tile workstation CPU with 52 cores and DDR6 (remember rumors state that Nova Lake separates the memory tile from the CPU tile). Will it happen? I have no idea. But it could.

Take the expense and time to design just one 1 tile with 26 cores for the desktop crowd. Mass produce it for yield and cost savings. Put two of those tiles together for the workstation crowd with a different memory controller. Apple put two M1 tiles together with the M1 Ultra. AMD did it with Threadripper. Heck, Intel did it (poorly) back with the Pentium D which started the whole glue it together meme. Plus, this is exactly what the rumors state: 2x26 cores.
 
That was implied by the "multiply by 100 million" statement which was in reference to the rumored 52 core variant. The need for DDR6 is only for the 52 core variant. There is absolutely no need for the rest of the Nova Lake chips to have DDR6. They may have it, but don't need it. I can certainly see a desktop CPU with one tile of 26 or fewer cores on DDR5 and a dual tile workstation CPU with 52 cores and DDR6 (remember rumors state that Nova Lake separates the memory tile from the CPU tile). Will it happen? I have no idea. But it could.

Take the expense and time to design just one 1 tile with 26 cores for the desktop crowd. Mass produce it for yield and cost savings. Put two of those tiles together for the workstation crowd with a different memory controller. Apple put two M1 tiles together with the M1 Ultra. AMD did it with Threadripper. Heck, Intel did it (poorly) back with the Pentium D which started the whole glue it together meme. Plus, this is exactly what the rumors state: 2x26 cores.
Seeing how they put 24 cores on N3B @ 114mm2, seems logical that 26 cores would be doable on 18A (or N2). They should be able to keep the die down to around 100mm2 or somewhere near that I would think as well. Doubling this and expecting the RAM to keep up with just 2 channels? That seems a bit optimistic IMO.

I believe I saw a rumor that AMD's 12c CCD would be around 75mm2 on N2. This next round is going to be interesting for sure.

Currently the 16c/32t Zen 5 generally bests Arrow Lake 24c/24t by 5% in multi-threaded loads (pretty close though). So essentially, 1 Zen 5 core = 1.5 Arrow Lake cores overall in MT. So if everything stays in the same proportion next generation, 24 Zen 6 cores would be equivalent to 36 Nova Lake cores in MT.

1745635580487.png

I believe we are entering an interesting time with the next generation of processors though. What percentage of the consumer market needs more than a 26 core Nova Lake (or 12c/24t Zen 6)? Sure, you can make one with a dual CCD and a decent memory controller (and potentially faster memory on dual channel), but how many consumers need it?

For the HPC/Workstation work, wouldn't something like Threadripper be better? Much more memory bandwidth and way more threads?

Perhaps this is what the 52 core Nova Lake will be targeting?
 
I'm really surprised people fight over SSE vs 3dnow, some depending on their preference for Intel or AMD. What I will always remember is 64-bit x86, which was designed by AMD; this had a much higher impact on x86 than any SIMD extension.

Also register banks exist. They can even be spotted on floor plans. They are obviously much larger than what the ISA requires due to renaming.
Intel had it designed as well they were super focused on Itanium they didn't wanted AMD to have thier money making x86 license.

Screenshot_20250426-125137.png
 
You took this this out of context. I said the compiler is reusing the same registers over and over.

example code

Code:
    int xvals[8] = {0,1,2,3,4,5,6,7};
    int end = sizeof(xvals) / sizeof(int);
    int i = 0;
    do {
        xvals[i]++;
    } while (++i < end);

the loop minus the array setup

Code:
.L2:
        mov     eax, DWORD PTR [rbp-4]
        cdqe
        mov     eax, DWORD PTR [rbp-48+rax*4]
        lea     edx, [rax+1]
        mov     eax, DWORD PTR [rbp-4]
        cdqe
        mov     DWORD PTR [rbp-48+rax*4], edx
        add     DWORD PTR [rbp-4], 1
        mov     eax, DWORD PTR [rbp-4]
        cmp     eax, DWORD PTR [rbp-8]
        setl    al
        test    al, al
        jne     .L2

The same 4 registers are used over and over. It is not going to spill to the stack because they are renamed by the out of order engine. My point was compilers generally don't use more than a few named registers.
there are only 16 GPR Registers available in x86_64 to begin with

Seeing how they put 24 cores on N3B @ 114mm2, seems logical that 26 cores would be doable on 18A (or N2). They should be able to keep the die down to around 100mm2 or somewhere near that I would think as well. Doubling this and expecting the RAM to keep up with just 2 channels? That seems a bit optimistic IMO.

I believe I saw a rumor that AMD's 12c CCD would be around 75mm2 on N2. This next round is going to be interesting for sure.
where did it leak i had guessed around 80mm2 i was close :rofl:
Currently the 16c/32t Zen 5 generally bests Arrow Lake 24c/24t by 5% in multi-threaded loads (pretty close though). So essentially, 1 Zen 5 core = 1.5 Arrow Lake cores overall in MT. So if everything stays in the same proportion next generation, 24 Zen 6 cores would be equivalent to 36 Nova Lake cores in MT.

View attachment 122734
It has Y cruncher in the mix with AVX-512 when NVL will get AVX-512 there would be a change in it if in handbrake they are using SVT-AV1 it also supports AVX-512 .
I believe we are entering an interesting time with the next generation of processors though. What percentage of the consumer market needs more than a 26 core Nova Lake (or 12c/24t Zen 6)? Sure, you can make one with a dual CCD and a decent memory controller (and potentially faster memory on dual channel), but how many consumers need it?
For this the Tiles that were leaked were 8+16/4+8/4+0 and the SOC tile contains 4LPE Cores no matter the config i think a 4+4(4 cores disabled)+4 I3 would be ridiculous.
THe SKU can be but not limited to
  • 8+16+4
  • 2*(8+16)+4
  • 2*(4+8)+4
  • 4+8+4
  • 4+0+4
the 8+16 Tile is N2 and the 4+8+4/4+0 tile is 18AP and the common SoC tile is shared across all the SKUs.

For the HPC/Workstation work, wouldn't something like Threadripper be better? Much more memory bandwidth and way more threads?

Perhaps this is what the 52 core Nova Lake will be targeting?
Probable around $1000 this should be a great buy for people looking at ST/MT performance where you don't need a ton of PCI-E and for RAM i think 64 GB DIMM should be more available by than so 256 GB should be enough for most enthusiasts.
 
Intel had it designed as well they were super focused on Itanium they didn't wanted AMD to have thier money making x86 license.

View attachment 122737

100% fake. There was absolutely no 64 bit in the P4 until it needed it. I cannot believe I read such stupidity. His 64 bit just happened to be the same as AMD's? What a fool. Totally worthless fake chump. Have I made my opinion clear?
 
Last edited:
100% fake. There was absolutely no 64 bit in the P4 until it needed it. I cannot believe I read such stupidity. His 64 bit just happened to be the same as AMD's? What a fool. Totally worthless fake chump. Have I mad my opinion clear?
Ok so you are saying one of the Chief architects of x86 of his time is lying that's pretty bold of you
 
Back
Top