Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Page 590 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
781
748
106
PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png

Intel Core Ultra 100 - Meteor Lake

INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)



Clockspeed.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,025
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,516
Last edited:

dttprofessor

Member
Jun 16, 2022
161
43
71
ARL uses the same SOC tile of MTL,intel has known they make a mistake before MTL‘s launches. Accord to the leak figures, they decided no change .
 

OneEng2

Senior member
Sep 19, 2022
673
921
106
The effect is minimal. You are talking 5% at best.

Load will saturate TDP so you have crap battery life anyway, and people don't care about extending that by 10-20% over better performance. Bursty workloads go nowhere near saturating TDP even at the peak, where it's there for 2-3% of the time.

AMD had copper interconnect in 0.18u but Intel's transistor performance was 20-30% ahead. Checking boxes is one thing, but you still need to do work to do better.

Regarding 18A: Roughly equal density to N3, and beating performance in chips that need high performance. In low power environments it's probably close.
Seems to me that battery life is pretty important in thin and light laptops.

18A may turn out better than you think with the inclusion of BSPD.
Right, and Intel revised Intel 18A performance estimates as being 15% faster than Intel 3. Previously it was Intel 20A was 15% faster than Intel 3, with Intel 18A being an additional 10% over Intel 20A, if I recall correctly.
... and this may be a problem for Intel as they will be pitching an 18A that is roughly on par with TSMC's last generation N3X while TSMC's premium process will be N2.

Still, 18A will have BSPD which provides some pretty good chip wide efficiencies that are not reflected in the transistor specifications.
Density isn't the top target, efficiency & performance shouled be.
Agree; however, this philosophy hurts the desktop market in both overall performance, and die size cost. Still, it provides the greatest scalability for DC where the profit and growth are.
Depends on the software, some are licensed based on “core” count and don’t distinguish between physical or logical cores for licensing purposes.
Threads are not "cores". I have read the licenses to many softwares over carefully (although it has been a few years).
Correct me if I'm wrong, or is Netburst being a failure more hindsight is 20/20? This is years before I started following hardware, but I heard that when Netburst design first started, many believed clock speeds would just keep increasing, so focusing on clockspeed at the expense of IPC sounded like a good idea initially at the time the development started. Only later on was it discovered to be a terrible idea. Going from 2Ghz to 4Ghz is 2x performance, without any change to IPC. Intel was predicting publicly that they'd be hitting 10Ghz in a few generations.
Possibly. Still, I am an EE who graduated college in the 80's. There was a time when all the transistors in a chip turned full ON and full OFF and leakage was minimal when they weren't in use. Also, non-linear gate effects were so small that they were ignored in most calculations. As clock speeds got higher and lithography got smaller and transistors never really "turned all the way off and on" anymore, it was abundantly clear that tricks like raising clock frequency, raising core voltage, making transistors more leaky, etc in the interest of higher clocks would incur an exponential cost in power. I think Intel simply ignored the many "voices of reason" that tried to tell them.
Until Intel has an answer for X3D I don’t see them competing with AMD for the gaming crown anytime soon.
Not this go round anyway.
I'd say most of the extra performance of 13/14th gen over 12th gen was due to extra cache added and not memory performance. Also, you can't really compare intel and amd RAM controller 'quality' as they work in completely different modes - AMD can work both in gear1 and gear2 mode with DDR5, while Intel only does G2 or G4 even with Arrow Lake, Intel's controller is (or rather was) integrated into the ring bus while AMD one is connected via an interface that has inherent tradeoffs to allow it to scale better with more core clusters. Also, if your software is coded in a numa-aware fashion, you'd be able to utilize the bandwidth in a proper fashion in 1:2 mode with Zen4/5, contrary to the stigma of "amd ddr5 bad"
AMD focusing on high core count memory controller design is a good plan IMO. Data Center high core count processors are where the highest margin and most growth are projected.

Of course, it may hurt them in the Laptop market, but we will see.
 

GTracing

Senior member
Aug 6, 2021
478
1,112
106
I haven't seen this mentioned yet with the memory controller discussion, but more cores and cache increase memory latency. Back in the Zen2 days, Intel had a better memory controller, in the sense that the 9900k has lower memory latency and could clock higher. (I believe that's what Flametail was thinking of with his post). According to anandtech's reviews, Alder Lake increased the memory latency from ~60ns to ~90ns, putting it more or less on par with desktop Ryzen CPUs.

Also, everyone here (myself included) is using "memory controller" to refer to the overall memory latency. To be pendantic, the memory controller is just one part of the overall latency. It can also be affected by size of caches, cache architecture, memory clocks and timings, and as Det0x pointed out, whether or not chiplets are used.
 
  • Like
Reactions: Tlh97 and Joe NYC

511

Platinum Member
Jul 12, 2024
2,898
2,901
106
I haven't seen this mentioned yet with the memory controller discussion, but more cores and cache increase memory latency. Back in the Zen2 days, Intel had a better memory controller, in the sense that the 9900k has lower memory latency and could clock higher. (I believe that's what Flametail was thinking of with his post). According to anandtech's reviews, Alder Lake increased the memory latency from ~60ns to ~90ns, putting it more or less on par with desktop Ryzen CPUs.
DDR4->5 may hVe something to do with it
Also, everyone here (myself included) is using "memory controller" to refer to the overall memory latency. To be pendantic, the memory controller is just one part of the overall latency. It can also be affected by size of caches, cache architecture, memory clocks and timings, and as Det0x pointed out, whether or not chiplets are used.
And chiplets included
 
  • Like
Reactions: Tlh97 and GTracing

Hulk

Diamond Member
Oct 9, 1999
5,118
3,660
136
Regarding the emulation of yet unbuilt silicon...

I would assume at the early design stages the emulation is more of a "rough sketch" of a design so that too much time isn't wasted on laying out design details that may change dramatically during the design process.

Along this line I would think that as the design becomes more finalized the emulation also becomes more accurate.

Finally, when the design is finalized wouldn't the emulation be finalized as well and represent very accurately how the final silicon would perform?

As a ME I have very limited understanding of how this works. In fact, most of what I know about CPU design and fabrication I learned here. I'm just trying to think of it like designing a mechanical part using computer aided design where you start with a rough idea and refine the model as you move along, finding stress concentrations and such, redesign in the computer, check stress and strain again,... I'm just thinking that you can be even more accurate with CPU's because every single transistor is put down in the design in the computer beforehand.

Also, I know AI is becoming a big part of CPU design. Does it also help with emulation of performance of the design?
 

DrMrLordX

Lifer
Apr 27, 2000
22,700
12,651
136
But personally after Willamette I think they should have realized some problems but they doubled down with Prescott.

Northwood-B and -C worked out well enough that they thought they could continue getting away with it. Plus keep in mind that originally Netburst was a big scheme to justify RDRAM in desktops. They had good reasons to push such a design.

+20% ST performance over the 14900KS was always fanfiction, but this seems below even my expectations, althought i did get some of my predictions right, even if i took some flak for that at the time

I don't think anyone could have predicted an actual performance regression in ST out of Arrow Lake. Even Intel's most pessimistic slides showed +5%. That might have been on a fictional version of 20a that we never saw. N3B does not seem to be the right node for Arrow Lake. Also keep in mind that Intel may not have as much experience or skill in utilizing TSMC nodes as their primary competitor...
 

Hitman928

Diamond Member
Apr 15, 2012
6,642
12,245
136
Still, 18A will have BSPD which provides some pretty good chip wide efficiencies that are not reflected in the transistor specifications.

BPSD gives a density advantage, allows for lower losses (thus improved efficiency), and better power delivery which enables slightly higher frequencies. The main downsides for BPSD are the increased costs and significantly worse thermals which, if unsolved, limits performance due to temperatures. Intel says that they've solved the thermal issue but have not yet given any indication of how they've done so.

Threads are not "cores". I have read the licenses to many softwares over carefully (although it has been a few years).

SMT "threads" don't really exist in that sense. When a CPU physical core has 2 way SMT, it is presented as 2 logical cores and most modern software that I am aware counts logical cores for licensing purposes, which is why many engineering/high performance software vendors that license on a per core basis recommend turning off SMT to get the most performance per core/license possible.
 

511

Platinum Member
Jul 12, 2024
2,898
2,901
106
BPSD gives a density advantage, allows for lower losses (thus improved efficiency), and better power delivery which enables slightly higher frequencies. The main downsides for BPSD are the increased costs and significantly worse thermals which, if unsolved, limits performance due to temperatures. Intel says that they've solved the thermal issue but have not yet given any indication of how they've done so.
If they do so why would they say in Public either they don't have it or it is a Trade Secret 🤣
SMT "threads" don't really exist in that sense. When a CPU physical core has 2 way SMT, it is presented as 2 logical cores and most modern software that I am aware counts logical cores for licensing purposes, which is why many engineering/high performance software vendors that license on a per core basis recommend turning off SMT to get the most performance per core/license possible.
Yes It is basically 1.3X the performance but the lincence price is 2X unless someone is giving discounts
 

Hulk

Diamond Member
Oct 9, 1999
5,118
3,660
136
I don't think anyone could have predicted an actual performance regression in ST out of Arrow Lake. Even Intel's most pessimistic slides showed +5%. That might have been on a fictional version of 20a that we never saw. N3B does not seem to be the right node for Arrow Lake. Also keep in mind that Intel may not have as much experience or skill in utilizing TSMC nodes as their primary competitor...

I am not doubting your assessment that Arrow Lake may show a performance regression from Raptor Lake in ST. What I am wondering is what metrics you have in mind that had you come to this assessment because as yet I don't have a clear indication of where this ST performance will land.

For example, are you thinking, "In Geekbench 6 a 285K operating at 5.7GHz scores xxxx while a 14900K operating at 6GHz scores xxxx, therefore it looks like there is an overall ST performance regression moving from Raptor to Arrow."

As I wrote above I'm not challenging your assessment, I know you are very intelligent and well thought-out in your responses, so I'm thinking you have some metrics in your head and I'm curious about them?

I'm not sure about ST yet because I just haven't "nailed down" how Arrow Lake will do in ST at a specific frequency in a couple of ST benches that I can compare to Raptor Lake at its stock ST frequency. We also have to specify if we are talking about 5.7GHz Arrow Lake vs 6 GHz (K) or 6.2GHz (KS). The KS has almost a 9% frequency advantage while the K has just over 5%. So a tie with the K (both stock) would mean +5% IPC for Lion Cove. A tie with the KS would mean +9% for Lion Cove.
 

OneEng2

Senior member
Sep 19, 2022
673
921
106
I don't think anyone could have predicted an actual performance regression in ST out of Arrow Lake. Even Intel's most pessimistic slides showed +5%. That might have been on a fictional version of 20a that we never saw. N3B does not seem to be the right node for Arrow Lake. Also keep in mind that Intel may not have as much experience or skill in utilizing TSMC nodes as their primary competitor...
I think that these are good guesses of where Intel went off the rails with their earlier estimates. In addition to not being skilled in utilizing TSMC nodes, tools, and processes, my guess is that they were also not well versed in cross-company development.... and likely were not as open as AMD likely is with TSMC about their design and its worry points.
SMT "threads" don't really exist in that sense. When a CPU physical core has 2 way SMT, it is presented as 2 logical cores and most modern software that I am aware counts logical cores for licensing purposes, which is why many engineering/high performance software vendors that license on a per core basis recommend turning off SMT to get the most performance per core/license possible.

Ok, the first one I looked up (as I said, its been a while since I worked closely with IT infrastructure managers in larger OEM's) is pretty darned explicit. PHYSICAL CORES, not logical ones. SMT does not count.

If you have other examples, please let me know.
 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
6,642
12,245
136
I think that these are good guesses of where Intel went off the rails with their earlier estimates. In addition to not being skilled in utilizing TSMC nodes, tools, and processes, my guess is that they were also not well versed in cross-company development.... and likely were not as open as AMD likely is with TSMC about their design and its worry points.


Ok, the first one I looked up (as I said, its been a while since I worked closely with IT infrastructure managers in larger OEM's) is pretty darned explicit. PHYSICAL CORES, not logical ones. SMT does not count.

If you have other examples, please let me know.

Your link is wrong so I don’t know what software you are referring to.

Cadence design (one of the biggest EDA software providers) use logical cores for licensing. SonnetEM, Calibre tools, HFSS, many more.
 

511

Platinum Member
Jul 12, 2024
2,898
2,901
106
I think that these are good guesses of where Intel went off the rails with their earlier estimates. In addition to not being skilled in utilizing TSMC nodes, tools, and processes, my guess is that they were also not well versed in cross-company development.... and likely were not as open as AMD likely is with TSMC about their design and its worry points.


Ok, the first one I looked up (as I said, its been a while since I worked closely with IT infrastructure managers in larger OEM's) is pretty darned explicit. PHYSICAL CORES, not logical ones. SMT does not count.

If you have other examples, please let me know.
why is voting information showing here can non Americans vote?🤣
 

OneEng2

Senior member
Sep 19, 2022
673
921
106
Your link is wrong so I don’t know what software you are referring to.

Cadence design (one of the biggest EDA software providers) use logical cores for licensing. SonnetEM, Calibre tools, HFSS, many more.

Not sure how that happened. I went back and fixed the link.
 

OneEng2

Senior member
Sep 19, 2022
673
921
106
Your link is wrong so I don’t know what software you are referring to.

Cadence design (one of the biggest EDA software providers) use logical cores for licensing. SonnetEM, Calibre tools, HFSS, many more.
Looked up Cadence. As an aside, last time I used orCAD it was free ;). I use Altium these days.

I have literally never heard of a CAD program being licensed by any other measure than "seats". You can get a fixed seat, or floating seats. Both limit access to the program by the number of people using it at the same time. I am intrigued by the "token" concept for analysis software though. Seems like a "per task" kind of charge.

Please provide a link showing a per thread licensing model. Cadence is the 2nd one I looked up and the 2nd one that has not provided proof to your claims.
 

OneEng2

Senior member
Sep 19, 2022
673
921
106
why is voting information showing here can non Americans vote?🤣
My bad! Was looking up information for my daughter who is in college about early voting in Michigan. Somehow this got caught up in my quote here (not sure how that happened).
 
  • Like
Reactions: 511

Hitman928

Diamond Member
Apr 15, 2012
6,642
12,245
136
Looked up Cadence. As an aside, last time I used orCAD it was free ;). I use Altium these days.

I have literally never heard of a CAD program being licensed by any other measure than "seats". You can get a fixed seat, or floating seats. Both limit access to the program by the number of people using it at the same time. I am intrigued by the "token" concept for analysis software though. Seems like a "per task" kind of charge.

Please provide a link showing a per thread licensing model. Cadence is the 2nd one I looked up and the 2nd one that has not provided proof to your claims.

Cadence virtuoso/schematic/layout edit licenses are per seat because they don't need more than 1 core/thread to run. The actual simulation/verification/extraction software packages though, are multi-threaded. Their licensing structure is a confusing mess and works off a token system that is broken into "tiers" and each tier depends on exact functions used and how many logical cores are being used. I can't link directly to the guide because you have to be licensed to have access to it. I worked at a startup for a while and was heavily involved in the license purchase negotiations though, and I know exactly how it works. Cadence does not distinguish between physical or logical cores.

Edit:

I found a link from their forum that talks about it. It's a little dated now but nothing has changed in regards to the answer.

Question said:
Is the terminology of "multithread" (i.e. +mt = x) in APS/IRUN means number of CPU core to use?
Answer said:
Yes, that's right. The tool won't allow you to use more threads than "cores" that are available. Note that I put "cores" in quotation marks because if you have hyperthreading enabled, that will increase the number available to use - although in general we don't recommend using hyperthreading with APS because you don't typically get good scaling with a heavily floating point application like APS. . .
Answer said:
APS uses 2 tokens for 1 core, and 4 tokens for 2-4 cores, and 6 tokens for 5-16 cores.




SonnetEM, Calibre tools, etc., work similarly though their licensing schemes aren't as convoluted as Cadence's.
 
Last edited: