Discussion Intel current and future Lakes & Rapids thread

Mopetar · Jan 7, 2022

dullard said:
A lot of people here will not accept that Intel will do rapid releases of chips. Historically Intel has averaged about 12 months per generation. But that has a wide range (19.3 months for Comet Lake -> Rocket Lake and 7.2 months for Rocket Lake -> Alder Lake).

Makes sense that if there's a long period it will be followed by a short one. Intel has these planned out years in advance and a delay to Rocket Lake need not affect Alder Lake. Add the two together and average and it looks like business as usual.

Rocket Lake being pretty awful as well would hardly make Intel want to hold off with their next generation CPUs either. No need to stall the rest of the pipeline just to wait on Rocket Lake.

Khato · Jan 7, 2022

Fun discussion regarding MTL/foveros which I have a few thoughts on to throw into the ring.

Regarding usage of foveros versus EMIB, might the usage of foveros be a combination of superior ubump density and the small size of the silicon in question? Does EMIB work as well when a higher percentage of the active silicon is on the EMIB versus substrate? Also not sure if such would have implications for power delivery to the areas above the EMIB connection, unless power was also sent through the bridge. My basic thought here is that EMIB is far superior for connecting large dice together, but once below a certain total size the traditional silicon interposer wins out.

Another interesting observation from the breakdown of different tile sizes on MTL - that main SOC tile is pretty close to half the total area, which could indicate that things were designed for the possibility of it being an active base foveros tile? Now obviously that's not happening, and I can think of a few reasons why. First, the base tile needs to be different for each configuration - if there are 2 different CPU and GPU tiles then you're up to 4 base tile configurations. The active logic would need to be sized for the minimum configuration, resulting in 'wasted' area on the larger configurations potentially. (Though that sounds like great area for L4 cache or some such.) That ties into the second problem of going with an active base die - available fab capacity. A passive base die should be extremely cheap to produce, potentially to the extent of a 200mm^2 passive base tile plus 90mm^2 SoC tile being cheaper than a 100mm^2 SoC base tile. So with the passive scheme the part that needs 4 different designs becomes cheap and simple to manufacture.

Asterox · Jan 7, 2022

Delete, wrong topic.

jpiniero · Jan 7, 2022

dullard said:
A lot of people here will not accept that Intel will do rapid releases of chips. Historically Intel has averaged about 12 months per generation. But that has a wide range (19.3 months for Comet Lake -> Rocket Lake and 7.2 months for Rocket Lake -> Alder Lake).

IIRC Rocket got delayed because of chipset bugs and each time you need to respin that can eat up months easily.

Exist50 · Jan 7, 2022

One more thing to consider for the SoC.

See bullet 6.

dullard · Jan 7, 2022

Exist50 said:
One more thing to consider for the SoC.

See bullet 6.

Now that is something that I certainly hadn't considered. I just assumed the CPU tile would have all the cores. If the CPU tile only has the P cores and the SOC tile has the E cores, then the SOC tile could certainly be the biggest tile.

Exist50 · Jan 7, 2022

dullard said:
Now that is something that I certainly hadn't considered. I just assumed the CPU tile would have all the cores. If the CPU tile only has the P cores and the SOC tile has the E cores, then the SOC tile could certainly be the biggest tile.

That tile is still 2+8. Just saying Meteor Lake still has some mysteries.

tomatosummit · Jan 7, 2022

dullard said:
Now that is something that I certainly hadn't considered. I just assumed the CPU tile would have all the cores. If the CPU tile only has the P cores and the SOC tile has the E cores, then the SOC tile could certainly be the biggest tile.

This was the kind of thing I've been guessing about in my own head for a while.
To me (not a cpu engineer) having performance cores on the same silicon chiplet as e-cores/gpu seems like a poor idea, ryzen7000 has it's rumoured igpu on the soc for example.
I could see;
Various compute chiplets for 2/4/8/12/16 p-cores on leading node for different skus
SOC chiplet with 4/8 e-cores on trailing node
iGPU chiplet on tsmc

I do know the core numbers are off in relation to what's been leaked but I'm just playing with the idea that intel can mix and match chiplets to vary p/e core counts for product lines.

Exist50 said:
That tile is still 2+8. Just saying Meteor Lake still has some mysteries.

What's the source on the 2+8 chiplet thing? I can't find anything online. I pray it's not mlid.

vstar · Jan 7, 2022

tomatosummit said:
This was the kind of thing I've been guessing about in my own head for a while.
To me (not a cpu engineer) having performance cores on the same silicon chiplet as e-cores/gpu seems like a poor idea, ryzen7000 has it's rumoured igpu on the soc for example.
I could see;
Various compute chiplets for 2/4/8/12/16 p-cores on leading node for different skus
SOC chiplet with 4/8 e-cores on trailing node
iGPU chiplet on tsmc

I do know the core numbers are off in relation to what's been leaked but I'm just playing with the idea that intel can mix and match chiplets to vary p/e core counts for product lines.

What's the source on the 2+8 chiplet thing? I can't find anything online. I pray it's not mlid.

There were some rumors of the compute chiplet being 6+8 instead

https://twitter.com/x/status/1461826065828036610

lobz · Jan 7, 2022

dmens said:
(...) Just give up already. Go ask your friends for more leak scraps so they can laugh at you more.
(...)

Regardless of what I think about the discussion itself, these remarks are 100% unnecessary and not only they derail the whole discussion, but they also shed a very skewing light on your arguments...

IntelUser2000 · Jan 7, 2022

tomatosummit said:
Various compute chiplets for 2/4/8/12/16 p-cores on leading node for different skus
SOC chiplet with 4/8 e-cores on trailing node

The E cores are there to increase efficiency and performance so being on a trailing edge is not a good idea. Previous node would result in 2x the core size and decreased perf/watt efficiency.

Having some on the SoC tile is indicative of some unknown plans by Intel.

tomatosummit said:
What's the source on the 2+8 chiplet thing? I can't find anything online. I pray it's not mlid.

Point 1: MLiD has been spot on
Point 2: Exist50 has his own sources.

Khato · Jan 7, 2022

IntelUser2000 said:
Having some on the SoC tile is indicative of some unknown plans by Intel.

Testing of ADL E-core power efficiency at low thread counts is indicative of one possibility. The four-core cluster and associated L3/ring is fine for actual compute loads, but complete overkill for near-idle through standby. Adding in one or two power/leakage optimized E-cores to the SoC sounds like a great way to reduce power consumption in the same fashion as ARM little cores. (Despite comparisons, Intel's E-cores serve a notably different purpose than ARM little cores.) Might at long last result in a barely tolerable modern standby implementation...

IntelUser2000 · Jan 7, 2022

Khato said:
Might at long last result in a barely tolerable modern standby implementation...

Seems your idea is sound, if that's indeed the idle power bottleneck.

Intel's problem seems to be the off-die PCH, and Foveros should achieve close results to an on-die implementation(Read: Should, as Lakefield demonstrates not necessarily). That's likely the reason they won't catch up to Rembrandt in battery life until Meteorlake at least, because AMD already has enough I/O on-die to be PCH-lite.

So I don't know the details required, but having an efficient E core running at hyper optimized setups and process may be the way of doing it, I don't know. Windows is "notorious" for being hard to achieve good battery life here. It's because it's quite open to various developers, and it's a real time OS. Perhaps the IO die needs that extra compute power to rein things in, and it needs to be something that can run Windows by itself so it can easily synchronize with the main cores.*

*It's already possible to get the package power down to 0.2W. But it requires doing things that most off-the-shelf implementations will never resort to as it's unrealistic.

repoman27 · Jan 8, 2022

Khato said:
Fun discussion regarding MTL/foveros which I have a few thoughts on to throw into the ring.

Regarding usage of foveros versus EMIB, might the usage of foveros be a combination of superior ubump density and the small size of the silicon in question? Does EMIB work as well when a higher percentage of the active silicon is on the EMIB versus substrate? Also not sure if such would have implications for power delivery to the areas above the EMIB connection, unless power was also sent through the bridge. My basic thought here is that EMIB is far superior for connecting large dice together, but once below a certain total size the traditional silicon interposer wins out.

Another interesting observation from the breakdown of different tile sizes on MTL - that main SOC tile is pretty close to half the total area, which could indicate that things were designed for the possibility of it being an active base foveros tile? Now obviously that's not happening, and I can think of a few reasons why. First, the base tile needs to be different for each configuration - if there are 2 different CPU and GPU tiles then you're up to 4 base tile configurations. The active logic would need to be sized for the minimum configuration, resulting in 'wasted' area on the larger configurations potentially. (Though that sounds like great area for L4 cache or some such.) That ties into the second problem of going with an active base die - available fab capacity. A passive base die should be extremely cheap to produce, potentially to the extent of a 200mm^2 passive base tile plus 90mm^2 SoC tile being cheaper than a 100mm^2 SoC base tile. So with the passive scheme the part that needs 4 different designs becomes cheap and simple to manufacture.

I don't think there will be more than one GPU tile, but it looks like there will be at least two (or possibly three) CPU tiles as well as LP and HP versions of the SoC tile. Even still, I think two base tiles could still cover all of the planned configurations for MTL. But even if Intel keeps the number of metal layers on the base tile low enough to avoid any SAQP, there's still going to be lots of TSVs, so I'm not sure it will ever qualify as "cheap and simple to manufacture".

Exist50 said:
One more thing to consider for the SoC.

See bullet 6.

That hadn't occurred to me either, and it is interesting to consider. Being from the Architecture Day 2020 presentation though, the context was probably more Lakefield and less Meteor Lake.

And while KEI is clearly a buzzword bingo TLA that only a marketing department could embrace, if you look at the stated goals for the Evo platform, battery life is going to be a lot better if the SoC tile includes basic GPU functionality, media engine, display engine, and display I/O.

igor_kavinski · Jan 8, 2022

Intel's Linux OS Shows The Importance Of Software Optimizations, Further Optimized Xeon "Ice Lake" In 2021 - Phoronix

Would be interesting to see how the Ice Lake Xeon fares now against its Epyc counterpart.

IntelUser2000 · Jan 8, 2022

repoman27 said:
And while KEI is clearly a buzzword bingo TLA that only a marketing department could embrace, if you look at the stated goals for the Evo platform, battery life is going to be a lot better if the SoC tile includes basic GPU functionality, media engine, display engine, and display I/O.

How do you figure it'll be better when all the blocks you are talking about are on-die already?

The problem is the PCH is not on-die, but it contains none of the blocks you talk about.

repoman27 · Jan 8, 2022

vstar said:
There were some rumors of the compute chiplet being 6+8 instead

The packages in the CNET photos were almost certainly the MTL-M (U9) 2+8+2 configuration, but MTL-P (P28/H45) will most likely use a 6+8 CPU tile.

During the Intel Accelerated event, Intel showed off a test wafer of Meteor Lake compute tiles that measure 4.8 mm x 7.9 mm. The Meteor Lake test chips that CNET photographed during their Fab 42 tour contain a top tile that also measures 4.8 mm x 7.9 mm, which strikes me as being somewhat beyond coincidental. Not locating the SoC tile in between the CPU and GPU tiles seems like a bold strategy, as it would make interconnect routing a nightmare. So I think @wild_cracks and @Locuza_ might need to reassess.

repoman27 · Jan 8, 2022

IntelUser2000 said:
How do you figure it'll be better when all the blocks you are talking about are on-die already?

The problem is the PCH is not on-die, but it contains none of the blocks you talk about.

It should be better from a power standpoint versus disaggregating some of them to a separate GPU tile. If the only thing the GPU tile contains is additional EU slices, then you only have to light it up when you're doing something that's actually demanding.

How is the latter related? I don't follow...

biostud · Jan 8, 2022

Some RP-L ES rumors

Intel's 13th Gen Raptor Lake Core i9-13900K Early CPU Sample Spotted: Up To 32 Threads, 1.8 GHz Clocks, No AVX-512 Support

Another Intel 13th Gen Raptor Lake Core i9-13900K CPU has been spotted within the Intel GFX CI, offering up to 32 threads.

wccftech.com

DrMrLordX · Jan 8, 2022

igor_kavinski said:
Would be interesting to see how the Ice Lake Xeon fares now against its Epyc counterpart.

Doesn't make a lot of sense without some commentary as to WHY performance would improve by that much, just going with an Intel-provided distro. If I had to guess, Intel provided some in-house compiled versions of FOSS applications with better AVX512 optimization.

igor_kavinski · Jan 8, 2022

DrMrLordX said:
If I had to guess, Intel provided some in-house compiled versions of FOSS applications with better AVX512 optimization.

Can't be just AVX-512. PHPbench probably doesn't use or need AVX-512 and the performance improvement seems quite significant.

lobz · Jan 8, 2022

IntelUser2000 said:
The E cores are there to increase efficiency and performance so being on a trailing edge is not a good idea. Previous node would result in 2x the core size and decreased perf/watt efficiency.

Having some on the SoC tile is indicative of some unknown plans by Intel.

Point 1: MLiD has been spot on
Point 2: Exist50 has his own sources.

Yeah, say what you want about mlid @tomatosummit , he's had a very-very clean track record in recent past, especially informations regarding Intel, that is unusual among YOUTUBE leakers. Personally I'm there for the ads, that's when I get to see his dog 😀🙂🙂

lobz · Jan 8, 2022

wccftech's opening remarks in an article about the 12900HK - 'Well, it will definitely blow away Zen 3 and Zen 3+ if it doesn't blow itself up first'

being wccf, unhealthy amounts of salt needed and it probably means nothing, but I've found it funny anyway 🙂

The content itself that spawned this quote: 'The CPU hit a max temp of 99C while the average temperature was reported at 69C for the core and 76C for the package. The power consumption peaked at 113W but averages around 63W. (...) Thankfully, Lab501 also has some power and thermal numbers of two AMD Ryzen 9 5900HX APU laptops that we can compare these against. First is an ASUS ROG STRIX SCAR 17 (G733QS) laptop which has a peak temperature rating of 94C and an average temperature rating of 70C. The laptop had a max power consumption of 65W and averaged at around 30W. The second laptop is a Lenovo Legion 7 (16ACHG6) which has a peak temperature rating of 88C and averaged at around 69C. The power rating is however higher than the ASUS variant with an average of 86W peak package power and 45W on average.'

That is a massive difference in power consumption, but I'm aware, that the HK can be configured to be constrained as well, just saying.

igor_kavinski · Jan 8, 2022

Turning heat into electricity | MIT News | Massachusetts Institute of Technology

It's time they design a mini/micro/nano-thermoelectric generator to convert all that heat into electricity.

NTMBK · Jan 8, 2022

igor_kavinski said:
Can't be just AVX-512. PHPbench probably doesn't use or need AVX-512 and the performance improvement seems quite significant.

View attachment 55648

AVX-512 makes certain problems more amenable to vectorization with things like scatter and vector masking, so it may be that even PHPBench has certain parts of it that can be sped up with AVX-512.

Discussion Intel current and future Lakes & Rapids thread

Diamond Member

Golden Member

Golden Member

Lifer

Platinum Member

Elite Member

Platinum Member

Member

Member

Platinum Member

Elite Member

Golden Member

Elite Member

Senior member

Lifer

Elite Member

Senior member

Senior member

Lifer

Lifer

Lifer

Platinum Member

Platinum Member

Lifer

Lifer