Discussion Intel current and future Lakes & Rapids thread

Page 599 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
The 12900K lists 6 MB of L3 cache for the efficiency cores, which makes me think it's a typo, but if it weren't that would further suggest the cache being the culprit.

It shouldn't matter that much on the ring bus. It should have access to the entire 25MB pool, unless extra cycle or two latency affects performance that much.

The L2 cache being the bottleneck makes more sense. There is a reason multi-core oriented CPUs moved to private L2 caches and shared L3.

a heavy workload the cache is getting thrashed really badly.
 

tomatosummit

Member
Mar 21, 2019
184
177
116
It shouldn't matter that much on the ring bus. It should have access to the entire 25MB pool, unless extra cycle or two latency affects performance that much.

The L2 cache being the bottleneck makes more sense. There is a reason multi-core oriented CPUs moved to private L2 caches and shared L3.
While I also think l2 is the big bottleneck the shared access to the l3 might also be additionally restrictive. It's still 4cores on one ringstop.
Or am I overthinking it because the access goes through the l2 module anyway?
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
While I also think l2 is the big bottleneck the shared access to the l3 might also be additionally restrictive. It's still 4cores on one ringstop.
Or am I overthinking it because the access goes through the l2 module anyway?

It might be worth testing on the 12900K or the 12600K then. According to wikichip logic 12600K will have 2MB for Gracemont module.


Remember for Gracemont the L2 is a shared cache, meaning for 4 cores, it has 1/4 the capacity per core compared to single thread, in addition to potential contention. Contention will be an issue for L3 as well, but capacity wise the ring will allow access to all 25MB(unless it has arbitrary restrictions).

Of course we are assuming the caches are the culprit here, not anywhere else.
 

Hulk

Diamond Member
Oct 9, 1999
5,138
3,727
136
It might be worth testing on the 12900K or the 12600K then. According to wikichip logic 12600K will have 2MB for Gracemont module.


Remember for Gracemont the L2 is a shared cache, meaning for 4 cores, it has 1/4 the capacity per core compared to single thread, in addition to potential contention. Contention will be an issue for L3 as well, but capacity wise the ring will allow access to all 25MB(unless it has arbitrary restrictions).

Of course we are assuming the caches are the culprit here, not anywhere else.

Peanut gallery chiming in here with a question.

Does the fact that Gracemont does not have HT somewhat alleviate the stress on the L2 as compared to a HT CPU? Or stated another way, when designing a CPU that is not going to be hyperthreaded would the L2 be sized smaller than for a HT CPU?
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Peanut gallery chiming in here with a question.

Does the fact that Gracemont does not have HT somewhat alleviate the stress on the L2 as compared to a HT CPU? Or stated another way, when designing a CPU that is not going to be hyperthreaded would the L2 be sized smaller than for a HT CPU?

The working set is larger when doing fine grained thread switching - so ideally you want the caches being larger starting from L1$.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Remember for Gracemont the L2 is a shared cache, meaning for 4 cores, it has 1/4 the capacity per core compared to single thread, in addition to potential contention.

This typically is not the case, because there is sharing. On one hand there is code sharing, which is pretty obvious. But also shared data structures require only a single copy in a shared cache, contrary to having multiple copies in private caches. Of course the amount of sharing is larger, when all cores running the same application (e.g. when running multithreaded workloads like say Cinebench)
 

Mopetar

Diamond Member
Jan 31, 2011
8,487
7,726
136
This typically is not the case, because there is sharing. On one hand there is code sharing, which is pretty obvious. But also shared data structures require only a single copy in a shared cache, contrary to having multiple copies in private caches. Of course the amount of sharing is larger, when all cores running the same application (e.g. when running multithreaded workloads like say Cinebench)

It probably depends if the scheduler is doing a good of assigning threads based on those shared resources. If it's putting something random like a background process for an unrelated application, it's unlikely to be accessing the same memory.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Does the fact that Gracemont does not have HT somewhat alleviate the stress on the L2 as compared to a HT CPU? Or stated another way, when designing a CPU that is not going to be hyperthreaded would the L2 be sized smaller than for a HT CPU?

So with Hyperthreading you have to deal with sharing of resources right?

Then if you have to have L2 cache being shared as well, that's further complication. Overall for scaling separate L2 and large, shared L3 for all cores is the best. You lose a bit on single thread, but the scaling is better so you quickly win out. But it's mostly for multi-core scaling. Hyperthreading benefit is just a side thing.

Mid level L2 also has an advantage that it fills the large capacity and latency gap between the L1 and the last level cache, to better fit all scenarios since we're talking about a general purpose CPU. You have to cover all bases, and then some, because you don't know what the people will use it for.

This typically is not the case, because there is sharing. On one hand there is code sharing, which is pretty obvious.

You are right but significant amount won't be. So when you are talking big numbers, such as having 4x the cores, then the per core amount will be substantially smaller. In this case it might as well be.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
Sapphire Rapids on eBay? Someone's in trouble.
Makes me wonder if Intel even tracks trays of CPUs that are sent to OEMs for testing (guessing that where this came from). Looks like a pull from something, a bit scratched up. Maybe even a non-working sample that got tossed into a plastic bin on a shelf??? Grabbed by an 'enterprising' tech or janitor and sold on eBay under a fake name and address?? Intel, one would think, has thousands of these (or more) floating around at various OEMs/ODMs testing servers, peripherals and enterprise software for the upcoming SPR release?
 
Last edited:

repoman27

Senior member
Dec 17, 2018
384
540
136
Well, Intel sent YuuKi_AnS a SPR engineering sample nearly a year ago, which he went ahead and delidded.

 

DrMrLordX

Lifer
Apr 27, 2000
22,901
12,970
136
Makes me wonder if Intel even tracks trays of CPUs that are sent to OEMs for testing (guessing that where this came from). Looks like a pull from something, a bit scratched up. Maybe even a non-working sample that got tossed into a plastic bin on a shelf??? Grabbed by an 'enterprising' tech or janitor and sold on eBay under a fake name and address?? Intel, one would think, has thousands of these (or more) floating around at various OEMs/ODMs testing servers, peripherals and enterprise software for the upcoming SPR release?

They probably track it by the markings on the lid.
 

repoman27

Senior member
Dec 17, 2018
384
540
136
I'm curious what people would think of 4+16 for the P die.
It would net the most threads for the least area / power. I don't see it happening for Meteor Lake though. Looking closely at the ADL-P and MTL CPU tile wafer shots, I've come around to agreeing with you that the MTL-M compute die is a 2P+8E design. I get the feeling Intel isn't going to reuse CPU tiles at all for Meteor Lake. (I mean, why bother understanding your own strategy, right?) So I think we'll see 2+8 and 6+8 LP tiles, and probably an 8+16 HP tile.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
It would net the most threads for the least area / power. I don't see it happening for Meteor Lake though. Looking closely at the ADL-P and MTL CPU tile wafer shots, I've come around to agreeing with you that the MTL-M compute die is a 2P+8E design. I get the feeling Intel isn't going to reuse CPU tiles at all for Meteor Lake. (I mean, why bother understanding your own strategy, right?) So I think we'll see 2+8 and 6+8 LP tiles, and probably an 8+16 HP tile.
Oh, it's not a possibility for Meteor Lake. But beyond that? Who can say...

As for tile reuse, I think it'll be quite interesting to see what gets reused, and where. Mixing and matching across the MTL lineup should be expected, but what about other product categories? Networking? Graphics?
 

Hulk

Diamond Member
Oct 9, 1999
5,138
3,727
136
What are the advantages of using compute tiles for desktop with lower core counts that would fit on a monolithic chip?

If Raptor desktop is 8+16 and Meteor is 8+16 what's the advantage of the compute tiles? It is simply production cost?
 

LightningZ71

Platinum Member
Mar 10, 2017
2,508
3,190
136
Ask AMD, the APUs have been sporting core counts equivalent to the desktop X series processors for years. Breaking up the various SoC functions into chiplets/tiles that are made on processes that are tailored to their various functions can bring about smaller individual ICs that have higher yields per wafer and perform better in their totality due to not having compromises.
 

dullard

Elite Member
May 21, 2001
25,994
4,608
126
What are the advantages of using compute tiles for desktop with lower core counts that would fit on a monolithic chip?

If Raptor desktop is 8+16 and Meteor is 8+16 what's the advantage of the compute tiles? It is simply production cost?
1) Higher yields with tiles. Instead of needing to throw away an entire monolithic chip (total loss of that silicon area), you might have several good tiles and one bad tile in the same area (mostly usable silicon area).

2) One design problem doesn't hold up the whole generation. Suppose there is a problem with a new iGPU but the new CPU is performing great. With a monolithic chip, the thing can't ship. With tiles, you can use a previous iGPU combined with your new CPU. This eliminates the need for long delays and the expenses of backporting (like the 26.5 months between Comet Lake and Alder Lake and the necessary costs of creating Rocket Lake).

3) Flexibility. This is related to #2, but you essentially can ship products incrementally when they are ready rather than waiting for all new concepts to be perfected. The CPU team doesn't need to wait for the GPU team to be ready (and vise versa). They can launch what they have and move on to the next project. Being able to have your people work more independently gives your designs, planning, etc far more flexibility. Heck, Intel can even outsource some of the tiles to other companies for even more flexibility.

The drawbacks are of course higher latency, higher power, and/or more costly packaging.
 
Last edited:

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
1) Higher yields with tiles. Instead of needing to throw away an entire monolithic chip (total loss of that silicon area), you might have several good tiles and one bad tile in the same area (mostly usable silicon area).

2) One design problem doesn't hold up the whole generation. Suppose there is a problem with a new iGPU but the new CPU is performing great. With a monolithic chip, the thing can't ship. With tiles, you can use a previous iGPU combined with your new CPU. This eliminates the need for long delays and the expenses of backporting (like the 26.5 months between Comet Lake and Alder Lake and the necessary costs of creating Rocket Lake).

3) Flexibility. This is related to #2, but you essentially can ship products incrementally when they are ready rather than waiting for all new concepts to be perfected. The CPU team doesn't need to wait for the GPU team to be ready (and vise versa). They can launch what they have and move on to the next project. Being able to have your people work more independently gives your designs, planning, etc far more flexibility. Heck, Intel can even outsource some of the tiles to other companies for even more flexibility.

The drawbacks are of course, higher latency and/or more costly packaging.
It's also somewhat worse from a power perspective. Supposedly one of the first Meteor Lake proposals from the design team, in a "best we can do" kind of spirit, was monolithic on N3. The current topology is very much a compromise dictated from on high, and very last minute.
 

dullard

Elite Member
May 21, 2001
25,994
4,608
126
It's also somewhat worse from a power perspective. Supposedly one of the first Meteor Lake proposals from the design team, in a "best we can do" kind of spirit, was monolithic on N3. The current topology is very much a compromise dictated from on high, and very last minute.
Noted. I added that to the drawback list above.
 

DrMrLordX

Lifer
Apr 27, 2000
22,901
12,970
136
Supposedly one of the first Meteor Lake proposals from the design team, in a "best we can do" kind of spirit, was monolithic on N3. The current topology is very much a compromise dictated from on high, and very last minute.

. . . uh oh.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
It's also somewhat worse from a power perspective. Supposedly one of the first Meteor Lake proposals from the design team, in a "best we can do" kind of spirit, was monolithic on N3. The current topology is very much a compromise dictated from on high, and very last minute.
Hmm, kind of weird since Intel has been pushing multiple dice on package (EMIB, FOVEROS) for quite a while. What the heck is going on with that company?! Gee, let's NOT use the packing tech that WE developed :rolleyes:
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
. . . uh oh.
Hmm, kind of weird since Intel has been pushing multiple dice on package (EMIB, FOVEROS) for quite a while. What the heck is going on with that company?! Gee, let's NOT use the packing tech that WE developed :rolleyes:
The story I've heard is that everyone was just called into the main conference room, and shown a presentation of basically "This is Meteor Lake now". Supposedly elicited a mixture of silence and chuckling.

For this reason, COVID-instigated hiring freezes, the addition of Arrow Lake, and Microsoft poaching half of Intel's best talent in Oregon, I consider myself a Meteor Lake sceptic even independent of any process issues.

And Ajay, right tool for the job. Advanced packaging is a useful tool, but doesn't beat monolithic in performance or power. And iirc, at the time, MTL was still more low power targeted. Though with process issues and a more flexible lineup, it might not be a bad idea in retrospect. But IDC will probably do things differently for Lunar Lake and whatever else they own.