Discussion Intel current and future Lakes & Rapids thread

Page 615 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
Do you think that EMR will be a big departure from the SPR design? I don’t think so; they only have a year. It will still be a NxM grid of cores. All my reasoning above will still apply.
A big departure? No, definitely not. But I do think they have room to play with the number of chiplets, in addition to the core arrangement within them.

Like, how much area would a 5x7 array die take up? That would be 33 cores (accounting for memory controllers), and two of those dies would make 66 cores total, or maybe 64 with one spare on each. Would match the rumors, at least.

I'm just looking at those huge EMIB blocks and thinking that they're really quite a lot of overhead.
 

ashFTW

Senior member
Sep 21, 2020
316
236
126
A big departure? No, definitely not. But I do think they have room to play with the number of chiplets, in addition to the core arrangement within them.

Like, how much area would a 5x7 array die take up? That would be 33 cores (accounting for memory controllers), and two of those dies would make 66 cores total, or maybe 64 with one spare on each. Would match the rumors, at least.

I'm just looking at those huge EMIB blocks and thinking that they're really quite a lot of overhead.
On Intel 7, that would be twice the SPR chiplet size, since everything is doubled. SPR XCC chiplet is ~400 mm^2. Your EMR chiplet will be 800 mm^2!! I don’t think that will yield very well. It will also be bumping against the reticle limit. Not going to happen!

The simplest path to EMR is reusing the SPR rapids layout design.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
On Intel 7, that would be twice the SPR chiplet size, since everything is doubled. SPR XCC chiplet is ~400 mm^2. Your EMR chiplet will be 800 mm^2!! I don’t think that will yield very well. It will also be bumping against the reticle limit. Not going to happen!

The simplest path to EMR is reusing the SPR rapids layout design.
It would be pretty much in line with their past die sizes for top end Xeons. And this would be in '23. Intel 7 should be quite mature. It's not like every core needs to work either.

Idk, you have any better idea for something that would match the rumor?
 

ashFTW

Senior member
Sep 21, 2020
316
236
126
Idk, you have any better idea for something that would match the rumor?
I already explained the rumor. Intel being defensive wrt to yields, and/or holding cards close to their chest. I bet, they will announce 68 or 72 cores when the time comes. The design will use 4 chiplets just like SPR, with each chiplet being around 460 mm^2.

Diamond Granite Rapids and Sierra Forest will both be on Intel 3 and will benefit from two node shrinks. They seem to also be planning to disaggregate cores from uncore. There two cores only chiplets will make much more sense. In fact, the old design preview that Intel showed for diamond rapids indeed showed two core only chiplets. These chiplets will be much more reasonable size, probably around 500 mm2.

1647062774800.jpeg

Falcon shores will have Xeon and Xe cores in custom ratios. Smaller and multiple chiplets of each kind on the package will again make more sense there, so that custom ratios can be created.
 
Last edited:

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
Intel 3 is using denser libraries compared to Intel 4.
I.e. it actually includes a dense library, while Intel 4 lacks one entirely. Only Intel could afford not having a dense library in the first place.

If you're actually expecting Intel 3 to compete with N3 in density, you're in for a rough time.
 

ashFTW

Senior member
Sep 21, 2020
316
236
126
I.e. it actually includes a dense library, while Intel 4 lacks one entirely. Only Intel could afford not having a dense library in the first place.
What?? Intel 4 is a node shrink compared to Intel 7. There is always a library, what do you mean it lacks one entirely??

If you're actually expecting Intel 3 to compete with N3 in density, you're in for a rough time.
I only compared Intel 4 to Intel 3. I think Intel 20A is meant to favorably compete with TSMC N3. It is first Intel node with GAA, while N3 will still be FinFet. Intel will also be using High NA EUV starting with 20A.
 
Last edited:

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
What?? Intel 4 is a node shrink compared to Intel 7.
Yes. That's the shrink. But it sounds like they will only have a limited selection of libraries available. Probably something like HP to start, with the density-focused libraries only arriving with Intel 3. But at the end of the day, Intel 7 to Intel 3 is still only one node shrink for density. Will probably be competing with TSMC's 5nm family.
 

ashFTW

Senior member
Sep 21, 2020
316
236
126
Yes. That's the shrink. But it sounds like they will only have a limited selection of libraries available. Probably something like HP to start, with the density-focused libraries only arriving with Intel 3. But at the end of the day, Intel 7 to Intel 3 is still only one node shrink for density. Will probably be competing with TSMC's 5nm family.
The HP library on Intel 3 will be denser than the HP library on Intel 4.
 

ashFTW

Senior member
Sep 21, 2020
316
236
126
According to what? At best we're looking at an N6 kind of deal.
According to Intel. I’m not sure if they have given any specific details on density improvement, only that Intel 3 will have ~18% performance per watt improvement.
 

ashFTW

Senior member
Sep 21, 2020
316
236
126
Where did Intel say that there would be density improvements for the same library?
When they first announced the change in process node nomenclature. See the wikichip article by David Schor.


“Intel 3 will offer a new denser high-performance (HP) standard library that will offer greater area scaling.”
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
When they first announced the change in process node nomenclature. See the wikichip article by David Schor.


“Intel 3 will offer a new denser high-performance (HP) standard library that will offer greater area scaling.”
Great, thanks for the link. Guess they're going the N6/N4 route with it.
 

Asterox

Golden Member
May 15, 2012
1,042
1,837
136
Is there any plans in patching this at the hardware level in future lakes?


"The new exploit impacts all Intel processors released in the last several years and specific Arm core processors. Intel processors affected include the most recent 12th Gen Core Alder Lake CPUs. Surprisingly, AMD chips have shown no effect from the vulnerability at this time. "


Hm, maybe or we will see. :mask:

 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Diamond Rapids

Granite Rapids not Diamond.

I already explained the rumor. Intel being defensive wrt to yields, and/or holding cards close to their chest. I bet, they will announce 68 or 72 cores when the time comes. The design will use 4 chiplets just like SPR, with each chiplet being around 460 mm^2.

800mm2 is certainly within reach. Nvidia V100 exceeds that at 815mm2.

Even if Intel 3 offers a significant shrink, moving from 4 to 2 tiles and at least doubling the core count we might end up with each tile being greater than 500mm2.

Though you have a good point as well. If they go your route, then it might be the rumored 72 core Sapphire Rapids that couldn't be built that's moving to Emerald Rapids generation. If @Exist50 is right then that sounds like Intel preparing for core only tile that Granite Rapids will use.

I bet they always wanted to go this route and never planned for many smaller chiplets that AMD is pursuing. They wanted EMIB or Foveros badly to get monolithic-like package before moving to the tile approach.

From the samples out there, it's working fairly well with the L3 cache latency pretty close to Icelake-SP Xeon despite not being monolithic.
 

ashFTW

Senior member
Sep 21, 2020
316
236
126
Granite Rapids not Diamond.
Yup, sorry, typo.

800mm2 is certainly within reach. Nvidia V100 exceeds that at 815mm2.
There is a lot more redundancy in GPU designs, so the risk of getting functional huge die is much lower.

Even if Intel 3 offers a significant shrink, moving from 4 to 2 tiles and at least doubling the core count we might end up with each tile being greater than 500mm2.
I think Intel will go with smaller die, and use more of them on a chip, using advanced packaging, to get the desired number of cores for each product. This way the Xeon tiles can also be shared with Falcon Shores to offer varying Xeon:Xe ratios (for example, 1:3, 2:2, 3:1). And don’t forget that soon, with high NA EUV, reticle limit is going to be halved. It’s good to keep the max chiplets size around 400 plus minus 100 mm2.

From the samples out there, it's working fairly well with the L3 cache latency pretty close to Icelake-SP Xeon despite not being monolithic.
That‘s good to know. Thanks!

Architecturally, we can already deduce that given that each EMIB chiplet is just bridging two mesh points from one SPR chiplet to the next. It can be logically viewed as the EMIB chiplet itself hosting two mesh points.

1647118621511.jpeg
 
Last edited:

repoman27

Senior member
Dec 17, 2018
384
540
136
There must be a 2+1 as well. The Celeron and/or Pentium uses GT1 graphics.
The GT1 (and non ULT dual-core) products were made the same way they always are, by blowing fuses and salvaging dies.
Anyway the point was that even far as back then they had more configurations despite the potentially better yield. They also did that with Atoms having six or so separate dies.

If they were able to do that back then, why did they regress on that department with Tigerlake/Alderlake having "horrible yield" as some like to think?
Intel may have had 8 designs queued up and ready to go, but they only ever produced 4 of them. That's the exact same number of client dies as they are currently producing for Alder Lake. There has to be sufficient volume to justify taping out, qualifying, and proceeding with a volume ramp of a new layout. Intel had to do 4+3e and 2+3 ULT because they had customers (primarily Apple) that wanted them. 4+2 and 2+2 ULT were the mainstream parts that everyone wanted and therefore clearly justified. Non-ULT 2+3 didn't have enough takers to bother with. The GT1 and non-ULT 2+2 dies would only get green-lit at the point where yields were good enough and demand strong enough that Intel was leaving money on the table by partially disabling a significant percentage of perfectly good dies just to fill customer orders. Intel never got to that point on 22nm, even with the insertion of Haswell Refresh.

Intel hasn't shared how many designs they might have had in the pipeline for Alder Lake, but from what they have shared regarding the 22nm, 14nm, and 10nm ramps, they would have been delusional to think that wasting time on additional layouts was merited.

The heavy use of multi-patterning and SAQP with 10nm necessitates more manufacturing steps than 14nm, which in turn makes it effectively impossible for them to ever match the defect densities or cycle times of 14nm. We're looking at a process that technically first achieved PRQ in Q4'17, but didn't exceed 14nm in terms of WSPM until 3.5 years later in Q2'21. Intel hasn't attempted to copy 10nm to one of their four major leading-edge manufacturing sites, and continues to maintain significant 14nm capacity at the other three. Intel has also been capacity constrained for the better part of the last 3.5 years, much of the capital investment for 10nm has been fully depreciated at this point, and they've had enough wafer starts to climb the yield curve. Even considering all that, Intel isn't in a hurry to convert the remainder of their 14nm lines to Intel 7, which means it probably doesn't make economic sense for them to do so. Yet Intel 7 offers up to a 2.7x density increase and ~26% better perf/W compared to the latest version of 14nm. This would all seem to point to 10nm yields not being awesome and cycle times being brutal.
 
  • Wow
Reactions: igor_kavinski

repoman27

Senior member
Dec 17, 2018
384
540
136
Great, thanks for the link. Guess they're going the N6/N4 route with it.
TSMC left a little headroom with N7 and N5 so they could increase the early yields and do a 6% optical shrink down the road with N6 / N4. Intel has variously stated that Intel 3 will offer a higher performance library and a denser standard HP library (as well as an ~18% increase in perf / W, increased use of EUV, optimized metal stack, and increased intrinsic drive current). AFAIK, they have never suggested there would be any type of optical shrink between Intel 4 and Intel 3. Transistor density (MTr / mm²) should remain the same between the two nodes, just as it did for 10nm, 10SF, and Intel 7.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Even considering all that, Intel isn't in a hurry to convert the remainder of their 14nm lines to Intel 7, which means it probably doesn't make economic sense for them to do so. Yet Intel 7 offers up to a 2.7x density increase and ~26% better perf/W compared to the latest version of 14nm. This would all seem to point to 10nm yields not being awesome and cycle times being brutal.

Server is already on 10nm, so are all mobile and desktop with 7. There's no major 14nm product to port to 7. Good point on the rest.

Yea, it's harder and more expensive for sure but I don't believe in the "point to yields as the single handed problem" either.

There is a lot more redundancy in GPU designs, so the risk of getting functional huge die is much lower.

I don't really believe this is the reason. Their execution faltered during the time. Server chips have lot more redundancy than client chips as well. Nvidia simply executed better, even though making an 815mm2 chip may be borderline insane.

Architecturally, we can already deduce that given that each EMIB chiplet is just bridging two mesh points from one SPR chiplet to the next. It can be logically viewed as the EMIB chiplet itself hosting two mesh points.

There's theory and then there's practice. In theory Lakefield should have achieved records for low power x86 using Foveros, but in practice it didn't. So them actually achieving it in a volume product is something to be celebrated.
 
Last edited: