Discussion Intel current and future Lakes & Rapids thread

Page 660 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Timmah!

Senior member
Jul 24, 2010
883
108
106
As for SPR, IIRC Intel last said the launch would be in 1H. Which of course ends in 9 days. They could still do it but if they aren't even going to ramp for another 6+ months... that feels wrong. I suspect they will make a delay announcement this week but they will just have to pick the right day.

If they are only mostly getting 4-6 cores per tile I'd be tempted to go ahead with the W launch anyway given that the volume is low and Icelake-W never got any traction with the big OEMs.
I doubt the issue is with low yields of those 15 core tiles, if it is and they cant consistently produce more than 4-6 functional cores per tile (despite producing 18 core chips or more since like 2014), then they are doing it wrong. I think the issues are more likely with the stacking part of the chip.
If they have already named SKUs of those chips, like that 16 core part leaked yesterday on wccftech, perhaps they are going ahead with that W launch.
 

ashFTW

Senior member
Sep 21, 2020
220
158
76
I have been thinking about @Exist50 's comment regarding so much silicon used for the bottom tiles in a 4-stack design. I have a modified block diagram using Foveros Omni with a single base tile. Of course, now the cache will be much smaller. The CPU tiles overhang off the single base tile, such that HBM versions can still be produced. The non-HBM versions can be a "chop" of the ones shown below (ala  M1 Pro and Max).

I would still like the Granite, Diamond and Sierra CPU tiles to be produced such that they can work with Falcon Shores base dies by them all preserving the FDI (Foveros Die Interconnect) interfaces.
GR_SF_ODI.jpg
 
Last edited:

Exist50

Senior member
Aug 18, 2016
757
680
136
I have been thinking about @Exist50 's comment regarding so much silicon used for the bottom tiles in a 4-stack design. I have a modified block diagram using Foveros Omni with a single base tile. Of course, now the cache will be much smaller. The CPU tiles overhang off the single base tile, such that HBM versions can still be produced. The non-HBM versions can be a "chop" of the ones shown below (ala  M1 Pro and Max).

I would still like the Granite, Diamond and Sierra CPU tiles to be produced such that they can work with Falcon Shores base dies by them all preserving the FDI (Foveros Die Interconnect) interfaces.
View attachment 63420
Ok, so I think this is a rather interesting design in a couple of ways. One thing that you should keep in mind though is that you generally want to avoid routing high speed IO (PCIe/CXL, UPI/Xe Link, DDR) under the compute tiles. Congested and noisy.
 
  • Like
Reactions: Tlh97 and ashFTW

ashFTW

Senior member
Sep 21, 2020
220
158
76
Ok, so I think this is a rather interesting design in a couple of ways. One thing that you should keep in mind though is that you generally want to avoid routing high speed IO (PCIe/CXL, UPI/Xe Link, DDR) under the compute tiles. Congested and noisy.
Ponte Vecchio does it, I believe. I’ll check tonight.

Edit: You are right! PVC compute tiles stay clear of the high speed I/O PHY. And these areas are covered with thermal tiles on the top. There is also, area lost to the Foveros Die Interconnect that's under the shadows of the compute tiles. I wonder what's % area is actually available to the Xe cores?

PVC_stack.jpg

The Rialto Bridge seems to follow the same design wrt to PHYs. The only difference I see is that the RAMBO tiles (15MB each) are gone, and there are 4 bigger compute tiles now, compared to 8 smaller ones before. There should still be 144MB L3 on the base die, assuming its the same tile as PVC.

RB.jpg

If think this has to be addressed with future Foveros technology.
 
Last edited:

ashFTW

Senior member
Sep 21, 2020
220
158
76
Ok, so I think this is a rather interesting design in a couple of ways. One thing that you should keep in mind though is that you generally want to avoid routing high speed IO (PCIe/CXL, UPI/Xe Link, DDR) under the compute tiles. Congested and noisy.
Ok, I think I fixed it :), based on your input. Thanks!

The compute tiles are no longer on top of the I/O. Thermal tiles would cover the HBM PHYs and the I/O area in the base tile's center.

There are two levers for number of cores - 1) the CPU tiles have 1 degree of freedom using Foveros Omni, so they can be variable sized 2) number of CPU tiles can be varied from 1 to 4. Should be possible to support at least 32 to 128 total P-cores, and 3-4 times E-cores.

Again, to minimize rework, the CPU tiles should be built in such a way to preserve the FDI interfaces to be compatible with Falcon Shores.

GR_SF_ODI2.jpg
FDI = Foveros Die Interconnect
 
Last edited:
  • Like
Reactions: Mopetar

ashFTW

Senior member
Sep 21, 2020
220
158
76
And here is a double sized, 2-stack version. It could support as high as 256 P-cores, and 3-4 times E-cores. The size and number of the top tiles can be adjusted to support smaller configurations, and/or to limit the TDP.
GR_SF_OD12_2stack.jpg
 
Last edited:
  • Like
Reactions: Mopetar

trivik12

Member
Jan 26, 2006
89
36
91
while its possible wccftech is trash. They just recirculate whatever raichu or greymon55 tweets. I would not link their articles.
 

nicalandia

Golden Member
Jan 10, 2019
1,394
1,615
106
6ghz for the i9 😏
They have no choice as that is what is required to be competitive with Zen4. If you thought that Alder Lake was pushed beyond sense, then just wait for Raptor Lake
 

Exist50

Senior member
Aug 18, 2016
757
680
136
Ok, I think I fixed it :), based on your input. Thanks!

The compute tiles are no longer on top of the I/O. Thermal tiles would cover the HBM PHYs and the I/O area in the base tile's center.

There are two levers for number of cores - 1) the CPU tiles have 1 degree of freedom using Foveros Omni, so they can be variable sized 2) number of CPU tiles can be varied from 1 to 4. Should be possible to support at least 32 to 128 total P-cores, and 3-4 times E-cores.

Again, to minimize rework, the CPU tiles should be built in such a way to preserve the FDI interfaces to be compatible with Falcon Shores.

View attachment 63447
FDI = Foveros Die Interconnect
Pardon, I was clearly somewhat ambiguous. The problem I had in mind was package-level trace routing. I.e. you want an unobstructed path from the PHY to the package pins. So the HBM in your schematic is good, but everything else will be competing with compute tile power delivery.
 

nicalandia

Golden Member
Jan 10, 2019
1,394
1,615
106
Just when we all thought that Chiplets were the future these guys go and analyze this..

1656005304193.png

The decision of chiplet vs monolithic becomes a lot more difficult now. Once you account for packaging costs, it is very likely the monolithic die is cheaper to fabricate. Furthermore, there are some power costs with the chiplet design. In this case, it is absolutely better to build a large monolithic die instead of going chiplet/MCM.

 

Exist50

Senior member
Aug 18, 2016
757
680
136
Just when we all thought that Chiplets were the future these guys go and analyze this..

View attachment 63469

The decision of chiplet vs monolithic becomes a lot more difficult now. Once you account for packaging costs, it is very likely the monolithic die is cheaper to fabricate. Furthermore, there are some power costs with the chiplet design. In this case, it is absolutely better to build a large monolithic die instead of going chiplet/MCM.

Seems rather contrived. Obviously all these companies know the benefits and tradeoffs.
 

ashFTW

Senior member
Sep 21, 2020
220
158
76
Pardon, I was clearly somewhat ambiguous. The problem I had in mind was package-level trace routing. I.e. you want an unobstructed path from the PHY to the package pins. So the HBM in your schematic is good, but everything else will be competing with compute tile power delivery.
DDR5, PCIe etc connect directly down to the package unobstructed; the compute tiles only overlap the FDI region of the base tile. From there the signals are routed to the package pins. Can't the I/O and memory pins be in the center of the socket? The motherboards have a lot of routing layers, right? If they are indeed required/desired to be at the periphery, is the power delivery (Cu pillars) to the overhang part of the compute tile from the package so dense that traces to the package pins can't be easily and reliably made? The Cu pillars should be sparse, since they connect to the power delivery network on the top metal layer of the top die.

Edit: The only reason HBM PHY is placed at the base tiles’s edges is because of the on package connectivity to HBM3‘s 1024 pin interface via EMIB. I/O and memory don’t have such requirements; they can happily be passed straight down through the substrate to the package pins and routed from there to the rest of the motherboard.

Edit2: the package substrate also can have several layers for routing I/O, managing any congestion.
 
Last edited:

Hulk

Diamond Member
Oct 9, 1999
3,375
874
136
Just when we all thought that Chiplets were the future these guys go and analyze this..

View attachment 63469

The decision of chiplet vs monolithic becomes a lot more difficult now. Once you account for packaging costs, it is very likely the monolithic die is cheaper to fabricate. Furthermore, there are some power costs with the chiplet design. In this case, it is absolutely better to build a large monolithic die instead of going chiplet/MCM.

This comparison seems to assume equal yields for both designs, which I think is a big advantage for chiplets?
 

nicalandia

Golden Member
Jan 10, 2019
1,394
1,615
106
This comparison seems to assume equal yields for both designs, which I think is a big advantage for chiplets?
Clearly a Editorial mistake, but this is what they are getting at.

This is what they published: "The cost increase on underutilizing the slit on the reticle means that the foundry would not sell these wafers for $17,000 to sustain 50.2% gross margins. Instead, they would be selling these wafers for $21,364. The defect free silicon cost for the monolithic product still sits at $567. The defect free silicon cost per die isn’t $215, but instead $270. Per product, it is no longer $430, but instead $541. "

This is what they were trying to say: The cost increase on underutilizing the slit on the reticle means that the foundry would not sell these wafers for $17,000 to sustain 50.2% gross margins. Instead, they would be selling these wafers for $21,364. The defect free silicon cost for the monolithic CPU still sits at $567. The defect free silicon cost per die(Chiplet MCM) isn’t $215 anymore, but instead $270 and per final product, it is no longer $430, but instead $541"


This is a very simplistic take since they are comparing a Monolithic die with a simple MCM(which have been with us for at least 2 decades and their issues are very well known) instead of a well thought out/Well executed Chiplet/3D Chiplet designs where the IO, SOC and Compute Tiles can be use very modular


So for semianalysis a 16 Core Monolithic CPU would cost less than a 8C + 8C MCM, if all things are equal(IO, Logic), but things are getting more complex and I find that to be a very simplistic view knowing how complex Upcoming CPUs are going to get.
 
Last edited:

JasonLD

Senior member
Aug 22, 2017
451
412
136
When the reticle limit goes down to 429 mm2 using High-NA EUV, they have no choice but to go with chiplets on high end.
 

Saylick

Golden Member
Sep 10, 2012
1,655
2,091
136
Just when we all thought that Chiplets were the future these guys go and analyze this..

View attachment 63469

The decision of chiplet vs monolithic becomes a lot more difficult now. Once you account for packaging costs, it is very likely the monolithic die is cheaper to fabricate. Furthermore, there are some power costs with the chiplet design. In this case, it is absolutely better to build a large monolithic die instead of going chiplet/MCM.

I'll have to dig into the article when I get home, but it's clear that the "Litho Scanner" cost is the line item that drives their message home (not saying their message is correct).

Upon a quick read of the article, I think I found the caveat:
The standard photomask is 104mm by 132mm. The lithography tool then exposes through the photomask to print features on the wafer at 4x magnification. That field is 26mm by 33mm. Most designs do not line up perfectly with 26mm by 33mm.

In comes the concept of reticle utilization rates.

Generally, chip designs are smaller, so the photomask can contain multiple identical designs as with the picture above. Even then, most designs will not fit perfectly onto that 26mm by 33m field, so generally a portion of that photomask is also not exposed.

If a die was 12mm by 16mm we could fit 4 dies per reticle. The reticle utilization rate is quite high here as only a tiny sliver of the reticle is not exposed. With a monolithic die which is 25mm by 32mm, we do not utilize 1mm on the slit and scan directions. That reticle utilization rate is likewise, quite high. In the case of our chiplets which are 13.5mm by 32mm. This die is too large to fit 2 side by side on the reticle, so there can only be 1 die per reticle. Some visualizations of the examples described above are shown in the graphic below.
1656010607160.png
Long story short, I think their selected fictitious die size of 800mm2 just happens to be the perfect example where splitting it into two 430mm2 dies (they gave 8% extra die for interconnects) no longer fits within a single reticle pass while the 800mm2 monolithic die does. This results in slower processing of the litho scanner.

If they used an example where you took a 600mm2 monolithic die and split it into two 320mm2 dies, both chiplets would fit within a single reticle pass, thereby not incurring any extra cost. If this understanding is correct (please correct me if I am wrong), then the article is BS.

Edit: BS was perhaps too strong of a word. I think the conclusion applies to just the case where the monolithic die is right at the reticle limit. At much smaller chiplets, this shouldn't be a problem and you should reap the benefits of the improved yield. Furthermore, one of the benefits of chiplets beyond cost is that you can mix and match the best chiplets into the highest performing products. With a mono design, even with harvesting, the die is only as fast as the slowest critical block. With chiplets, even if the cost to manufacturer were identical to a mono design, you likely could get a higher performing part and thus sell it for more revenue.
 
Last edited:

nicalandia

Golden Member
Jan 10, 2019
1,394
1,615
106
If they used an example where you took a 600mm2 monolithic die and split it into two 320mm2 dies, both chiplets would fit within a single reticle pass, thereby not incurring any extra cost. If this understanding is correct (please correct me if I am wrong), then the article is BS.

Edit: BS was perhaps too strong of a word. I think the conclusion applies to just the case where the monolithic die is right at the reticle limit. At much smaller chiplets, this shouldn't be a problem and you should reap the benefits of the improved yield.
You can also add to that that ASML has yet to build a production 450mm machine and neither Intel nor TSCM will be fielding one before the end of the decade... So their 25 mm x 32 mm 800 mm^2 Monolithic die is just a pipe dream


"While we haven’t moved to 450mm wafers yet (and there are doubts we will any time in the next decade)"
 
Last edited:

Doug S

Golden Member
Feb 8, 2020
1,198
1,734
106
I'll have to dig into the article when I get home, but it's clear that the "Litho Scanner" cost is the line item that drives their message home (not saying their message is correct).

Upon a quick read of the article, I think I found the caveat:

Long story short, I think their selected fictitious die size of 800mm2 just happens to be the perfect example where splitting it into two 430mm2 dies (they gave 8% extra die for interconnects) no longer fits within a single reticle pass while the 800mm2 monolithic die does. This results in slower processing of the litho scanner.

If they used an example where you took a 600mm2 monolithic die and split it into two 320mm2 dies, both chiplets would fit within a single reticle pass, thereby not incurring any extra cost. If this understanding is correct (please correct me if I am wrong), then the article is BS.

Edit: BS was perhaps too strong of a word. I think the conclusion applies to just the case where the monolithic die is right at the reticle limit. At much smaller chiplets, this shouldn't be a problem and you should reap the benefits of the improved yield. Furthermore, one of the benefits of chiplets beyond cost is that you can mix and match the best chiplets into the highest performing products. With a mono design, even with harvesting, the die is only as fast as the slowest critical block. With chiplets, even if the cost to manufacturer were identical to a mono design, you likely could get a higher performing part and thus sell it for more revenue.

Even with smaller chiplets it is still possible to 'underutilize the slit' depending on its width. One would assume however that designers will know about this and choose a floorplan that minimizes this issue by coming as close as possible to a width that (roughly) evenly divides into the slit width.

When the reticle width is halved with high NA this can potentially be more of a problem, since there are fewer options for evenly dividing the reticle unless you have 'taller' (more rectangular) chiplets - though I'm not aware of any reason why that would be a problem.
 

Doug S

Golden Member
Feb 8, 2020
1,198
1,734
106
You can also add to that that ASML has yet to build a production 450mm machine and neither Intel nor TSCM will be fielding one before the end of the decade... So their 25 mm x 32 mm 800 mm^2 Monolithic die is just a pipe dream


"While we haven’t moved to 450mm wafers yet (and there are doubts we will any time in the next decade)"

Huh? Wafer size has nothing to do with reticle size. It is possible to make dies up to 858 mm^2 on current EUV processes using 300mm wafers, but I'm not sure about the largest N7+ or N5 dies that have actually shipped. Given that everyone knows high NA is on the horizon, there isn't much of a future in designing something that big.

450mm has been dead for a decade, and will never happen. The EUV scanners are just a tiny of piece of it, all the wafer handling, cleaning, testing, dicing, etc. etc. machines from dozens of suppliers would need to handle the larger wafers. There aren't enough customers to pay for all those "smaller" companies that can't charge $100+ million per unit like ASML can to do the required R&D.
 

nicalandia

Golden Member
Jan 10, 2019
1,394
1,615
106
Huh? Wafer size has nothing to do with reticle size. It is possible to make dies up to 858 mm^2 on current EUV processes using 300mm wafers, but I'm not sure about the largest N7+ or N5 dies that have actually shipped. Given that everyone knows high NA is on the horizon, there isn't much of a future in designing something that big.

450mm has been dead for a decade, and will never happen. The EUV scanners are just a tiny of piece of it, all the wafer handling, cleaning, testing, dicing, etc. etc. machines from dozens of suppliers would need to handle the larger wafers. There aren't enough customers to pay for all those "smaller" companies that can't charge $100+ million per unit like ASML can to do the required R&D.
I was referencing this(the size) for the expected price, I mean just 30 CPUs on a $18,000 300mm Wafer? that's $600 per CPU They would need to have larger Wafers to justify that price.
 

leoneazzurro

Senior member
Jul 26, 2016
622
893
136
There are also other advantages in having smaller dies, namely the higher utilization of the wafer area. i.e. by using a chiplet half of the size of a monolithic die, you can put more than double of chiplets on the same wafer.
 

eek2121

Golden Member
Aug 2, 2005
1,850
2,093
136
IMO the article is garbage anyways because they are attempting to frame this as MCM vs monolithic, but it isn’t, it is purely about die size and die characteristics. Do you think if I take my custom design to TSMC they are going to ask me if it is MCM or not? No, they are going to give me a quote and call it a day.

Intel and AMD both know the cost + benefits of both approaches. We see the direction they are going in.
 

ASK THE COMMUNITY