Discussion Intel Meteor Lake & Arrow Lake Discussion Threads

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
May 1, 2020
188
274
106
As always a very nicely written article with a lot of thought - nothing groundbreaking. But with an emphasise on the battleground of the future - the die to die interconnect:

Some interesting points and additional thoughts:
  • AMD and Intel have different priorities.
  • AMD wants to be independent of the Reticle Limit in order to scale almost indefinitely. They pay for this with energy.
  • Intel wants to minimize the energy overhead for mobile usage. They are stuck in the Reticle Limit and pay for it with higher production cost.
  • When will the first achieve the best of both worlds? For AMD this could be EFB coupled with some crazy chiplet geometries.
  • Intel seems to be introducing something along the line with Granite Rapids/Sierra Forest. They also introduce a central IOD. Will they use some kind of EMIB for the interconnect?
  • IMHO AMD will have to catch up a bit. Nothing impossible. But Intel have made inroads due to their experience with SPR and MTL.
 

bsp2020

Member
Dec 29, 2015
95
98
91
  • IMHO AMD will have to catch up a bit. Nothing impossible. But Intel have made inroads due to their experience with SPR and MTL.
Not sure how you figured this.

SPR is not shipping till next year and will only match Milan, if that. All indications are pointing to Genoa shipping before SPR. In what way do you think SPR is ahead of Genoa? Both are 2023 products.

MTL is due out next year. AMD Phoenix is coming out 1H 23, before MTL. In what way do you think Phoenix will be behind MTL? Not much is known about Phoenix yet. But I'm curious what you think it will be lacking.
 
  • Like
Reactions: Tlh97 and ftt
May 1, 2020
188
274
106
@bsp2020
I am not talking about the end products in general. SPR will have a hard time against Genoa and no one knows how MTL and Phoenix will compare.
In terms of advanced packaging Intel already had KBL-G, Lakefield and now SPR and MTL. 2 years ago I had hoped that Zen4 would switch to something advanced as well - but that did not happen. Of course, maybe AMD will surprise me with Phoenix going chiplet and not using IFOP. Other than that we will have to wait till at least Zen5. Maybe they will at some point decide to change their strategy and go for something like MTL for Client/Mobile and keep IFOP for Server/HPC.
 

bsp2020

Member
Dec 29, 2015
95
98
91
@bsp2020
I am not talking about the end products in general. SPR will have a hard time against Genoa and no one knows how MTL and Phoenix will compare.
In terms of advanced packaging Intel already had KBL-G, Lakefield and now SPR and MTL. 2 years ago I had hoped that Zen4 would switch to something advanced as well - but that did not happen. Of course, maybe AMD will surprise me with Phoenix going chiplet and not using IFOP. Other than that we will have to wait till at least Zen5. Maybe they will at some point decide to change their strategy and go for something like MTL for Client/Mobile and keep IFOP for Server/HPC.
AMD knows how to use interposer to build chiplet/tile/whatever. It just does not make economical sense to use because they can build chips that perform just as well. cheaper without an interposer. SPL using EMIB does not perform as well as Genoa built without using silicon interposer/EMIB. What matters is the performance/cost of the products. Otherwise, AMD Fiji GPU would have been heralded as better a product than NVidia Maxwell.

AMD & TSMC knows how to build Interposer/EFP(A better solution than Intel's EMIB). They are not using it on the mainstream desktop products because they do not need to.

Not sure why you want AMD to use advanced packaging for the sake of using advanced packaging...
 

poke01

Member
Mar 8, 2022
159
178
76
AMD & TSMC knows how to build Interposer/EFP(A better solution than Intel's EMIB). They are not using it on the mainstream desktop products because they do not need to.

Not sure why you want AMD to use advanced packaging for the sake of using advanced packaging...
TSMC interposer is already shipping with M1 Ultra in the Mac Studio. I can't see Intel's EMIB out in the market yet.

Let's face the facts TSMC is clearly the foundry leader.
 

coercitiv

Diamond Member
Jan 24, 2014
5,159
8,249
136
In terms of advanced packaging Intel already had KBL-G, Lakefield and now SPR and MTL.
Intel always looked good on paper with their advanced packaging. Their problem was not being able (or willing) to use it in a commercially successful product. So before we conclude that AMD needs to catch up on something, let's have Intel show their hand.

Of course, maybe AMD will surprise me with Phoenix going chiplet and not using IFOP.
AMD better keep Phoenix monolithic. You were probably thinking of Dragon Range, which does not need the same type of power savings.
 
  • Like
Reactions: Tlh97 and ftt

moinmoin

Diamond Member
Jun 1, 2017
3,782
5,519
136
Of course, maybe AMD will surprise me with Phoenix going chiplet
AMD better keep Phoenix monolithic.
Somewhere the announced "AMD chiplet architecture" has to fit in.

"“Phoenix Point” innovations include the AIE inference accelerator, image signal processor, advanced display for refresh and response, AMD chiplet architecture, and extreme power management."

 
May 1, 2020
188
274
106
AMD & TSMC knows how to build Interposer/EFP(A better solution than Intel's EMIB). They are not using it on the mainstream desktop products because they do not need to.
You mean EFB (Embedded Fanout Bridge, which I already mentioned).
And yes, TSMC has the IP for it. And surely AMD will have been working on these things for years. Maybe I am just a little disappointed that on the CPU side of things there is still little known as to when, how and in which product they might introduce such a solution.

Not sure why you want AMD to use advanced packaging for the sake of using advanced packaging...
To do what Intel does with MTL. Disintegrate the function blocks, being able to choose the best/cheapest process and library for each, gaining flexibility for mix & matching different function blocks (small iGPU/big iGPU, small IO/big IO, etc.). I hope it is commonly accepted that, with the current and expected growth of costs per mm²/transistor, this is a must to stay cost competitive.

Having an interconnect via package is fine and well for a Desktop CPU of 65w or more and only one connection between IOD and CCD. But from an energy PoV this will not work in Mobile (read: not Desktop replacements) and with a further increasing amount of dies to connect.
The IFoP has a consumption of at least 1 pJ/bit. With a bandwith of 160GB/s (Zen4 has two ports on each CCD with 80GB/s each) that nets you up to 1.28w only for transferring data between these two dies. Intels interposer only needs 0.2-0.3 pJ/bit. AFAIR EFB is in the same ballpark - maybe even lower.

AMD better keep Phoenix monolithic. You were probably thinking of Dragon Range, which does not need the same type of power savings.
No, as @moinmoin already mentioned there was some slide on a FAD. But I would be rather surprised if it is more than just connecting some AI processor (or similar) to the main SoC.
 

Doug S

Golden Member
Feb 8, 2020
1,322
1,965
106
To do what Intel does with MTL. Disintegrate the function blocks, being able to choose the best/cheapest process and library for each, gaining flexibility for mix & matching different function blocks (small iGPU/big iGPU, small IO/big IO, etc.). I hope it is commonly accepted that, with the current and expected growth of costs per mm²/transistor, this is a must to stay cost competitive.
I wonder whether that makes sense in the volume market. In lower volume higher end markets, sure. But I can't see "small GPU, small I/O, small CPU" as a group being worth using specialized packaging when you can run put them all together on the same die (which may be an N+1 process except for low power variants) for less money. And that's the combination the majority of PC shipments will use (with maybe a couple different dies to go really small for entry level and embedded for stuff like POS systems)
 

Exist50

Golden Member
Aug 18, 2016
1,140
1,138
136
I wonder whether that makes sense in the volume market. In lower volume higher end markets, sure. But I can't see "small GPU, small I/O, small CPU" as a group being worth using specialized packaging when you can run put them all together on the same die (which may be an N+1 process except for low power variants) for less money. And that's the combination the majority of PC shipments will use (with maybe a couple different dies to go really small for entry level and embedded for stuff like POS systems)
I think the cross over there depends on volume. Is it worth taping out a new die and porting everything to the same process just for the lower end market? Surely it'll be cheaper in terms of variable manufacturing cost, but the NRE might not be so favorable.
 

bsp2020

Member
Dec 29, 2015
95
98
91
And yes, TSMC has the IP for it. And surely AMD will have been working on these things for years. Maybe I am just a little disappointed that on the CPU side of things there is still little known as to when, how and in which product they might introduce such a solution.

To do what Intel does with MTL. Disintegrate the function blocks, being able to choose the best/cheapest process and library for each, gaining flexibility for mix & matching different function blocks (small iGPU/big iGPU, small IO/big IO, etc.). I hope it is commonly accepted that, with the current and expected growth of costs per mm²/transistor, this is a must to stay cost competitive.
Engineering is always a balancing act. Using technology for the sake of using it makes no sense. Large silicon interposer costs money, which is one of the reasons that MTL is rumored to be high end only (Can't seem to find the link at the moment). You won't necessarily reduce costs by breaking up chips into smaller pieces. It seems to me Intel, all of sudden, decided to break up their chip design into smaller pieces without thinking through all the consequences. Ponte Vecchio obviously was broken up into too many pieces and has suffered major delays because of it (my speculation). SPL seems to be broken up into pieces without properly designing separate modules and has also suffered major delays. Both of them relied heavily on their EMIB and have suffered major delays. Then, Intel will use full silicon interposer for their next mainstream market products (MTL) even though they have heralded EMIB as a superior solution to silicon interposer for years.

Sure it costs less, in terms of energy, to use a silicon bridge or interposer. But there are other costs that increase when you break up chips into smaller functional units. After all, you don't expect Intel/AMD to build a single core chiplet/tile and put them on a silicon interposer, do you? Building a small single core chiplet using an advanced node will surely increase its yield and reduce cost, right? Then why is Intel not doing it?

Also, there are more things to consider than joules/bit transferred, even when you are looking at power cost only. You also need to look at how much data you are actually transferring. Using EMIB/silicon interposer will increase monetary costs and reduce power costs. But it does nothing for performance directly. Adding a 3D cache, on the other hand, reduces power (by reducing the data traffic) and boosts performance. I believe (again my opinion) that the reason AMD is focusing heavily in caching technology for both CPU (3D cache) & GPU (Infinity cache) is that they understand that they must reduce off-die traffic in order to build high-performance power efficient products going forward. Putting everything on a silicon interposer may work for high end mobile or desktop products. But the server will need something better and the mobile as well.

I have no idea how much 3d cache or EMIB/silicon interposer costs. All I'm saying is that using advanced packaging may not necessarily be better and you want to look at the system as a whole and decide how to partition/break up functional units to optimize cost and performance. Intel seems to be having a knee-jerk reaction in their chiplet/tile design. First, they want to break things up into really small pieces and connect them using EMIB (Ponte Vecchio). Then they try fewer pieces without properly separating functional modules (4 tiles for SPL) using EMIB. And now they want to just put everything on a silicon interposer and hope that works better.

I like AMD's way of gradually increasing the number of chiplets as they build next-generation products, going from 4 chips Zen1 EPYC to 9 chiplets Zen2/Zen3 to 17 chiplets Zen3D, etc.

Having an interconnect via package is fine and well for a Desktop CPU of 65w or more and only one connection between IOD and CCD. But from an energy PoV this will not work in Mobile (read: not Desktop replacements) and with a further increasing amount of dies to connect.
Such simple blanket statements won't work. We do not know how much traffic there is between IOD and CCD. Rather than using an EMIB or a silicon interposer, using 3D cache may allow AMD to improve performance and reduce power usage enough to allow them to create a higher-tier product, allowing them to sell it at a higher margin. For really low power, both AMD & Intel will have to create a monolithic die. After all, you don't expect Apple to start using chiplet approach for their phone SoC any time soon, do you?

The IFoP has a consumption of at least 1 pJ/bit. With a bandwith of 160GB/s (Zen4 has two ports on each CCD with 80GB/s each) that nets you up to 1.28w only for transferring data between these two dies. Intels interposer only needs 0.2-0.3 pJ/bit. AFAIR EFB is in the same ballpark - maybe even lower.
Again, you have to look at how much data traffic there really is. You can spend money and silicon to reduce power consumtion of each transfer or you can spend money and silicon to reduce traffic using bigger cache. The latter approach has additional benefits of improving performance. At the market MTL and Dragon Range is targeting (high performance mobile), jury is still out as to which approach is better.
 

jpiniero

Lifer
Oct 1, 2010
12,216
3,626
136
PV's delays were/are 100% because they used 7 nm. It takes time porting it to nodes that actually work (ie: TSMC)

Meteor's interposer costs should be quite low if it is indeed using 22FFL for it.

Then they try fewer pieces without properly separating functional modules (4 tiles for SPL) using EMIB. And now they want to just put everything on a silicon interposer and hope that works better.
Keep in mind Sapphire was intended to be released like 2 years ago. PV was/is meant to be more of a proof of concept of doing a multiple chiplet product.
 

Exist50

Golden Member
Aug 18, 2016
1,140
1,138
136
For me the most interesting from Chips&Cheese is that MTL will not share L3 cache with GPU anymore. All AMD APU have already been this way. So much for an SLC.
In theory, they could put an SLC in the SoC die in addition to the compute die's L3, but whether MTLA actually has that is a very different matter.
 
May 1, 2020
188
274
106
You won't necessarily reduce costs by breaking up chips into smaller pieces. It seems to me Intel, all of sudden, decided to break up their chip design into smaller pieces without thinking through all the consequences.
Interesting. Saving costs is the main reason for AMD, Intel and TSMC for doing Chiplets - and all of them are pretty vocal about that.

The next best reason is to be able to make such a product as SPR, Ponte Vecchio and a 64c to 96c EPYC at all. Just use any of the publicly available yield calculators (like this one) and compare the good die percentage at a given defect rate for 400mm2 vs. 1600mm², or 70mm² vs. 560mm², or 40mm² vs. 640mm². Either the Reticle Limit strikes or the cost due to bad yields. And we are not talking about 20% more or less.
Furthermore you can employ cheaper processes for things that scale badly (IO, cache). And for Intel it is hopefully speeding up the time to market for Intel4 as they only need libraries for the compute dies and not the full feature set. But even if: Doing a monolithic die several times the size of the compute die on a an early process such as Intel4 where yields might be rather mediocre would quite likely be more expensive than the tiled approach. Example calculation for MTL:
  • Let's say Intel can produce the 41mm² compute-die at Intel4 with a yield of 80% (which would be quite good in the beginning)
  • The whole monolithic MTL might be around 150mm² due to limited scaling of IO and other blocks
  • They would get 1181 good dies from a 300mm wafer for the compute-die alone
  • For the monolithic one the yield would decrease to only 46% and only 179 good dies.
  • So in this case the monolith costs them almost 7x more than the compute-die - although the size difference is only 3.7 times. So if the total costs of the other dies + packaging is less than that huge difference it becomes worth it. And as the other dies are small as well and/or on mature processes this is highly likely to me.
  • Needless to say that the capacity for Intel4 might be severely constrained in the beginning.

Ponte Vecchio obviously was broken up into too many pieces and has suffered major delays because of it (my speculation). SPL seems to be broken up into pieces without properly designing separate modules and has also suffered major delays. Both of them relied heavily on their EMIB and have suffered major delays. Then, Intel will use full silicon interposer for their next mainstream market products (MTL) even though they have heralded EMIB as a superior solution to silicon interposer for years.
There are several theories about the problems doing their rounds, but to my knowledge the packaging has so far not been a major one.
I was also quite surprised they would not use EMIB. But 22nm seems to make this very cheap and maybe their older factories would otherwise face low demand.

Adding a 3D cache, on the other hand, reduces power (by reducing the data traffic) and boosts performance.
Right, and were is the problem to have both - EFB for chiplet connection and SoIC for Cache?

For really low power, both AMD & Intel will have to create a monolithic die. After all, you don't expect Apple to start using chiplet approach for their phone SoC any time soon, do you?
MTL starts at 10w which already is low power. Apple also already uses Chiplets for Mobile (edit: Actually this is wrong.). Smart Phone SoCs are THE extreme. These surely will be the last ones to migrate - but in the long term also in this segment doing monoliths might become cost-prohibitive.

All in all I think you overestimate the costs of advanced packaging - this is a business with an astonishing growth rate and a lot of economics of scale to be gained. And on the other side you seem to underestimate the rising costs of monolithic dies on advanced processes with lesser and lesser scaling for everything that is not logic. But time will tell - and I think it will be interesting times from a technological perspective ;)
 
Last edited:

bsp2020

Member
Dec 29, 2015
95
98
91
The next best reason is to be able to make such a product as SPR, Ponte Vecchio and a 64c to 96c EPYC at all. Just use any of the publicly available yield calculators (like this one) and compare the good die percentage at a given defect rate for 400mm2 vs. 1600mm², or 70mm² vs. 560mm², or 40mm² vs. 640mm². Either the Reticle Limit strikes or the cost due to bad yields. And we are not talking about 20% more or less.
Furthermore you can employ cheaper processes for things that scale badly (IO, cache). And for Intel it is hopefully speeding up the time to market for Intel4 as they only need libraries for the compute dies and not the full feature set. But even if: Doing a monolithic die several times the size of the compute die on a an early process such as Intel4 where yields might be rather mediocre would quite likely be more expensive than the tiled approach. Example calculation for MTL:
  • Let's say Intel can produce the 41mm² compute-die at Intel4 with a yield of 80% (which would be quite good in the beginning)
  • The whole monolithic MTL might be around 150mm² due to limited scaling of IO and other blocks
  • They would get 1181 good dies from a 300mm wafer for the compute-die alone
  • For the monolithic one the yield would decrease to only 46% and only 179 good dies.
  • So in this case the monolith costs them almost 7x more than the compute-die - although the size difference is only 3.7 times. So if the total costs of the other dies + packaging is less than that huge difference it becomes worth it. And as the other dies are small as well and/or on mature processes this is highly likely to me.
  • Needless to say that the capacity for Intel4 might be severely constrained in the beginning.
I don't know enough about the detailed breakdown of the cost involved in monolithic vs big chiplet vs small chiplet. Common sense tells me that there is always a sweet spot and it is never at an extreme. As I said earlier, I don't think anyone with a sane mind will build a single core chiplet and put them on an interposer and think that they will have better economics and/or performance.

What I'm saying is that, from where I'm standing (10,000Km above sea level), SPL & PV seems to be a knee-jerk reaction to chiplet revolution that came too quickly for Intel. SPL seems to me like a rushed design. They just broke up their big monolithic design into 4 pieces and hoped EMIB would save them. PV went even further and broke up into more pieces when GPU is even harder to break up into chiplets due to high bandwidth requirements. There is a reason why NVidia builds the biggest chips in the world for their high-end GPU. Even AMD's chipet-based Navi31 is rumored to be 1 main die + memory controller die. No one has ever developed a GPU with chiplet approach and Intel thought that they can be the first one to build it and succeed...

MTL is better compared to SPL & PV, in terms of how the design is broken up into funtional units. It seems they did proper partitioning of the functional units. I'm sure it will be a good product for the intended market. Of the three main markets, AMD & Intel competes in (Mobile/Desktop/Server), MTL will be most competitive against AMD's products in its target market. But you have to realize that AMD is not building a product specifically for that market. AMD's desktop products have been derivatives of their server & mobile market products. Quite frankly, I'm surprised that AMD competes as well as it does in the desktop market when the products they introduce there are derivatives from their server market products.

The most important point I'm learning from MTL is that they chose to use a silicon interposer. That tells me that they are giving up on EMIB and EMIB has contributed to SPR & PV's troubles in a major way. Of course, it's my speculation. But I think it's a reasonable one. I don't expect to see EMIB in Intel's high-end products again anytime soon. Emerald Rapids, which keeps SPR's tile approach, will not be competitive against AMD"s products either. I don't know much about products after EMR, but Intel won't be competitive against AMD for at least 2 years. Their CEO said so himself and I believe him this time.

There are several theories about the problems doing their rounds, but to my knowledge the packaging has so far not been a major one.
I was also quite surprised they would not use EMIB. But 22nm seems to make this very cheap and maybe their older factories would otherwise face low demand.
You can build EMIB using the same 22nm silicon and use less of it. MTL proves, at least to me, that they are giving up on EMIB and most of their products that uses EMIB heavily (SPR/EMR/PV etc.) won't be competitive.

Right, and were is the problem to have both - EFB for chiplet connection and SoIC for Cache?
Cost and complexity. Transitor scaling has done wonders for semiconductor industry because cost & complexity in process technology increases linearly but the benefits increase exponentially. When you are manufacturing at large scale as semiconductor industry does, the exponential benefit will always win out. Development in packaging technology however, yields linear benefits in terms of production scale. So, you can't scale it as fast as fabrication technology and need to use it only when it makes sense.

In due time, when they built enough packaging facility we will see packaging technology used more widely. But it won't scale like fab technology, which increases transistor counts exponentially every few years.

MTL starts at 10w which already is low power. Apple also already uses Chiplets for Mobile (edit: Actually this is wrong.). Smart Phone SoCs are THE extreme. These surely will be the last ones to migrate - but in the long term also in this segment doing monoliths might become cost-prohibitive.

All in all I think you overestimate the costs of advanced packaging - this is a business with an astonishing growth rate and a lot of economics of scale to be gained. And on the other side you seem to underestimate the rising costs of monolithic dies on advanced processes with lesser and lesser scaling for everything that is not logic. But time will tell - and I think it will be interesting times from a technological perspective ;)
I have to say that I disagree and, hopefully, I explained my reasons well enough above. Advanced packaging technology will be used to build high-end products that is not possible to build using traditional approach. Not to build mainstream products more cheaply. At least for a while.
 

ASK THE COMMUNITY