- Mar 3, 2017
- 1,747
- 6,598
- 136
Oh, it would. Even if it was just single 16C chiplet for Zen5 and we would have to wait for dual chiplet version to Zen6. I dont intend to upgrade to Zen5 anywayThat certainly would be exciting. So Zen 5 is expected to go up to 64 threads?
Yeah, having to cut down a 16 core CCD to 4,6,8 core CPUs would be pretty wasteful.That depends. I'd expect for the consumer market, there will always be at least one big 8c CCD with Zen5. The second CCD could be a 16c Zen5c CCD or another 8c - giving you up to 48 hybrid threads.
Not only that: IMHO for the client market, competitive ST performance will remain a significant factor even in the long run. The server market OTOH is already on the verge.Yeah, having to cut down a 16 core CCD to 4,6,8 core CPUs would be pretty wasteful.
I had initially thought that we would get almost everything stacked in the Zen 5 generation, but that just doesn't seem to be the case. The stacked silicon packages are more expensive and they add in some limitations. They generally require that chips be directly adjacent. The infinity fabric fan out used for MCD/GCD connection seems to be an in between tech (almost 900 GB/s), but it also has the adjacency limitation. You can't easily build something like Genoa with either tech since it does not place the chips adjacent to each other. I thought about daisy chaining chiplets, but this is also a problem for routing high speed signals that distance. They might be able to place 4 along each edge and 2 on the top and bottom, but that would require a lot IO die design work and the chips may be too large.I expect some interesting things from Zen 5 considering AMD is not developing cores on a shoestring budget anymore since a couple of years now.
Zen 3 was developed pretty much during the years of austerity at AMD. Zen 4 slightly less so and Zen 5 should see the first fruits of R&D under better days.
But more interesting for me is indeed packaging and SoC architecture. MI300 is almost here (next week?) to give us a glimpse of next gen packaging.
Curious to see whether InFO-R will replace substrate based PHY for 2.5D packaging on the Zen 5 family. Bergamo seems to have demonstrated the limits of routing with the substrate based interconnects and a likely way forward is fanout based RDLs at a minimum if not active bridges.
Besides the issue with practically no more space for traces coming out from IOD to CCD there is also the problem that the next gen IF which as per employee LinkedIn can hit up to 64Gbps compared to the current 36 Gbps.
I think InFO-3D could be a wildcard to enable lower cost 3D packaging. InFO-3D fits nicely here to enable lesser dense interconnect density than say FE packaging like SoIC but dense enough for SoC level interconnects for stacking on top of IOD. There is big concern at the moment with F15 and F14 underutilized and TSMC is pushing customers from 16FF and older to N7 family and ramping down those fabs (commodity process nodes you might say). Having any customer generously making use of N7/6 besides the leading node would be a win win.
Regarding the core perf gains, they have more transistors to work with and a more efficient process to work with, so at the very least just throwing more transistors at the problem should bring decent gains if their ~6 years (2018-2023) of 'grounds up design' of Zen 5 has to be worthwhile. Zen 4 is behind in capacity in almost all key resources of a typical OoO machine from key contemporaries. Pretty good (but not surprising given other factors) that it even keep up.
Nevertheless, few AMD patents, regarding core architecture, I have been reading strikes me as intriguing and I wonder if they will make it to Zen 5 in some form.
Not coincidentally all these patents are about increasing resources without drastically increasing Transistor usage.
- Dual fetch/Decode and op-cache pipelines.
- This seems like something that would be very interesting for mobile to power gate the second pipeline during less demanding loads
- Remove secondary decode pipeline for a Zen 5c variant? Lets say 2x 4 wide decode for Zen 5 and 4 wide for Zen 5c
- Retire queue compression
- op-cache compression
- Cache compression
- Master-Shadow PRF
But that 16 cores would be only the dense cores as far as I understand. They would not be too exciting for desktop customers, since the ST performance might be weak.That certainly would be exciting. So Zen 5 is expected to go up to 64 threads?
We have seen some changes to the caches with different Zen generations, but I think Zen 5 is going to be changes to the whole cache hierarchy. This may be significantly more radical changes compared to zen 2 to Zen 3. I don't know whether that will result in big improvements though. Pushing single thread performance is obviously getting harder and harder, so I am keeping my expectations low. They can much more easily push FP performance, so I am expecing a significant increase there.It's not only about the improvements directly achieved but also the new technologies introduced (which can then be refined) and future improvements enabled by the changes (the usual even Zen gen).
Also the excitement may be not only about the Zen cores but also the package layout with CCDs and one IOD that with Zen 4 was still essentially unchanged since Zen 2.
I had initially thought that we would get almost everything stacked in the Zen 5 generation, but that just doesn't seem to be the case. The stacked silicon packages are more expensive and they add in some limitations. They generally require that chips be directly adjacent. The infinity fabric fan out used for MCD/GCD connection seems to be an in between tech (almost 900 GB/s), but it also has the adjacency limitation. You can't easily build something like Genoa with either tech since it does not place the chips adjacent to each other. I thought about daisy chaining chiplets, but this is also a problem for routing high speed signals that distance. They might be able to place 4 along each edge and 2 on the top and bottom, but that would require a lot IO die design work and the chips may be too large.
Also, cpus just do not really need that high of bandwidth. Stacked devices would be lower power, but that seems to be one of the few advantages of stacked silicon for cpus. I suspect that MCM with infinity fabric connected chips (technically not chiplets) are going to stay with us for quite a while yet. They can continue to make them very cheaply since the same chiplet is used for a huge number of products. Intel, with an expensive stacked silicon package, will likely have trouble competing with this. Intel has their own fab, so I guess they don't take as big of a hit from having everything on the same tile chiplet and made on the most advanced process. AMD has been spliting everything out to allow them to make IO, cache, and logic all on different processes, which should allow them to better compete on price. This is in addition to having a cheaper MCM package.
For GPUs, 2.5D or 3D stacked silicon or interconnect makes sense due to the bandwidth requirements, but AMD isn't even using stacking for consumer level GPU devices. They are using infinity fabric fan out to connect MCDs to the GCD. The infinity cache allows the use of cheaper memory also, rather than HBM. Stacking seems to be reserved for the very high end like MI300. Since they probably have to use EFB to connect to the HBM, I suspect that the base die are connected together with EFB, which is also a cost saving packaging tech used in place of a full interposer. It would be great to be able to get HBM in consumer products though. An APU with a single HBM stack for mobile would be a powerful device. This also leads me to wonder what AMD could possibly still be making at Global Foundries. If looks like GF has made HBM in the past, so I was actually wondering if it is plausible that AMD would make a specialized version of HBM at GF using infinity fabric fan out links rather than 2.5D connections. The PC market is going to need something to compete with Apples M-series chips. This may require some add-on accelerator chips for video editing and such. Perhaps such things could be connected with infinity fabric fan out.
I share your feelings. I was also more positive about the adoption of advanced packaging in the CPU space by AMD.I had initially thought that we would get almost everything stacked in the Zen 5 generation, but that just doesn't seem to be the case. The stacked silicon packages are more expensive and they add in some limitations. They generally require that chips be directly adjacent. The infinity fabric fan out used for MCD/GCD connection seems to be an in between tech (almost 900 GB/s), but it also has the adjacency limitation. You can't easily build something like Genoa with either tech since it does not place the chips adjacent to each other. I thought about daisy chaining chiplets, but this is also a problem for routing high speed signals that distance. They might be able to place 4 along each edge and 2 on the top and bottom, but that would require a lot IO die design work and the chips may be too large.
Also, cpus just do not really need that high of bandwidth. Stacked devices would be lower power, but that seems to be one of the few advantages of stacked silicon for cpus. I suspect that MCM with infinity fabric connected chips (technically not chiplets) are going to stay with us for quite a while yet. They can continue to make them very cheaply since the same chiplet is used for a huge number of products. Intel, with an expensive stacked silicon package, will likely have trouble competing with this. Intel has their own fab, so I guess they don't take as big of a hit from having everything on the same tile chiplet and made on the most advanced process. AMD has been spliting everything out to allow them to make IO, cache, and logic all on different processes, which should allow them to better compete on price. This is in addition to having a cheaper MCM package.
For GPUs, 2.5D or 3D stacked silicon or interconnect makes sense due to the bandwidth requirements, but AMD isn't even using stacking for consumer level GPU devices. They are using infinity fabric fan out to connect MCDs to the GCD. The infinity cache allows the use of cheaper memory also, rather than HBM. Stacking seems to be reserved for the very high end like MI300. Since they probably have to use EFB to connect to the HBM, I suspect that the base die are connected together with EFB, which is also a cost saving packaging tech used in place of a full interposer. It would be great to be able to get HBM in consumer products though. An APU with a single HBM stack for mobile would be a powerful device. This also leads me to wonder what AMD could possibly still be making at Global Foundries. If looks like GF has made HBM in the past, so I was actually wondering if it is plausible that AMD would make a specialized version of HBM at GF using infinity fabric fan out links rather than 2.5D connections. The PC market is going to need something to compete with Apples M-series chips. This may require some add-on accelerator chips for video editing and such. Perhaps such things could be connected with infinity fabric fan out.
Yesn't.AMD has been all about cheap-to-make products using a sensible tech.
The MI300 is a premium product with the price tag surely sitting well above 128c server chips or the previous accelerators.
They can get an extra $100 for it, so the cost makes sense there. The big question is how the incremental cost of more advanced packaging compares to its product benefits. And more likely than not, that tradeoff will change over time.3D V cache isn't exactly cheap either considering the cache chiplets are only 1 node behind the main processor die.
Yep and "isn't cheap" is not an issue isThey can get an extra $100 for it, so the cost makes sense there. The big question is how the incremental cost of more advanced packaging compares to its product benefits. And more likely than not, that tradeoff will change over time.
I noticed one patent which attempted to address this issue by using normal fanout when the CCDs are single column on each side of the IOD and adding bridges in the fanout when multi columns of CCDs are there. so basically same CCD/IOD can be used but packaged differently for different configs.I had initially thought that we would get almost everything stacked in the Zen 5 generation, but that just doesn't seem to be the case. The stacked silicon packages are more expensive and they add in some limitations. They generally require that chips be directly adjacent. The infinity fabric fan out used for MCD/GCD connection seems to be an in between tech (almost 900 GB/s), but it also has the adjacency limitation. You can't easily build something like Genoa with either tech since it does not place the chips adjacent to each other. I thought about daisy chaining chiplets, but this is also a problem for routing high speed signals that distance. They might be able to place 4 along each edge and 2 on the top and bottom, but that would require a lot IO die design work and the chips may be too large.
Also, cpus just do not really need that high of bandwidth. Stacked devices would be lower power, but that seems to be one of the few advantages of stacked silicon for cpus. I suspect that MCM with infinity fabric connected chips (technically not chiplets) are going to stay with us for quite a while yet. They can continue to make them very cheaply since the same chiplet is used for a huge number of products. Intel, with an expensive stacked silicon package, will likely have trouble competing with this. Intel has their own fab, so I guess they don't take as big of a hit from having everything on the same tile chiplet and made on the most advanced process. AMD has been spliting everything out to allow them to make IO, cache, and logic all on different processes, which should allow them to better compete on price. This is in addition to having a cheaper MCM package.
For GPUs, 2.5D or 3D stacked silicon or interconnect makes sense due to the bandwidth requirements, but AMD isn't even using stacking for consumer level GPU devices. They are using infinity fabric fan out to connect MCDs to the GCD. The infinity cache allows the use of cheaper memory also, rather than HBM. Stacking seems to be reserved for the very high end like MI300. Since they probably have to use EFB to connect to the HBM, I suspect that the base die are connected together with EFB, which is also a cost saving packaging tech used in place of a full interposer. It would be great to be able to get HBM in consumer products though. An APU with a single HBM stack for mobile would be a powerful device. This also leads me to wonder what AMD could possibly still be making at Global Foundries. If looks like GF has made HBM in the past, so I was actually wondering if it is plausible that AMD would make a specialized version of HBM at GF using infinity fabric fan out links rather than 2.5D connections. The PC market is going to need something to compete with Apples M-series chips. This may require some add-on accelerator chips for video editing and such. Perhaps such things could be connected with infinity fabric fan out.

Is 36 Gbps IFOP from Zen2 generation?Bandwidth was not an issue with earlier cores, but Zen 4 showed signs of bandwidth deficiency in many workloads with the 36 Gbps IFOP.
It depends on FCLK. Earlier Zen generations have slightly lower FCLK. Zen 2 was stable at 1600MHz FCLK. Zen 4 is around 2000MHz.Is 36 Gbps IFOP from Zen2 generation?
Zen5 won't change anything in this regard, but AFAIK Zen6 and onwards will use MI300/400 style packaging.When Zen 2 came out in 2019, it was a radical change when it comes to CPU packaging as well as core scaling with SCF/SDF.
5 years later in 2024 we would hope something new will come up to address the short comings of this technology for the next gen CPUs
AMD will have to address the above problems with a new interconnect, even their competitor is using much more exotic BE packaging in current gen products. But I wouldn't hold my breath if next gen CPUs are stuck on same tech.
- Latency is a known issue with AMD's IFOP and for addressing that few LinkedIn posts put it at 64 Gbps for next gen IF. This is a big jump and could have a major efficiency impact if same IFOP is used going forward.
- Bandwidth was not an issue with earlier cores, but Zen 4 showed signs of bandwidth deficiency in many workloads with the 36 Gbps IFOP.
- Power, well 0.4 pJ/bit for MCD links vs 2pJ/bit speaks for itself. GLink is being advertised as 0.25 pJ/bit
- Die area being consumed on Zen 4 for IFOP is ~7mm2 in a 66mm2 chip (excluding the SDF/SCF that is part of L3), that is 10% of a potential N4P and soon N3E Si. GCD-MCD links have demonstrated smaller beachfront for higher bandwidth density. GuCs GLink for instance needs 3mm of beachfront to provide 7.5 Tbps of BW.
- Trace density from the IOD to the CCDs and IO. It seems there is already a limit reached how much space is available to route the signals from IOD to CCD considering the space is also needed for IO/Memory etc traces
As for the costs, AMD is doing fanout links on a 750 USD GPU product in which the actual chip should be sold at half of that if not less, and with 6 of these links. And SoIC parts like 5800X3D selling for less than 300 USD.
I noticed one patent which attempted to address this issue by using normal fanout when the CCDs are single column on each side of the IOD and adding bridges in the fanout when multi columns of CCDs are there. so basically same CCD/IOD can be used but packaged differently for different configs.
https://www.freepatentsonline.com/11469183.html
View attachment 81610
Rumors of MI300C have been floating around, lets see if this is real in couple of days. It could be a precursor.
That seems like the most logical progression.Zen5 won't change anything in this regard, but AFAIK Zen6 and onwards will use MI300/400 style packaging.
It seems that with the Mi300 approach about to be released this year, from GPU, APU to CPU, I don't think AMD is going to expend any money or effort on any half measure between the current Zen 4 (Genoa) and Zen 5 (Turin) on SP5 socket and the "nirvana" of Mi300/400.Rumors of MI300C have been floating around, lets see if this is real in couple of days. It could be a precursor.
Do you mean the RAM approach or the packaging?I wonder if there is a way to retrofit AM5 socket to move to the Mi300 approach on the client side. Maybe it is not worth it, and maybe AMD will offer alternative to AM5 socket sometimes between Zen 5 and Zen 6. The Strix Halo may potentially be the harbinger of this.
Do you mean the RAM approach or the packaging?
Regarding packaging: Maybe I am oversimplifying things, but anything that happens "above the pins" should be socket agnostic, as long as the socket provides enough singnal and power traces as well as enough surface area. So I see no reason, why they should not be able to change the packaging on AM5.
Regarding memory: I expect them to go the Apple route for future mobile SoCs and that is what we hear already. And yes, that won't work for the current socket IMHO. But that's only a problem for OEMs and not so much for customers.
Yep, exactly my line of thinking.The rumor has it that Mi300 does not have any local motherboard memory. Only on package HBM memory and CXL links for more external memory.
Remember when Lisa Su said that the package would be the new motherboard? The biggest part of this is to eliminate memory from the motherboard and move it to the package.
Apple client computers have made this move, without suffering any adverse effects. The ability of OEMs to pair AMD CPUs with garbage memory is only to AMD's determent, so having fast enough memory inside the package would be to AMD's benefit.
The memory can be LPDDR5 at the outset, for lowest cost, biggest bang for the buck, and later possibly HBM memory.
So, hopefully, AMD is already planning on this approach for the future generations. We will see if Strix Halo (as was leaked by MLID) will have this sort of socket, or if it is only a notebook chip.
You mean this?That could be likely, any changes would likely be triggered by DC and AI first.,
Any idea about this snippet below I screen shotted from LinkedIn?
View attachment 81628

I doubt that will ever happen. Latency is a thing, CXL will mostly be used where large pools of memory are needed, not to replace the local memory. That will migrate to the package.That seems like the most logical progression.
Mi400 will get Zen 5 cores. Cloud providers / hyperscalers start migration from local motherboard memory to pooled CXL memories.
