Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 21 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

leoneazzurro

Senior member
Jul 26, 2016
912
1,440
136
It was just developed, It's only sample production for customers.
It will take time until It's available in first products, maybe in Q4 2023?
When Strix Point will arrive, then price could be lower, true, but It will be higher than lower clocked ones.
Strix with this memory would once more not be for cheaper laptops but premium ones.
I always thought APU was meant as a cheaper alternative to CPU+dGPU combo, yet It's not.

There are different markets for APUs, as they are used from mainstream to very expensive tjin&light laptops. So while it's unlikely to find this memory in very low-end machines, it will be not strange to find those on things like this:


which costs nornally around 2000$ and more. Also, new GPUs and CPUs will cost more and more, so even low end discrete GPUs anc CPU combos will not be extremely cheap if one wants to have some even entry-level performance (that is, light gaming and streaming). Of course for surfing and office work even the IGPU that is on the 7000 desktop series will be enough.
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,700
4,545
136
In upcoming years, it will be impossible to build DIY PC of entry level to mainstream level performance for equal or less amount of money as you would have to pay for Mini-PC, or APU based system.

Its not the problem with the dies and their prices. Its the problem of prices of everything around them: DRAM, GDDR memory, PCBs, controllers, power delivery, manufacturing costs of separate components.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,355
2,845
106
In upcoming years, it will be impossible to build DIY PC of entry level to mainstream level performance for equal or less amount of money as you would have to pay for Mini-PC, or APU based system.

Its not the problem with the dies and their prices. Its the problem of prices of everything around them: DRAM, GDDR memory, PCBs, controllers, power delivery, manufacturing costs of separate components.
I am talking about laptops here.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
Didn't you notice how much they ask for Rembrandt laptops? Phoenix will end up the same and Strix too. Too expensive for that performance.
I haven’t kept up on the mobile stuff, but I would expect PC laptop makers will need something to compete with the Apple M2 systems eventually, so high end APUs may be a thing eventually. A regular processor with a discrete gpu will not be able to compete on power consumption with a high end APU.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,599
5,767
136
Hmmm, if we knew, when the Zen 4 Kernel patches came out, can we then get a feel for the rough potential launch window of Zen 5?
Zen 4 patches started appearing at the end of 2021.

Zen 1 and Zen 2 developed by first team
Zen 3 and Zen 4 developed by second team
AMD's newish cadence should be around 18 months and Zen 4 should have launched earlier, but Norrod postponed it by 2Q to add CXL.

I would say early 2Q24 should be the launch window considering the team developing Zen 5 is working in parallel.
Which makes it 18 Months after Zen 4 (if you add the additional 2Qs of Zen 4 push back that makes it two years from the supposed original Zen 4 planned launch, from Forrest Norrod's statement)
So, I would not be surprised if they launch Zen 5 at CES24
 

BorisTheBlade82

Senior member
May 1, 2020
663
1,014
106
Zen 4 patches started appearing at the end of 2021.

Zen 1 and Zen 2 developed by first team
Zen 3 and Zen 4 developed by second team
AMD's newish cadence should be around 18 months and Zen 4 should have launched earlier, but Norrod postponed it by 2Q to add CXL.

I would say early 2Q24 should be the launch window considering the team developing Zen 5 is working in parallel.
Which makes it 18 Months after Zen 4 (if you add the additional 2Qs of Zen 4 push back that makes it two years from the supposed original Zen 4 planned launch, from Forrest Norrod's statement)
So, I would not be surprised if they launch Zen 5 at CES24
They more or less need to, if they want to keep the Mobile cadence with Strix Point.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
It looks like my speculation that the MCD could have v-cache stacked on top might be correct; don't know anything about this person though:



It makes it a lot simpler to manufacture if the base die (MCD) is the same size as the v-cache die. They can just do wafer on wafer without dicing and making a reconstituted wafer with cache die and small pieces of filler silicon. If something goes wrong, it is only an MCD die, not a more expensive cpu or gpu die/wafer. That should make it a lot cheaper, so I would expect MCDs to be used other places. MI300 doesn't seem to use them unless they are embedded under the compute die in some manner. I haven't seen anything about off package memory channels on MI300/SH5, but it seems like they would have extra memory somehow, unless they are depending on CXL with essentially HBM cache. HBM still has rather high latency though, since it is still DRAM, so I would expect cache to be under there somewhere. A single silicon interposer under the entire thing still seems too expensive and unnecessary. It seems like it would be smaller embedded die and/or EFB bridge chips.

I don't know how MCDs would work for a Ryzen or Epyc cpu though. If these were used in Epyc, how many DDR5 controllers would fit on an MCD die? I would expect GDDR6 to take proportionately more die area than DDR5 controllers. Fitting the memory channels for a whole quadrant (192-bit) on 1 MCD seems too large, but the Genoa IO die isn't really that big. It is only 24.79 x 16 mm, so if the memory controllers are along the top and bottom edges, as they are in rome/milan io die, then they may be able to fit 192-bit controller. That is 384-bit DDR5 (really 12x32 rather than 6x64, right?) in a 16 nm wide area for the genoa IO die. That would lead to a design with just 4 MCDs, so adding cache by stacking would be good to have, especially if the cpu die only have shared L2 and no L3 or small L3. Perhaps Turin will look like MI300, except with 4 cpu chiplets per quadrant.
 
  • Like
Reactions: lightmanek

Joe NYC

Golden Member
Jun 26, 2021
1,928
2,269
106
I suspect Zen5 will use some of the stacking and connectivity tech used for RDNA3 and MI300, so it is kind of relevant. The things you have labeled as Zen4 cores look more like infinity cache, or maybe L2 cache, or something like that. I have seen the small chips between the HBM3 referred to as structural silicon (semiaccurate, I think). The chiplet you have labeled as "adaptive chiplet" looks exactly like a Zen 4 chiplet with 8 cores. The thing you have labeled "AI chiplet" may be partially FPGA. FPGAs have large arrays, so it could look like cache. It could also just be all AI hardware. That would have large, regular, arrays of things in addition to possible caches. It would be easier to tell if I knew the die size of HBM3. I didn't find it in a quick search and I don't have time to search more today. I thought HBM2 was around 100 mm2. The rendering may be completely inaccurate, but if the "AI chiplets" are actually cpu cores, then where do the 24 cores come from? There are essentially 3 GPUs (2 chiplets each), so having 3x8-cores would make sense. I don't know where the other 8 cores would be hiding unless there is something weird like 2 low power cores in each base die.

The ballpark die sizes are 300-350 mm2 for base die and ~150 mm2 for each of the 2 compute dies stacked on top.

In case of 2 chaplets, that must be adding to 24 Zen 4(d?) cores, they must be in 80-100 mm2 each, since they cover somewhere between 25% and 33% of the base die.

I think these CPU dies may be 16 core Zen 4d chiplets with 2 cores disabled on each.

In any case, the Mi300 implementation, where CPU CCDs are stacked on top of base die, mean that there will be core re-use, but not CCD reuse. Mi300 style CCDs will be different.

Turin will continue to use Genoa SP5 (and Sienna SP6) sockets for mainstream server implementation.

Zen5 cores will likely go to eventually go to SH5 (Mi300, Mi400) socket as well.

Whether there will be some convergence of the architectures, Turin adopting with large base cache + I/O dies remains to be seen. Reusability of CCDs would be a huge asset - which would be the argument "for" convergence to Mi300 style architecture.
 
  • Like
Reactions: Vattila

Joe NYC

Golden Member
Jun 26, 2021
1,928
2,269
106
Replying to myself...

I am wondering if the layout pictured just isn't the 24 core device. Perhaps it is a 16 core with an FPGA or other accelerator. They apparently can put more than one type of chip on top of the base die. The base die looks like it might be able to fit 4 cpu chiplets, so I am wondering if the 24 core variant is really the top end. This seems like a small number of cores compared to what Nvidia will have with each Grace Hopper package (144?), although that may have a more powerful gpu.

I was just thinking the same, but the 2 dies of the same kind may in fact be 2 x 16 cores with 4 disable cores. It seems they may be too big to fit 4 of them on top of the base die. 3 definitely, 4 maybe.

The ratio of cores vs Grace + Hopper has Grace with a lot more cores, but Grace probably tries to be able to stand on its own against leading x86 CPUs.

In case of Mi300, it does not need to satisfy that requirement, it just needs to be optimal for the likely tasks.

Also, MLID mentioned there would be some interchangeability between compute chiplets stacked on top of the base die, which could allow different configurations.
 
  • Like
Reactions: Vattila

Joe NYC

Golden Member
Jun 26, 2021
1,928
2,269
106
It looks like my speculation that the MCD could have v-cache stacked on top might be correct; don't know anything about this person though:



It makes it a lot simpler to manufacture if the base die (MCD) is the same size as the v-cache die. They can just do wafer on wafer without dicing and making a reconstituted wafer with cache die and small pieces of filler silicon. If something goes wrong, it is only an MCD die, not a more expensive cpu or gpu die/wafer. That should make it a lot cheaper, so I would expect MCDs to be used other places. MI300 doesn't seem to use them unless they are embedded under the compute die in some manner. I haven't seen anything about off package memory channels on MI300/SH5, but it seems like they would have extra memory somehow, unless they are depending on CXL with essentially HBM cache. HBM still has rather high latency though, since it is still DRAM, so I would expect cache to be under there somewhere. A single silicon interposer under the entire thing still seems too expensive and unnecessary. It seems like it would be smaller embedded die and/or EFB bridge chips.

Navi 31, 32 use the MCD chiplet to save on N5 die area and as means to make memory bandwidth a building block, to add more or less bandwidth, as needed per GPU model.

On Mi300, there is already a base die of less expensive N6, and the base die will always interface with 2 HBM stacks, so the memory controller will be a fixed implementation, that will be part of base N6 die. And the N6 die will likely have a lot of L3 SRAM for cache.

So not really any commonality / similarity between the RDNA 3 and CDNA 3 packaging of components, IMO.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
Navi 31, 32 use the MCD chiplet to save on N5 die area and as means to make memory bandwidth a building block, to add more or less bandwidth, as needed per GPU model.

On Mi300, there is already a base die of less expensive N6, and the base die will always interface with 2 HBM stacks, so the memory controller will be a fixed implementation, that will be part of base N6 die. And the N6 die will likely have a lot of L3 SRAM for cache.

So not really any commonality / similarity between the RDNA 3 and CDNA 3 packaging of components, IMO.

It isn't necessarily implemented as a single interposer under each group of 2 gpu chiplets. That would be quite large and expensive.

They have shown images kind of indicating infinity cache die embedded under the compute die:

1675390248426.png

This could just be illustrative; I can't rule out a giant interposer, but it seems like it would be something on the order of 200 to 300 mm2 per each 2 gpu chiplets? Does it need that much for an infinity fabric switch, caches, and whatever IO it has? Even at 200 mm2, that is more than 2x the size of an epyc genoa IO die if all 4 of them are considered. I think they are bigger than 200 mm2; the HBM stacks are over 100 mm2. The actual fabric switches are very small. They need wider connections for gpus, but the PHY is minimal for anything stacked.

If they can use the base die other places then it may make more sense, like if they could use the base die across the product lines with many different combinations of chiplets (cpu, rdna, cdna, fpga, etc). It would still be quite wasteful for a lot of other products, so it still seems more likely that there are smaller pieces of bridge silicon (EFB connections) and possibly infinity fabric fanout (MCD <-> GCD) connected chips under the compute chiplets. If the MCD are used for stacking, then they would be thinned down to a few tens of microns thick, so embedding them under the other chiplets seems doable, even if it is multiple stacked die. IO, cache, and logic are scaling differently, so perhaps there are completely separate IO die and MCD. I wonder if it would be reasonable to use GF for just the IO die component.
 

BorisTheBlade82

Senior member
May 1, 2020
663
1,014
106
@jamescox
With the little knowledge we have, GF is entirely possible. Intel uses some 14/22nm process for their MTL interposer. Some 16-10nm TSMC legacy process is a possibility as well. It all depends on what is integrated into the base-dies.
I am definitely putting my eggs into the "base dies are connected with something EFBish" basket.
 
  • Like
Reactions: Kaluan and Vattila

Joe NYC

Golden Member
Jun 26, 2021
1,928
2,269
106
It isn't necessarily implemented as a single interposer under each group of 2 gpu chiplets. That would be quite large and expensive.

The interposer on Mi300 is going to be very large, multiple of the reticle limit, and quite expensive. I think in 2,000 - 2,500 mm2.
 

Joe NYC

Golden Member
Jun 26, 2021
1,928
2,269
106
They have shown images kind of indicating infinity cache die embedded under the compute die:



This could just be illustrative; I can't rule out a giant interposer, but it seems like it would be something on the order of 200 to 300 mm2 per each 2 gpu chiplets? Does it need that much for an infinity fabric switch, caches, and whatever IO it has? Even at 200 mm2, that is more than 2x the size of an epyc genoa IO die if all 4 of them are considered. I think they are bigger than 200 mm2; the HBM stacks are over 100 mm2. The actual fabric switches are very small. They need wider connections for gpus, but the PHY is minimal for anything stacked.

Bulk of the base die, of Mi300, under each pair of GPU, that is in 300+ mm2 range, and in addition to the things you mentioned, the caches have no upper limit. Whatever area is left over will be SRAM.

I think as much as 512 MB of SRAM per base die, total as much as 2 GB for the Mi300, probably in 1GB to 2 GB range for full Mi300

As an aside, I was looking at the 7950 picture:

1675398463449.png

and it seems it would not be a big stretch to increase the size of I/O die, give it some cache, which would be system level cache, and stack the CCDs on top of it (in Zen 5)

Let's say the base die of Zen 5 would be ~300 mm2 N6 and could accommodate 3 blocks to stack on it. The blocks, around 100mm2, would be:
- CCD with 8 CPU cores
- GPU
- SRAM for extra cache
- blank (for lower end implementations)

On the client implementation, the base die would sit directly on top on the organic substrate to connect to pins. The base die would provide all the routing and connected via hybrid bond - so minimal power and latency losses, maximum bandwidth.
 

Attachments

  • 1675398432176.png
    1675398432176.png
    514.9 KB · Views: 6
  • Like
Reactions: Vattila

Joe NYC

Golden Member
Jun 26, 2021
1,928
2,269
106
@jamescox
With the little knowledge we have, GF is entirely possible. Intel uses some 14/22nm process for their MTL interposer. Some 16-10nm TSMC legacy process is a possibility as well. It all depends on what is integrated into the base-dies.
I am definitely putting my eggs into the "base dies are connected with something EFBish" basket.

MLID said it is a giant interposer.

Also, I don't know if this is true, but AMD may dumped (for now) EFB, for now.


OTOH, AMD's Papermaster mentioned that there may be a future EFB with hybrid bond.