Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 124 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
820
1,456
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

DrMrLordX

Lifer
Apr 27, 2000
22,761
12,772
136
Raphael is also a guaranteed launch in 2022. I'd bet money on Q3 or earlier rather than Q4 like some leaks have suggested. My suspicion is that Zen 4 won't replace Zen3D, it will complement it.

Might be Raphael-H first though. It's the only product we're getting any information on thus far.
 

soresu

Diamond Member
Dec 19, 2014
3,947
3,392
136
BTW, in the latest Investor Presentation PDF from AMD, desktop Raphael is still not on the roadmap.

I heard a rumor that these days, AMD only puts those products on the roadmap that have already taped out. So this could mean that AMD is still fiddling with the design, maybe with the packaging / interconnect...
Yes but Genoa IS on their roadmap, and unless Raphael is based on Zen4c instead of the standard Zen4 core in Genoa then there is nothing to worry about there.

Well, nothing but a potential mid term console wave on the same process node :eek:
 

Joe NYC

Diamond Member
Jun 26, 2021
3,387
4,988
136
Yes but Genoa IS on their roadmap, and unless Raphael is based on Zen4c instead of the standard Zen4 core in Genoa then there is nothing to worry about there.

Well, nothing but a potential mid term console wave on the same process node :eek:

Both Raphael and Genoa will most likely share the same Zen 4 core.

But that does not automatically mean that they will share the CCD, as Milan and Vermeer did.
 

DrMrLordX

Lifer
Apr 27, 2000
22,761
12,772
136
Both Raphael and Genoa will most likely share the same Zen 4 core.

But that does not automatically mean that they will share the CCD, as Milan and Vermeer did.

In past products, AMD recycled EPYC dice for consumer Zen products because of economy. The only dice not to come from EPYC went into Pinnacle Ridge, and the changes there weren't so drastic that they amounted to a redesign on the level of Zen2 or Zen3. It would cost AMD a lot of money not to use Genoa CCDs in Raphael.
 
  • Like
Reactions: Mopetar

Joe NYC

Diamond Member
Jun 26, 2021
3,387
4,988
136
In past products, AMD recycled EPYC dice for consumer Zen products because of economy. The only dice not to come from EPYC went into Pinnacle Ridge, and the changes there weren't so drastic that they amounted to a redesign on the level of Zen2 or Zen3. It would cost AMD a lot of money not to use Genoa CCDs in Raphael.

That's the most economical way to go about it, to share the CCD.

But there will be a day when the current IFoP will be the bottleneck in the design and AMD will move to newer, mor efficient technologies. When it happens, it could happen simultaneously, that both desktop and server parts will change to something new, and they will still share the same CCD.

But since Zen 4 Raphael desktop seems to be lagging behind by Genoa by, potentially 9 months, AMD could have used these 9 months to advance the architecture. Just a speculation.

Much more likely reason why AMD is likely going to switch order of launches, starting with Genoa is capacity and perhaps DDR5 availability in consumer market.

BTW, if you look strictly at the launch dates, Zen 3 Vermeer launched in Q4 2020 and Zen 4 Raphael is, according to the rumors likely going to be Q4 2022 (8 quarters time span)

In the meantime, Zen 3 Milan launched late Q1 2021 and Zen 4 Genoa is likely to launch by mid-year 2022 (time span of 4 to 5 quarters). So, my speculation is that perhaps AMD did not just put Raphael on the side line, that maybe AMD used the time for something.
 

DrMrLordX

Lifer
Apr 27, 2000
22,761
12,772
136
But there will be a day when the current IFoP will be the bottleneck in the design and AMD will move to newer, mor efficient technologies. When it happens, it could happen simultaneously, that both desktop and server parts will change to something new, and they will still share the same CCD.

AMD needs to switch EPYC over to EFB just as badly as they need to switch Ryzen over to EFB. It would be logical to switch both at once.
 

soresu

Diamond Member
Dec 19, 2014
3,947
3,392
136
Both Raphael and Genoa will most likely share the same Zen 4 core.

But that does not automatically mean that they will share the CCD, as Milan and Vermeer did.
Given the cost of design at 5nm I think it unlikely that they will do 3 separate designs for the CCD.

As it is they must have determined serious (and clearly lucrative) demand in the cloud market to authorize even Zen4c as a separate design.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,387
4,988
136
AMD needs to switch EPYC over to EFB just as badly as they need to switch Ryzen over to EFB. It would be logical to switch both at once.

But they are not being released at the same time, Genoa is ahead of Raphael and Bergamo.

Which is why I am wondering, if there is something in Raphael and Bergamo that could not make it to Genoa., since there was extra time...
 

Joe NYC

Diamond Member
Jun 26, 2021
3,387
4,988
136
Given the cost of design at 5nm I think it unlikely that they will do 3 separate designs for the CCD.

As it is they must have determined serious (and clearly lucrative) demand in the cloud market to authorize even Zen4c as a separate design.

The cheapest way to make Raphael is to make it a fraction of Genoa after Genoa is done. As almost an afterthought. Like Ryzen3D after Milan X.

We don't know if AMD has appetite to invest into a desktop gaming CPU design any more than that.
 

BorisTheBlade82

Senior member
May 1, 2020
703
1,122
136
My speculation is the following:
  • Raphael as well as Genoa will stay on IFoP because of the same-die strategy. IMHO it is technically not possible to have the same die use IFoP and EFB as well.
  • Bergamo will be the first to use EFB.
  • Zen5 will also use EFB - for Server and Desktop.
  • This will allow using Bergamo as small cores together with Zen5 via EFB in Desktop and Mobile for Big.little chiplet solutions.
 
  • Like
Reactions: Joe NYC

soresu

Diamond Member
Dec 19, 2014
3,947
3,392
136
The cheapest way to make Raphael is to make it a fraction of Genoa after Genoa is done. As almost an afterthought. Like Ryzen3D after Milan X.
This has been AMD's strategy in essence since long before Zen.

Server first is their design philosophy, even when server launches later the die itself is designed with server in mind - with the obvious modern exception of the IOD since Rome launched with a different IOD to Matisse.

In fact I'm pretty sure I remember people talking about server first design in reference to K8 during the 2000s.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,777
6,787
136
  • Bergamo will be the first to use EFB.
So you suggest new sIOD for Bergamo?
Or you think IFOP SerDes and another kind of interconnect (e.g. HBI with repeaters ) will coexist? Which I doubt, they would need to add a lot more pads or do an RDL.
If Genoa does not have a denser interconnect then very likely that Bergamo will not have either.
Genoa will be the more mainstream part addressing much more diverse deployment scenarios than Bergamo.

Genoa, Bergamo and Turin will share the same socket SP5 and it is ~0.96 mm thinner than SP3 and ~0.45 mm thinner for the package substrate.
1638109014092.png
Turin might have a revised sIOD though because it will need to support CXL 2.0 which is a big deal.

FWIW, the capture from last presentation from AMD
1638110473336.png

And the diagram from leaked SP5 manual, looks like a match to me
1638110509308.png


with the obvious modern exception of the IOD since Rome launched with a different IOD to Matisse.
Even cIOD is essentially a cut up sIOD so basically everything is from Server. But perhaps this time around Raphael will have a purpose designed cIOD due to the IGPU

1638107737532.png
 
Last edited:

LightningZ71

Platinum Member
Mar 10, 2017
2,393
3,038
136
From a financial standpoint, AMD is far more able to fund the development of unique IODs and CCDs for different markets than as recentlong as three years ago. I do not find it hand to believe that AMD would develop a desktop only focused IOD, especially if it managed to allow them more die per wafer. I do not find it hard to believe that AMD would develop two unique Zen4 CCDs that have cores mix and matched internally for different focuses (all out throughput vs density and efficiency). I do still think it will be a while before anything outside of APUs and a desktop IOD won't be designed with the purpose of being in a server, with reuse in other markets being a "planned afterthought".
 

BorisTheBlade82

Senior member
May 1, 2020
703
1,122
136
@DisEnchantment
Yes, I indeed believe in a new and different IOD for Bergamo compared to Genoa. The former would employ EFB connections with the main (and maybe sole) purpose of increasing energy efficiency. Both would be socket compatible.
Bergamo will aim for the highest MPP power efficiency possible for hyper scalers to tackle Ampera et al. And for that they will need a much more efficient on-package interconnect.

As well as @LightningZ71 I am also under the impression that producing different IODs for different markets is much less taxing than producing different CCD for Server and Desktop. Even more so now that it is more or less a given that they will employ a small iGPU in the Desktop space.
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
The monolithic chips and interposers are dependent on the reticle size. So, say Mi100 had an interposer, so it was limited to the reticles size. Mi200 drops the interposer, replaces it with EFB, and is no longer constrained by the reticle size (which the Mi250 exceeded by good margin).



If it is 400 mm2 for IO die rather than 260 mm2 that Adore TV mentioned, then Genoa probably does not have to make any revolutionary changes, 400 mm2 can probably accommodate incremental IO increases.



The reticle size is about 800+ mm2, and it is unlikely that AMD will get anywhere close to it with any of the chips, including IO die.

As far as future packaging technologies, I don't know if either EMIB or EFB can reduce the die size if the die is pad limited. Maybe, I just don't know if there are any density increases.

EFB could make routing easier, since the bridge is elevated, above the substrate, and other traces could be going through the substrate underneath. But I don't know how big a difference that could make.

If AMD is able to use stacked connection with TSV, hybrid bond connection, those have extremely high density, and there could still metal layers under the TSVs, unrelated to TSVs, that can still be connecting to IO. It could result in 25% to 33% die size reduction and tiny savings on the CCD.

As far as keeping the IFoP just to keep the chips spread out, AMD can just use a longer bridge, leave space between the chips and fill it with mold similar to one in the EFB. In fact, if they use active bridge with L3 die, it could use some length to accommodate the SRAM (if it is, say 80 mm2) and overlap the 2 dies being connected just on the edges, not covering a big area of the chips below. It could solve concern about thermals while stacking...

Anyway, there have been just about no leaks about Raphael. We just have a vague idea that it will come about a year from now. Roughly the same time frame of RDNA3. So, if AMD is going to use some advance packaging technologies for RDNA3, why not Raphael?

It's not that AMD has very high concern about desktop and time to market in desktop space (judging by Zen3D being MIA in desktop). If it takes an extra quarter and AMD could move past IFoP on Rembrandt, maybe use the same chiplet interconnect on Rembrandt as Bergamo, that could turn out valuable in the long run...
I think several things here are probably not correct, but I don’t have much time to research in detail. The reticle size is probably still a limitation and long silicon bridges are likely not something they would want to do.

Most, if not all, of the advanced stacking technologies are still dependent on reticle size. Some of them apply packaging component to the whole wafer before it is diced. Others use a carrier wafer or a so-called “reconstituted wafer”. They place chiplets into the carrier wafer and then do more photolithography based processes on top that are dependent on reticle size. If you look at the TSMC 3D stacking overview I linked, they list reticle size limitations for all of them, I think.

It gets significantly more complicated to exceed reticle size. 2x may be not as difficult since they may be able to design it in such a way that all mask are the same. Just some connections would not connect to anything depending on how the carrier wafer is finally diced. The 1.5x (3/2) would be more difficult since may require a base mask with 2/3 of the intended interconnect and then another mask with two mirrored half’s to get the extra 1/3 of the final package. They may, in some cases, be able to make something with a repeated pattern such that it is still a single mask. Anything that isn’t a half multiplier likely gets even more difficult to design. I think they listed a 1.2 or 1.3x possibility for one of the stacking types, and that was coming later than the half multiple types.

I suspect that Bergamo may actually be a single reticle sized device. With stacking, if they can put cache into the embedded bridge silicon, then it overlaps (or “underlaps”) the other chips and doesn’t add to the 2D package area at all. In that case, the cpu chiplets may be very small, even with 16 relatively powerful cpu cores since most of the cache would be stacked. The IO die may also be significantly smaller since it replaces 12 serdes-based IFOP with stacked connections. The stacked connections should take very little die area, so the much smaller number that AdoredTV has could be the Bergamo IO die, while the Genoa IO die is still a giant serdes based device close to 400 mm2. There may be some special sauce in Genoa also. They might use some advanced packaging technology to do the really dense routing required with 12x DDR5, 12 serdes-based IFOP, and 8 x16 pci-express / IFIS. Although, I doubt that it has any embedded silicon, unless it is a simple LSI device to get the signals out of the IO die without being pad limited.

I forget what interview or article it was, but I remember someone from AMD talking about the decision to use serdes-based IFOP over embedded silicon. I remember them talking about the difficulties of doing long runs in embedded silicon bridges. They were staying with serdes-based IFOP because of those limitations. You might be able to pull it off with an active interposer, but it would take a lot of silicon, approaching full silicon interposer levels. I have considered the possibility of daisy chaining chiplets with small pieces of embedded silicon, but that seems unlikely. You would have to route signals long distances across multiple chiplets and bridges. That may take a lot of power and add latency.

My current idea of Bergamo is that it may take advantage of the area savings by stacking bridge chiplets with cache to pull of a single reticle sized device. The IO die and chiplets will need to be directly adjacent. They probably will have almost no space between them. There may be an opportunity to take advantage of the stacking technology to possibly move other components into stacked chiplets. I have wondered if it might make sense to put the memory controller in the bridge with the cache, or at least the unified memory controller to remain memory type agnostic. The IO die would likely be oddly shaped; perhaps long and narrow to accommodate 4 chiplets along each side. It may mostly contain physical interfaces with the other components being in the stacked silicon. We have had rectangular cpu die before with some of them being a relatively large aspect ratio rectangle rather than close to square.
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
Disagree. The DDR5 market a year from now (when AM5 launches) will look very different from the DDR5 market today (when ADL-S sales caused DDR5 shortages).
There will be a massive market for DDR5 server memory at some point. I don’t know if those are going to be the same memory die due to possible ECC differences. I know that DDR5 some extra ECC built into all chips, but I didn’t think the ECC was equivalent to server memory ECC. It has been a while since I watched the video about that (Ian Cutress?). It seems like it would be the same power management chip for both and that is what is currently in short supply and might continue to be in short supply for many months.
 
  • Like
Reactions: Tlh97 and Joe NYC

Makaveli

Diamond Member
Feb 8, 2002
4,968
1,563
136
There will be a massive market for DDR5 server memory at some point. I don’t know if those are going to be the same memory die due to possible ECC differences. I know that DDR5 some extra ECC built into all chips, but I didn’t think the ECC was equivalent to server memory ECC. It has been a while since I watched the video about that (Ian Cutress?). It seems like it would be the same power management chip for both and that is what is currently in short supply and might continue to be in short supply for many months.

You are correct on die ECC of DDR5 is not the same as full system ECC on a server, the latter doesn't protect data in transit.
 

Doug S

Diamond Member
Feb 8, 2020
3,384
5,999
136
You are correct on die ECC of DDR5 is not the same as full system ECC on a server, the latter doesn't protect data in transit.

Not only that, the on chip ECC in DDR5 is single error correct only. It can't even detect double bit errors. Even if it offered data protection in transit that wouldn't be sufficient for servers.

It is really odd that DDR5 got a weak form of built in ECC that can only correct single bit errors but without any link protection, and LPDDR5 does not appear (from all available info I've seen) to include that built in ECC but it does include link ECC (though maybe that's optional) which DDR5 does not.

Given that ECC in DDR5's half width 32 bit channels doubles the number of bits you'd need per DIMM for full ECC from 72 to 80 ECC is going to at least double the cost penalty for ECC DIMMs. Probably more than that, for the same reasons the cost penalty of DDR4 ECC DIMMs is greater than the 12.5% bit penalty. Some of the lower end more cost sensitive users of ECC in e.g. embedded markets like POS terminals will probably abandon it and decide the built in ECC in standard DDR5 is "good enough".
 

LightningZ71

Platinum Member
Mar 10, 2017
2,393
3,038
136
But the ECC in "standard" DDR5 is only there to allow it to meet the standards of generic DDR reliability. It's there to keep the internal data pathways in the DIMM and individual modules consistent as they expect the higher and higher operating frequencies to introduce too many random bit errors to make normal operations reliable enough for general usage. Effectively, all it does is keep the DIMM up to the standards expected of generic DDR4 with higher frequencies and higher data densities. It's best not to even consider it's existence.
 

Doug S

Diamond Member
Feb 8, 2020
3,384
5,999
136
But the ECC in "standard" DDR5 is only there to allow it to meet the standards of generic DDR reliability. It's there to keep the internal data pathways in the DIMM and individual modules consistent as they expect the higher and higher operating frequencies to introduce too many random bit errors to make normal operations reliable enough for general usage. Effectively, all it does is keep the DIMM up to the standards expected of generic DDR4 with higher frequencies and higher data densities. It's best not to even consider it's existence.

I don't think it is the higher operating frequencies it is the smaller capacitors. It isn't needed for DDR5 today but DDR5's roadmap extends all the way to 64 Gb chips - 4x more dense than today's 16 Gb DDR5 chips.

Seems like it would make sense to pursue multilayer designs like NAND did when the cells got too small, which allowed them to use much bigger cells and avoid the issues. I don't know enough about how DRAM is produced to know how feasible that is, obviously if it was easy they would already be doing it...
 

uzzi38

Platinum Member
Oct 16, 2019
2,746
6,653
146
There will be a massive market for DDR5 server memory at some point. I don’t know if those are going to be the same memory die due to possible ECC differences. I know that DDR5 some extra ECC built into all chips, but I didn’t think the ECC was equivalent to server memory ECC. It has been a while since I watched the video about that (Ian Cutress?). It seems like it would be the same power management chip for both and that is what is currently in short supply and might continue to be in short supply for many months.
Current projections still suggest DDR5 shipping volume will only overtake that of DDR4 in late 2023. We're a while away from DDR5 taking over I'm afraid.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,112
136
Current projections still suggest DDR5 shipping volume will only overtake that of DDR4 in late 2023. We're a while away from DDR5 taking over I'm afraid.
So, a more or less typical change over period for new DRAM technologies.