Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Vattila · Oct 6, 2019

Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts!

leoneazzurro · Nov 27, 2021

eek2121 said:
Rembrandt is launching in January 2022. Just because AMD doesn't confirm it, doesn't mean it isn't happening.

That was exactly my point.

DrMrLordX · Nov 27, 2021

eek2121 said:
Raphael is also a guaranteed launch in 2022. I'd bet money on Q3 or earlier rather than Q4 like some leaks have suggested. My suspicion is that Zen 4 won't replace Zen3D, it will complement it.

Might be Raphael-H first though. It's the only product we're getting any information on thus far.

eek2121 · Nov 27, 2021

Kepler_L2 said:
The more important thing is having both a DDR4 CPU/Platform and a DDR5 one.

Disagree. The DDR5 market a year from now (when AM5 launches) will look very different from the DDR5 market today (when ADL-S sales caused DDR5 shortages).

soresu · Nov 27, 2021

Joe NYC said:
BTW, in the latest Investor Presentation PDF from AMD, desktop Raphael is still not on the roadmap.

I heard a rumor that these days, AMD only puts those products on the roadmap that have already taped out. So this could mean that AMD is still fiddling with the design, maybe with the packaging / interconnect...

Yes but Genoa IS on their roadmap, and unless Raphael is based on Zen4c instead of the standard Zen4 core in Genoa then there is nothing to worry about there.

Well, nothing but a potential mid term console wave on the same process node

Joe NYC · Nov 28, 2021

soresu said:
Yes but Genoa IS on their roadmap, and unless Raphael is based on Zen4c instead of the standard Zen4 core in Genoa then there is nothing to worry about there.

Well, nothing but a potential mid term console wave on the same process node

Both Raphael and Genoa will most likely share the same Zen 4 core.

But that does not automatically mean that they will share the CCD, as Milan and Vermeer did.

DrMrLordX · Nov 28, 2021

Joe NYC said:
Both Raphael and Genoa will most likely share the same Zen 4 core.

But that does not automatically mean that they will share the CCD, as Milan and Vermeer did.

In past products, AMD recycled EPYC dice for consumer Zen products because of economy. The only dice not to come from EPYC went into Pinnacle Ridge, and the changes there weren't so drastic that they amounted to a redesign on the level of Zen2 or Zen3. It would cost AMD a lot of money not to use Genoa CCDs in Raphael.

Joe NYC · Nov 28, 2021

DrMrLordX said:
In past products, AMD recycled EPYC dice for consumer Zen products because of economy. The only dice not to come from EPYC went into Pinnacle Ridge, and the changes there weren't so drastic that they amounted to a redesign on the level of Zen2 or Zen3. It would cost AMD a lot of money not to use Genoa CCDs in Raphael.

That's the most economical way to go about it, to share the CCD.

But there will be a day when the current IFoP will be the bottleneck in the design and AMD will move to newer, mor efficient technologies. When it happens, it could happen simultaneously, that both desktop and server parts will change to something new, and they will still share the same CCD.

But since Zen 4 Raphael desktop seems to be lagging behind by Genoa by, potentially 9 months, AMD could have used these 9 months to advance the architecture. Just a speculation.

Much more likely reason why AMD is likely going to switch order of launches, starting with Genoa is capacity and perhaps DDR5 availability in consumer market.

BTW, if you look strictly at the launch dates, Zen 3 Vermeer launched in Q4 2020 and Zen 4 Raphael is, according to the rumors likely going to be Q4 2022 (8 quarters time span)

In the meantime, Zen 3 Milan launched late Q1 2021 and Zen 4 Genoa is likely to launch by mid-year 2022 (time span of 4 to 5 quarters). So, my speculation is that perhaps AMD did not just put Raphael on the side line, that maybe AMD used the time for something.

Kepler_L2 · Nov 28, 2021

eek2121 said:
Disagree. The DDR5 market a year from now (when AM5 launches) will look very different from the DDR5 market today (when ADL-S sales caused DDR5 shortages).

Majority of production will still be DDR4 next year https://www.tomshardware.com/news/ddr5-adoption-huge-by-2023

DrMrLordX · Nov 28, 2021

Joe NYC said:
But there will be a day when the current IFoP will be the bottleneck in the design and AMD will move to newer, mor efficient technologies. When it happens, it could happen simultaneously, that both desktop and server parts will change to something new, and they will still share the same CCD.

AMD needs to switch EPYC over to EFB just as badly as they need to switch Ryzen over to EFB. It would be logical to switch both at once.

soresu · Nov 28, 2021

Joe NYC said:
Both Raphael and Genoa will most likely share the same Zen 4 core.

But that does not automatically mean that they will share the CCD, as Milan and Vermeer did.

Given the cost of design at 5nm I think it unlikely that they will do 3 separate designs for the CCD.

As it is they must have determined serious (and clearly lucrative) demand in the cloud market to authorize even Zen4c as a separate design.

Joe NYC · Nov 28, 2021

DrMrLordX said:
AMD needs to switch EPYC over to EFB just as badly as they need to switch Ryzen over to EFB. It would be logical to switch both at once.

But they are not being released at the same time, Genoa is ahead of Raphael and Bergamo.

Which is why I am wondering, if there is something in Raphael and Bergamo that could not make it to Genoa., since there was extra time...

Joe NYC · Nov 28, 2021

soresu said:
Given the cost of design at 5nm I think it unlikely that they will do 3 separate designs for the CCD.

As it is they must have determined serious (and clearly lucrative) demand in the cloud market to authorize even Zen4c as a separate design.

The cheapest way to make Raphael is to make it a fraction of Genoa after Genoa is done. As almost an afterthought. Like Ryzen3D after Milan X.

We don't know if AMD has appetite to invest into a desktop gaming CPU design any more than that.

BorisTheBlade82 · Nov 28, 2021

My speculation is the following:

Raphael as well as Genoa will stay on IFoP because of the same-die strategy. IMHO it is technically not possible to have the same die use IFoP and EFB as well.
Bergamo will be the first to use EFB.
Zen5 will also use EFB - for Server and Desktop.
This will allow using Bergamo as small cores together with Zen5 via EFB in Desktop and Mobile for Big.little chiplet solutions.

soresu · Nov 28, 2021

Joe NYC said:
The cheapest way to make Raphael is to make it a fraction of Genoa after Genoa is done. As almost an afterthought. Like Ryzen3D after Milan X.

This has been AMD's strategy in essence since long before Zen.

Server first is their design philosophy, even when server launches later the die itself is designed with server in mind - with the obvious modern exception of the IOD since Rome launched with a different IOD to Matisse.

In fact I'm pretty sure I remember people talking about server first design in reference to K8 during the 2000s.

DisEnchantment · Nov 28, 2021

BorisTheBlade82 said:
Bergamo will be the first to use EFB.

So you suggest new sIOD for Bergamo?
Or you think IFOP SerDes and another kind of interconnect (e.g. HBI with repeaters ) will coexist? Which I doubt, they would need to add a lot more pads or do an RDL.
If Genoa does not have a denser interconnect then very likely that Bergamo will not have either.
Genoa will be the more mainstream part addressing much more diverse deployment scenarios than Bergamo.

Genoa, Bergamo and Turin will share the same socket SP5 and it is ~0.96 mm thinner than SP3 and ~0.45 mm thinner for the package substrate.

Turin might have a revised sIOD though because it will need to support CXL 2.0 which is a big deal.

FWIW, the capture from last presentation from AMD

And the diagram from leaked SP5 manual, looks like a match to me

soresu said:
with the obvious modern exception of the IOD since Rome launched with a different IOD to Matisse.

Even cIOD is essentially a cut up sIOD so basically everything is from Server. But perhaps this time around Raphael will have a purpose designed cIOD due to the IGPU

LightningZ71 · Nov 28, 2021

From a financial standpoint, AMD is far more able to fund the development of unique IODs and CCDs for different markets than as recentlong as three years ago. I do not find it hand to believe that AMD would develop a desktop only focused IOD, especially if it managed to allow them more die per wafer. I do not find it hard to believe that AMD would develop two unique Zen4 CCDs that have cores mix and matched internally for different focuses (all out throughput vs density and efficiency). I do still think it will be a while before anything outside of APUs and a desktop IOD won't be designed with the purpose of being in a server, with reuse in other markets being a "planned afterthought".

BorisTheBlade82 · Nov 28, 2021

@DisEnchantment
Yes, I indeed believe in a new and different IOD for Bergamo compared to Genoa. The former would employ EFB connections with the main (and maybe sole) purpose of increasing energy efficiency. Both would be socket compatible.
Bergamo will aim for the highest MPP power efficiency possible for hyper scalers to tackle Ampera et al. And for that they will need a much more efficient on-package interconnect.

As well as @LightningZ71 I am also under the impression that producing different IODs for different markets is much less taxing than producing different CCD for Server and Desktop. Even more so now that it is more or less a given that they will employ a small iGPU in the Desktop space.

jamescox · Nov 28, 2021

Joe NYC said:
The monolithic chips and interposers are dependent on the reticle size. So, say Mi100 had an interposer, so it was limited to the reticles size. Mi200 drops the interposer, replaces it with EFB, and is no longer constrained by the reticle size (which the Mi250 exceeded by good margin).

If it is 400 mm2 for IO die rather than 260 mm2 that Adore TV mentioned, then Genoa probably does not have to make any revolutionary changes, 400 mm2 can probably accommodate incremental IO increases.

The reticle size is about 800+ mm2, and it is unlikely that AMD will get anywhere close to it with any of the chips, including IO die.

As far as future packaging technologies, I don't know if either EMIB or EFB can reduce the die size if the die is pad limited. Maybe, I just don't know if there are any density increases.

EFB could make routing easier, since the bridge is elevated, above the substrate, and other traces could be going through the substrate underneath. But I don't know how big a difference that could make.

If AMD is able to use stacked connection with TSV, hybrid bond connection, those have extremely high density, and there could still metal layers under the TSVs, unrelated to TSVs, that can still be connecting to IO. It could result in 25% to 33% die size reduction and tiny savings on the CCD.

As far as keeping the IFoP just to keep the chips spread out, AMD can just use a longer bridge, leave space between the chips and fill it with mold similar to one in the EFB. In fact, if they use active bridge with L3 die, it could use some length to accommodate the SRAM (if it is, say 80 mm2) and overlap the 2 dies being connected just on the edges, not covering a big area of the chips below. It could solve concern about thermals while stacking...

Anyway, there have been just about no leaks about Raphael. We just have a vague idea that it will come about a year from now. Roughly the same time frame of RDNA3. So, if AMD is going to use some advance packaging technologies for RDNA3, why not Raphael?

It's not that AMD has very high concern about desktop and time to market in desktop space (judging by Zen3D being MIA in desktop). If it takes an extra quarter and AMD could move past IFoP on Rembrandt, maybe use the same chiplet interconnect on Rembrandt as Bergamo, that could turn out valuable in the long run...

I think several things here are probably not correct, but I don’t have much time to research in detail. The reticle size is probably still a limitation and long silicon bridges are likely not something they would want to do.

Most, if not all, of the advanced stacking technologies are still dependent on reticle size. Some of them apply packaging component to the whole wafer before it is diced. Others use a carrier wafer or a so-called “reconstituted wafer”. They place chiplets into the carrier wafer and then do more photolithography based processes on top that are dependent on reticle size. If you look at the TSMC 3D stacking overview I linked, they list reticle size limitations for all of them, I think.

It gets significantly more complicated to exceed reticle size. 2x may be not as difficult since they may be able to design it in such a way that all mask are the same. Just some connections would not connect to anything depending on how the carrier wafer is finally diced. The 1.5x (3/2) would be more difficult since may require a base mask with 2/3 of the intended interconnect and then another mask with two mirrored half’s to get the extra 1/3 of the final package. They may, in some cases, be able to make something with a repeated pattern such that it is still a single mask. Anything that isn’t a half multiplier likely gets even more difficult to design. I think they listed a 1.2 or 1.3x possibility for one of the stacking types, and that was coming later than the half multiple types.

I suspect that Bergamo may actually be a single reticle sized device. With stacking, if they can put cache into the embedded bridge silicon, then it overlaps (or “underlaps”) the other chips and doesn’t add to the 2D package area at all. In that case, the cpu chiplets may be very small, even with 16 relatively powerful cpu cores since most of the cache would be stacked. The IO die may also be significantly smaller since it replaces 12 serdes-based IFOP with stacked connections. The stacked connections should take very little die area, so the much smaller number that AdoredTV has could be the Bergamo IO die, while the Genoa IO die is still a giant serdes based device close to 400 mm2. There may be some special sauce in Genoa also. They might use some advanced packaging technology to do the really dense routing required with 12x DDR5, 12 serdes-based IFOP, and 8 x16 pci-express / IFIS. Although, I doubt that it has any embedded silicon, unless it is a simple LSI device to get the signals out of the IO die without being pad limited.

I forget what interview or article it was, but I remember someone from AMD talking about the decision to use serdes-based IFOP over embedded silicon. I remember them talking about the difficulties of doing long runs in embedded silicon bridges. They were staying with serdes-based IFOP because of those limitations. You might be able to pull it off with an active interposer, but it would take a lot of silicon, approaching full silicon interposer levels. I have considered the possibility of daisy chaining chiplets with small pieces of embedded silicon, but that seems unlikely. You would have to route signals long distances across multiple chiplets and bridges. That may take a lot of power and add latency.

My current idea of Bergamo is that it may take advantage of the area savings by stacking bridge chiplets with cache to pull of a single reticle sized device. The IO die and chiplets will need to be directly adjacent. They probably will have almost no space between them. There may be an opportunity to take advantage of the stacking technology to possibly move other components into stacked chiplets. I have wondered if it might make sense to put the memory controller in the bridge with the cache, or at least the unified memory controller to remain memory type agnostic. The IO die would likely be oddly shaped; perhaps long and narrow to accommodate 4 chiplets along each side. It may mostly contain physical interfaces with the other components being in the stacked silicon. We have had rectangular cpu die before with some of them being a relatively large aspect ratio rectangle rather than close to square.

jamescox · Nov 28, 2021

eek2121 said:
Disagree. The DDR5 market a year from now (when AM5 launches) will look very different from the DDR5 market today (when ADL-S sales caused DDR5 shortages).

There will be a massive market for DDR5 server memory at some point. I don’t know if those are going to be the same memory die due to possible ECC differences. I know that DDR5 some extra ECC built into all chips, but I didn’t think the ECC was equivalent to server memory ECC. It has been a while since I watched the video about that (Ian Cutress?). It seems like it would be the same power management chip for both and that is what is currently in short supply and might continue to be in short supply for many months.

Makaveli · Nov 28, 2021

jamescox said:
There will be a massive market for DDR5 server memory at some point. I don’t know if those are going to be the same memory die due to possible ECC differences. I know that DDR5 some extra ECC built into all chips, but I didn’t think the ECC was equivalent to server memory ECC. It has been a while since I watched the video about that (Ian Cutress?). It seems like it would be the same power management chip for both and that is what is currently in short supply and might continue to be in short supply for many months.

You are correct on die ECC of DDR5 is not the same as full system ECC on a server, the latter doesn't protect data in transit.

Doug S · Nov 29, 2021

Makaveli said:
You are correct on die ECC of DDR5 is not the same as full system ECC on a server, the latter doesn't protect data in transit.

Not only that, the on chip ECC in DDR5 is single error correct only. It can't even detect double bit errors. Even if it offered data protection in transit that wouldn't be sufficient for servers.

It is really odd that DDR5 got a weak form of built in ECC that can only correct single bit errors but without any link protection, and LPDDR5 does not appear (from all available info I've seen) to include that built in ECC but it does include link ECC (though maybe that's optional) which DDR5 does not.

Given that ECC in DDR5's half width 32 bit channels doubles the number of bits you'd need per DIMM for full ECC from 72 to 80 ECC is going to at least double the cost penalty for ECC DIMMs. Probably more than that, for the same reasons the cost penalty of DDR4 ECC DIMMs is greater than the 12.5% bit penalty. Some of the lower end more cost sensitive users of ECC in e.g. embedded markets like POS terminals will probably abandon it and decide the built in ECC in standard DDR5 is "good enough".

LightningZ71 · Nov 29, 2021

But the ECC in "standard" DDR5 is only there to allow it to meet the standards of generic DDR reliability. It's there to keep the internal data pathways in the DIMM and individual modules consistent as they expect the higher and higher operating frequencies to introduce too many random bit errors to make normal operations reliable enough for general usage. Effectively, all it does is keep the DIMM up to the standards expected of generic DDR4 with higher frequencies and higher data densities. It's best not to even consider it's existence.

Doug S · Nov 29, 2021

LightningZ71 said:
But the ECC in "standard" DDR5 is only there to allow it to meet the standards of generic DDR reliability. It's there to keep the internal data pathways in the DIMM and individual modules consistent as they expect the higher and higher operating frequencies to introduce too many random bit errors to make normal operations reliable enough for general usage. Effectively, all it does is keep the DIMM up to the standards expected of generic DDR4 with higher frequencies and higher data densities. It's best not to even consider it's existence.

I don't think it is the higher operating frequencies it is the smaller capacitors. It isn't needed for DDR5 today but DDR5's roadmap extends all the way to 64 Gb chips - 4x more dense than today's 16 Gb DDR5 chips.

Seems like it would make sense to pursue multilayer designs like NAND did when the cells got too small, which allowed them to use much bigger cells and avoid the issues. I don't know enough about how DRAM is produced to know how feasible that is, obviously if it was easy they would already be doing it...

uzzi38 · Nov 29, 2021

jamescox said:
There will be a massive market for DDR5 server memory at some point. I don’t know if those are going to be the same memory die due to possible ECC differences. I know that DDR5 some extra ECC built into all chips, but I didn’t think the ECC was equivalent to server memory ECC. It has been a while since I watched the video about that (Ian Cutress?). It seems like it would be the same power management chip for both and that is what is currently in short supply and might continue to be in short supply for many months.

Current projections still suggest DDR5 shipping volume will only overtake that of DDR4 in late 2023. We're a while away from DDR5 taking over I'm afraid.

Ajay · Nov 29, 2021

uzzi38 said:
Current projections still suggest DDR5 shipping volume will only overtake that of DDR4 in late 2023. We're a while away from DDR5 taking over I'm afraid.

So, a more or less typical change over period for new DRAM technologies.

Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Senior member

Golden Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Golden Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Golden Member

Platinum Member

Senior member

Senior member

Senior member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Lifer