Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 118 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
2c Zen4-lite APU on 12nm would be quite enough horsepower for value segments. If they stick with SMT2 that's four big threads. If they go beyond we can hope for an arrangement like four big threads, and four background threads, each with the ability to execute without speculation, and even in-order.

Noting Nosta's above quote and yours, you'd have to compare the cost of producing a 4C + iGPU die at TSMC on N7 vs. producing the same die at GF on 12LP+ (or, if it actually produces something) 12FDX. While 12LP+ may, indeed, be more expensive than 12LP, it may be that the deal that was worked with AMD (should this thing ever see the light of day) was good enough to bring both a profit AND cover the opportunity cost of making it at GF instead of TSMC.

I suspect that a 4C+RDNA2 small APU at GF on some 12nm node is going to be less expensive to produce and package at GF then at TSMC on N7/N6. Even if it's less than a dollar a die, that's still a savings, and extra volume, which AMD is likely desperate for.
 

amd6502

Senior member
Apr 21, 2017
971
360
136
Maybe not just cost, but availability. Depending on how long the fab shortages continue to go on, they may have had to foresee the production problems like over a year ago (for it to make a big difference). What we are seeing weirdly, is that erring on abundance of designs also lowers risks in the way of having more security of alternative supply.

The traditional common sense decision is that you don't backport new designs to older nodes. This is turning out to be incorrect.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
Something I came across:

X3D is how AMD often describes 3D stacked cache, which in this arrangement also acts as a bridge between GCDs. MCD has been the term used for this die.

It is just a patent, not a road map, but some of the speculation around RDNA3 is based on this sort of arrangement.

View attachment 52771

I don’t have time to study the patent, but if they can use SoIC stacking (no micro-solder bumps; direct copper to copper bond), then there would be almost zero penalty for off die cache. If they make such infinity cache devices for GPUs, then I would think they would reuse them in Epyc CPUs also.

We still don’t know what the cache hierarchy in Bergamo will look like. The TSV area may be relatively small, so perhaps they do not have to worry about the cache stacking over top of cores. If they can use SoIC, then it may not have any L3 cache, just the L3 cache chip. It seems to be one large die in this case, so for cpu usage, it seems like they might need 2 of them, one on each side, unless they handle IO differently. That could mean 1 GB of L3 cache, so more than current Milan-x at 768. Genoa-x, if it exist, could be up to 1152 MB though, with single stacks.
 
  • Like
Reactions: Tlh97 and Joe NYC

yuri69

Senior member
Jul 16, 2013
373
573
136
AMD summed their reasoning behind using a traditional package substrate solution for Zen 2/3 in their "Pioneering Chiplet Technology and Design for the AMD EPYC and Ryzen Processor Families" paper.

The main points are:
* even the top server config (8 CCD SKUs) do not require extensive bandwidth (compared to GPUs/accelerators) and therefore signal routing
* bridges and interposters feature a limited signal reach which would not permit an effective 8 CCD + IOD layout
* the required area of the interposter would hit the reticle limit which would require working around
* interposters ain't cheap

Genoa upped to max number of CCDs and added DDR5 b/w requirements but is that enough for a new, expensive, tech?
 

eek2121

Platinum Member
Aug 2, 2005
2,904
3,906
136
Sorry, I am a bit behind since I am recovering from surgery, but I wanted to reply to these gems.

Zen4 mobile 65W category is MCM, the rest is monolithic. Zen5 will be MCM across the stack.
65W != mobile. AMD likely is 45-54W cTDP, with a possible 35W more on that in a sec…)
Wondering though to what extent. e.g. even with earlier Zen parts we see the desktop parts on some "desktop replacement" laptops. Question is whether the Raphael mobile chips would fit in there or more in the 35/45 W TDP. I'm particularly worried about idle power usage.
Why does this keep getting brought up? AMD already knows how to lower power usage:
  1. Die shrink
  2. Dynamic IF clock
  3. Dynamic memory speeds
Once they implement those things, idle power will be near the same levels as monolithic.
Yeah, sadly and inevitably, Ryzen desktop was mainly a way to build the Zen ecosystem and reputation. Server is where the big $$s are; all the more so since AMD is capacity constrained.

Why does this keep getting brought up? Not only has AMD refuted it, but DIY is their bread and butter. Server and mobile could not survive without DIY. AMD needs the entire stack, for more reasons than 1.
 
  • Like
Reactions: lightmanek

andermans

Member
Sep 11, 2020
151
153
76
Why does this keep getting brought up? AMD already knows how to lower power usage:
  1. Die shrink
  2. Dynamic IF clock
  3. Dynamic memory speeds
Once they implement those things, idle power will be near the same levels as monolithic.

Mostly because I don't share the same confidence that (a) AMD will do those things for Raphael-H (I mean 2 out of 3 would be possibilities in the Renoir/Cezanne era) and (b) that these things will bring it down to near monolithic levels. We're talking going ~15W -> ~3W avg. on light workloads (e.g. light webbrowsing) and I don't even see dynamic memory speeds closing the gap considering that a monolithic die can benefit from it as well.
 

BorisTheBlade82

Senior member
May 1, 2020
660
1,003
106
@eek2121
Sorry, but most of your post is bollocks. Servers and also Mobile is much more important to AMD than DIY. Everything else is just wishful thinking from your side.
Furthermore it is a FACT that the IFOP interconnect needs much more energy compared to monolithic solutions. This is why everyone and their dog is talking about silicon interconnects.
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
I expect CCIX and Gen-Z to be supported, but CXL supersedes both so new projects making use of the former is not recommended.
Update on this very topic, CXL is about to absorb everything Gen-Z:

"Looking to the future, the CXL Consortium and Gen-Z Consortium have identified synergies between the two consortia that resulted in the signing of a Letter of Intent which, if passed and agreed upon by all parties, would transfer the Gen-Z Specifications and all Gen-Z assets to the CXL Consortium."
 

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
Why does this keep getting brought up? Not only has AMD refuted it, but DIY is their bread and butter. Server and mobile could not survive without DIY. AMD needs the entire stack, for more reasons than 1.

One look at their upcoming product stack should tell you everything you need to know. Genoa is already shipping in limited quantity through ODM channels, MilanX is already in the hands of hyperscalars (and has been for a little while), but where is Raphael? Or Zen3D?


Allegedly we get Zen3D in January. But as of yet that hasn't been officially announced, and it might just be a Vermeer refresh. We don't know, and AMD isn't saying anything.
 

Joe NYC

Golden Member
Jun 26, 2021
1,898
2,192
106
I don’t have time to study the patent, but if they can use SoIC stacking (no micro-solder bumps; direct copper to copper bond), then there would be almost zero penalty for off die cache. If they make such infinity cache devices for GPUs, then I would think they would reuse them in Epyc CPUs also.

Hopefully, we will get a better sense, from leaks, over time if the SoIC stacked bridges make it to RDNA3, and in what form. It seems that the N5 RDNA3 (Navi31 and Navi32) would be the leading vehicles for this, and it can give us hint to what Bergamo will use.

The goal is for the whole Zen X CPU to act as one massive monolithic die.
If IFoP brough us 33% there, EFB advanced it to 66%, SoIC can get us to near 100% of the way there

We still don’t know what the cache hierarchy in Bergamo will look like. The TSV area may be relatively small, so perhaps they do not have to worry about the cache stacking over top of cores. If they can use SoIC, then it may not have any L3 cache, just the L3 cache chip. It seems to be one large die in this case, so for cpu usage, it seems like they might need 2 of them, one on each side, unless they handle IO differently. That could mean 1 GB of L3 cache, so more than current Milan-x at 768. Genoa-x, if it exist, could be up to 1152 MB though, with single stacks.

If there were to be a SoIC stacked bridge, they could just as well use up some of the silicon for L3 and take L3 completely out of the CCD. That way, N5 CCD would be the most efficiently utilized for logic, cheaper (N6 process) for cache

As opposed to another, separate level in cache hierarchy, AMD could use the fast, high bandwidth low latency connection to facilitate sharing of L3. Something along the lines of what IBM demonstrated. To get the maximum usage out of the silicon invested in L3s.

Did IBM Just Preview The Future of Caches? (anandtech.com)

For example if the L3 that's adjacent to one CCD is mostly unused, another CCD could use this L3 as a victim cache, and still be able to retrieve the content extremely quickly.
 

Joe NYC

Golden Member
Jun 26, 2021
1,898
2,192
106
AMD summed their reasoning behind using a traditional package substrate solution for Zen 2/3 in their "Pioneering Chiplet Technology and Design for the AMD EPYC and Ryzen Processor Families" paper.

The main points are:
* even the top server config (8 CCD SKUs) do not require extensive bandwidth (compared to GPUs/accelerators) and therefore signal routing

Adding massive L3, sharing that L3 among the CCDs will become priority, and that content from shared L3 could use massive bandwidth.

* bridges and interposters feature a limited signal reach which would not permit an effective 8 CCD + IOD layout
* the required area of the interposter would hit the reticle limit which would require working around
* interposters ain't cheap

Interposer spanning the entire chip has probably been obsoleted by advances in bridging technologies. Sapphire Rapids opted for EMIB, Mi200 opted for EFB bridges. Giant interposer may be as good as dead.

At the time of Zen 2 release, various bridging technologies being discussed were not ready, or had problems.

Bergamo will be released nearly 4 years after Zen 2, so what was true in 2019 will not necessarily still be true in 2023.
 
  • Like
Reactions: Tlh97

Jwilliams01207

Junior Member
Dec 6, 2013
24
2
71
Thermal density is already a problem now, and AMD is already tackling it in several ways. Will it increasingly be a problem with denser nodes? Yes. Is it an unsolvable problem? As Zen 2 has shown, no.
 

uzzi38

Platinum Member
Oct 16, 2019
2,565
5,575
146
AMD summed their reasoning behind using a traditional package substrate solution for Zen 2/3 in their "Pioneering Chiplet Technology and Design for the AMD EPYC and Ryzen Processor Families" paper.

The main points are:
* even the top server config (8 CCD SKUs) do not require extensive bandwidth (compared to GPUs/accelerators) and therefore signal routing
* bridges and interposters feature a limited signal reach which would not permit an effective 8 CCD + IOD layout
* the required area of the interposter would hit the reticle limit which would require working around
* interposters ain't cheap

Genoa upped to max number of CCDs and added DDR5 b/w requirements but is that enough for a new, expensive, tech?
Well the biggest selling point of FOEB/EFB is that it's cheaper than EMIB. In all of the press releases they never mention any sort of pJ/bit figures (even when referencing EMIB's shortly beforehand), only the fact that it's cheaper (and higher yielding) than EMIB.

That's a pretty important point given that EMIB's biggest selling point over other 2.5D solutions is that it was also cheaper than those before it that utilised a full sized interposer.

Besides, Intel will also start using EMIB in SPR and MTL, so it's not like Intel has a cost advantage. Rather, given that Genoa has a significantly higher performance ceiling than SPR, AMD likely has more wiggle-room for fancier techs like this than Intel does.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
Adding massive L3, sharing that L3 among the CCDs will become priorityu, and that content from shared L3 could use massive bandwidth.



Interposer spanning the entire chip has probably been obsoleted by advances in bridging technologies. Sapphire Rapids opted for EMIB, Mi200 opted for EFB bridges. Giant interposer may be as good as dead.

At the time of Zen 2 release, various bridging technologies being discussed were not ready, or had problems.

Bergamo will be released nearly 4 years after Zen 2, so what was true in 2019 will not necessarily still be true in 2023.
TSMC already has the ability to do stacked packaging at least 2x reticle size, possibly larger, for some packaging types. I don’t know what the penalty is for exceeding the reticle size. There is probably some added cost for going larger.

I don’t know if Bergamo can be done in a single reticle sized package. Current Epyc is around 1000 mm2 total with IO die plus 8 cpu die, so it seems like it should be at least 1.25X. The IO die might be smaller since it doesn’t need 8 or 12 serdes based IFOP. The stacked connections would use very little die area compared to pci-e 5 or 6 speed physical links. They could also take more advantage of the stacking in some manner to get it down to 1 reticle. I have wondered if they would separate the unified memory controllers from the physical interfaces somehow. Stacking may allow that, so it may not be a single chip IO die. It would be quite high thermal density with 128 cores packed into a 1 reticle sized area.
 

Ajay

Lifer
Jan 8, 2001
15,332
7,792
136
One look at their upcoming product stack should tell you everything you need to know. Genoa is already shipping in limited quantity through ODM channels, MilanX is already in the hands of hyperscalars (and has been for a little while), but where is Raphael? Or Zen3D?


Allegedly we get Zen3D in January. But as of yet that hasn't been officially announced, and it might just be a Vermeer refresh. We don't know, and AMD isn't saying anything.
Yep, Ryzen showed the world that AMD was back, but the total market value of server CPUs these days insane. AMD are just going where the money is. So, talky, talky about server stuff, shut up about DIY. Or so it seems.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,743
3,075
136
I think you guys are over valuing the margin on server product ( When i look at the discounts we get from HPE/DELL etc on the CPU) and under valuing on DIY especially on the higher end parts. im sure it will come down to competitive position vs revenue vs margin. Especially given AMD will be wafer limited.

Grow market share with 5nm hold market share in existing area's with 7nm. Thats an interesting consideration with V cache as well.
 

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
I think you guys are over valuing the margin on server product ( When i look at the discounts we get from HPE/DELL etc on the CPU) and under valuing on DIY especially on the higher end parts. im sure it will come down to competitive position vs revenue vs margin. Especially given AMD will be wafer limited.

Grow market share with 5nm hold market share in existing area's with 7nm. Thats an interesting consideration with V cache as well.

There's no denying that high end desktop, even the non-Threadripper parts, can make a lot of money. The problem is that server can make even more and when wafers are limited you need to prioritize. Notice we still don't have a bottom half of the product stack for Zen 3 and it's possible we never will.
 
  • Like
Reactions: Tlh97

itsmydamnation

Platinum Member
Feb 6, 2011
2,743
3,075
136
There's no denying that high end desktop, even the non-Threadripper parts, can make a lot of money. The problem is that server can make even more and when wafers are limited you need to prioritize. Notice we still don't have a bottom half of the product stack for Zen 3 and it's possible we never will.
your ignoring the part where i told you i know what my companies buy price is for Zen2/3 cpu's (32,48,64 core configs) from the likes of Dell/HPE, that's why i said i think some of you are over estimating how much margin AMD makes from server CPU's relative to desktop CPU's.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
your ignoring the part where i told you i know what my companies buy price is for Zen2/3 cpu's (32,48,64 core configs) from the likes of Dell/HPE, that's why i said i think some of you are over estimating how much margin AMD makes from server CPU's relative to desktop CPU's.
Milan seems to be hard to get right now. AMD will try to fill demand for server parts before releasing Threadripper based on Zen 3. Regardless of how much money they make in the server market, it is really bad when companies come up with a specification using AMD parts and then cannot get supply of those parts to build the machines. Margins on Epyc should still be quite high and volume should be significantly higher than Threadripper.
 

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
your ignoring the part where i told you i know what my companies buy price is for Zen2/3 cpu's (32,48,64 core configs) from the likes of Dell/HPE, that's why i said i think some of you are over estimating how much margin AMD makes from server CPU's relative to desktop CPU's.

Unless it's ridiculously inexpensive, the cost for AMD to make an Epyc or Threadripper part isn't that much greater than one of their desktop CPUs. They also use some of the chiplets (i.e., any that only have four working cores) that aren't usable in any desktop parts because they haven't released the bottom of their product stack there.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,743
3,075
136
Unless it's ridiculously inexpensive, the cost for AMD to make an Epyc or Threadripper part isn't that much greater than one of their desktop CPUs. They also use some of the chiplets (i.e., any that only have four working cores) that aren't usable in any desktop parts because they haven't released the bottom of their product stack there.
Every 32/48/64 core epyc we buy has full l3 cache , ie 8ccds , the uncore is also way bigger.

A good 4 X in manufacturing costs then a 2 ccd ryzen part.
 

tomatosummit

Member
Mar 21, 2019
184
177
116
Every 32/48/64 core epyc we buy has full l3 cache , ie 8ccds , the uncore is also way bigger.

A good 4 X in manufacturing costs then a 2 ccd ryzen part.
You're not wrong but I think people are just underestimating the margins on desktop cpus, especially with zen3 parts.
compare the zen2 desktop cpus where the 3300 was $130 and 3600 was easily available for ~$160 to the cheapest 5600x which still is $300. Almost the same BOM for them but over twice the cost of the cheapest parts. We know from amd's own reveals and slides that a dual ccd cpus are only 60% more expensive to manufacture but are selling for more than double the cost on the cheapest parts again.
The 5000x ryzens have extreme margins no matter how you look at it.

On the otherhand a distributer (dell/hp) for server cpus is going to be purchasing from amd in much higher volumes and consistently as well. AMD can afford to reduce margins there and aren't dell's asking prices above the rrp anyway so their discounts aren't as kind as they seem.
This is again coupled with the large increases to asking price for epyc cpus as each zen generation launched. MilanX is going to be eye watering.

I am aware I've missed out various things like distribution and support that will chew through margins.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
Besides, Intel will also start using EMIB in SPR and MTL, so it's not like Intel has a cost advantage. Rather, given that Genoa has a significantly higher performance ceiling than SPR, AMD likely has more wiggle-room for fancier techs like this than Intel does.
SPR w/ HBM will have 14 EMIB connections. On first chiplet attempt. Kudos to Intel.
Remains to be seen what AMD has in store in this regard.

In other news ... Zen4 DIY postponed. Hopefully not :|

Trento + Genoa