Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 22 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
Heat dissipation method will be interesting, though.
I wonder if solid thermal vias are a possible partial solution to it.

I mean, they are already using TSV's for power and data transfer through stacks, so it stands to reason that thermal vias are viable - perhaps even using the power/data vias if there are enough of them.

Add that to greater perf/watt from the N5/N5P node change, plus the Zen3 and Zen4 uArch pef/watt improvements, plus a conservative clock frequency and I imagine that 2 high stacks are definitely viable.

Likely there will also be some sort of thermal management structure embedded into the interposer too, to conduct as much heat as possible in the reverse direction towards the reverse side of the socket and its backplate would will likely be more than just a solid chunk of metal for HS/F related structural support.
 

Ajay

Lifer
Jan 8, 2001
15,458
7,862
136
Thinking about it, it would likely be better for AMD to increase the CCD die size if needed to get more cores in while still upping the # of xtors per core for higher throughput per clock. This assumes that they can get enough wafers from TSMC to match their sales expectations (which clearly are going in the right direction).
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
I wonder if solid thermal vias are a possible partial solution to it.

I mean, they are already using TSV's for power and data transfer through stacks, so it stands to reason that thermal vias are viable - perhaps even using the power/data vias if there are enough of them.

Add that to greater perf/watt from the N5/N5P node change, plus the Zen3 and Zen4 uArch pef/watt improvements, plus a conservative clock frequency and I imagine that 2 high stacks are definitely viable.

Likely there will also be some sort of thermal management structure embedded into the interposer too, to conduct as much heat as possible in the reverse direction towards the reverse side of the socket and its backplate would will likely be more than just a solid chunk of metal for HS/F related structural support.
No idea. I only remember 2 things from their press event in October. They could have increased core counts, but decided to save it for Zen 4, and they could have extracted even more performance but decided to space it out. I wouldn't be surprised if Zen 4 increases core counts and increases IPC to an average of 30% over Zen 3. Current prices should hold for Zen 4, but will see another notable increase for Zen 5 in 2022?
 

Cardyak

Member
Sep 12, 2018
72
159
106
30% IPC increase seems way to generous. I don’t think we’ve ever seen an increase that large outside of a full redesign from an old failing architecture (Akin to Netburst -> Core)

Given how large the leap was in Zen 3, I’m going to hedge my bets on the conservative side of things and stake a claim that Zen 4 will have between 12-15% IPC increase.
 

HurleyBird

Platinum Member
Apr 22, 2003
2,684
1,268
136
30% IPC increase seems way to generous. I don’t think we’ve ever seen an increase that large outside of a full redesign from an old failing architecture (Akin to Netburst -> Core)

Given how large the leap was in Zen 3, I’m going to hedge my bets on the conservative side of things and stake a claim that Zen 4 will have between 12-15% IPC increase.

DDR5 and an interposer-based package with a new IOD plus on-package memory/L4 might yield 15% or more performance improvement by itself, without even touching the fundamental architecture.

Zen3 is a lot more impressive than the 19% IPC uplift would make it seem, because the IOD which ended up totally unchanged was already something of a bottleneck for Zen 2.

It's doubtful that Zen 4's microarchitectural uplift will be anywhere near as impressive as Zen 3's... but add on package, process, and platform-level improvements and I think we're looking at an absolute monster.
 
  • Like
Reactions: Tlh97

CHADBOGA

Platinum Member
Mar 31, 2009
2,135
832
136
DDR5 and an interposer-based package with a new IOD plus on-package memory/L4 might yield 15% or more performance improvement by itself, without even touching the fundamental architecture.

Zen3 is a lot more impressive than the 19% IPC uplift would make it seem, because the IOD which ended up totally unchanged was already something of a bottleneck for Zen 2.

It's doubtful that Zen 4's microarchitectural uplift will be anywhere near as impressive as Zen 3's... but add on package, process, and platform-level improvements and I think we're looking at an absolute monster.

How much of an uplift did we see in systems when the transition from DDR3 to DDR4 happened?
 

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
The uplift depended, largely, on how memory bandwidth constrained the benchmark was, and if the ddr 3 system was using highly over locked RAM. Remember that, at the bitter end of the DDR 3 era, the high performance DIMMS were pushing transfer rates equivalent to the entry level DDR4 DIMMS at that time.
 

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
As for the uplift expected for Zen4, if Nosta is to be believed at all, there were two designs in the pipe for Zen3, one for N7, and one for N5. The N5 was supposed to be more ambitious. N5 brings a significant increase in density, allowing far more transistors to be thrown at performance for the same die size. While x86 isn't particularly friendly for going wider, there are other things that can be touched.
 

CHADBOGA

Platinum Member
Mar 31, 2009
2,135
832
136
As for the uplift expected for Zen4, if Nosta is to be believed at all, there were two designs in the pipe for Zen3, one for N7, and one for N5. The N5 was supposed to be more ambitious. N5 brings a significant increase in density, allowing far more transistors to be thrown at performance for the same die size. While x86 isn't particularly friendly for going wider, there are other things that can be touched.

Reminds me of the famous quote from Tropic Thunder.

Just because Intel has been stuck on 14nm forever, I really, really doubt that AMD are going to produce a 3rd version of Zen, on 7nm.
 

HurleyBird

Platinum Member
Apr 22, 2003
2,684
1,268
136
The uplift depended, largely, on how memory bandwidth constrained the benchmark was, and if the ddr 3 system was using highly over locked RAM. Remember that, at the bitter end of the DDR 3 era, the high performance DIMMS were pushing transfer rates equivalent to the entry level DDR4 DIMMS at that time.

And for a lot of tasks with Ryzen, it seems like having a high infinity fabric clock (at a 1:1 ratio) is just as important as actual memory bandwidth. Zen 4 should support a much more aggressive infinity fabric to keep up with DDR 5, and outside of the high core count parts, that might actually be the more important consideration.
 

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
Reminds me of the famous quote from Tropic Thunder.

Just because Intel has been stuck on 14nm forever, I really, really doubt that AMD are going to produce a 3rd version of Zen, on 7nm.
Oh heavens no! No way do they do a third design targeted at N7. I think that a Zen3+/Zen4 on N5 is up next. The N5 design may be just Zen3 expanded instead of being another major retooling of the core.
 
  • Like
Reactions: Tlh97 and CHADBOGA

Kedas

Senior member
Dec 6, 2018
355
339
136
If DDR5 doesn't pick up very fast do you think AMD will replace the Zen 3 dies with Zen4 dies to fill the gap?
The AM5 Zen4 CPU's would still have some big advantages. Since Zen4 is 5nm I don't think die size will change much, maybe even smaller.
 

coercitiv

Diamond Member
Jan 24, 2014
6,211
11,933
136
if Nosta is to be believed at all
Name one process & architecture combo "leak" that Nosta talked about in the past and turned out to be true. The amount of fantasy nodes and architectures is staggering, and yet people still eat this crap with a spoon.

You would have better chances at predicting the future of AMD products by tossing a coin. My cat would have better chances of predicting AMD product & node mix, and I don't have a cat. It's still better than what Nosta predicts because me getting a cat and using it to make predictions is still within the realm of possibility in this universe.
 

HurleyBird

Platinum Member
Apr 22, 2003
2,684
1,268
136
If DDR5 doesn't pick up very fast do you think AMD will replace the Zen 3 dies with Zen4 dies to fill the gap?

They'll produce and sell both at the same time. Zen3 should still be pretty competitive with Alder Lake, and compared to Zen 4 is on a different node, and uses a different memory standard and platform. As the last AM4 and 7nm CPU (and maybe even the last generation that uses GF for the IOD) it wouldn't surprise me if AMD ends up still producing Zen 3 in good numbers even during the Zen 5 generation.
 
  • Like
Reactions: Martimus

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
They'll produce and sell both at the same time. Zen3 should still be pretty competitive with Alder Lake, and compared to Zen 4 is on a different node, and uses a different memory standard and platform. As the last AM4 and 7nm CPU (and maybe even the last generation that uses GF for the IOD) it wouldn't surprise me if AMD ends up still producing Zen 3 in good numbers even during the Zen 5 generation.
I do wonder if AMD may produce an AM4+ platform that supports some AM4 SoC's (maybe just Vermeer+) for retaining DDR4 memory compatibility.

If the Raphael IOD supports both DDR4 and DDR5 then they can target AM4+ as a budget platform for Zen4 until DDR5 becomes moderately affordable.

I think it less likely though - they may just push AM5/DDR5 regardless, but I do hope that this isn't the case, it will definitely put a dent in Zen4 sales if that happens.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,353
1,547
136
If the Raphael IOD supports both DDR4 and DDR5 then they can target AM4+ as a budget platform for Zen4 until DDR5 becomes moderately affordable.

I don't believe that this is feasible. DDR5 is electrically much, much more different from DDR4 than basically any DDR standard has been from it's predecessor. To be compatible with both basically means shipping two separate memory controllers.

I think it's much more likely that they make use of the fact that the IO die is separate and ship the same CCD with two different IO die chiplets as different SKUs to have compatibility with both DDR4 and 5.
 
  • Like
Reactions: Tlh97 and CHADBOGA

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,666
136
I think it's much more likely that they make use of the fact that the IO die is separate and ship the same CCD with two different IO die chiplets as different SKUs to have compatibility with both DDR4 and 5.
And that's what the rumored Warhol seems to be about, just that that's Zen 3 still.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
And for a lot of tasks with Ryzen, it seems like having a high infinity fabric clock (at a 1:1 ratio) is just as important as actual memory bandwidth. Zen 4 should support a much more aggressive infinity fabric to keep up with DDR 5, and outside of the high core count parts, that might actually be the more important consideration.
Just rambling here from me:

DDR5 technically starts at 3200 MT/s (1600 MHz) but Hynix have taken the stance that they'll start their range at 4800 MT/s (2400 MHz). There's no easy way to divide that into a current reasonable IF speed. (1:1 IF is 2400 MHz and 1:2 IF is 1200 MHz).

Speeding up the IF and moving to DDR5 and the interposer work -- you start to shed some of the latency problems. And it may make an L4$ that some have mentioned meaningless... but let me explain my thinking.

I think using the Firestorm core as a bellwether for high efficiency and high performance is reasonable. Anything bolded is from actual reported figures in AT articles, the ~ designates my rough estimate from the charts included.

Full Random Test DepthFirestorm (M1)Zen 3 (5950X)Zen 2 (EPYC 7742)Zen 2 (3950X)
1MB18 cycles (~5.5 ns)42 cycles (~8.3 ns)?36 cycles (~7.9 ns)
2MB18 cycles (~5.5 ns)48 cycles (~9.5 ns)?42 cycles (~9.2 ns)
4MB20 cycles (~6.25 ns)51 cycles (10.141 ns)34 cycles (10.27 ns)43 cycles (9.267 ns)
8MB27 cycles (~8.5 ns)53 cycles (~10.5 ns)?44 cycles (~9.5 ns)
16MB83 cycles (~26 ns)68 cycles (~13.5 ns)?115 cycles (~25 ns)
32MB269 cycles (~84 ns)177 cycles (~35 ns)?299 cycles (~65 ns)
64MB298 cycles (~93 ns)313 cycles (~62 ns)?345 cycles (~75 ns)
128MB310 cycles (96.834 ns)398 cycles (78.827 ns) (160MB)384 cycles (113 ns) (NPS4)396 cycles (86.09 ns) (160MB)

But I think the key thing we have to consider for Zen 4 is the type of workload they are optimizing for. Obviously there are big differences in how the cache hierarchy is set up between M1 and Zen 2 and Zen 3. It's obviously been tailored nicely to the workload. But the workload considerations for Zen 3 are probably a lot different from those on M1.

Note the Zen 3 penalty on L3$ size - you're seeing a creep in latency from 16 -> 32MB test depth. One would imagine that taking the L3$ size to 48 or 64MB is only going to hurt latency more. Nothing you do to the IF is going to help that. If we presume the IF speed will rise from 1600 MHz to 2400 MHz, and we can cut 33% off the latency (it won't be that perfect) then you have a situation where adding another 16MB L3$ to the existing 32MB might make latency on a 48MB load as high as latency of just sending it to system memory, despite it "technically" fitting into L3$. So optimizing the IF is definitely going to help in these larger test depths with mitigating cache usage on the CCD.

The way I see it, a faster IF and faster DDR5 memory may let AMD reconfigure their chiplet design a bit. One could consider dropping L3$ to 16MB again, making it open to all 8 cores still, but opening up the L2$ to 1MB/core with the extra space. One could consider using good yields to allow a larger 16-core CCD with reduced per-core but same total 32MB shared L3$ on the CCD. One could just keep 8-core CCDs and reduce L3$ size and make the CCD smaller, allowing you to shove more CCD chiplets on each chip. You could add an L4$ chiplet with the freed-up space, but that seems messy. Or you just leave it as-is, and take the L3$ miss benefit. Who knows?! There would just be so much room for activities! But I'd like to see a focus on the L1/L2 cache portion of the memory subsystem instead of just continuing to focus on the big, slow stuff.
 

Cardyak

Member
Sep 12, 2018
72
159
106
But opening up the L2$ to 1MB/core with the extra space. One could consider using good yields to allow a larger 16-core CCD with reduced per-core but same total 32MB shared L3$ on the CCD.

Not sure about the L3 hypothesis, but the L2 doubling from 512KB to 1MB in Zen 4 has been rumoured for a while now. I’ll be very surprised if it doesn’t happen.
 
  • Like
Reactions: lightmanek

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,666
136
DDR5 technically starts at 3200 MT/s (1600 MHz) but Hynix have taken the stance that they'll start their range at 4800 MT/s (2400 MHz). There's no easy way to divide that into a current reasonable IF speed. (1:1 IF is 2400 MHz and 1:2 IF is 1200 MHz).
Renoir already moved toward async IF, clocking it on demand depending on bandwidth and latency requirements. I'd expect the new desktop/server IMC to make use of a similar approach.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
Renoir already moved toward async IF, clocking it on demand depending on bandwidth and latency requirements. I'd expect the new desktop/server IMC to make use of a similar approach.
It also seems to have had no effect on cycles of latency for 160MB chunks (comparing 4900H to 3950X), which indicates to me that despite the 70-80% power savings the asynchronous IF gives, it does not have deleterious performance.

Can they achieve the same performance with a separate IOD as well? What would go into that engineering project? Would it be a function of interposer, or IOD node advancement, or both, or more stuff I am not thinking of now?
 
  • Like
Reactions: lightmanek

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,666
136
Can they achieve the same performance with a separate IOD as well? What would go into that engineering project? Would it be a function of interposer, or IOD node advancement, or both, or more stuff I am not thinking of now?
Possibly latency is worse with async IF compared to sync, just that it adds roughly the same amount as the separate IOD does?

Regarding what IOD or alike design AMD would go for the next platform, there are several goals they would want to achieve:
  • more bits per joules (this one is rather easy as the currently used SerDes over MCM package substrat is already one of the worse choices in that regard).
  • reach for as many chiplets as possible (this is why for Zen 2 SerDes over MCM package substrat was retained while silicon interposers and bridges were under consideration), not sure what possibilities there are now in using all the package space and at least the same amount of chiplets. For classic interposers in server chips it would have to be either too big or too many of them to be really feasible.
  • low latency seems to be of relative lateral importance, with other design choices working on improving it (L3$ in general, Infinity Cache).
I'm not seeing any obvious upgrade to the status quo there.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,675
3,801
136
Not sure about the L3 hypothesis, but the L2 doubling from 512KB to 1MB in Zen 4 has been rumoured for a while now. I’ll be very surprised if it doesn’t happen.

I would be shocked if that didn't happen. I was going to post this but you beat me to it. Other than cutting L1 in favor of uop cache, I don't think there's been any changes. With Intel moving to a 1.25MB L2, a 1MB L2 seems like a no brainer. The L3 seems to be doing a fine job as is. I think it's time to work on the L2 in particular. If they can increase the L1 modestly without a latency bump, that could be worth it as well.
 

randomhero

Member
Apr 28, 2020
181
247
86
Several pages ago I've speculated about ST performance of Zen4.
I changed my mind after reading some of posts. I am of belief that with 5 nm clocks will go down. What I think I got wrong was that IPC would not compensate enough.
What I have forgotten is that new processes could cut down latencies CCD wide and what I have forgotten even more is advanced packaging. They can get higher inter CCD bandwidth also, from 32 bit to 64 bit per lane per cycle. They could get rid of of SerDes links on package and go wide with some sort of the silicon bridge(full interposer or silicon interconnect bridge) further improving on package bandwidth and possibly latencies as well.
After seeing what have they accomplished with Zen3, how they optimised design to extract as much as possible from limited resources(transistor performance,execution width, etc.), Zen4 on 5nm could come as quite of shock to industry regarding gen to gen uplift in performance.
I have definitely missed a metric ton of things that could be done to improve design from Zen3 to Zen4 since my knowledge on the subject is shallow as pudlle. Thankfully, that's where you guys(and galls!) come in! 🙂
 

DisEnchantment

Golden Member
Mar 3, 2017
1,608
5,816
136
The outlook now though does not seem good with N5 for AMD.
With the situation as it is right now with wafer availability, would make AMD think hard about relying solely on N5 for their next gen products.
Additional problem is that AMD would have to be very conservative with die size to maximize products from each wafer.

Therefore, I think that AMD will continue to use N7 until 2022 and probably beyond if they want to sell anything in volume.
If N6 is what TSMC promised it to be, then it makes business sense that AMD capitalize on improved capacity of N7 lines next year, which by extension can be used to fab N6 wafers.
N7 is going to be in its 3rd year next year, and capacity should be able to reach 180K-220K wpm, from current 140-150K wpm.

I am kind of hoping they would do both N5 and N6 designs next refresh and I think most of their products would continue to be on N7/N6.
They will not be able to sell anything in substantial volumes if they go all N5 for their next products, even in 2022.


This leads me to believe there is an even more critical need for a better physical interconnect to continue scaling performance. And interposers and stacked devices should happen to continue the momentum.
I suppose next year, doing 100mm2+ N7N6 chips should be feasible for AMD's yield and cost targets, would a 70mm2 N5 chip going to net more transistors than a 100mm2 N6 chip ?

Rough calculation using TSMC's numbers would say 100mm2 N6 dies would have roughly similar xtor count as 70mm2 N5 dies, (~40% bigger but~45% less dense), advantage being that there are lots more of N7/N6 wafers, around 200K wpm lets say.

Allocation of transistor budget per core has to go up to allow engineers more room for a more comprehensive design uplift. At the same time newer interconnects should help with doing away with the SerDes on package.
Additional uplifts in efficiency would help too.
Looking at N5 availability next year, it is hard to say how many products will move to N5, probably Mobile only. And probably few halo products.
 
  • Like
Reactions: Tlh97