Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 107 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

Gideon

Golden Member
Nov 27, 2007
1,618
3,635
136
If it's anything like what has been previously rumoured, then it's just that Zen4D is a modified Zen 4 core. As for how deep those modifications go we don't know, but cache sizes is probably one of those modifications.
I wouldn't be surprised if they do something similar to the FPU that they did with PS5 CPU (halving the FP execution units) as there is no need to waste that much die-space on that for an efficency-core.
 

Thibsie

Senior member
Apr 25, 2017
738
777
136
I wouldn't be surprised if they do something similar to the FPU that they did with PS5 CPU (halving the FP execution units) as there is no need to waste that much die-space on that for an efficency-core.

If they do as in one of their patents where an illegal instruction (unsupported) triggers the thread migration to the high performance core, then it means the efficiency core would not support all instructions the other core supports.

No Avx512? No avx2? No avx ? Something else?

Indeed they could also offer limited performance for some of these....
 

Gideon

Golden Member
Nov 27, 2007
1,618
3,635
136
If they do as in one of their patents where an illegal instruction (unsupported) triggers the thread migration to the high performance core, then it means the efficiency core would not support all instructions the other core supports.

No Avx512? No avx2? No avx ? Something else?

Indeed they could also offer limited performance for some of these....

I hope they do not go the route of unsupported instructions if they use as recent core as Zen 4. I'd much more like if they just take many clock-cycles (thus the PS5 Zen2 example that does support AVX2 AFAIK, just slower).

Regarding that patent. I hope it's talking about a third extra-ultra-lightweight dummy core, that can do the most simplest tasks (e.g. display refresh) without waking up either the "big" or the "medium" core.
 
  • Like
Reactions: Tlh97 and uzzi38

DisEnchantment

Golden Member
Mar 3, 2017
1,599
5,762
136
If it's anything like what has been previously rumoured, then it's just that Zen4D is a modified Zen 4 core. As for how deep those modifications go we don't know, but cache sizes is probably one of those modifications.
Using Zen4 core sounds like an overkill IMO, Zen3 is already fairly big compared to the Gracemont or A55 cores.
If Zen4 is LITTLE then I imagine Zen5 would be a giant. But we are talking about 2023+ and on N3 so who knows. Maybe strip off some cache and make the chip narrower from front to back.

Also too little details at the moment, don't know if this is basically within the CCD and is basically a heavily power gated Zen5 or a straight up different core (even on a different die using a different low power process).
AMD would have looked into those for sure

Regarding that patent. I hope it's talking about a third extra-ultra-lightweight dummy core, that can do the most simplest tasks (e.g. display refresh) without waking up either the "big" or the "medium" core.
In the mobile Arena this is called AOP (Always On Processor). You can power down all core clusters and wake up only when a new event is there to be processed.
This is very critical in giving cellphones long battery life short of flat out hibernating.
I agree that any mobile x86 SoC has to have one.

Another important piece of the puzzle in the mobile space(besides big.LITTLE and AOP) is DSP.
Good thing is that CVML blocks seems to be confirmed. You can offload all camera, image processing, audio processing to the DSP at fraction of the power done in SW.
Funny thing is that Android already provided both SW and HW paths for media/audio / image processing, from what I heard MS Audio stack is mostly SW with only the Codec/DAC HW.
Google Android provides HAL layers to allow offloading media and audio processing to HW if present.
Power savings are massive. MS need to rethink.
 
  • Like
Reactions: RnR_au and Tlh97

DisEnchantment

Golden Member
Mar 3, 2017
1,599
5,762
136
New EDAC/MCA patches seems to indicate Trento is is DF3.5 (compared to Milan which is DF3.0)
Future version of the fabric (DF 4.0?) is expected to used multiple DF block instances
Patches 31-32 prep for future systems including, but not limited to,
heterogeneous CPU+GPU systems.

Patch 33 adds support for systems with Data Fabric version 3.5
(heterogeneous CPU+GPU systems).

Replace watchdog cd6h/cd7h port I/O accesses with MMIO accesses

p-state driver additions for upcoming processor
Additional patches probably Rembrandt
 

andermans

Member
Sep 11, 2020
151
153
76
Kinda surprised he isn't indicating Zen5 chiplets to be 16 cores too. 32 chiplets for the 256 core Turin SKU sounds like a ton.

Also a complication wrt Zen4 IPC increases is that we don't know which SKUs (if any) will have V-Cache on top. If not, that would probably greatly diminish gains vs. Zen3D.
 

uzzi38

Platinum Member
Oct 16, 2019
2,595
5,766
146
Very intresting leaks, Zen 4D is expected if it will be used as "Little cores" in various combination.


View attachment 52290

View attachment 52291

David's Tweet has the stuff that's actually correct picked out from that video.

It's probably best to think of Zen 4D as a rebalanced Zen 4 core than a true little, because from the looks of it most of the perf/core at a given clock will still be there, and ISA support is also the same. There's actually an improvement vs the regular Zen 4 core that's not mentioned in neither the video nor the Tweet either.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,599
5,762
136

David's Tweet has the stuff that's actually correct picked out from that video.

It's probably best to think of Zen 4D as a rebalanced Zen 4 core than a true little, because from the looks of it most of the perf/core at a given clock will still be there, and ISA support is also the same. There's actually an improvement vs the regular Zen 4 core that's not mentioned in neither the video nor the Tweet either.
With V Cache and frequency and efficiency improvements on N5 makes lesser sense to go for highest perf with density tradeoff on the server, imo.
VCache can easily recover the cut in cache on Z4D and align with the timelines too. Although one downside is too many cores sharing cache in one CCX will increase latency a bit.
Zen3 L3 on N5 would consume almost 40% of the die size and it sticks out like a sore thumb already. If you share the GMI and the Debug circuitry packing 16 more cores does not seem like stretching it too far.
Charlie seems to be talking about the differentiated cores too.
But hey that guy seems to be improving, I watch him, out of sheer curiosity, after 8 months or so.
 

uzzi38

Platinum Member
Oct 16, 2019
2,595
5,766
146
With V Cache and frequency and efficiency improvements on N5 makes lesser sense to go for highest perf with density tradeoff on the server, imo.
VCache can easily recover the cut in cache and align with the timelines too. Although one downside is too many cores sharing cache in one CCX will increase latency a bit.
Zen3 L3 on N5 would consume almost 40% of the die size and it sticks out like a sore thumb already. If you share the GMI and the Debug circuitry packing 16 more cores does not seem like stretching it too far.
Charlie seems to be talking about the differentiated cores too.
But hey that guys seems to be improving, I watch him, out of sheer curiosity, after 8 months or so.

Tbh I think Bergamo will have 2 CCXes per CCD as well, no need to stretch things further there
 

Saylick

Diamond Member
Sep 10, 2012
3,114
6,260
136

David's Tweet has the stuff that's actually correct picked out from that video.

It's probably best to think of Zen 4D as a rebalanced Zen 4 core than a true little, because from the looks of it most of the perf/core at a given clock will still be there, and ISA support is also the same. There's actually an improvement vs the regular Zen 4 core that's not mentioned in neither the video nor the Tweet either.
Seems like what we currently have with the mobile Zen cores having half the cache as the desktop/server variant is analogous to what they are doing here with Zen 4D, except that Zen 4D appears to take it to the next level by halving the cache and also doubling the number of cores, so I am assuming that means 16 cores sharing 16 MB of L3. I can't help but look back on Hans' comment and conceptual take on a 16 core chiplet where there's barely any LLC because he assumed that the L3 cache sits on a vertically stacked cache die. I suppose if Zen 4D is intended to run at low clocks, i.e. well within the efficient part of the freq-voltage curve, the thermal density of such an approach might be comparable to a traditionally configured Zen 5 CCD even with a full cache die stacked above it.
 

uzzi38

Platinum Member
Oct 16, 2019
2,595
5,766
146
Seems like what we currently have with the mobile Zen cores having half the cache as the desktop/server variant is analogous to what they are doing here with Zen 4D, except that Zen 4D appears to take it to the next level by halving the cache and also doubling the number of cores, so I am assuming that means 16 cores sharing 16 MB of L3. I can't help but look back on Hans' comment and conceptual take on a 16 core chiplet where there's barely any LLC because he assumed that the L3 cache sits on a vertically stacked cache die. I suppose if Zen 4D is intended to run at low clocks, i.e. well within the efficient part of the freq-voltage curve, the thermal density of such an approach might be comparable to a traditionally configured Zen 5 CCD even with a full cache die stacked above it.
I highly, highly, highly doubt there are any TSV pads in the cache at all.

Sounds like another potential die area saving they could be making instead.
 
  • Like
Reactions: BorisTheBlade82

DisEnchantment

Golden Member
Mar 3, 2017
1,599
5,762
136
Tbh I think Bergamo will have 2 CCXes per CCD as well, no need to stretch things further there
Yeah makes a lot of sense, especially the 2 SDP per CCD seems like already a plan in advance.
L3 is chopped but L2 gets beefed.
If that is the case it would hardly lose any perf at all, just maybe clock much lower which is OK, for such high core counts anyway.
One issue with current chiplets is that they are shared between desktop and server therefore using same tradeoff and device characteristics which is not the case between RDNA2 and CDNA for example.
 
  • Like
Reactions: Tlh97 and uzzi38

yuri69

Senior member
Jul 16, 2013
386
613
136
Does it really make sense to fork the big core line for a "true big core" and a "slightly less big core"?

I mean, trading the ST for MT has traditionally been a niche - look at UltraSPARC T-line, Bulldozer, etc. Will it really take off now, in the blooming cloud era?

TBH Zen 4 cor eis starting to look less exciting. It's apparently large even at 5nm (hence Zen 4D). It looks as a simple evolution of Zen 3 (doubled FPU, doubled L2). Is going to be short-lived (~12 months till Zen 5). Osborne effect (Zen 5 being a major redesign).
 

uzzi38

Platinum Member
Oct 16, 2019
2,595
5,766
146
Does it really make sense to fork the big core line for a "true big core" and a "slightly less big core"?

I mean, trading the ST for MT has traditionally been a niche - look at UltraSPARC T-line, Bulldozer, etc. Will it really take off now, in the blooming cloud era?

TBH Zen 4 cor eis starting to look less exciting. It's apparently large even at 5nm (hence Zen 4D). It looks as a simple evolution of Zen 3 (doubled FPU, doubled L2). Is going to be short-lived (~12 months till Zen 5). Osborne effect (Zen 5 being a major redesign).

If anything the booming cloud era is probably why it exists. Upping private L2 at the cost of public L3 definitely seems like an optimisation for server workloads if you ask me.

Also, whether or not it makes sense to use Zen 4 as the base for the moderately smaller core, that depends on how big Zen 5 is, doesn't it?
 

yuri69

Senior member
Jul 16, 2013
386
613
136
Also, whether or not it makes sense to use Zen 4 as the base for the moderately smaller core, that depends on how big Zen 5 is, doesn't it?
Well, Zen 5 still sounds like a Zen core - scalable and balanced to be used form top to bottom with minor changes. AMD still has to be able to cater the 15W ultrabook market even in 2023/2024. There are two sane possibilities - AMD would fork the Zen core to another "mobile-optimized branch" or Zen 5 can't be such a big monster.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
To me, The 6xxx series should consist of Rembrandt and Zen3D. 7xxx should be Zen 4 parts. As cute/awesome as some of the comments here are, that move is the most sensible one.

From AMD’s public statements thus far, I wouldn’t expect larger CCX units with Zen 4 (before whatever the refresh is at least)

He is charging too much. Don’t let him fool you. One of my top subscription sites charges $5/mo and I have x,xxx subscribers. My niche is much smaller than his and I invest < 30 hours a month in that project.

He needs to come back down to earth.
I wasn’t talking about larger CCX with Zen 4. If that happens, it will likely be with Zen 5 and could come in several different forms. The current rumors say that the 128-core Bergamo will be made from 16-core CCDs. That makes a lot of sense if it is using smaller, cut down cores. Likely that is just 2 CCX on one chip unless they so something weird like connect 2 smaller cores to each L2 cache to double the core count without really changing the floor plan much. Zen 5 will likely use real chiplets (stacked silicon) rather than just regular chips in an MCM.
 

leoneazzurro

Senior member
Jul 26, 2016
909
1,434
136
If anything the booming cloud era is probably why it exists. Upping private L2 at the cost of public L3 definitely seems like an optimisation for server workloads if you ask me.

Also, whether or not it makes sense to use Zen 4 as the base for the moderately smaller core, that depends on how big Zen 5 is, doesn't it?

Well AMD already told us they are also going wide, so I think it's safe to think Zen5 is the point where they go wide. How they go wide, it is another question, as they seem to have another approach respect to Intel, who went for brute force.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
If it's anything like what has been previously rumoured, then it's just that Zen4D is a modified Zen 4 core. As for how deep those modifications go we don't know, but cache sizes is probably one of those modifications.
The AVX512 units are going to be huge, so just removing those and perhaps skewing the process tech more towards power efficiency may be all that is required. The amount of software that makes use of AVX512 is still presumably small and a lot of server applications don’t use much of any vector floating point, so it is a waste of die area for a lot of the market. Integer units are still very small, so perhaps they will still support the instruction set, but the floating point instructions might be executed on much smaller hardware. I don’t really expect any massive changes until Zen 5, so I am thinking that a possible 16-core CCD will still be two 8-core CCX, just using much smaller cores. This would not be a big.LITTLE type implementation since it would all be big cores or all small cores. With zen 5 using stacked die, all kinds of things are possible. Zen 4 is likely still all serdes except for possible stacked cache die.
 

CakeMonster

Golden Member
Nov 22, 2012
1,389
494
136
I have a hard time imagining Zen5 being ready in 2023. So if Zen5 is 2024, then even 8 'fat' and amazing Z5 cores + 16 'slim' and still quite powerful Zen4* cores will look a bit meager compared to what Intel is projected to have with Raptor Lake already in 2022.

This is all speculation, and AMD seem confident enough given their 5 year video and Ian's interview with their guy, so if Intel pulls ahead with more primary and secondary core counts then AMD could very well keep up on performance. But if we are to put any trust in these rumors, it looks quite obvious that its Intel that will pull ahead in core and thread count.

My personal pet peeve, fueled by speculation and ignorance of course, is that I don't like the idea of being stuck on 8 big cores for so long. In a few years, I want more than 8 big cores no matter how good the small ones are. I can easily imagine games and applications evolving with the influx of cores and I want a better baseline.
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
Why can't AMD use a desktop chip that has one CCD on a performance optimized process with a stacked L3 die, a second CCD with a core and process optimized for efficiency and go for a hybrid approach that seems tailor made for their approach?
 
  • Like
Reactions: BorisTheBlade82