Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 110 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
807
1,411
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

leoneazzurro

Golden Member
Jul 26, 2016
1,051
1,711
136
The opportunity was there and they took it. They could add more transistors in Zen2 without increasing power drastically.

Some Perspective
Zen --> Zen2
2x CCX for 8Cores + 2x 8MBL3 @2800 MTr (Entire Zeppelin is 4800 MTr) --> 1CCD/CCX for 8Core + 1x 32MBL3 3800MTr
That's 1.35x MTr gain for 15% IPC and total TDP 95W-->105W
More than half of the 1.35x gain over 2x CCX of Zen1 is due to doubling of L3, addition of 256 bit FP units and addition of GMI and SMU in Zen2 CCD
Real Core+L2 is only ~15% more MTr

Zen2 --> Zen3
~1.1x MTr gain for 19% IPC and total TDP 105W-->105W

Zen3-->Zen4
Between 25 to 40% MTr gain, IPC?
L3/SMU largely simlar MTr count

Something to put in the equation is that I/O die will likely change process, thus going to N7/N6. While the new I/O die will support quite probably PCI-E 5 and DDR5 is a given, the new process should allow anyway substantial power savings, that is, more power can go to to the CCDs at the same power envelope.
About the cores, we don't know how much of the transistor budget went to AVX512, but it is safe to think that doubling the L2 and increasing the IF links frequency (and having 2 IF links per CCD) alone will easily bring a good increase in IPC. Add to this a possible clock frequency increase die to the new process.
 

remsplease

Junior Member
Oct 22, 2021
16
3
41
Yeah, that's not the way it works. The SoC can support PCIe 5.0 with the chipset supporting PCI 4.0 (or 3.0).
DRAM support has nothing to do with the chipset.

The pcie standard has to be supported by cpu, chipset, connectors, etc for pcie5 to speeds to work. Yes, you can mix and match. You're using SOC loosely. The CPU is SoC. The chipset is SoC. The videocard GPU is SoC.

The chipsets on modern boards, which accompany a line/family of CPU's, absolutely know what memory type and speed is in use as function of their normal operating parameters.
 

remsplease

Junior Member
Oct 22, 2021
16
3
41
You claim Zen4 is a die shrink of Zen3, but that is not possible. Otherwise it would not support AVX-512.

I claimed Zen 4 is a Zen 3 die shrink with some modifications. It is. Example: Updated mem controller to support DDR 5.

A die shrink of an existing, well performing design is a good thing. Expecting a completely new taped out, tested, yielding design every cycle is unrealistic.

AVX512 could have been supported on Zen3 if AMD chose to dedicate x number of transistors for that purpose and make it work. I don't have details on whether or not Zen4 includes AVX512 support.
 
Last edited:

gdansk

Platinum Member
Feb 8, 2011
2,894
4,381
136
There are significant enough changes people find 'die shrink' an uncharitable characterization. That's the basis of it but AMD has extra transistors and are using them on new features and more cache. A simple die shrink sounds like they shrink the core and use the smaller size.
 
Last edited:

exquisitechar

Senior member
Apr 18, 2017
683
940
136
A die shrink of an existing, well performing design is a good thing. Expecting a completely new taped out, tested, yielding design every cycle is unrealistic.
That's exactly what is necessary and has/will be done for Zen 4. It's simply not a die shrink no matter how much one tries to stretch the meaning of the term, there are significant microarchitectural changes.

Not sure why Zen 4 is being underestimated so much after that leak. Sure, it's not as significant of a change from a core oriented perspective as some were hoping, but that doesn't mean it's a Zen 3 die shrink. It's simply more of an evolution like Zen 2, and Zen 5 is the big new core update as per Mike Clark's interview.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,109
136
Puts new light on this quote from Mike Clark: "and as we continue to go forward, getting more cores, and getting more cores in a sharing L3 environment, we’ll still try to manage that latency so that when there are lower thread counts in the system, you still getting good latency out of that L3. Then the L2 - if your L2 is bigger then you can cut back some on your L3 as well."

Good memory. More cores sharing L3$ sure sounds like a larger CCX (Zen5) - 12 cores is the general limit for a single ring bus, unless AMD is using double rings or mesh.
bigger load store

Load store ports?

at least DDR5-4800 and tho higher IF-speed (2400MHz), probably 5200/2600

Okay, didn't think about this. IF frequency is going to hamstring higher speed DDR5 modules. DDR5-6400 would require a 4/3 ratio if max IF is 2.4 GHz :confused:.
 
  • Like
Reactions: Tlh97

Ajay

Lifer
Jan 8, 2001
16,094
8,109
136
The pcie standard has to be supported by cpu, chipset, connectors, etc for pcie5 to speeds to work. Yes, you can mix and match. You're using SOC loosely. The CPU is SoC. The chipset is SoC. The videocard GPU is SoC.

The chipsets on modern boards, which accompany a line/family of CPU's, absolutely know what memory type and speed is in use as function of their normal operating parameters.

System On Chip has as it's core a processor of some sort, otherwise it's not a system. Chipsets are dumb by contrast. They are crossbar switches, lane splitters, encoders/decoders and buffers. All they know about the upstream is that they have X lanes of PCIe to the SoC (in AMD, the IOD). If a device requests a chunk of memory that has been allocated to it, it just gets forward up the line to - independent of the DRAM size, speed, channels, etc.

I claimed Zen 4 is a Zen 3 die shrink with some modifications. It is. Example: Updated mem controller to support DDR 5.

A die shrink of an existing, well performing design is a good thing. Expecting a completely new taped out, tested, yielding design every cycle is unrealistic.

AVX512 could have been supported on Zen3 if AMD chose to dedicate x number of transistors for that purpose and make it work. I don't have details on whether or not Zen4 includes AVX512 support.

No one is doing 'optical' shrinks to EUV. The RTL is different - cells need to be redesigned, go/no go regions for layout from the PDK are different, the power distribution loads and capacitance are different and heat load/unit area are different. At least the xtors are still FinFET. This is part of why Zen4 is taking a long time.
 

maddie

Diamond Member
Jul 18, 2010
4,881
4,951
136
System On Chip has as it's core a processor of some sort, otherwise it's not a system. Chipsets are dumb by contrast. They are crossbar switches, lane splitters, encoders/decoders and buffers. All they know about the upstream is that they have X lanes of PCIe to the SoC (in AMD, the IOD). If a device requests a chunk of memory that has been allocated to it, it just gets forward up the line to - independent of the DRAM size, speed, channels, etc.



No one is doing 'optical' shrinks to EUV. The RTL is different - cells need to be redesigned, go/no go regions for layout from the PDK are different, the power distribution loads and capacitance are different and heat load/unit area are different. At least the xtors are still FinFET. This is part of why Zen4 is taking a long time.
Some nitpicking for a slow Sunday here.

A lawyer might argue that except for the iGPU processors, AMD has no SoC designs. System on package (SoP :) ) would be more accurate. We know from old patents that they have been looking, for a long time, to dis-aggregate SoCs and make chiplet designs using optimally designed & fabricated chiplets.

Zen was allowing a high core count CPU to be assembled from lower core count CPU chips.
Zen 2 had the SOC split into cores and IO.
Zen 3D has the Core die itself split off most of the L3 into a separate chiplet.

So 3 separate sections here, each optimized for the desired speed, density, power & cost parameters.

I suppose the best way to predict how this develops is to assume than any sub-component large enough in area to support a unique chiplet would be the target of dis-aggregation, especially if you can use a different process or library to obtain an advantage pursuing your performance targets.
 

DrMrLordX

Lifer
Apr 27, 2000
22,028
11,609
136
Interestingly enough Centaur's CNS apparently does just that for its AVX-512 support.

Hmm, interesting. Looks like they've had to add more logic to their design than a simple 3x256b design though, in order to bring it up to the same AVX-512 compliance level as Cannonlake.

Wonder how they pulled it off otherwise? I was told you couldn't split AVX-512 like that due to bit alignment issues or something . . . ?

I claimed Zen 4 is a Zen 3 die shrink with some modifications. It is. Example: Updated mem controller to support DDR 5.

"some modifications" is going to mean a lot more work than just an optical shrink. Plus as @Ajay pointed out, moving to an EUV node forces a lot of changes elsewhere.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,109
136
A lawyer might argue that except for the iGPU processors, AMD has no SoC designs. System on package (SoP :) ) would be more accurate. We know from old patents that they have been looking, for a long time, to dis-aggregate SoCs and make chiplet designs using optimally designed & fabricated chiplets.
Okay fancy pants, you win :p
 

gdansk

Platinum Member
Feb 8, 2011
2,894
4,381
136
Lisa Su confirms:
96 Zen4 cores in Genoa for 2022 (sampling now)
128 Zen4C with different cache hierarchy in Bergamo for 2023

Will they offer chips with a mixture of Zen4 and Zen4C chiplets? And I wonder if Bergamo is using the same IOd.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,109
136
Genoa 2x density over Milan, 96 cores per socket, DDR5 (ofc) amd confirmed PCI-E 5.
They speak >1,25x speed up over MIlan, that is probably per core.
AT said:
The Zen 4c chiplet, according to AMD, is built on an HPC variant of TSMC N5. This aims at denser logic and denser cache, likely at the expense of high-end frequency. AMD says that this process offers 2x density, 2x power efficiency, and >1.25x silicon performance over the regular N7 it uses. When asked if this was a specific statement about core performance, AMD said that it wasn’t, and just a comment on the process node technologies. It is worth noting that 2x efficiency is quite a substantial claim based on metrics provided by TSMC on its N7 -> N5 disclosures.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,109
136
Most important update there

View attachment 52621


2x density, Zen4 will be a much much bigger core than Zen3. More than 50% higher MTr
And they have the new Bridge interconnect too. I am guessing this will go to Zen4 products too.


And the leaks are on the money lol
Hmm, looking at the latter (EFB), maybe the 16 core CCDs needed for the upcoming very high core count Epyc processors could just be packaged like this still using 8 core CCDs.