Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 94 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
809
1,412
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

eek2121

Diamond Member
Aug 2, 2005
3,100
4,398
136
Moving from N7 (w/secret sauce!) to N5 will bring with it some performance improvement @ isopower. Not sure if that will be the only improvement, but it will count for something. I doubt it'll be +29% though.

I can guarantee it’ll be at least 50% faster.



It has 50% more cores! 🤣

In all seriousness, I think you are downplaying the dramatic part that DDR5 will play. 12 channels of DDR5 drastically increases the amount of memory bandwidth available. Previous generations can be bandwidth starved in certain scenarios.

AVX-512 is going to play a considerable part as well. I imagine the chip likely has new AVX units.

The TDP has also jumped, though there is the question of whether that is only due to the core count increase or a frequency increase as well.

29% sounds a bit conservative if you ask me.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,539
3,471
106
In all seriousness, I think you are downplaying the dramatic part that DDR5 will play. 12 channels of DDR5 drastically increases the amount of memory bandwidth available. Previous generations can be bandwidth starved in certain scenarios.

The problem is that, unless there was a major improvement in Infinity Fabric, each CCD is limited to 2 DDR4 channels worth of bandwidth down and 1 channel up.

So one CCD, or 1 core within it can never get at all the bandwidth - unless the interconnect is majorly upgraded.
 

eek2121

Diamond Member
Aug 2, 2005
3,100
4,398
136
The problem is that, unless there was a major improvement in Infinity Fabric, each CCD is limited to 2 DDR4 channels worth of bandwidth down and 1 channel up.

So one CCD, or 1 core within it can never get at all the bandwidth - unless the interconnect is majorly upgraded.

Given it is a brand new IO die, that basically guarantees there is a new version of IF.
 

eek2121

Diamond Member
Aug 2, 2005
3,100
4,398
136
Has it been confirmed they're going to TSMC for the IO die anywhere? It's just as likely they're using the updated 12nm Global Foundries node that has better power characteristics. There was certainly room in the old IO for a more compact design. Maybe not quite as much if they're adding more overall IO, but if it weren't otherwise changed I'd make a strong bet on it not being TSMC given the size.

Either way the lack of size reduction isn't surprising. I don't know how many posts I made trying to tell people that IO doesn't benefit from node shrinks, but I think that there's no real denying it at this point unless you don't believe the leaks.

I don’t believe so, but it is definitely on a smaller process.
 

Mopetar

Diamond Member
Jan 31, 2011
8,114
6,770
136
I don’t believe so, but it is definitely on a smaller process.

Still could be Global Foundries 12LP+ which had a 15% area reduction over the old 12LP node that AMD used previously.

The only other good explanations I've heard for AMD commuting to buy as many wafers as they did is a new Athlon line on that node or the possibility of them using it for massive amounts of HBM2. The first seems more likely, but would constitute such a massive number of chips that I find it hard to believe that would be the only use.
 

coercitiv

Diamond Member
Jan 24, 2014
6,678
14,278
136
The '170' is suspicious. It is not listed as 170W - so we don’t know the actual power output, could be unobtainium units :p
Not suspicious at all, it's Watts. Even with the typo the data is clear, since we have all the variables required for TDP calculation right there in the table:

ADM TDP forumla is as follows:
TDP (Watts) = (tCase°C - tAmbient°C)/(HSF θca)
where HSF θca (°C/W) is defined as the minimum °C per Watt rating of the heatsink to achieve rated performance

and the numbers fit
169.56=(46.7-35)/0.069

The problem here is tCase is very low, and one doesn't chose a much lower die/heatspreader junction temperature unless it is actually needed. I reckon 170W TDP SKU(s) will have very agressive boosting, probably the highest in the entire lineup.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,109
136
Not suspicious at all, it's Watts. Even with the typo the data is clear, since we have all the variables required for TDP calculation right there in the table:

ADM TDP forumla is as follows:
TDP (Watts) = (tCase°C - tAmbient°C)/(HSF θca)
where HSF θca (°C/W) is defined as the minimum °C per Watt rating of the heatsink to achieve rated performance

and the numbers fit
169.56=(46.7-35)/0.069

The problem here is tCase is very low, and one doesn't chose a much lower die/heatspreader junction temperature unless it is actually needed. I reckon 170W TDP SKU(s) will have very agressive boosting, probably the highest in the entire lineup.
Uhm, the emoji :p was, I thought, a clear indication that my post was sarcasm.
 

Magic Carpet

Diamond Member
Oct 2, 2011
3,477
233
106
The problem here is tCase is very low
I wonder how easy it’s going to keep it cool. I remember, Thuban had also a low tCase (62) relatively speaking, but it ran quite cool even with stock cooling at stock clocks. But 46 is even lower than that, the die size must be big enough to remove heat effectively.
 

Magic Carpet

Diamond Member
Oct 2, 2011
3,477
233
106
ADM TDP forumla is as follows:
TDP (Watts) = (tCase°C - tAmbient°C)/(HSF θca)
where HSF θca (°C/W) is defined as the minimum °C per Watt rating of the heatsink to achieve rated performance

and the numbers fit
169.56=(46.7-35)/0.069

The problem here is tCase is very low, and one doesn't chose a much lower die/heatspreader junction temperature unless it is actually needed. I reckon 170W TDP SKU(s) will have very agressive boosting, probably the highest in the entire lineup.
I don't understand why the ambient temp is rated at only 35 degrees, though. 125W Thuban TDP was done in a similar way, but the ambient was 44 degrees instead (more realistic). A way to lower potential TDP (power goes up as the temp increases). Liquid cooling suggested for a reason, imo.

1629217922278.png
 
Last edited:

Abwx

Lifer
Apr 2, 2011
11,557
4,349
136
Moving from N7 (w/secret sauce!) to N5 will bring with it some performance improvement @ isopower. Not sure if that will be the only improvement, but it will count for something. I doubt it'll be +29% though.

If those leaks are accurate area wise and given TSMC s density at 5nm vs 7nm then Zen 4 has 60% more transistors than Zen 3, wether this is due to increased caches or anything else i dont think that this amount is here just for the fun.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
If those leaks are accurate area wise and given TSMC s density at 5nm vs 7nm then Zen 4 has 60% more transistors than Zen 3, wether this is due to increased caches or anything else i dont think that this amount is here just for the fun.
They won't get that much density gain. Especially not on a cache heavy design like Zen.
Apple managed 1.49x with relative far more logic for the die area than Zen.
For AMD they will get around 1.35x-1.4x density gain at best. Which would translate to around 30% more MTr.

But 30% MTr gain would be huge for Zen4 if it materializes, especially if cache remain more or less same.
Zen Bottlenecks are listed here, not sure about the methodology but there is some merit to the analysis here

Regarding the area, it is directly from the design guide from AMD, so it is pretty much a given.
Have to say ExecuFix got some deep moles.
Pretty interesting to get the tidbits from such leaker, but professionally as someone responsible for providing design info to our suppliers, I dread it very much.
 
Last edited:

yuri69

Senior member
Jul 16, 2013
541
976
136
The Zen 4 belonging to Family 19h has been known for quite a long time. The Family changes seem to be related to cache topology - K8 to K10, Bobcat to Jaguar, Zen 2 to Zen 3.
 

Abwx

Lifer
Apr 2, 2011
11,557
4,349
136
They won't get that much density gain. Especially not on a cache heavy design like Zen.
Apple managed 1.49x with relative far more logic for the die area than Zen.
For AMD they will get around 1.35x-1.4x density gain at best. Which would translate to around 30% more MTr.

TSMC claim 45% die reduction, they are talking of a whole SoC not of a specific circuit like a memory cell.

Apple chip is not comparable since it include the IMC and PCH.


Edit : Assuming a square root scaling of transistors/perf this point to 27% better MT perf at same frequency.
 
Last edited:

AMDK11

Senior member
Jul 15, 2019
426
338
136
None of that should be a surprise really. AMD appears to apply a tick tock cadence of its own to the Zen family:
  • Zen 1: new core
  • Zen 2: same core on new node with increased FPU capability
  • Zen 3: new core on same node
  • Zen 4 by all indications so far: same core on new node with increased FPU capability
  • Zen 5: new core on same node?
I dare to say that each of the Zen generation, i.e. Zen, Zen2, Zen3 and Zen4, are new x86 cores. The fact that Zen3 is almost a completely new design from scratch does not mean that Zen2 and Zen are the same apart from the FPU block. There have been changes between Zen and Zen2 not only in FPU but also in Front-End, Back-end, executive units and Load-Store pdsystem. Also Zen, Zen2, Zen3 and the future Zen4 are the new x86 cores.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
TSMC claim 45% die reduction, they are talking of a whole SoC not of a specific circuit like a memory cell.
This is what they said
1629228491920.png

At IEDM, Geoffrey Yeap gave a little more color to that density by reporting that for a typical mobile SoC which consists of 60% logic, 30% SRAM, and 10% analog/IO, their 5 nm technology scaling was projected to reduce chip size by 35%-40%.
But Apple managed 1.49x scaling or 33% die size reduction. IF you look at the numbers closely and compare an Apple SoC and Zen3 die, Zen3 die has huge percentage of cache which then is even more biased towards the 1.35x scaling (in ideal conditions)
But it matters less if cache remains more or less same, with 9% MTr gain from Zen2 to Zen3 they got 19% IPC, now consider 30% MTr gain.
I am also hoping what @uzzi38 is also saying, that L1/L3 would remain largely same.
Reading papers around, the bottlenecks are in many places, like the Retire buffer, OOO window etc
Some of which can be solved by throwing more regfile silicon
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
AVX512 support will burn a ton of area. 4x area required for FP register file alone and execution units need to get widened to 512bits as well. There is also a question of widening load/store datapaths, need to handle at least 1 of each @512bits to have decent performance?

I think ZEN4 is gonna be like ZEN1->2 transition, Zen3 made more capable in FP department and a lot of those increased resources are going to benefit IPC all around, but i don't expect more execution ports or widened core.