Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 90 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
Yes but peak throughput means very little without utilization. Intel went to 356 for the regfile and IPC stayed lower than Zen3, so...
If any, the fact AMD stayed conservative with Zen3 while achieving a significant IPC increase points to many "low hanging fruits" that can be improved in Zen4...
I mean... that's exactly what he's talking about?
 

naukkis

Senior member
Jun 5, 2002
701
569
136
IIRC it was Papermaster who mentioned about AMD not going with the approach of throwing silicon at the problem and being conservative with die increases for Zen3 to achieve the 19% IPC which is why I am doubting AMD throwing tens of mm2 of die area in cache for questionable lead across the board save for Enterprise loads.

Point was that increasing structure sizes will increase performance but with reduced perf/watt. Increasing cache sizes won't decrease but increase perf/watt.

Intel is at opposite end now, they also did't employ structures that decreased perf/watt previously but they seem to lost it now. They make everything as big as they could and end product is fast at ridiculous power levels, it's pretty much Netburst part two now.
 
  • Like
Reactions: Tlh97 and Joe NYC

AMDK11

Senior member
Jul 15, 2019
205
136
116
I didn't even say that, but look at what they changed and how the changed it . Did

peak L/S bandwidth increase , No
peak ALU throughput increase , No
peak FPU throughput increase , No
peak dispatch/issue/retire , No
Peak Decode through increase , No
Number of register file ports , No
Number of FP register file ports , No
Did the floor plan change , No
Did internal structures increase massively in size ( reg ' L/S queue dispatch queue etc) , No
Did the pipeline change , No


Get the point , Zen3 in bound by the Scope of Zen1 and Zen1 was designed with a big enough fundamental base through the core that large gen on gen improvements could happen .

Now compare that list with Willow to Golden or Tremont to Gracemont.

Im saying i think Zen4 wont be bound by the Scope of Zen3. In the same way that Zen wasn't bound by bulldozer yet lots of things came straight from it ( like the FPU still supporting XOP). So i expect fundamental changes in featch/decode , execution and retirement / memory access for Zen 4.

To go to your metaphor go stick a 560HP 6L engine from an enzo in a Golf GTi and see how well it works, there both cars, they both have 4 wheels.
The key part of the microarchitecture is the logic and the algorithms implemented in it that control the core resources. This very key part of the microarchitecture is a closely guarded secret by AMD and Intel. In Zen 3, the logic and algorithms controlling the core resources have been largely replaced by a completely new, more complex one. In Zen2 there was a new such predictor, and in Zen 3 there is a completely new predictor, among others.
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
Point was that increasing structure sizes will increase performance but with reduced perf/watt. Increasing cache sizes won't decrease but increase perf/watt.

Intel is at opposite end now, they also did't employ structures that decreased perf/watt previously but they seem to lost it now. They make everything as big as they could and end product is fast at ridiculous power levels, it's pretty much Netburst part two now.

Having the smaller, high efficiency cores present, Intel doesn't NEED to avoid structures that decrease the performance per watt. If they need to be efficient, they move threads to the Monts. If they need maximum performance, they throw the 5greads at the Coves, power be darned!
 
  • Like
Reactions: BorisTheBlade82

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
Not sure if there was any rumors/confirmation of which team is working on Zen 4, but given that Zen 3 was a ground-up rebuild and Zen 4 also likely to be heavy lift, I would imagine that Zen 3 and Zen 4 are designed by two different teams.
You would be correction with this assumption.
 

naukkis

Senior member
Jun 5, 2002
701
569
136
Having the smaller, high efficiency cores present, Intel doesn't NEED to avoid structures that decrease the performance per watt. If they need to be efficient, they move threads to the Monts. If they need maximum performance, they throw the 5greads at the Coves, power be darned!

But what about consequences? They produce huge die which needs expensive motherboards and power supplies to power it - for what, maybe 10% or less more performance than rival designs. That's pure insanity.
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
But what about consequences? They produce huge die which needs expensive motherboards and power supplies to power it - for what, maybe 10% or less more performance than rival designs. That's pure insanity.
It is insanity. But it's also the corner Intel itself backed into by having to match the frequencies it achieved on 14nm as well as trying to match the core count of the competition. We need to remember Alder Lake is just the first real reaction to the renewed competitive market environment, still heavily influenced by the years of BK and RS.
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
But what about consequences? They produce huge die which needs expensive motherboards and power supplies to power it - for what, maybe 10% or less more performance than rival designs. That's pure insanity.
What the above poster said, combined with their marketing need to win both single threaded benchmarks AND multi-threaded benchmarks, while having a process and core combination that doesn't allow them to do both of those things with just one core type. It's even harder for them now without the crutch of having an "exclusive" instruction set that enables a handful of big, easy wins to lean on for their composite scores.
 
  • Like
Reactions: Tlh97 and moinmoin

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
Some of that like core-on-core stacking almost seems like it would be infeasible due to issues with heat dissipation. I suppose it could work if you had a very clever layout and employed rotation of some sort so that the problem areas don't overlap or if the clock speeds were intentionally kept low enough where nothing gets too hot, but that does potentially preclude it from some products.
 

jpiniero

Lifer
Oct 1, 2010
14,509
5,159
136
Some of that like core-on-core stacking almost seems like it would be infeasible due to issues with heat dissipation. I suppose it could work if you had a very clever layout and employed rotation of some sort so that the problem areas don't overlap or if the clock speeds were intentionally kept low enough where nothing gets too hot, but that does potentially preclude it from some products.

You're already getting some of that with GAA.
 

zir_blazer

Golden Member
Jun 6, 2013
1,160
400
136
All of the biggliest details from the Gigabyte leak for Zen 4:

The last table, where they are talking about PCIe Lanes per Processor, mentions that Type 1 Model 44h has 2 USB 4 Ports built-in. Type 3 Model 70h too, which I didn't notice before. I don't recall that being mentioned elsewhere.

I wasn't expecting USB 4 support built-in at all. Intel dissapointed me because it will only be included for certain Mobile dies, since the Desktop ones seems to require an external controller. Seems that now both AMD and Intel are at feature parity.
 
Last edited:

Thibsie

Senior member
Apr 25, 2017
727
752
136
Some of that like core-on-core stacking almost seems like it would be infeasible due to issues with heat dissipation. I suppose it could work if you had a very clever layout and employed rotation of some sort so that the problem areas don't overlap or if the clock speeds were intentionally kept low enough where nothing gets too hot, but that does potentially preclude it from some products.

Stack big cores on top of more low power utilitarian cores ?
 

Joe NYC

Golden Member
Jun 26, 2021
1,893
2,191
106
Stack big cores on top of more low power utilitarian cores ?

I wonder if a more fruitful approach would rather be just a fast interconnect at an intersect of dies.

Suppose there are 2 rectangular dies, and instead of stacking them on top of each other, the dies instead form an "L".

This way, nearly all of the surface area is eligible for contact with heatsink.

Looking at a package such as Milan, the area of the MCM is perhaps > 3x the area of the sum of the areas of the dies. So there is a plenty of space Even in Ryzen desktop chips.

In your example, the low power and high power don't have to be on top of each other for better performance (unlike, say L3). They just benefit from a fast link. So the small dies could be the horizontal part of the "L" and big cores the vertical part of the "L" and both would have heatsink on top of most of their area.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
All of the biggliest details from the Gigabyte leak for Zen 4:

GenZ and CXL in one go, awesome. SIs will be mighty pleased. NVDIMM-P seems supported as was in the original roadmap a long time ago.
The next gen servers from Intel and AMD would be transformative

Very sad that info has to come this instead of being willingly shared by AMD (like for example Intel Architecture day or previous Horizon events)
 

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
Seems the Zen4 APUs will support DP2.0, as per the block diagram for AM5 IO, which is not currently supported by RDNA2.
It could be a new update of DCN for RDNA2 iGPUs for which the support was planned to be added to Linux in this patch

 

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
There are solution being studied for cooling 3D stacked logic, i.e. by using a fluid inside the silicon stack.

That sounds interesting but difficult in reality since you need to use something that isn't going to expand too much at the sizes that are being used in top of a whole lot of other issues you'd run into at that level.

If you've got bottled anything it wants to expand as it heats up and that exerts more pressure on the container which needs to be strong enough to withstand that but also small enough to fit in a chip between the layers without ballooning the size too much or causing damage to the other layers.
 

leoneazzurro

Senior member
Jul 26, 2016
905
1,430
136
That sounds interesting but difficult in reality since you need to use something that isn't going to expand too much at the sizes that are being used in top of a whole lot of other issues you'd run into at that level.

If you've got bottled anything it wants to expand as it heats up and that exerts more pressure on the container which needs to be strong enough to withstand that but also small enough to fit in a chip between the layers without ballooning the size too much or causing damage to the other layers.

Well this is why it is still at study and not already implemented (btw, if you keep the fluid flowing and without changing state there is no problem in the fluid itself, it is more a problem of the possible CTE mismatches between silicon and the materia used for containing the fluid).
 
  • Like
Reactions: Tlh97 and moinmoin