Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 173 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

biostud

Lifer
Feb 27, 2003
18,251
4,764
136
Huh. If that's true then that would explain why AMD has delayed Raphael for so long. They're not sharing CCDs with Genoa anymore. They're possibly sharing a CCD layout with Bergamo.
And also why Threadripper is moving to Threadripper Pro, if we can expect 32 cores on mainstream platform.
 

deasd

Senior member
Dec 31, 2013
520
761
136
Just wow, that being said consumer Zen4 = Zen4C? With V-cache? 16C per CCD? Now I'm sure Zen4 low-end won't come out for a long time.... Also it's interesting to see how windows scheduler to recognize these low TDP cores, or just something like SMT since Pcore and lowTDPcore are same Zen4 arch?

edit: alright it's WCCFtech... I'll take a grain of salt
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
Just wow, that being said consumer Zen4 = Zen4C? With V-cache? 16C per CCD? Now I'm sure Zen4 low-end won't come out for a long time.... Also it's interesting to see how windows scheduler to recognize these low TDP cores, or just something like SMT since Pcore and lowTDPcore are same Zen4 arch?

edit: alright it's WCCFtech... I'll take a grain of salt
Huh. If that's true then that would explain why AMD has delayed Raphael for so long. They're not sharing CCDs with Genoa anymore. They're possibly sharing a CCD layout with Bergamo.
It could be Genoa CCDs, Bergamo CCDs, or its own CCDs for Raphael. However, it is definitely feasible to have a 16-core die.

Windows would probably scheduler the same way as before. It is basically no different than half-die boosting from Hydra(6-core 10h) and Orochi(8-core 15h). As well as random variety of boost fluctuation between cores caused by CPPC2/latest P-state driver.

Rembrandt's 8c 2.7 GHz base for 15-28W.
~1.25x * 2.7 GHz = ~3.375 GHz

So, all 16-cores would have a guaranteed P1-state(2x<30W) of ~3.375 GHz when all active, while 8-cores at edges of the die would have extended duration P0-state.

When TDP is flatly scaled across all cores aka constrained TDP(ECO-mode or whatever):
If low-task(Desktop-Standard Power): Burst-work can be put on edge cores since they boost very high, and lengthy-work can be put on center cores since they can't boost as high. Lengthy-work is usually I/O-Mem intensive so boosting only burns heat.
If high-task(Server-Power Virus): All cores would work an average of the P1-state anyway.

I believe this was standard in the Windows Scheduler for AMD and Intel back when boosting was only applied to a few cores.

There is also the case of GMI3 links having increased effective transfer rate from the GMI2.
GMI1 = 10.6 GT/s
GMI2 = 25 GT/s
GMI3 = 32 GT/s to 64+ GT/s
Which does allow for another doubling of the core count, like Zen2/3 did.
 
Last edited:

itsmydamnation

Platinum Member
Feb 6, 2011
2,773
3,152
136
I see a potential problem with hotspots, If the chiplet is only 16 cores and no L3 between them.
why? hotspots are normally sub components within each core, eg FPU execution ports, how does having no L3 make any different , those 4 cores that are next to each other on a Zen 3 CCD would already be having problem if that was the case.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,361
2,848
106
why? hotspots are normally sub components within each core, eg FPU execution ports, how does having no L3 make any different , those 4 cores that are next to each other on a Zen 3 CCD would already be having problem if that was the case.
With Zen3 one core has max two cores as neighbors, in Zen4 you have up to 4. Then there is also the smaller process. More heat generated over the same area.
This is just my uneducated opinion, I could be wrong.

If the efficiency cores has a combined TDP of 30w, and they are placed central with the vcache on top, that should alleviate your concern.
Zen4 based Raphael shouldn't have two different cores, or did something change?
 

DrMrLordX

Lifer
Apr 27, 2000
21,637
10,855
136
Zen4 based Raphael shouldn't have two different cores, or did something change?

They're the same cores, but with different voltage/clockspeed/temperature curves. The block of 8 interior cores is not meant to have a TDP of higher than 30w. Following AMD's usual TDP rules, that would mean the max core power draw for those 8 cores would be ~41w.
 
  • Like
Reactions: Tlh97 and biostud

jpiniero

Lifer
Oct 1, 2010
14,605
5,225
136
Huh. If that's true then that would explain why AMD has delayed Raphael for so long. They're not sharing CCDs with Genoa anymore. They're possibly sharing a CCD layout with Bergamo.

Nah, it's using the Zen 4 die. They might release later in the lifecycle a product which includes a Zen 4c die however.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
Huh. If that's true then that would explain why AMD has delayed Raphael for so long. They're not sharing CCDs with Genoa anymore. They're possibly sharing a CCD layout with Bergamo.

I mean, after certain AMD statements, it would be rather funny if they went with the hybrid approach Intel is going with, not funny for Intel, however. 32C/64T on the high-end parts would make Raptor Lake DOA.

That approach also makes more sense than Intel's 'big core' 'small core' approach.

I will believe it when I see it, however. Last I heard, the tooling was not yet available for 5nm 3d v-cache.
 

DrMrLordX

Lifer
Apr 27, 2000
21,637
10,855
136
Nah, it's using the Zen 4 die. They might release later in the lifecycle a product which includes a Zen 4c die however.

Guess we'll find out in 4-5 months or so.

I mean, after certain AMD statements, it would be rather funny if they went with the hybrid approach Intel is going with, not funny for Intel, however. 32C/64T on the high-end parts would make Raptor Lake DOA.

That approach also makes more sense than Intel's 'big core' 'small core' approach.

I will believe it when I see it, however. Last I heard, the tooling was not yet available for 5nm 3d v-cache.

It's just a rumour at this point. We don't know that AMD is really going to release such a product in Q3 2022 (July/August). Though it's not really a "hybrid" die if all the cores are identical. It's just a peculiar power management scheme.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Huh. If that's true then that would explain why AMD has delayed Raphael for so long. They're not sharing CCDs with Genoa anymore. They're possibly sharing a CCD layout with Bergamo.
Zen4 will have two types of Chiplets the 2D High Performance Used in Genoa and some Desktop models(the ones aimed at Gaming where 8 cores are enough) and Bergamo/High Density 3D Chiplets

 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Zen4 will have two types of Chiplets the 2D High Performance Used in Genoa and some Desktop models(the ones aimed at Gaming where 8 cores are enough) and Bergamo/High Density 3D Chiplets

Why do you think 3D V-Cache would be paired with Zen 4c, much less exclusively?
 

dnavas

Senior member
Feb 25, 2017
355
190
116
And also why Threadripper is moving to Threadripper Pro, if we can expect 32 cores on mainstream platform.

HEDT needs more than just cores, there's the question of connectivity and memory bandwidth as well. Different usecases will require different memory bandwidth, but the move to DDR5 is only increasing bandwidth by ~50% until faster sticks come out, and Zen4 cores are getting faster (IPC & clockspeed). The result is that just to keep up with existing 5950X, you'd need a 3rd primary memory slot (trying to keep away from "channel" because DDR5 is complicating that terminology).
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Wait, but how does that match with 3d v-cache in particular?


Where are you getting that Zen 4c is twice the density?

3D V $/STVs would work as a Ring Bus to connect all 16 cores as a single Chiplet.


Zen4D was rumored by Moore's Law Is Dead to be a dense version of Zen4. Zen4D might just be a client version of Zen4c, 16 core Chiplet on the same die area of Zen4
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
3D V $/STVs would work as a Ring Bus to connect all 16 cores as a single Chiplet.
Moving the fabric to the cache chiplet would be well beyond the scope of what AMD's done thus far. I'm not seeing the connection you're trying to make here.

Zen4D was rumored by Moore's Law Is Dead to be a dense version of Zen4. Zen4D might just be a client version of Zen4c, 16 core Chiplet on the same die area of Zen4
Whatever you call it, a resynth of Zen 4 isn't going to halve the size.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
i dont believe that at all, because how does cache coherency work
L3 Control + L2 Shadow Tags would still be on-die.

5nm goals are 2x density, 2x power, and >1.25x perf via:
- Process(+Standard Cells) from 7nm to 5nm
- Architectural improvements Zen3 to Zen4
- Power design improvements Zen3 to Zen4
etc.

How Zen4 can achieve the shrink:
6T 2-fin on 7nm
to
6T 2-fin AND ~5T 1-fin or just ~5T on 5nm
6T 2-fin = 1.8x logic density
5T 1-fin = 3.2x logic density

Zen3 = ~3.25 mm2
Zen4 = ~1.625 mm2
With the biggest area increases coming from 1 MB L2 and 2 MB L2.
Zen3 L2 = 0.8 *2 & *4 = 1.6 for 1 MB L2 and 3.2 for 2 MB L2
5nm TSMC SRAM shrink *0.8 = 1.28 for 1 MB L2 and 2.56 for 2 MB L2

Zen3 Core+L2 = 3.25+0.8 = 4.05
Zen4 Core+L2 = 1.625+1.28 = 2.905 or 1.625+2.56 = 4.185
If someone wants to do L3 control and L2 shadow tags to compare, I hand off the rest to you.

2.46 mm2 is the area that the cores have to fit in. Which only half-sized core + 0.8x 512 KB L2 can technically fit in => 1.625+0.8*0.8; 2.265 mm2.
If it is using N7/N6 SRAM TSV placement. However, if Bergamo with N5 SRAM has different TSV placement then it can fit more.

or, If it is a different TSV-placed N7/N6 die:
sram.jpg
Moving the CCD signal interfaces to the edges rather being awkwardly in the middle gives ~5.125 mm2 allowing 2MB + Half-sized core at 4.185 mm2 to fit.

Core <-> L2 <-> L3 Ctl&L2St(both cores) <-> L3D Interface <-> L2 <-> Core && Reflection

Looking at linkedin:
7nm-X3D = 32B/c
5nm-X3D = 64B/c (2x32B)
3nm-X3D = 96B/c (2x48B/3x32B)
for a single X3D interface. So, there appears to changes each gen 7nm -> 5nm -> 3nm.
 
Last edited:
  • Like
Reactions: Tlh97 and Kryohi