Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 146 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

Thibsie

Senior member
Apr 25, 2017
751
807
136
Was watching Gabe Loh's talk:

It's mostly just a recap about the decisions going chiplets (nothing new really), but what stood out to me regarding our current discussion was this particular slide:
View attachment 55776

Same numbers as used now for 5nm! And it's "or" and seems to me an internal target to achieve before picking up a new node in HVM.

I guess the middle road is possible too?
 

mv2devnull

Golden Member
Apr 13, 2010
1,498
144
106
It would certainly make sense in cooler climates to use that energy for heating in the winter, if you have a way to get it there (i.e. steam tunnels or the like) If you could site a datacenter near something that needs huge quantities of hot water (imagine the world's biggest laundromat) that would be perfect.
EuroHPC system "LUMI" in Kajaani, Finland (AMD Milan/Trento and MI250X GPUs) is supposed to heat 20% of the town. https://www.lumi-supercomputer.eu/t...sc-and-loiste-lampo-have-signed-an-agreement/
(Population is under 40k in that area, so not so many houses to heat up.) Yearly average temperature in that region is +4 Celsius (39F ?).
 

MadRat

Lifer
Oct 14, 1999
11,910
239
106
Would they use non-binary groups of memory channels for some alterior reason? When addressing memory channel calls, perhaps channels 0-3 are mapped out internally while channels 4-15 are the actual memory channels?
 
Last edited:

Abwx

Lifer
Apr 2, 2011
10,971
3,526
136
Was watching Gabe Loh's talk:

It's mostly just a recap about the decisions going chiplets (nothing new really), but what stood out to me regarding our current discussion was this particular slide:
View attachment 55776

Same numbers as used now for 5nm! And it's "or" and seems to me an internal target to achieve before picking up a new node in HVM.

Actually this even provide (grossly) the power vs frequency curve of the process.

If 1/2 the power is at isofrequency hence 1x the power is at 1.25x the frequency.

Power vs frequency curve is of the form :

P(f) = f^(ln2/ln1.25) = f^3.11

That is , power increase as a cube of frequency, at least in the curve segment that is considered by TSMC, wich is surely at frequencies of interest, so quite high.

At lower frequencies the process Power vs frequency will be close, as usual, to a square law.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
I guess that is about all we are gonna get. Thanks for the link mate

But this is the main takeaway.


AMD's N7-->N5 transition is not the same as advertised by TSMC. They are coming from a much worse efficiency and density standpoint compared to vanilla N7 (of course highly optimized for perf)
AMD's N7 as used by them is not the same as TSMC's stock N7 offered to others. Neither is the N5 as they use it.

This is reason I have interpreted Lisa's sentences exactly the way she said. 2x density, 2x efficiency and 1.25x perf.
However the 2x efficiency gain will get nullified by almost 2x more active silicon per CCD, so it will be a wash. But as said, we shall find out.
They shared some bits and pieces for N7 optimizations during Zen2/3 launched and I guess this time they will too for Zen4.
I believe that most folks here (including me) were implying that.

AMD likely made several changes to N7 in order to hit high clock speeds. Some of these changes are likely not necessary with N5 due to lower power consumption, and as engineers have had more time to work with TSMC, they likely have iterated on others.

For this reason, I would be surprised and disappointed if we do not see significant performance uplift with Zen 4.
 

moinmoin

Diamond Member
Jun 1, 2017
4,956
7,676
136
I guess the middle road is possible too?
Referring to the "or" statement you mean? I think both still apply, just not at the same time but at different points of the frequency curve: At the most efficient point it's the same performance at half the power consumption, and at the highest possible frequency it's +25% the performance (both assuming same core/IPC spec). Or something like that.
 
  • Love
  • Like
Reactions: Joe NYC and Thibsie

Doug S

Platinum Member
Feb 8, 2020
2,269
3,523
136
If you mean offshore wind then yeah I suppose - but you still need to connect to comms, so infrastructure is lacking regardless, unless it's all done by microwave point to point at ground level, or ground to satellite.

There's also tidal based and high altitude wind power too.


Uh...you already have an armored undersea cable running to shore to get the power off the turbine, and I would guess there's approximately 0% chance the cabling doesn't already include a few fibers already for monitoring and in case someone wants to pay them to install a 5G antenna halfway up the mast. Networking is the easiest part of this scheme, because its already there. At most they'd need to install a DWDM module at each end to upgrade bandwidth to meet datacenter level requirements.

The networking is definitely much easier than installing, servicing, or upgrading the datacenter, or even making changes to the power delivery to allow for power feeding from shore to the windmill when it is calm - unless you only want your datacenter to operate the wind blows!
 

Mopetar

Diamond Member
Jan 31, 2011
7,850
6,015
136
It would appear to be. I don't know if Zen 4 chiplet die size was ever comfirmed, but it's probably around the same size as that of Zen 3.

Of course it can be misleading since the Zen chiplets can be paired and they will need to attach to an IO die regardless of whether it's a one or two chiplet CPU.
 

Saylick

Diamond Member
Sep 10, 2012
3,172
6,414
136
Size comparison per SoC/CCD is not meaningful. Do it per core.
SoC has way too much things inside besides the cores.
Unfortunately, I don't think anyone here knows the exact size of Zen 4 cores...

Given the following:
1) The L3 cache is the same size (32 MB)
2) Cache and IO doesn't shrink as well as logic
3) The Zen 4 CCD is smaller than the Zen 3 CCD
Then, there's a likely chance that Zen 4 cores take up less area than Zen 3 cores.
 

Ajay

Lifer
Jan 8, 2001
15,469
7,875
136
Difficult to say much about those without running some simulations for these cutting edge process/SoCs. I bet nobody is going to share such things for obvious reasons.
Historically you'd be calculating power as P ∝ C*V2*f. But f also is a nonlinear function of V. Reducing V sounds great except you have to raise I to drive higher switching frequency leading to higher I2R losses.
So efficiency at device level is not going to show the complete picture.
Optimization of metal layers and PD network is more and more critical. Accurate parasitic extraction would be hell for these processes.
At SoC level things like interconnect efficiency plays big role in compute efficiency due to energy used during data movement

Great response. This addresses my essential point (poorly made), that we don’t distinguish from device physics and semiconductor component performance like used to. Some of this is because it appears that the fabs themselves are publicly defining process metrics based on standard test sleds, which are ARM SoCs these days. The increasing dominance of I2R power loses (high current, increased resistance) are really pumping the brakes on realized density as the metal channel cross sections get smaller and smaller. Expertise in tools for wave guide design and simulation much be great paying jobs in cutting edge semi right now!

We haven't seen much in the way of 'deep dives' into xtor device physics since Intel 22nm FinFET; which set the direction for xtor design for a decade. I hope we get more info as GAA designs come out - Intel is more likely to make public (at least to sites like semiengineering.com) disclosers on some of the physics behind their new xtors. That'll be a bit of wait for Intel's 20A node.
 

Ajay

Lifer
Jan 8, 2001
15,469
7,875
136
is this size comparison accurate?

View attachment 55790
as @DisEnchantment pointed out, this is an apples to oranges comparison.
In Alder Lake, you have an SoC with integrated graphics on board, compute cores and uncore (I/O, memory interface, etc.).
In Zen, you have chiplets (CCDs) containing the compute cores and almost all of the uncore on a separate chip (IOD) and no integrated graphics with Zen3.
 

Hitman928

Diamond Member
Apr 15, 2012
5,324
8,019
136
Great response. This addresses my essential point (poorly made), that we don’t distinguish from device physics and semiconductor component performance like used to. Some of this is because it appears that the fabs themselves are publicly defining process metrics based on standard test sleds, which are ARM SoCs these days. The increasing dominance of I2R power loses (high current, increased resistance) are really pumping the brakes on realized density as the metal channel cross sections get smaller and smaller. Expertise in tools for wave guide design and simulation much be great paying jobs in cutting edge semi right now!

We haven't seen much in the way of 'deep dives' into xtor device physics since Intel 22nm FinFET; which set the direction for xtor design for a decade. I hope we get more info as GAA designs come out - Intel is more likely to make public (at least to sites like semiengineering.com) disclosers on some of the physics behind their new xtors. That'll be a bit of wait for Intel's 20A node.

The interconnects for these digital circuits is so short that you don't consider them wave guides. Maybe when you get to the periphery with longer connection lines and having to go off chip you start to consider this but the IO frequencies are much lower than inside the cores themselves. Increased interconnect resistance and coupling capacitance of high density circuits is certainly a concern though and is a big part of the R&D that goes into each node.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Size comparison per SoC/CCD is not meaningful. Do it per core.
SoC has way too much things inside besides the cores.


AMD Zen3 Core on Cezanne SOC size: 6.4 mm2 with L3$ and 4.2 mm2 with L2$ only
1641926515447.png


1641926473679.png





Intel Golden Cove core on Alder Lake SOC size: 7.04 mm2 with L2$ and 9.4 with L3$, the Ring bus is Huge compared to Zen 3


1641926543026.png
 

Ajay

Lifer
Jan 8, 2001
15,469
7,875
136
The interconnects for these digital circuits is so short that you don't consider them wave guides. Maybe when you get to the periphery with longer connection lines and having to go off chip you start to consider this but the IO frequencies are much lower than inside the cores themselves. Increased interconnect resistance and coupling capacitance of high density circuits is certainly a concern though and is a big part of the R&D that goes into each node.
Okay, my bad. Interesting that this is the case at such high frequencies. Yes, off chip high-speed I/O is being designed with the help of E/M modeling and simulation software. Can remember who is using what, but Ansys HFSS is an example of one such tools.
 

Hitman928

Diamond Member
Apr 15, 2012
5,324
8,019
136
Okay, my bad. Interesting that this is the case at such high frequencies. Yes, off chip high-speed I/O is being designed with the help of E/M modeling and simulation software. Can remember who is using what, but Ansys HFSS is an example of one such tools.

HFSS is a full 3D EM simulator. You'll need this for modeling off chip connections. For strictly on-chip or planar board level simulations, you can use what is referred to as a 2.5D simulator.

There are several companies that offer 2.5D and/or full 3D simulators, e.g. Cadence, Siemens (Mentor), Microwave Office (bought by Cadence), ADS (bought by Keysight), Sonnet, all have EM simulators with varying degrees of integration with other tools/packages.
 

Hitman928

Diamond Member
Apr 15, 2012
5,324
8,019
136
Okay, my bad. Interesting that this is the case at such high frequencies. Yes, off chip high-speed I/O is being designed with the help of E/M modeling and simulation software. Can remember who is using what, but Ansys HFSS is an example of one such tools.

Also, on the frequency comment, you have to consider how short these interconnects are compared to the wavelength of the signal. Even at 5 GHz, in silicon the wavelength is going to be ~18 mm. Now, to get a clean digital signal you need multiple odd harmonics to form the square wave, though at those frequencies, I'm sure they aren't entirely square but I don't know what their tolerances are. Anyway, even if we assume they need to capture the 9th harmonic, the wave length is still ~2 mm. This is going to be orders of magnitude longer than the interconnect length so there's really no need to try and model waveguides into the interconnects and thank goodness we don't have to.
 

Hitman928

Diamond Member
Apr 15, 2012
5,324
8,019
136
Last edited:

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
is this size comparison accurate?

View attachment 55790
Your numbers are Off..

Zen3 83.736 mm2 is for the 8 core Cluster with full fat L3$(CCD die), but it's only 51.32 mm2 with half L3$ for Cezanne(SOC) and Alder Lake Compute cluster(P Cores) is only 84.23 mm2 with full L3$.. and according to the latest info we have(die shot) the Full Fat Zen4 with full L3$ is 72 mm2
 
Last edited: