Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 111 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
795
1,334
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

tamz_msc

Diamond Member
Jan 5, 2017
3,592
3,487
136
Eh, gotta save that for later. Same with not announcing IPC or other performance-related metrics about the core itself. Can't cannibalize Milan X sales.
Still some info would have been nice. Workloads that see benefit from AVX-512 are more bound by system memory BW rather than cache, so Milan-X and Genoa/Bergamo are for different target applications.
 
  • Like
Reactions: Mopetar

BorisTheBlade82

Senior member
May 1, 2020
638
956
106
12 dies for 96 cores?
Yes, for sure. 8c per CCD have been pretty sure for a long time.
But regarding Bergamo: 8 CCD x 16c (2 CCX)?

Most important update there

View attachment 52620

View attachment 52621

And they have the new Bridge interconnect too. I am guessing this will go to Zen4 products too.
I fear this is still only for GPUs. Also there is no way how the Milan 12CCD unit could be connected via silicon bridges - geometrically speaking 😉
Or were talking about Desktop/Mobile SKU?
 
  • Like
Reactions: Tlh97 and Joe NYC

leoneazzurro

Senior member
Jul 26, 2016
893
1,414
136
@Ajay

Yep, but to be fair, I wrote that before the AT article was out, just during the conference.
Btw, it must be said that these are nowhere the claims TSMC itself provides about the jump between N7 and N5 (-40% power at ISO performance power or +15%performance at ISO power IIRC, and 1,8x density instead of 2x). Sooo... yes, Interesting times. If Zen4 manages to beat Golden Cove when still being 4wide it would be really funny...
 

leoneazzurro

Senior member
Jul 26, 2016
893
1,414
136
I'm wondering what you lose with Bergamo versus Genoa. If it's just fmax that's livable in a lot of cases given that all core turbo is probably not that high anyway.

Cache per core, and probably it will have stripped down FP units and no AVX512 and less top frequency (these are targetted to cloud, where these are much less important than aggregate integer performance).
 

Kedas

Senior member
Dec 6, 2018
355
339
136
I'm wondering what you lose with Bergamo versus Genoa. If it's just fmax that's livable in a lot of cases given that all core turbo is probably not that high anyway.
The c is for cache I assume, maybe no L3 on the core die and only stacked on top of L2
or maybe they are pulling some sort of 2 cores in one trick again for 'high thread density' almost like SMT-4 but really like 2* SMT-2 per 'Core'
 

leoneazzurro

Senior member
Jul 26, 2016
893
1,414
136
Once again, same ISA support.

Well, Golden Cove and Gracemont nominally have the same ISA too, except for the AVX512 part...
Where is it stated that Bergamo will support AVX512? Honest question, I just returned home and I'm reading the AT piece, and there is only a slide with "Zen4 ISA" which could mean about everything. Moreover, AVX512 is a quite area/power hungry feature, and as said for cloud tasks IIRC it is mostly useless.
 
  • Like
Reactions: BorisTheBlade82

Abwx

Lifer
Apr 2, 2011
10,589
3,058
136
@Ajay

Yep, but to be fair, I wrote that before the AT article was out, just during the conference.
Btw, it must be said that these are nowhere the claims TSMC itself provides about the jump between N7 and N5 (-40% power at ISO performance power or +15%performance at ISO power IIRC, and 1,8x density instead of 2x). Sooo... yes, Interesting times. If Zen4 manages to beat Golden Cove when still being 4wide it would be really funny...

Process used is N5P , so a little better than N5.

 

DisEnchantment

Golden Member
Mar 3, 2017
1,567
5,553
136
Neither confirmation of GEN-Z
Gen-Z is in the PPR, so pretty much confirmed.

She did name-drop CXL though. RIP CCIX
CCIX lost when AMD joined CXL and CXL and Gen-Z agreed on MoU

I fear this is still only for GPUs. Also there is no way how the Milan 12CCD unit could be connected via silicon bridges - geometrically speaking
EFB is inside a fanout package.
Some folks are saying the EFB bridges are derivatives of ASE/SPIL FOEB bridges. They look nothing like TSMC LSI bridges (which is polymer and RDL interconnect layers with the bridge), which brings more context when AMD said they invested heavily in bringing up the packaging supply chain (from last ER Q&A).
That would greatly help on cost and capacity.

My guess here
If they were to use the EFB for Genoa, what they would do within the 14 layers of routing in the substrate (Milan) can be moved to the EFB/Fan out package.
So for 12 CCDs they connect the same way as if they connect to the substrate in Milan just that the bumps are making contact with the Fan out package instead.
Cu Pillars route directly to substrate and the EFB could route the chiplet to chiplet connections.

This should be much cheaper than polymer based RDL interposer with LSI, because the fan out package can make use of really cheap filler material.
The EFB is scalable as AMD said. AMD keep their own OSAT supply chain, and most importantly maintain chip last packaging strategy and working with KGDs only
They could fab the EFB with some older lithography process.
The nice thing here is that the EFB package can be manufactured en masse using wafer level Fan out, chip first. (but packaged witht the CCDs chip last )
Another important thing also is that the Fan out is necessary for high thermal load scenarios, it provides a middle layer to manage different CTE and prevent warping, which is what the pressure points on SP3 carrier does.
 

turtile

Senior member
Aug 19, 2014
609
293
136
@Ajay

Yep, but to be fair, I wrote that before the AT article was out, just during the conference.
Btw, it must be said that these are nowhere the claims TSMC itself provides about the jump between N7 and N5 (-40% power at ISO performance power or +15%performance at ISO power IIRC, and 1,8x density instead of 2x). Sooo... yes, Interesting times. If Zen4 manages to beat Golden Cove when still being 4wide it would be really funny...

There were rumors floating around a while ago that AMD was working with TSMC for a special process on 5nm. If these numbers are true, it looks like this rumor is true.
 

leoneazzurro

Senior member
Jul 26, 2016
893
1,414
136
There's actually several other instructions that Gracemont doesn't support. Don't think you can interpret this as anything other than Bergamo having AVX-512.

And these instructions are quite probably disabled and reserved for Sapphire Rapids as for distributing properly the workload the core capability must be the same:


And in fact, for activating the AVX512 support with Alder Lake, the E-cores had to be disabled in the AT test.
Look, I am not saying that for sure Bergamo will not have AVX512, but it is very unlikely that a dense design which is cloud optimized, uses such a area/power Hungry feature which is basically unused in the target workload those CPUs should be optimized for.
 
  • Like
Reactions: BorisTheBlade82

yuri69

Senior member
Jul 16, 2013
330
473
136
I'm wondering what you lose with Bergamo versus Genoa. If it's just fmax that's livable in a lot of cases given that all core turbo is probably not that high anyway.
The Zen 4c slide mentioned "density-optimized cache hierarchy" and "significantly improved power efficiency" separately. This suggests the power efficiency being a big deal and separated from the cache.

What kind of tricks you can do to improve power efficiency for a cloud oriented CPU? Adding moar cores - 2xCCX. Lowering clocks to fit the efficiency curve? Axing FP resources as much as possible since cloud workloads are INTish? Aggressively optimizing DVFS based on workload profile?
 

moinmoin

Diamond Member
Jun 1, 2017
4,720
7,237
136
Neither confirmation of GEN-Z :(
I expect CCIX and Gen-Z to be supported, but CXL supersedes both so new projects making use of the former is not recommended.

Well, Golden Cove and Gracemont nominally have the same ISA too, except for the AVX512 part...
Where is it stated that Bergamo will support AVX512? Honest question, I just returned home and I'm reading the AT piece, and there is only a slide with "Zen4 ISA" which could mean about everything. Moreover, AVX512 is a quite area/power hungry feature, and as said for cloud tasks IIRC it is mostly useless.
1636403012621.png
"Same "Zen 4" ISA."

Bergamo is no hybrid package so there is no need for some "Zen 4c lacks blah so Zen 4 had to have some parts disabled for ISA parity" shenanigans like Intel pulled with Alder Lake.
 

Abwx

Lifer
Apr 2, 2011
10,589
3,058
136
Some infos :

Microsoft has issued documentation for the Milan-X HBv3 VMs with the following performance projections and VM size details and technical overview:

  • Up to 80% higher performance for CFD workloads
  • Up to 60% higher performance for EDA RTL simulation workloads
  • Up to 50% higher performance for explicit finite element analysis workloads
  • Up to 120 AMD EPYC 7V73X CPU cores (EPYC with 3D V-cache, “Milan-X”)
  • Up to 96 MB L3 cache per core (3x larger than standard Milan CPUs, and 6x larger than “Rome” CPUs)
  • 350 GB/s DRAM bandwidth (STREAM TRIAD), up to 1.8x amplification (~630 GB/s effective bandwidth)
  • 448 GB RAM
  • 200 Gbps HDR InfiniBand (SRIOV), Mellanox ConnectX-6 NIC with Adaptive Routing
  • 2 x 900 GB NVMe SSD (3.5 GB/s (reads) and 1.5 GB/s (writes) per SSD, large block IO)

 

DisEnchantment

Golden Member
Mar 3, 2017
1,567
5,553
136
Sooo... yes, Interesting times. If Zen4 manages to beat Golden Cove when still being 4wide it would be really funny...
4 wide front end (But uop cache helps a lot here)
Zen3 OOO BE is quite wide. 10 Wide for int and 6 wide fp
From Chips&Cheese tests bottlenecks are ROB, LS and Rename Reg file before hitting front end limits.
Not challenging to address if they have the transistor budget

Btw, it must be said that these are nowhere the claims TSMC itself provides about the jump between N7 and N5 (-40% power at ISO performance power or +15%performance at ISO power IIRC, and 1,8x density instead of 2x).
TSMC's figures are correct for N7-->N5 transition and I suppose AMD are not lying either.
AMD landed up with these numbers because they qualified the statement, "over the 7nm process we're using in today's products."
So it comes down to trading too much density and efficiency in N7 to meet the frequency target.
Now since their target (my guess is that AMD will target Zen4 around 5GHz or even less) is very much within optimal range on the Shmoo plot, suddenly they don't have to trade so much of density and efficiency for frequency.

I think a single characteristic of higher device performance enabled to scale back on all other tradeoffs. Things look saner now, Zen4 will be stretch its legs.
Looking at that process node slide I am convinced they will beat both Zen2 and Zen3 gains.
The MTr gain is likely to be spent wisely given AMD's and Papermaster's or Clark's general sentiment.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,130
14,149
136
Some infos :

Microsoft has issued documentation for the Milan-X HBv3 VMs with the following performance projections and VM size details and technical overview:

  • Up to 80% higher performance for CFD workloads
  • Up to 60% higher performance for EDA RTL simulation workloads
  • Up to 50% higher performance for explicit finite element analysis workloads
  • Up to 120 AMD EPYC 7V73X CPU cores (EPYC with 3D V-cache, “Milan-X”)
  • Up to 96 MB L3 cache per core (3x larger than standard Milan CPUs, and 6x larger than “Rome” CPUs)
  • 350 GB/s DRAM bandwidth (STREAM TRIAD), up to 1.8x amplification (~630 GB/s effective bandwidth)
  • 448 GB RAM
  • 200 Gbps HDR InfiniBand (SRIOV), Mellanox ConnectX-6 NIC with Adaptive Routing
  • 2 x 900 GB NVMe SSD (3.5 GB/s (reads) and 1.5 GB/s (writes) per SSD, large block IO)

OMG... I would say that Intel has a real problem in the server area. No wonder Facebook (meta) contracted for a bunch of these.