Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 115 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
820
1,456
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

Joe NYC

Diamond Member
Jun 26, 2021
3,375
4,948
136
To combat the SPR CPUs with HBM probably Genoa will very likely have multiple stacks of Vcache.
4 stacks would be enough to match the on package HBM in cache mode.
Power, BW and latency (roughly around 18-20 ns using Zen3 estimations and lower frequency) should be way better for V Cache though
But since N5 on N5 only is seen on TSMC stacking roadmap, not sure how they will leverage the older N7 node.

It could very well be that Genoa X will just lag behind Genoa intro by 1-2 quarters and is aligned with TSMC schedule of when N5 becomes eligible for stacking as the bottom die.

Sapphire Rapids will likely be released before Genoa. And if Intel includes SPR HBM in the intro, even though it will lag regular SPR by several quarters, AMD can do the same, and release Genoa X benchmarks in the intro of Genoa.

I guess this is the time to add SLC on top of IOD. N7 on N7
The IOD is a real mystery.

Eligibility for stacking must have played a big role in AMD going with TSMC N6 (or N7) for IOD.

Possibilities are wide open how AMD could use it.


But I am rather more excited about the Zen4c cores, this should go in a laptop.
Putting high power Zen4 cores is not good if you are already constrained in frequency and thermals.
Might as well target 3.8 GHz max to begin with and get all that efficiency and density instead of targetting 4.4GHz but sustaining it only for 10 seconds for example.
I will see if Rembrandt is efficient enough otherwise I wait for Zen4 laptop

Yeah, I agree with that.
 
Last edited:

Bigos

Member
Jun 2, 2019
199
515
136
3.8GHz Zen 4 would still be somewhere in the region of ~1400pts on GB5 I believe (give or take 100pts), which you'll probably note is above the performance of even the Cortex-X2 (~1200pts), forget the X1.

I'd wager that's plenty for most notebook users.

I would assume this will still be slower than mobile Alder Lake (or whatever Intel will have as a competing product when Zen 4 mobile is released). I don't know why you are comparing this to ARM.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,112
136
No, the L3 is still smaller. Look, the goal is to retain as much per-core performance as possible for cloud workloads, whilst trimming down the size of each core enough such that it becomes feasible to fit 128 cores in the kinds of space we could fit 96 cores before.

In the cloud larger L2s and smaller L3s are preferred. It makes no sense to stack L3 cache on top as V-Cache on Bergamo to make up for what's missing on die, it makes more sense to cut down the L3 and remove the TSVs and cache tags needed to handle all of that extra cache from the die, and then beef up the L2 instead. That would get you the same per-core performance in these specific workloads in a smaller area.
Okay, thanks. Got it now. New die Zen4c die. AMD is really gearing up on the server front!

It's too late for this thread, but it would be nice if we have client and server threads for Zen5 - I get a bit lost from time to time, especially with AMD diversifying it's server offerings.
Somebody else would have to do it, the only thread I create are one that are going to die ;)
 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,777
6,786
136
You can draw parallel between Zen4c to Zen3/2 in laptop SoCs
AMD's monolithic laptop SoCs do not use same device/process characteristics like regular CCDs.
e.g. Renoir MTr/mm2 is ~62 compared to ~50 for desktop CCDs. Even IO included.
They are designed for lower top end clocks and better efficiency but they have higher density. They will not be able to match desktop CCD clocks. They also got their L3 cut.

Zen4c sounds like something for which a physical layout can be straight up taken from Mobile.
All they would do is simply cut out the un-core IO part, route to GMI and put in a CCD.
While in an EPYC SoC they would run with base of 2.5GHz, they could be made to run at 3.4 GHz+ on laptop (using the 1.25x perf gain from N5)

(Zen3 CCD operates between 2.45-3.8GHz in EPYC except special SKUs and 3.5-4.9GHz in desktop, but ~1.9-4.4GHz for U series SoC)
 
Last edited:

uzzi38

Platinum Member
Oct 16, 2019
2,746
6,653
146
No, I think it's more like "Let's beat Ampere", using whatever tools we need to get the job done.



Which would be AMD asset, not a liability, if AMD can do it earlier and better than others.



I was thinking only the "c" cores in some of the lower end mobile targeted APUs (Chromebooks, tablets) - which could be based solely on "c" cores, while being at the opposite end of the spectrum from hyperscaler Bergamo CPUs as you can get.

I think the idea you have missed is that this core will be AMD's weapon to fight Arm.
Ampere yeeted themselves into the realm of irrelevancy already.

Their own custom cores are focused on perf/area, not performance/watt or on even somewhat performant general purpose server CPU. Don't expect much vs A76, I sincerely mean that.
 

uzzi38

Platinum Member
Oct 16, 2019
2,746
6,653
146
I would assume this will still be slower than mobile Alder Lake (or whatever Intel will have as a competing product when Zen 4 mobile is released). I don't know why you are comparing this to ARM.
You're the one that brought up ARM.

But yes, it would be slower than ADL. So what? Like I said there, what matters is having plenty good performance for general day-to-day tasks. Once that's been nailed, the focus from there for end-user experience is going to end up coming from battery life etc.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,777
6,786
136
But yes, it would be slower than ADL. So what? Like I said there, what matters is having plenty good performance for general day-to-day tasks. Once that's been nailed, the focus from there for end-user experience is going to end up coming from battery life etc.
Anyway, as per leaks, which seem credible so far, the high end mobile/Mobile Desktop probably will use the desktop Raphael SoCs.
For me I am interested in the Phoenix SoC/5800U replacements.


The fact AMD is thinking of using desktop SoC for mobile could tell us something about expected power efficiency.
The uArch is just called Zen 5, but it takes on various codenames for the various products that use the Zen 5 architecture.
Zen5 Core has a code name but some folks are not sharing. RetiredEngineer knows all these things
Zen4 Core is Persephone
Zen3 Core is Cerebrus

The CCDs and the IODs have their own codename too.
 

moinmoin

Diamond Member
Jun 1, 2017
5,236
8,443
136
I guess the point is why do this when you'd likely be able to sell all the Genoa N5 wafers AMD is willing to buy regardless.
Datacenters are all about rack density and efficiency.
E.g. Frontier is one Epyc (likely 64c) with four MI250 OAMs per sled (which is half-wide 2U rack mount).
Bergamo increases the amount of cores in the same space by 33% over Genoa. If density is your focus that's a significant change. Even more so if you skip same Zen gen Genoa, double the cores over Naples, Rome and Milan.

For what is worth, leakers say that the Zen5 APU (and maybe the following) will be MCM.
But that appears to be already true for Zen 4 Raphael as well, so a Zen 5 APU being MCM is not adding any news for us there.
 

leoneazzurro

Golden Member
Jul 26, 2016
1,114
1,866
136
But that appears to be already true for Zen 4 Raphael as well, so a Zen 5 APU being MCM is not adding any news for us there.

I meant the mobile variant (if AMD does not decide to use Raphael for high-end mobile, btw), that is, the one rumors call "Strix point"
 

Joe NYC

Diamond Member
Jun 26, 2021
3,375
4,948
136
Ampere yeeted themselves into the realm of irrelevancy already.

Their own custom cores are focused on perf/area, not performance/watt or on even somewhat performant general purpose server CPU. Don't expect much vs A76, I sincerely mean that.

Aren't they (Ampere) just starting with custom cores? I have to catch up on their status.

Being a monolithic die, Ampere needs to be focused on area efficiency, while they also need to deliver high core counts (with modest performance per core). But they will not be standing still.

AMD does not have the same limitations, having compute chiplets, I/O separation, stacking available in their tool chest. So AMD may deploy them all, and perhaps have some segmentation based on L3, just like Milan-X.

There are other Arm and emerging Risc-V designs. I think Zen 4c is going to AMD's first salvo against them, and likewise Intel will likely release something for the server market, based on their small cores.
 

Kepler_L2

Senior member
Sep 6, 2020
934
3,818
136
Datacenters are all about rack density and efficiency.
E.g. Frontier is one Epyc (likely 64c) with four MI250 OAMs per sled (which is half-wide 2U rack mount).
Bergamo increases the amount of cores in the same space by 33% over Genoa. If density is your focus that's a significant change. Even more so if you skip same Zen gen Genoa, double the cores over Naples, Rome and Milan.


But that appears to be already true for Zen 4 Raphael as well, so a Zen 5 APU being MCM is not adding any news for us there.
Zen4 mobile 65W category is MCM, the rest is monolithic. Zen5 will be MCM across the stack.
 

DrMrLordX

Lifer
Apr 27, 2000
22,752
12,755
136
If true, who the heck is going to buy SPR? Genoa is going to smoke it in every aspect: performance, perf/W, cores/socket, IO, everything.

Same kind of people that are still buying Cascade Lake. Look at Intel's Q3 earnings, a lot of their enterprise is still coming from 14nm (though that's obviously fading). Plus AMD will sell through Genoa like crazy. Once you get on a waiting list, it's Intel or bust for x86 server hardware. We don't know what SPR's yields will be like - it's been delayed so heavily that one must suspect it suffers from the same basic problems as IceLake-SP. But hopefully Intel can bring more volume to the market with Sapphire Rapids than IceLake-SP.

If not then there will be a lot of people on waiting lists for anything current.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,375
4,948
136
Nope. Plus the HBM part will likely be a unicorn.

"release" is a little fuzzy term.

To clarify, I think Intel will announce Sapphire Rapids in Q1 (even if it does not really ship until Q2).

AMD will likely announce Genoa after Sapphire Rapids. It would be a pleasent surprise if Genoa is announced before Sapphire Rapids. Do you think it iwll happen?

Agreed on HBM, which was announced that it will lag. But Intel is still committing releasing it for Aurora, IIRC.
 

DrMrLordX

Lifer
Apr 27, 2000
22,752
12,755
136
"release" is a little fuzzy term.

To clarify, I think Intel will announce Sapphire Rapids in Q1 (even if it does not really ship until Q2).

AMD will likely announce Genoa after Sapphire Rapids. It would be a pleasent surprise if Genoa is announced before Sapphire Rapids. Do you think it iwll happen?

Agreed on HBM, which was announced that it will lag. But Intel is still committing releasing it for Aurora, IIRC.

Not really sure who will announce what when, but in terms of shipping volume, I expect Genoa to beat Sapphire Rapids to the hyperscalars easily. Which is where a lot of the early sales will go. Milan was shipping for months before AMD officially announced it, and Genoa will likely be the same.

Intel announced IceLake-SP but couldn't push volume for around 6 months. Granted, yields on that were putrid, but still. Sapphire Rapids will be a major test of Intel's ability to fabricate large dice on anything from their 10nm family.
 

quikah

Diamond Member
Apr 7, 2003
4,183
732
126
"release" is a little fuzzy term.

To clarify, I think Intel will announce Sapphire Rapids in Q1 (even if it does not really ship until Q2).

AMD will likely announce Genoa after Sapphire Rapids. It would be a pleasent surprise if Genoa is announced before Sapphire Rapids. Do you think it iwll happen?

Agreed on HBM, which was announced that it will lag. But Intel is still committing releasing it for Aurora, IIRC.

AMD hasn't even managed to release Milan-X yet...

I would be shocked if Genoa is released before SPR.
 
  • Like
Reactions: yuri69

jamescox

Senior member
Nov 11, 2009
644
1,105
136
"Lets optimise for cost efficiency and then just throw twice the silicon on top".

Do I even need to point out how redundant that is?



They're also stacked via SoIC, so eh, been through that already.



Oh it absolutely is, but I don't think you've thought through this idea all that well. Why would you want to stack cache on the L3 of the little cores in particular? It would make more sense to either:

1. Only stack on the big cores' cache

2. Create a seperate system level cache and stack on that instead.

Unless you're suggesting little core only products as budget oriented solutions that you stack additional SRAM on top of. Surely you must realise that's just a tad bit silly, eh?
“Cost efficiency”? The MCM and / or chiplet approach is what allows for cost efficiency. Bergamo for cloud servers will not be a cheap part, so “too expensive” arguments are mostly not valid for Bergamo, but are valid for mobile solutions. Bergamo will likely be a very expensive part, possibly even more expensive than Milan-X.

Given that Bergamo is supposed to be for maximum core density, why is it only 8 chiplets when Genoa goes up to 12? I suspect the answer is that Bergamo uses silicon bridge interconnect for lower power consumption and higher performance. I initially thought that perhaps they were going to move the L3 or add L4 to the IO die, like the infinity cache on GPUs. The silicon bridge interconnect would lower power, reduce latency, and increase bandwidth to the IO die possibly making caches on the IO die more reasonable.

Someone brought up using cache in the silicon bridge chip though. With the rumors about RDNA3 GPUs having some giant cache chip (512 MB) in between the gpu die, I am wondering if Bergamo will get the same design or possibly even the same cache chip. Perhaps they could place a 512 MB cache die (or 2; one on each side of the IO die) and connect the cpu chiplets to that using silicon bridge tech. If that is the case, then the cpu die may have much larger shared L2 caches and no L3. There could be a lot of other options for the “density optimized cache hierarchy”. I am not sure what would do best for cloud applications, but for things like databases and such, massive caches seem like they are needed, otherwise Milan-x (or Genoa-x) would likely be better.
 
  • Like
Reactions: BorisTheBlade82

Kepler_L2

Senior member
Sep 6, 2020
934
3,818
136
Sure, if that stack excludes sub-$100 (or whatever will be the low budget red line) parts like Dali, Monet etc. Like will the successor to Dragon Crest be MCM?
The ultra low-end stuff is probably not MCM, but they will not be on the latest node either.
 

Bigos

Member
Jun 2, 2019
199
515
136
You're the one that brought up ARM.

As an example of an big/medium core arrangement (so that readers would know what I mean by that). For example, Snapdragon 865 has one big Cortex-A77 core and 3 medium Cortex-A77 cores (with lower frequency and less cache - seems similar to Zen 4 vs Zen 4c).

Nowhere was I comparing to any ARM SoC. Was that not clear enough?

But yes, it would be slower than ADL. So what? Like I said there, what matters is having plenty good performance for general day-to-day tasks. Once that's been nailed, the focus from there for end-user experience is going to end up coming from battery life etc.

I agree that battery life is very important for notebooks. However, single-thread performance defines snappiness of the workloads people will use the notebook for. I don't know which of these two aspects is more important, but I would not disregard one for the other.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,375
4,948
136
Not really sure who will announce what when, but in terms of shipping volume, I expect Genoa to beat Sapphire Rapids to the hyperscalars easily. Which is where a lot of the early sales will go. Milan was shipping for months before AMD officially announced it, and Genoa will likely be the same.

Intel announced IceLake-SP but couldn't push volume for around 6 months. Granted, yields on that were putrid, but still. Sapphire Rapids will be a major test of Intel's ability to fabricate large dice on anything from their 10nm family.

I know about Ice Lake announcement and delayed shipments, and that's I think what is to be expected from Sapphire Rapids as well one year later.

I really don't have a good sense about Genoa, if the silicon AMD is shipping / seeding final silicon. But just going by Lisa Su statement, she said Genoa will go into production and for sale in 2022, so maybe Genoa is not yet in full production...

So I don't know if that puts Genoa ahead or behind SPR. One thing we can be quite confident about is that there will be no yield issues for Genoa...
 
Last edited: