Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 143 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

DisEnchantment

Golden Member
Mar 3, 2017
1,608
5,816
136
uplift ≠ ipc.

But yeah, if that is speed across all cores, that's a 2.5x improvement. As I understand it, some (but not all) boinc workloads support AVX-512. If milkyway@home is among them, it would make a lot of sense if that's what's happening here.
It not MT as per BOINC wiki. Need frequency data to get better picture.

The BOINC benchmark provides performance analysis of the system's Whetstone (or floating point - FP) and Dhrystone (or integer - INT) performance in Million Instructions Per Second (MIPS). These are synthetic benchmarks and suffer from the typical shortcomings of synthetic tests.

Having more cores won't be reflected in this benchmark because it runs on a single core (ideally a physical core rather than a logical one).


Being Whetstone and Dhrystone benchmarks, these results are not really indicative of real world performance.
 
Last edited:

Ajay

Lifer
Jan 8, 2001
15,468
7,873
136
Even though there's a lot of nonsense in that article I don't think the 16core chiplet is impossible.
It's read to me more as zen4c bergamo than anything that will go into a ryzen product.
And it doesn't look like 'efficiency' cores specifically either. Just a full fat zen core sharing l2 and unable to run at high frequency due to the stacked l3 thermal insulation. Basically the periphery cores can boost and use the full l2 or all cores are in use at closer to base clocks.
The 35w mention seems very telling as 8x35 is 280 and adding an IO puts it in the ballpark for next generation server cpu power draw.
Zne4C isn't going into Raphael. It's a specialty chip for some large customer (and maybe others if the demand and wafer availability is there). Again, I'll fall out of my chair if Raphael is anything but stock Zen4 CCDs/IOD with the twist of some new packaging. According to AMD, we should see a +25% performance increase of Zen3 and higher perf/watt.
 

SteinFG

Senior member
Dec 29, 2021
422
475
106
a bit of crazy/fun speculation from me. Recent leaks point to Ryzen using a Fan-out package, so it's possible Epyc is using it too. Looking at the leak from Gygabyte, I'm estimating the size of such fan-out area has to be around 31x65mm big. Looking online at what different companies offer, found an ASE fan-out packging page, in which they state that with "FOCoS", which is aimed at servers and networking, it's possible to make a package with a length of 67mm, barely enough for potential Epyc package. And on a standard wafer, about 20 such packages can be made. Edit: TSMC offers InFO_OS, which seems similar, and is qualified for 65x65mm packages in 2019.

1641656914133.png
 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,608
5,816
136
a bit of crazy/fun speculation from me. Recent leaks point to Ryzen using a Fan-out package, so it's possible Epyc is using it too. Looking at the leak from Gygabyte, I'm estimating the size of such fan-out area has to be around 31x65mm big. Looking online at what different companies offer, found an ASE fan-out packging page, in which they state that with "FOCoS", which is aimed at servers and networking, it's possible to make a package with a length of 67mm, barely enough for potential Epyc package. And on a standard wafer, about 20 such packages can be made. Note: TSMC offers InFO_OS, which seems similar, but I can't find the max package size.
I don't think it will be ASE providing the packaging, or at least not the offerings above. This could be done in house by Tongfu.
But I do not have access to the paywalled part of the SemiAnalysis article (way too many subscriptions atm)
So just a guess on my part.

Those RDL layers in the ASE offerings are too less. For comparison, the current EPYC substrate has 14 signal layers.
Beside the RDL in the fan out, I have seen patents indicating that there could be an IVR die that is sitting in there too.
This IVR die is to keep IR losses to a minimum when increasing drive currents and to reduce the amount of VDD pins going up from the substrate.
Reducing the voltage (e.g. N5/N3 devices can operate at much lower voltages) means increasing the current, and by keeping IVR close to the cores/CCDs the I2R losses can be kept in check.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,608
5,816
136

Evan Burness going on record saying Azure capturing EPYC shipments before any other customer can get their hands on them.
Basically that is a strategy for them to preempt supplies before the customers can even get their hands on them.
Then customers go to Azure for competitive analysis and they put out the best offering because nobody else got the chips before Azure.
He was also bragging that they have not lost a single competitive analysis requested by any customer.
 

moinmoin

Diamond Member
Jun 1, 2017
4,954
7,672
136
It seems that AMD works on packaging -> core -> packaging -> core. Packaging seems to take more time.
As @DisEnchantment mentioned parts of packaging are in-house. But where this gets interesting is process nodes and other technology not achievable in-house. There AMD can't do anything but wait for its partners to finish their work, case in point being Zen 3D V-Cache being prepared since the Zen 3 launch, but only being available later with Milan-X and again later on the consumer market.

I wonder how AMD plans to handle such external dependencies over the long run. Ideally they wouldn't have a single linear roadmap but be able to launch node, core and packaging improvements as they become ready for HVM.

Evan Burness going on record saying Azure capturing EPYC shipments before any other customer can get their hands on them.
Basically that is a strategy for them to preempt supplies before the customers can even get their hands on them.
Then customers go to Azure for competitive analysis and they put out the best offering because nobody else got the chips before Azure.
He was also bragging that they have not lost a single competitive analysis requested by any customer.
With such early access and day and date availability Azure essentially turned Epyc into their own Graviton competitive advantage at least temporally which is a big win for Microsoft in the Cloud market.

But I'm not sure this is really a positive for AMD which over the long run should be more interested in a customer base boarder than this. Microsoft's commitment must be significant.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
Zne4C isn't going into Raphael. It's a specialty chip for some large customer (and maybe others if the demand and wafer availability is there). Again, I'll fall out of my chair if Raphael is anything but stock Zen4 CCDs/IOD with the twist of some new packaging. According to AMD, we should see a +25% performance increase of Zen3 and higher perf/watt.

Well, be careful of how you interpret AMD’s statements. They weren’t necessarily talking about a Zen 3 to Zen 4 uplift. I do agree, however, that there should be substantial uplift.

Agreed that Zen4c will not be on the desktop. 3D V-Cache will also likely not be available at launch. I suspect we will initially see 3 SKUs: a 16 core SKU, a 12 core SKU, and an 8 core SKU.
 

soresu

Platinum Member
Dec 19, 2014
2,665
1,865
136
Well, be careful of how you interpret AMD’s statements. They weren’t necessarily talking about a Zen 3 to Zen 4 uplift. I do agree, however, that there should be substantial uplift.

Agreed that Zen4c will not be on the desktop. 3D V-Cache will also likely not be available at launch. I suspect we will initially see 3 SKUs: a 16 core SKU, a 12 core SKU, and an 8 core SKU.
16 core SKUs require 2 fully working CCD's, 12 core SKUs only require 2 CCD's with 2 cores not working - this is why the 3950X came out months after the initial release of Zen2/Matisse.

So I'd say if 12 core SKUs are available at launch then 6 core SKUs will be too.

At least until 12C or 16C CCD's become a thing.
 
  • Like
Reactions: Tlh97

ryanjagtap

Member
Sep 25, 2021
108
127
96
Yields on N7 were probably lower at Matisse's launch then yields on N5 today. I doubt there will be many 6c CCDs with two bad cores. We may see the 12c parts excised from the lineup.
Do you think with the yields being so good and the zen 3 and upwards designs being 8c CCD, the ryzen 3 line will be stopped and it will move to the athlon side on zen 2 design with 4c CCX? zen 4 starting sku maybe 6c/12t of slightly defective 8c/16t zen 4
edit: We are already seeing this in rembrandt having 6c/12t minimum
 

Attachments

  • Screen_Shot_2022_01_04_at_8.05.39_AM.jpg
    Screen_Shot_2022_01_04_at_8.05.39_AM.jpg
    191.4 KB · Views: 31

ryanjagtap

Member
Sep 25, 2021
108
127
96
L2 cache appears to be doubled.

EDIT: while the application may not be a benchmark, that would appear to indicate a 19% integer uplift. Floating point seems little changed.
Are we sure this is the L2 cache per core being doubled and not the total L1 cache of the CPU, which has been doubled from 64Kb to 128Kb [doubling the L1 instruction and L1 data cache from 32Kb to 64KB]???
edit: My bad, maybe I'm wrong as both the 16 and 32 threads CPU show same cache.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,640
10,858
136
Do you think with the yields being so good and the zen 3 and upwards designs being 8c CCD, the ryzen 3 line will be stopped and it will move to the athlon side on zen 2 design with 4c CCX?

I think it's possible that the market segment currently served by 4c and 6c SKUs could be completely taken over by Rembrandt and its successors. Not gonna say it's absolutely going to happen, but it could.
 

Abwx

Lifer
Apr 2, 2011
10,966
3,485
136
[/QUOTE]

. According to AMD, we should see a +25% performance increase of Zen3 and higher perf/watt.
[/QUOTE]

They said nothing of the sort, just that TSMC s N5P provide either 1.25x better perf at isowatt or 0.5x the power at isofrequency (and of course same throughput).
 

DisEnchantment

Golden Member
Mar 3, 2017
1,608
5,816
136
just that TSMC s N5P provide either 1.25x better perf at isowatt or 0.5x the power at isofrequency (and of course same throughput).
Sorry to be a bit pedantic, actually they did not say that for N5.

They said "twice the density, twice the power efficiency and 1.25x the performance of the 7 nm process we're using in today's products" (Lisa's own sentences, exactly the way she said it. First slide below)
But for N7 they did mention perf at iso power or efficiency at iso perf (see Papermaster presentation during EPYC Rome launch. Second slide below)

I don't think they would have omitted any precondition for the gains if they are there because the implications are significant and they/we know it. Much different than what TSMC provided.
(Ian Cutress/AT also asked AMD about this, but got response that these are the process targets. But to be noted is that AMD is comparing N7 vs N5 the way they used/customized them, not stock TSMC figures)
But who knows, we have to wait. (I supposed the almost 2x in MTr per CCD is nullifying that 2x efficiency)

1641733212299.png
 

maddie

Diamond Member
Jul 18, 2010
4,747
4,691
136
Sorry to be a bit pedantic, actually they did not say that for N5.

They said "twice the density, twice the power efficiency and 1.25x the performance of the 7 nm process we're using in today's products" (Lisa's own sentences, exactly the way she said it. First slide below)
But for N7 they did mention perf at iso power or efficiency at iso perf (see Papermaster presentation during EPYC Rome launch. Second slide below)

I don't think they would have omitted any precondition for the gains if they are there because the implications are significant and they/we know it. Much different than what TSMC provided.
(Ian Cutress/AT also asked AMD about this, but got response that these are the process targets. But to be noted is that AMD is comparing N7 vs N5 the way they used/customized them, not stock TSMC figures)
But who knows, we have to wait. (I supposed the almost 2x in MTr per CCD is nullifying that 2x efficiency)

View attachment 55705
Are you saying 50% power AND 125% performance at the same time?

If yes, I don't believe it.
 

turtile

Senior member
Aug 19, 2014
614
294
136
Zne4C isn't going into Raphael. It's a specialty chip for some large customer (and maybe others if the demand and wafer availability is there). Again, I'll fall out of my chair if Raphael is anything but stock Zen4 CCDs/IOD with the twist of some new packaging. According to AMD, we should see a +25% performance increase of Zen3 and higher perf/watt.

I don't think it makes any sense not to use Zen4c in mobile parts. After all, the Zen mobile parts have an L3 cache that is half the size. The only change Lisa mentioned about the changes is a process optimized for dense cache and power efficiency. Since cache takes up a lot of space, they can add more cores, and because of that, it makes mobile more price competitive. Combine that with power efficiency, and you have a perfect combination for lower-priced mobile parts.

The 7nm 64 MB cache only measures 36 mm^2 while the 32 MB on-chip looks like it takes up almost half of the die space of an 81 mm^2 chip. Going from 96 cores to 128 cores can easily be achieved by increasing the cache density by 2x.
 

DrMrLordX

Lifer
Apr 27, 2000
21,640
10,858
136
I don't think it makes any sense not to use Zen4c in mobile parts.

Only the monolithic mobile products have 1/2 L3. Raphael-H will probably buck that trend to chase higher performance. Phoenix will likely follow in Rembrandt's footsteps and as such will be monolithic, making Zen4c CCDs non-candidates.
 
  • Like
Reactions: scineram

DisEnchantment

Golden Member
Mar 3, 2017
1,608
5,816
136
The fact that she say AND doesnt mean that it s a the same time, she s quoting the respective characteristics i talked about, otherwise that would mean 0.25x the power at isofrequency , wich TSMC certainly never claimed so.
I dont want to bother debating what they showed in the slide, when the product launches we will know. I did not get on the stage to state it nor am I defending it.

However, it does not change the fact that AMD and TSMC quoted values are different.

1641743341251.png
 

Abwx

Lifer
Apr 2, 2011
10,966
3,485
136
I dont want to bother debating what they showed in the slide, when the product launches we will know. I did not get on the stage to state it nor am I defending it.

However, it does not change the fact that AMD and TSMC quoted values are different.

View attachment 55709

AMD use an enhanced 5nm process dubbed N5P, hence the better numbers than on TSMC s slide wich display the vanilla 5nm perfs.


 

turtile

Senior member
Aug 19, 2014
614
294
136
Phoenix will likely follow in Rembrandt's footsteps and as such will be monolithic, making Zen4c CCDs non-candidates.

I'm not saying that Zen 4c CCDs will be used in mobile but that the Zen 4c design and process will be used. The reason I believe this is because RDNA2/3 are benefited from cache as well, especially when using system RAM. If they do go monolithic, it seems that this process would be ideal for multiple reasons.
 
  • Like
Reactions: Tlh97

moinmoin

Diamond Member
Jun 1, 2017
4,954
7,672
136
I have no doubt we will see the optimized Zen 4C cores as part of monolithic mobile APUs (not sure if as part of Phoenix already). Whether Zen 4C CCDs will be used in mobile MCMs akin to Raphael-H depends on whether there'll be a consumer MCM containing Zen 4C CCDs.