Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 503 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,560
14,514
136
I think it was in the low to mid-2 GHz range for Zen 4c.
That does not sound right to me. I have a 9554, 64 core Genoa. It does 3500 mhz with all cores@100% on a crappy cooler (by noctua standards). Oh, and I have a 96 core 9654 Genoa, and it does 2750 mhz all core load(on the same crappy cooler). Since Bergamo is light on the silicon, I would expect at least 2500 mhz.
 
  • Like
Reactions: lightmanek

Saylick

Diamond Member
Sep 10, 2012
3,162
6,388
136

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,560
14,514
136

Timorous

Golden Member
Oct 27, 2008
1,615
2,772
136
but if your remaining 16 cores are cacheless and single threaded you're not doing yourself any favors and are contributing to heat. At best you'd get similar if not same performance as Intel

Sure, just like the 7950X3D does. At best there might be a marginally better bin on an 8+16 part to give it a slightly higher gaming performance ceiling.

For that to be worth it the c cores would have to be shrunken down zen 5 or zen 6 cores with cache and with smt to make it useful

Zen4c has cache, it has 32MB L3 cache and 16MB L2 cache. It also has SMT because Zen4c is just Zen4 cores with half the L3 per core and lower clock speeds so density can increase.

if there was a hard cut off of power delivered to each core through the bios that would help in the way of power management and thermals

Bergamo fits in the same power envelope as Genoa and the 7950X3D does not even use the 170W AM5 TDP option. There is enough power to load all the cores on AM5. Why do you think AMD went a bit OTT on power delivery with it? Headroom for future products.

The 8+16 ccd doesn't make sense for amd unless they can fix the issues intel has and improve, but also if they can find a reason to make it at volume. Again I'm repeating myself for the 10th time in the past year. none of this makes sense if epyc can't use it. epyc is a money maker. Epyc is a high margin product. consumer ryzen is not where the large bulk of their income is.

Given the ISA sets are the same it is pretty trivial for AMD, they can do what they do with the 7950X which has 1 CCD that can clock higher and the other which is a worse bin.

You also assume all Zen4c CCDs will be viable for EPYC. Seems unlikely as I expect a reasonable amount to exceed the v/f curve to be binned in either SKU. If quantities are high enough AMD can use them in an AM5 design rather than scrapping them.

Youre 8+16 proposition also makes no sense because it would be ideal to include 2 of those ccds with one having the L3 $ on it. why do you ask? because if they make such a ccd from the top, their threadrippers and epycs get a major bump in core counts. they won't have a complain about core counts for years. it'll also force intel to address their short comings and figure out how to get their issue of ht and avx512 working on their processors.

There are no TSVs in Zen4c so you can't stack cache on it. That was one of the ways they increased density.

Realistically this is likely to never happen because 8+16 is a lot more complicated than 8 normal cores from a fabrication perspective, otherwise you add time to your binning and your production rate slows down as a result, and for a company like amd that is almost always supply constrained because companies want their products by the truck load, it's a terrible idea.
Not really. It is just mixing 2 CCDs which they already do with the X3D 2 CCD parts and even did before that with the 1 good bin and 1 less good bin. AMD have already solved that bit.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
Sure, just like the 7950X3D does. At best there might be a marginally better bin on an 8+16 part to give it a slightly higher gaming performance ceiling.



Zen4c has cache, it has 32MB L3 cache and 16MB L2 cache. It also has SMT because Zen4c is just Zen4 cores with half the L3 per core and lower clock speeds so density can increase.



Bergamo fits in the same power envelope as Genoa and the 7950X3D does not even use the 170W AM5 TDP option. There is enough power to load all the cores on AM5. Why do you think AMD went a bit OTT on power delivery with it? Headroom for future products.



Given the ISA sets are the same it is pretty trivial for AMD, they can do what they do with the 7950X which has 1 CCD that can clock higher and the other which is a worse bin.

You also assume all Zen4c CCDs will be viable for EPYC. Seems unlikely as I expect a reasonable amount to exceed the v/f curve to be binned in either SKU. If quantities are high enough AMD can use them in an AM5 design rather than scrapping them.



There are no TSVs in Zen4c so you can't stack cache on it. That was one of the ways they increased density.


Not really. It is just mixing 2 CCDs which they already do with the X3D 2 CCD parts and even did before that with the 1 good bin and 1 less good bin. AMD have already solved that bit.
I had an entire response to each post point you made but deleted it. You're not having this conversation in good faith. you keep pushing this wild dream that isn't reasonable for amd nor their finances. ring a ding bing, they're not making intel money to go on a whim like this, Tim. When your head gets back to reality we can discuss this without you constantly ignoring everything being said and repeating yourself over and over again as if it's going to make me give a damn about your opinion when it's radically out there that I can't tell if you're serious or trying to get a laugh out of me.

Feel free to put me on ignore if you want. I won't lose sleep over it. I'll still enjoy your wild posts as the forum jester.
 

Timorous

Golden Member
Oct 27, 2008
1,615
2,772
136
I had an entire response to each post point you made but deleted it. You're not having this conversation in good faith. you keep pushing this wild dream that isn't reasonable for amd nor their finances. ring a ding bing, they're not making intel money to go on a whim like this, Tim. When your head gets back to reality we can discuss this without you constantly ignoring everything being said and repeating yourself over and over again as if it's going to make me give a damn about your opinion when it's radically out there that I can't tell if you're serious or trying to get a laugh out of me.

Feel free to put me on ignore if you want. I won't lose sleep over it. I'll still enjoy your wild posts as the forum jester.

1) I am just pointing out a path AMD could take, I don't know if they will but the option is there if they feel they want more MT performance prior to Zen 5 releasing or if they just want to get any niggles worked out on this kind of setup in a lower volume part before it becomes a larger part of the mainstream stack.

2) Rebutting your points because they are incorrect is not the same as ignoring. Not every Zen4c CCD will be of high enough quality to be used in Bergamo, They do have L3 cache despite you saying they are cacheless, they also have SMT when you said you would need to wait for Zen 5c or Zen 6c for that. You talk about stacking L3 on top of the die despite the fact there are no TSVs and the other issue is the cache is split into 2 groups of 16 so it would need an entirely different cache die (or 2 even) so it does not cover the cores.

3) In terms of cost AMD have done some far more crazy things like TR or the 2990WX which was really hacky part. the 5800X3D as well which prior to the full announcement had everyone assuming that only the R9's would get the X3D treatment because of cost. Unless you work there and are divulging confidential information, which I highly doubt, you have no insight into what it will cost them or what they can gain just through the knowledge of bringing up such a product.

4) I will again repeat point 1. I am pointing out a potential avenue they can go down. There are far too many unknowns to know if AMD will actually take such a path but pointing out what AMD could do is not the same as thinking they will.

Finally, don't be so insufferable, it is not a good look, especially when you wrote so many flat out incorrect statements.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
2) Rebutting your points because they are incorrect is not the same as ignoring. Not every Zen4c CCD will be of high enough quality to be used in Bergamo, They do have L3 cache despite you saying they are cacheless, they also have SMT when you said you would need to wait for Zen 5c or Zen 6c for that. You talk about stacking L3 on top of the die despite the fact there are no TSVs and the other issue is the cache is split into 2 groups of 16 so it would need an entirely different cache die (or 2 even) so it does not cover the cores.
Bud, nothing I said is incorrect. You were making a suggestion about ultimate performance and brought up a design that nukes half its cache for space. How the hell is that performance minded? You want performance you go full cache, no exceptions. in fact screw the small cores. Go a full fat 24 cores per die. I'm confident AMD can wrangle the power down and deliver a product that beats the snot out of Intel for the next decade.
 
  • Like
Reactions: Markfw

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
1) I am just pointing out a path AMD could take, I don't know if they will but the option is there if they feel they want more MT performance prior to Zen 5 releasing or if they just want to get any niggles worked out on this kind of setup in a lower volume part before it becomes a larger part of the mainstream stack.
Could take but won't take. Again this is all about money. to do this amd needs to drop their 8 core die and adopt a 8+16 die structure. This die structure should not half access to less cache as you suggested. It should have access to more cache from the getgo. You need to think performance here. Once you introduce higher core counts your die complexity goes up. Once complexity goes up your ability to get perfect dies goes down. AMD tuned their process based on how many dies they can save from the top to the bottom. It makes no sense to go about an approach that Intel did in the past. inte;s future approach is compute tiles/dies breaking the core count down into managable and affordable pieces, reducing complexity to reduce overhead costs. why would amd go the other direction now? this is a lot more work involved from manufacturing to validation to prepping. added time is loss of possible income.

Again, income income income income income. AMD won't sell a single ccd 8+16 on a x950 for cheap. See the launch price of the zen 4 variant, add another 300-400 to it. Why? AMD is offering you a high cache, 8+16 where both have smt and both have cache access. and offer avx512. why would they sell it for near the same price with all the work involved and extra steps in product manufacturing? money doesn't grow on trees.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
3) In terms of cost AMD have done some far more crazy things like TR or the 2990WX which was really hacky part. the 5800X3D as well which prior to the full announcement had everyone assuming that only the R9's would get the X3D treatment because of cost. Unless you work there and are divulging confidential information, which I highly doubt, you have no insight into what it will cost them or what they can gain just through the knowledge of bringing up such a product.
Threadrippers are bastardised epycs. they're also a high margin part. Think of threadripper as epyc lite. Everyone assumed the R9 would get the treatment because the R9 was the model Lisa Su was holding in her hand when she showed off 3d cache for the first time. a 5800x and 5800x3d are the same part with the latter getting 3d $ treatment. The msrp for the former is already high enough due to bom for the processor. the cache chip made at scale allows them to sell it to youi for a higher price recouping and some extra money for a higher msrp. the cache argument you presented is not matching to your earlier one where you argue about not having cache atop the magical 8+16.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
4) I will again repeat point 1. I am pointing out a potential avenue they can go down. There are far too many unknowns to know if AMD will actually take such a path but pointing out what AMD could do is not the same as thinking they will.

Finally, don't be so insufferable, it is not a good look, especially when you wrote so many flat out incorrect statements.
And you are who to suggest they should look into this?

I have written no such things. If anything you've been a massive bellend for days now. You want to have wild dreams for AMD, knock your socks off lad. I'll even buy you a tub of lotion and some kleenex.
 

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,663
136
It would not surprise me if eventually the Nc core design lesson spill over into the main core design, albeit likely not until at least Zen6.
I'm not so sure AMD will move away from this separation again as they target different market demands.

With Zen 4 AMD invested significant amounts of transistors to allow higher frequency. While this is certainly possible to optimize it will likely be necessary to do on future nodes as well (if not even more so) to reach the demanded ever increasing high frequencies. Zen 4c is essentially about rolling back that again, instead focusing on further density increase. With Zen 5 it appears both types of cores will be available closer to each other.
 
  • Like
Reactions: Tlh97 and soresu

Timorous

Golden Member
Oct 27, 2008
1,615
2,772
136
Bud, nothing I said is incorrect. You were making a suggestion about ultimate performance and brought up a design that nukes half its cache for space. How the hell is that performance minded? You want performance you go full cache, no exceptions. in fact screw the small cores. Go a full fat 24 cores per die. I'm confident AMD can wrangle the power down and deliver a product that beats the snot out of Intel for the next decade.

Have you not seen the lower cache 7950X beat the 7950X3D in some productivity workloads like rendering? Not all workloads scale with cache which is why Zen4c exists in the 1st place and why Genoa-X is a separate product line.

A full fat 24 core part would need 3 CCDs and a new IO die with the current suite of parts.

Edit: and you are imprecise again. Zen4c halves the per CCX L3 cache but the overall CCD L3 amount stays the same.
Could take but won't take. Again this is all about money. to do this amd needs to drop their 8 core die and adopt a 8+16 die structure. This die structure should not half access to less cache as you suggested. It should have access to more cache from the getgo. You need to think performance here. Once you introduce higher core counts your die complexity goes up. Once complexity goes up your ability to get perfect dies goes down. AMD tuned their process based on how many dies they can save from the top to the bottom. It makes no sense to go about an approach that Intel did in the past. inte;s future approach is compute tiles/dies breaking the core count down into managable and affordable pieces, reducing complexity to reduce overhead costs. why would amd go the other direction now? this is a lot more work involved from manufacturing to validation to prepping. added time is loss of possible income.

Again, income income income income income. AMD won't sell a single ccd 8+16 on a x950 for cheap. See the launch price of the zen 4 variant, add another 300-400 to it. Why? AMD is offering you a high cache, 8+16 where both have smt and both have cache access. and offer avx512. why would they sell it for near the same price with all the work involved and extra steps in product manufacturing? money doesn't grow on trees.

Why do you think I am talking single CCD? I never once said single CCD. I always said that if AMD wanted a 24c option swapping out one of the 8c CCDs on the 7950X with the 16c Zen4c CCD would give them a desktop part that is unmatched in all core productivity performance. If the standard CCD had v-cache it would also be the best gaming CPU making such a part the ultimate all rounder.

AMDs approach has been to re-use the pieces in as many places as possible. Introducing Zen4c is adding a new die to the mix already and being able to use those dies in more products has been AMDs MO since Zen 1.

I expect if a part were to launch it would be expensive as all halo parts are.

AMD will already have considered using it in a Ryzen SKU as part of the Zen4c business case. Whether that is just reusing the CCX layout for APUs or if it extends to using the CCDs in a desktop part we have no idea. It is not something they would not factor in.

Threadrippers are bastardised epycs. they're also a high margin part. Think of threadripper as epyc lite. Everyone assumed the R9 would get the treatment because the R9 was the model Lisa Su was holding in her hand when she showed off 3d cache for the first time. a 5800x and 5800x3d are the same part with the latter getting 3d $ treatment. The msrp for the former is already high enough due to bom for the processor. the cache chip made at scale allows them to sell it to youi for a higher price recouping and some extra money for a higher msrp. the cache argument you presented is not matching to your earlier one where you argue about not having cache atop the magical 8+16.

Cache argument makes perfect sense. 1 8c Zen4 CCD with v-cache like already exist and 1 16c Zen4c CCD connected to the existing IO die.

I am not talking some single CCD with 8 Zen4 cores and 16 Zen4c cores in one chip. That would be dumb.

The BOM of such a Zen4 part would be about the same as the existing 7950X3D part which was able to slot in at the same MSRP as the 7950X launched at. This part could command a premium for having higher performance in all core workloads.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
Tim, I'm tempted to block you for trolling at this point because I can't tell what you're advocating for here. You know well enough your idea isn't that great and will only lead to a whole host of other problems, so again I ask you what's the point of this back and forth debate you're having if you're not serious about a solution?

In one or two simple sentences, tell me what it is you're advocating AMD offer to consumers. You keep pushing this miracle processor that really won't sell and makes no sense as a product, and this is ignoring the mounting issues it'll gather as it pushes ahead in the pipeline.
 

Timorous

Golden Member
Oct 27, 2008
1,615
2,772
136
Tim, I'm tempted to block you for trolling at this point because I can't tell what you're advocating for here. You know well enough your idea isn't that great and will only lead to a whole host of other problems, so again I ask you what's the point of this back and forth debate you're having if you're not serious about a solution?

In one or two simple sentences, tell me what it is you're advocating AMD offer to consumers. You keep pushing this miracle processor that really won't sell and makes no sense as a product, and this is ignoring the mounting issues it'll gather as it pushes ahead in the pipeline.

Take a 7950X and / or an X3D. Swap one 8c Zen4 CCD for the 16c Zen4c CCD. Release as halo part with the best all core performance on desktop platforms. If the Zen4c CCD is paired with the Zen4 + v-cache CCD then it would also have top tier gaming performance.

Will they do this? No idea. Depends on too many factors we don't have information on but I would fully expect such a move was considered as part of the business case for Zen4c CCDs in the 1st place because AMD like to use the stuff they build in more than 1 segment if possible.

You talk about added complexity but AMD have already built the CCD and have brought it to production so all of that has been done. Adding a new Ryzen SKU is just re-using existing parts in a different segment.
 
  • Like
Reactions: Tlh97 and Hitman928

rainy

Senior member
Jul 17, 2013
505
424
136
zen 5 will likely be mid to late 2nd half 2024 product.
I think you're wrong with your prediction: I do not expect that gap between Zen 5 and Zen 4 would be equally big as it was between Zen 4 and Zen 3 (about 23 months).
I'm expecting that Zen 5 will be released on spring next year, 18-19 months after Zen 4.
 

Ajay

Lifer
Jan 8, 2001
15,454
7,862
136
I think you're wrong with your prediction: I do not expect that gap between Zen 5 and Zen 4 would be equally big as it was between Zen 4 and Zen 3 (about 23 months).
I'm expecting that Zen 5 will be released on spring next year, 18-19 months after Zen 4.
Sadly, all we know for sure is that it will be released in 2024.
 

Abwx

Lifer
Apr 2, 2011
10,948
3,459
136
According to the article published at Computerbase a Zen 4C 16C CCD require only 9.6% more area than a a regular Zen 4 CCD, road is open for low cost 24-32C :
They point out battery runtimes as the headline of the article but it's like identical to last years model pretty much (aside from power under full load) which is a bit ????

In a browsing test loading is extremely bursty, with a stream of short time loadings, with the 7940H they choose to use the same power as RMB, this way every burst has 30% higher perf than with RMB, the result is that the laptop seems more snappy, that is, response time is much lower.

On the other hand they could have set power such that the response time and exe time are the same, wich would had yielded much lower power usage and augmented battery life, this can be eventually set manually on some dGPU less laptop if they have a 9 or 15W CPU power limitation.
 
  • Like
Reactions: Tlh97

yuri69

Senior member
Jul 16, 2013
389
622
136
With regards to the chiplet interconnect, the Infinity Fabric on Package (IFOP) is the same on both dies, comprising two GMI3-Narrow links. However, while the die supports it, there does not appear to be a Zen 4c model that uses both GMI3 links. Instead, signals from the two independent CCX are muxed through a single link to the IO Die.

This is quite strange - Genoa's 12 CCXs connect to the IOD with 12 GMI3-narrow but Bergamo's 16 CCXs connect with only 8 GMI3-narrow, right? Is the bandwidth enough? Are the lower core clocks going to offset this?
 

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136

This is quite strange - Genoa's 12 CCXs connect to the IOD with 12 GMI3-narrow but Bergamo's 16 CCXs connect with only 8 GMI3-narrow, right? Is the bandwidth enough? Are the lower core clocks going to offset this?
The total cache per CCD increases roughly 21% for Zen4c over Zen4 to account for doubled L1/L2 caches mirroring the core count per CCD.

I'd think that cache changes will affect necessary bandwidth requirements - but 21% is not a huge figure and the GMI3 spec may have been designed with this already in mind.

Obviously there is more than just cache bandwidth to consider, but I'd assume it is a significant part of the equation.
 

DrMrLordX

Lifer
Apr 27, 2000
21,633
10,845
136
That does not sound right to me. I have a 9554, 64 core Genoa. It does 3500 mhz with all cores@100% on a crappy cooler (by noctua standards). Oh, and I have a 96 core 9654 Genoa, and it does 2750 mhz all core load(on the same crappy cooler). Since Bergamo is light on the silicon, I would expect at least 2500 mhz.

Remember that we're dealing with different libraries and a different core layout. Zen4c has made a lot of compromises to achieve higher effective logic density. Bergamo could get very hot at voltages that are trivial for Genoa. Also the target audience that wants/needs Bergamo may be more interested in the high core counts than they are in clockspeed.

Not every Zen4c CCD will be of high enough quality to be used in Bergamo,
At this point, the yields on N5-family nodes should be phenomenal. It would be very surprising if AMD had many salvage dice from production of Bergamo.
 
  • Like
Reactions: Tlh97 and Mopetar

BorisTheBlade82

Senior member
May 1, 2020
664
1,014
106
Tim, I'm tempted to block you for trolling at this point because I can't tell what you're advocating for here. You know well enough your idea isn't that great and will only lead to a whole host of other problems, so again I ask you what's the point of this back and forth debate you're having if you're not serious about a solution?

In one or two simple sentences, tell me what it is you're advocating AMD offer to consumers. You keep pushing this miracle processor that really won't sell and makes no sense as a product, and this is ignoring the mounting issues it'll gather as it pushes ahead in the pipeline.
What the heck are you on about?
Everything, @Timorous wrote does make perfect sense. And no one is stating this to be the truth, but rather a likely outcome.
 

BorisTheBlade82

Senior member
May 1, 2020
664
1,014
106

This is quite strange - Genoa's 12 CCXs connect to the IOD with 12 GMI3-narrow but Bergamo's 16 CCXs connect with only 8 GMI3-narrow, right? Is the bandwidth enough? Are the lower core clocks going to offset this?
Yep, this has been already known. The reason is, that the sIOD only has 12 IFoP links which can't be divided into 8 CCD other than 1/CCD.
Bandwidth would be a problem for some single CCD usage workloads as you only get 4/2 GByte/s/core. But in cloud workloads on up to 8 CCDs that might not be much of a problem.