Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
I think AMD will maintain or even slightly increase clocks with 5nm based on what’s been revealed about it so far, but who knows. The heat density will be a big issue and AMD is concerned with power and area first, increasing clock speeds is secondary.
This is something I see stated all of the time, but I have a problem with how it's normally used.

If a new node uses 50% of the power and has twice the density, then shouldn't cramming twice the # of transistors , each at 50% power lead to a similar heat density? I can see where you try for increased clocks from the previous case, the power density will rise, but it shouldn't at the same frequency.
 
  • Like
Reactions: Vattila

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
I agree, but it's the term used now very generally to refer to perf/freq. AMD and Intel even use it in their marketing materials so its hard to escape at this point.
Aren't those particular use cases essentially shorthand for "we benchmarked it in SPEC" in the majority of the cases? ;)

This is something I see stated all of the time, but I have a problem with how it's normally used.

If a new node uses 50% of the power and has twice the density, then shouldn't cramming twice the # of transistors , each at 50% power lead to a similar heat density? I can see where you try for increased clocks from the previous case, the power density will rise, but it shouldn't at the same frequency.
As I understand the denser structures are essentially trapping the heat more than bigger nodes. The smaller the circuit the smaller the "unused" areas that helps dissipate the heat.
 
Last edited:

Richie Rich

Senior member
Jul 28, 2019
470
229
76
We know almost nothing about Zen3 new uarch so it's hard to predict something about Zen4 performance. When Zen4 will be an evolution of brand new uarch Zen3 I would say it can bring similar IPC uplift as Zen2, something like 10-15% IPC uplift.

And of course maybe Zen4 will have SMT4 finally :)
 
  • Like
Reactions: Vattila

CHADBOGA

Platinum Member
Mar 31, 2009
2,135
832
136
We know almost nothing about Zen3 new uarch so it's hard to predict something about Zen4 performance. When Zen4 will be an evolution of brand new uarch Zen3 I would say it can bring similar IPC uplift as Zen2, something like 10-15% IPC uplift.

And of course maybe Zen4 will have SMT4 finally :)

They'd be mad not to. ;)
 

Cardyak

Member
Sep 12, 2018
72
159
106
This is something I see stated all of the time, but I have a problem with how it's normally used.

If a new node uses 50% of the power and has twice the density, then shouldn't cramming twice the # of transistors , each at 50% power lead to a similar heat density? I can see where you try for increased clocks from the previous case, the power density will rise, but it shouldn't at the same frequency.

According to WikiChip, 5nm offers a 30% reduction in power (Or 0.7x) compared to 7nm , but a 1.8x increase in density.

Doing some simple maths just in my head:

1.8 * 0.7 = 1.26 - So fully utilising all of the available density in 5nm will result in a ~26% power increase. (I understand Microarchitecture design decisions also influence this but let's just stick a pin in that for now)

Or alternatively: 1/0.7 = 1.42 - So utilising as much of the density as possible whilst keeping the same power envelope will net you approximately ~42% more transistor budget.

I expect AMD to opt for the latter option. They most likely won't fully use all of the density in 5nm, instead opting to have gaps and spaces between the transistors (I believe this is known colloquially as "Dark Silicon") but keep power consumption and TDP at consistent levels.

TBH I'd be happy with that, I'd rather have 42% more transistor budget for a few extra cores and higher IPC with no extra cost on heat and power, compared to a whopping 80% increase in transistor budget but more heat and potentially a regression in clock speeds.

Although actually thinking about it maybe going all in on density will more than offset the potential clock speed regressions with higher IPC. I guess AMD will have to look at how much IPC performance and parallelism they can extra using the extra transistor budget, and if it isn't enough to offset the heat and clock speed regressions then they'll have to adjust the density accordingly.
 
  • Like
Reactions: Vattila

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
Everyone: AMD will probably adopt SMT4, if not now then eventually.
No one: AMD will move back to a single-threaded core and also adopting an in-order pipeline. Opting for fastest, most secure manycore processor.

// In-order also applies for Load-slice and Freeflow
 
Last edited:
  • Like
Reactions: Vattila

Richie Rich

Senior member
Jul 28, 2019
470
229
76
According to WikiChip, 5nm offers a 30% reduction in power (Or 0.7x) compared to 7nm , but a 1.8x increase in density.

Doing some simple maths just in my head:

1.8 * 0.7 = 1.26 - So fully utilising all of the available density in 5nm will result in a ~26% power increase. (I understand Microarchitecture design decisions also influence this but let's just stick a pin in that for now)
Nice math. However instead of 26% power increase they can stay at const TDP while lowering clocks by 8%.

Not sure if 1.8x density allows increasing core count to 128 for Genoa while Zen3 will increase also L3 cache. Maybe they will stay at 8 chiplets while using 12-core CCD/CCX resulting to 96c-core (and 384 threads :) of couse).
 
  • Like
Reactions: Vattila

Tuna-Fish

Golden Member
Mar 4, 2011
1,345
1,524
136
1.8 * 0.7 = 1.26 - So fully utilising all of the available density in 5nm will result in a ~26% power increase. (I understand Microarchitecture design decisions also influence this but let's just stick a pin in that for now)

Or alternatively: 1/0.7 = 1.42 - So utilising as much of the density as possible whilst keeping the same power envelope will net you approximately ~42% more transistor budget.

I expect AMD to opt for the latter option. They most likely won't fully use all of the density in 5nm, instead opting to have gaps and spaces between the transistors (snip) but keep power consumption and TDP at consistent levels.

No. What they will do instead, as everyone has done for a couple of generations already, is to increase the proportion of the die that is made up of some structure that burns less power than the mean. Such as caches and branch prediction structures.

(I believe this is known colloquially as "Dark Silicon")

Dark silicon refers to the idea that you have die area that cannot all simultaneously be lit at the same time. It does not mean leaving silicon area literally blank -- instead, you can still make sure of such area, but you cannot make use of it all at the same time.
 

Ajay

Lifer
Jan 8, 2001
15,422
7,841
136
No. What they will do instead, as everyone has done for a couple of generations already, is to increase the proportion of the die that is made up of some structure that burns less power than the mean. Such as caches and branch prediction structures.



Dark silicon refers to the idea that you have die area that cannot all simultaneously be lit at the same time. It does not mean leaving silicon area literally blank -- instead, you can still make sure of such area, but you cannot make use of it all at the same time.
Actually, we’ve reached a point (Zen2) with there is unused silicon due to conflicting design rules (meaning a gap must be left left blank as a no-go area). Syn tools seem to work off a grid - and the are small blocks and strips that have no xtors. Came out of some of the ISSCC leaks. It gets crazier and crazier going down the road to 1nm.
 
  • Like
Reactions: Vattila

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
Actually, we’ve reached a point (Zen2) with there is unused silicon due to conflicting design rules (meaning a gap must be left left blank as a no-go area). Syn tools seem to work off a grid - and the are small blocks and strips that have no xtors. Came out of some of the ISSCC leaks. It gets crazier and crazier going down the road to 1nm.
What's crazy is as recently as 2016 it was thought that transistors smaller than 7nm would be highly susceptible to quantum tunneling. That didn't turn out to be an issue as processes move from N7 to N7P/N7+ to 6/5 for TSMC, and even Samsung targeting 3nm GAAFET production in 2021.

The main future issue is that 1 silicon atom is 0.2nm wide. It seems the industry has said they are unsure if nodes beyond 3nm would be viable, though TSMC is researching 2nm, and Intel thinks they can do 1.4nm by 2029.

Circling back to Zen4... if on 5nm, and if it doesn't drop til 2022, it may be very interesting market-wise. If Intel can get 7nm out by then (I know, but bear with me), they *might* regain the process lead. I think this path is the only way I could see them doing so in the next 5 years. 5nm TSMC is projected to have 171.3 MTr/mm2 and 7nm Intel is projected to have 237.18 MTr/mm2.

In any case, it's remarkable to see that we are going to see roughly a doubling of # of transistors from 7nm to 5nm so quickly and if TSMC keeps up the cadence, it'll happen still at a speed nearly in accordance with Moore's observation, even though we are starting to approach a literal atomic limit.
 

Hitman928

Diamond Member
Apr 15, 2012
5,236
7,785
136
Actually, we’ve reached a point (Zen2) with there is unused silicon due to conflicting design rules (meaning a gap must be left left blank as a no-go area). Syn tools seem to work off a grid - and the are small blocks and strips that have no xtors. Came out of some of the ISSCC leaks. It gets crazier and crazier going down the road to 1nm.

The way you have to layout with FinFETs is different than planar and actually pretty restricting. You kind of have to follow a grid pattern whether hand drawing the layout or relying on synthesis tools.
 
  • Like
Reactions: Vattila

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
What's crazy is as recently as 2016 it was thought that transistors smaller than 7nm would be highly susceptible to quantum tunneling. That didn't turn out to be an issue as processes move from N7 to N7P/N7+ to 6/5 for TSMC, and even Samsung targeting 3nm GAAFET production in 2021.

The main future issue is that 1 silicon atom is 0.2nm wide. It seems the industry has said they are unsure if nodes beyond 3nm would be viable, though TSMC is researching 2nm, and Intel thinks they can do 1.4nm by 2029.

Circling back to Zen4... if on 5nm, and if it doesn't drop til 2022, it may be very interesting market-wise. If Intel can get 7nm out by then (I know, but bear with me), they *might* regain the process lead. I think this path is the only way I could see them doing so in the next 5 years. 5nm TSMC is projected to have 171.3 MTr/mm2 and 7nm Intel is projected to have 237.18 MTr/mm2.

In any case, it's remarkable to see that we are going to see roughly a doubling of # of transistors from 7nm to 5nm so quickly and if TSMC keeps up the cadence, it'll happen still at a speed nearly in accordance with Moore's observation, even though we are starting to approach a literal atomic limit.

IIRC there has been no indication that Zen 4 is being released in 2022.

A lot of rumors are flying around that aren’t backed up by facts:
  1. People have continually claimed that Zen 3 won’t be released until Q4. AMD has indirectly denied this, stating that desktop parts will be out “later this year”. Only server parts are due to drop in Q4, which is consistent with last year. See the Anandtech article on the subject.
  2. We’ve received no guidance on release dates for next year, however, I suspect similar cadence to this year.
 
  • Like
Reactions: Vattila

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
IIRC there has been no indication that Zen 4 is being released in 2022.

A lot of rumors are flying around that aren’t backed up by facts:
  1. People have continually claimed that Zen 3 won’t be released until Q4. AMD has indirectly denied this, stating that desktop parts will be out “later this year”. Only server parts are due to drop in Q4, which is consistent with last year. See the Anandtech article on the subject.
  2. We’ve received no guidance on release dates for next year, however, I suspect similar cadence to this year.
1583775225203.png

The products on here are Zen (released in 2017), Zen2, Zen3, Zen4.

The timeline runs from 2017 (when Zen was released) to 2022.

If they planned to release Zen4 in 2021 why would they make the timeline run to 2022?

It could be simply a stupid graphic designer. But they don't make even tiny changes without thorough thought, e.g. the 7nm+ changing to 7nm for Zen3.

I agree with you that it should be released in 2021, but there is no indication that it will be. And we have this official AMD slide that says 2017 at the beginning when Zen was released, and 2022 at the end under Zen 4. It could be that they are implying Zen 4 will release before 2022, but I'm not sure. It's totally ambiguous.
 
  • Like
Reactions: Vattila

Hans Gruber

Platinum Member
Dec 23, 2006
2,131
1,088
136
View attachment 17923

The products on here are Zen (released in 2017), Zen2, Zen3, Zen4.

The timeline runs from 2017 (when Zen was released) to 2022.

If they planned to release Zen4 in 2021 why would they make the timeline run to 2022?

It could be simply a stupid graphic designer. But they don't make even tiny changes without thorough thought, e.g. the 7nm+ changing to 7nm for Zen3.

I agree with you that it should be released in 2021, but there is no indication that it will be. And we have this official AMD slide that says 2017 at the beginning when Zen was released, and 2022 at the end under Zen 4. It could be that they are implying Zen 4 will release before 2022, but I'm not sure. It's totally ambiguous.
If you look at the (CPU Roadmap) slide closely. It looks as if the Zen4 chip is lined up with end of Q3 and early Q4 2021 for a release date. The graph has a timeline that ends Jan 2022 the way I see it.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
View attachment 17923

The products on here are Zen (released in 2017), Zen2, Zen3, Zen4.

The timeline runs from 2017 (when Zen was released) to 2022.

If they planned to release Zen4 in 2021 why would they make the timeline run to 2022?

It could be simply a stupid graphic designer. But they don't make even tiny changes without thorough thought, e.g. the 7nm+ changing to 7nm for Zen3.

I agree with you that it should be released in 2021, but there is no indication that it will be. And we have this official AMD slide that says 2017 at the beginning when Zen was released, and 2022 at the end under Zen 4. It could be that they are implying Zen 4 will release before 2022, but I'm not sure. It's totally ambiguous.


Well first, this references EPYC parts, not desktop or mobile. Second, months and years are not listed. I interpret the “graph” as being from January 2017-January 2022. Finally, note that if EPYC sticks to a 12 month cadence, Genoa’s product cycle would be from September 2021-September 2022.
 
  • Like
Reactions: soresu

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
Ah, don't read too much into it folks. Its just a power point slide and there isn't deeper meaning to it!

They couldn't go 2021 as it'd mean the little CPU boxes couldn't line up equi-distance and centred!
 
  • Like
Reactions: Vattila

SarahKerrigan

Senior member
Oct 12, 2014
352
506
136
Silly question, but with the ARM supporters maintaining they have better IPC and use less power, why would the fastest supercomputers in the world all use x86 processors ? (AMD/Intel)

Since supercomputers use their own proprietary OS and applications, they should work on any selected hardware.

This doesn't seem like a good-faith question, but I will endeavour to answer as if it were.

-Currently, they don't. None of the top three supercomputers in the world are x86. Two are PPC, one is a custom Mainland Chinese CPU.
-There are good reasons to prefer a single vendor system, providing both CPU and GPU. Currently this is only really doable with x86.
-The strongest vendor of HPC ARM CPUs, Fujitsu, is not a US company; this may be a consideration for national security workloads.
-Licensable ARM cores, right now, look better on int than technical-computing FP, much of which vectorizes well and can take good advantage of wider vector lengths.
-Large future ARM supercomputers are being announced. Riken's "Fugaku" is a notable example here, and it may take the position of #1 supercomputer in the world depending on timing of its completion(early next year?). It is, per my recollection, projected to be more powerful than any currently-installed system.

I hope this answers your question; I don't particularly have a dog in this fight but it's something I pay close attention to.
 
  • Like
Reactions: Vattila

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,540
14,495
136
This doesn't seem like a good-faith question, but I will endeavour to answer as if it were.

-Currently, they don't. None of the top three supercomputers in the world are x86. Two are PPC, one is a custom Mainland Chinese CPU.
-There are good reasons to prefer a single vendor system, providing both CPU and GPU. Currently this is only really doable with x86.
-The strongest vendor of HPC ARM CPUs, Fujitsu, is not a US company; this may be a consideration for national security workloads.
-Licensable ARM cores, right now, look better on int than technical-computing FP, much of which vectorizes well and can take good advantage of wider vector lengths.
-Large future ARM supercomputers are being announced. Riken's "Fugaku" is a notable example here, and it may take the position of #1 supercomputer in the world depending on timing of its completion(early next year?). It is, per my recollection, projected to be more powerful than any currently-installed system.

I hope this answers your question; I don't particularly have a dog in this fight but it's something I pay close attention to.
Based on your answer, here is a revised question. The new supercomputer mentioned in this forum will be 10 times the speed of the current leader, summit. It will be out in 2 years. Why was an x86 processor (genoa based EPYC) chosen instead of ARM ? And currently the number one computer has 2.41 million processors, but number 3 has 10 million processors. I think those must be some prettyy weenie cores for over 4 to 1 rates of cores, and still WAY slower. As far as the PPC cores go, they are about to get trounced by The EPYC cores.
 
  • Like
Reactions: Thunder 57

Thunder 57

Platinum Member
Aug 19, 2007
2,673
3,792
136
Based on your answer, here is a revised question. The new supercomputer mentioned in this forum will be 10 times the speed of the current leader, summit. It will be out in 2 years. Why was an x86 processor (genoa based EPYC) chosen instead of ARM ? And currently the number one computer has 2.41 million processors, but number 3 has 10 million processors. I think those must be some prettyy weenie cores for over 4 to 1 rates of cores, and still WAY slower. As far as the PPC cores go, they are about to get trounced by The EPYC cores.

I think the single vendor point has merit, but I think there is more to it than that. My guess is that ARM is still trying to enter into this space. They just never seem to break into it for one reason or another. As far as PPC goes, I think they would still be competitive or considered at least if it wasn't for GloFo screwing things up.
 
  • Like
Reactions: scannall

SarahKerrigan

Senior member
Oct 12, 2014
352
506
136
Based on your answer, here is a revised question. The new supercomputer mentioned in this forum will be 10 times the speed of the current leader, summit. It will be out in 2 years. Why was an x86 processor (genoa based EPYC) chosen instead of ARM ? And currently the number one computer has 2.41 processors, but number 3 has 10 million processors. I think those must be some prettyy weenie cores for over 4 to 1 rates of cores, and still WAY slower. As far as the PPC cores go, they are about to get trounced by The EPYC cores.

I'm fairly sure I covered that, but I will attempt to elaborate. (By the way, so far as I am aware, it is presently 2020, which means El Capitan will arrive in three years, not two.)

-If DOE had a preference for an accelerator-driven system, and wanted the accelerator tightly-bound, that basically rules out ARM because there are no extant or announced ARM CPUs with an interconnect capable of coherently talking to any extant or announced GPU. In other words, based on public information available today, ARM would be stuck with PCIe. IBM/Nvidia, AMD, and Intel have this capability (NVlink, Infinity, and CXL respectively); however, IBM can only deliver it in conjunction with Nvidia, and there may have been a preference for a single-vendor solution. That leaves the x86 vendors.

-The most HPC-oriented ARM CPU currently in existence is Japanese. El Capitan will be running national security workloads. You may notice that historically, there is something of a preference for domestically-sourced systems and major components for this category.

-Epyc is a highly general-purpose processor, probably the best currently-available general-purpose server CPU in existence today. That's not to say that will necessarily be the case in 2023 - but it makes for an interesting compare with A64FX, which is fairly workload-specific, especially in its focus on bandwidth over capacity (which is limited to 32GB/socket!) There are tradeoffs here for supercompute processors - some workloads crave bandwidth (which is the category catered to by A64FX and NEC's SX-Aurora) while others are more capacity-sensitive or bandwidth-insensitive. It is rarely as simple as "IPC" in this area. You may find the utilization percentage numbers for HPCG interesting, for an example of how different design tradeoffs work out for one HPC code stream.

-Licensable cores, including Neoverse, that exist today are more oriented toward commercial workloads (integer, control-flow-heavy, random data movement) than toward HPC (FP, amenable to SIMD, streaming data movement.)

-HPC sites are generally reluctant to buy into emerging ecosystems; a history of execution goes a long way. Note that Zen saw minimal HPC adoption, but Zen2 is seeing plenty. If ARM is to gain serious HPC traction - and it might or might not - it will be a gradual process.

-While you said "they should work on any selected hardware", the truth is rarely so simple. As an example, Eigen - a popular open linear-algebra library - has support for a few flavors of AVX, VMX, CUDA, and NEON, but not, last I looked, for the emerging ARM SVE vector extension. Ecosystem, especially around optimization, really does matter.

As I said before, I don't think you're intending to discuss this in good faith, but I'm posting this here in the hopes that you or someone else may find it interesting as a perspective of someone with experience in this arena. I think it's indisputable, though, that AMD's execution right now is absolutely excellent, and I'm excited to see their roadmap develop going forward.
 
Last edited:
  • Like
Reactions: Vattila

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,540
14,495
136
I don't know what you "I don't think you're intending to discuss this in good faith ". My point is, ARM is a new struggling technology. Its not at the forefront at the moment, at least for supercomputers. It may have a place, but its kind of out of sync with the supercomputer model Or desktop that needs a lot of "horsepower". Its an efficient integer cpu that works well in mobile type applications is what I see.