Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Vattila · Oct 6, 2019

Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts!

DrMrLordX · Mar 5, 2020

thesmokingman said:
El Capitan will be running a custom Epyc that is Rome based.

Huh? Thought it was Genoa-based.

Adonisds · Mar 5, 2020

Is GAA expected to clock differently than finfet ? Higher or lower assuming the same design

maddie · Mar 5, 2020

exquisitechar said:
I think AMD will maintain or even slightly increase clocks with 5nm based on what’s been revealed about it so far, but who knows. The heat density will be a big issue and AMD is concerned with power and area first, increasing clock speeds is secondary.

This is something I see stated all of the time, but I have a problem with how it's normally used.

If a new node uses 50% of the power and has twice the density, then shouldn't cramming twice the # of transistors , each at 50% power lead to a similar heat density? I can see where you try for increased clocks from the previous case, the power density will rise, but it shouldn't at the same frequency.

moinmoin · Mar 5, 2020

Hitman928 said:
I agree, but it's the term used now very generally to refer to perf/freq. AMD and Intel even use it in their marketing materials so its hard to escape at this point.

Aren't those particular use cases essentially shorthand for "we benchmarked it in SPEC" in the majority of the cases?

maddie said:
This is something I see stated all of the time, but I have a problem with how it's normally used.

If a new node uses 50% of the power and has twice the density, then shouldn't cramming twice the # of transistors , each at 50% power lead to a similar heat density? I can see where you try for increased clocks from the previous case, the power density will rise, but it shouldn't at the same frequency.

As I understand the denser structures are essentially trapping the heat more than bigger nodes. The smaller the circuit the smaller the "unused" areas that helps dissipate the heat.

thesmokingman · Mar 5, 2020

DrMrLordX said:
Huh? Thought it was Genoa-based.

lol yea you are right, forgot about the salami!

Richie Rich · Mar 6, 2020

We know almost nothing about Zen3 new uarch so it's hard to predict something about Zen4 performance. When Zen4 will be an evolution of brand new uarch Zen3 I would say it can bring similar IPC uplift as Zen2, something like 10-15% IPC uplift.

And of course maybe Zen4 will have SMT4 finally

CHADBOGA · Mar 6, 2020

Richie Rich said:
We know almost nothing about Zen3 new uarch so it's hard to predict something about Zen4 performance. When Zen4 will be an evolution of brand new uarch Zen3 I would say it can bring similar IPC uplift as Zen2, something like 10-15% IPC uplift.

And of course maybe Zen4 will have SMT4 finally

They'd be mad not to.

Cardyak · Mar 6, 2020

maddie said:
This is something I see stated all of the time, but I have a problem with how it's normally used.

If a new node uses 50% of the power and has twice the density, then shouldn't cramming twice the # of transistors , each at 50% power lead to a similar heat density? I can see where you try for increased clocks from the previous case, the power density will rise, but it shouldn't at the same frequency.

According to WikiChip, 5nm offers a 30% reduction in power (Or 0.7x) compared to 7nm , but a 1.8x increase in density.

Doing some simple maths just in my head:

1.8 * 0.7 = 1.26 - So fully utilising all of the available density in 5nm will result in a ~26% power increase. (I understand Microarchitecture design decisions also influence this but let's just stick a pin in that for now)

Or alternatively: 1/0.7 = 1.42 - So utilising as much of the density as possible whilst keeping the same power envelope will net you approximately ~42% more transistor budget.

I expect AMD to opt for the latter option. They most likely won't fully use all of the density in 5nm, instead opting to have gaps and spaces between the transistors (I believe this is known colloquially as "Dark Silicon") but keep power consumption and TDP at consistent levels.

TBH I'd be happy with that, I'd rather have 42% more transistor budget for a few extra cores and higher IPC with no extra cost on heat and power, compared to a whopping 80% increase in transistor budget but more heat and potentially a regression in clock speeds.

Although actually thinking about it maybe going all in on density will more than offset the potential clock speed regressions with higher IPC. I guess AMD will have to look at how much IPC performance and parallelism they can extra using the extra transistor budget, and if it isn't enough to offset the heat and clock speed regressions then they'll have to adjust the density accordingly.

NostaSeronx · Mar 6, 2020

Everyone: AMD will probably adopt SMT4, if not now then eventually.
No one: AMD will move back to a single-threaded core and also adopting an in-order pipeline. Opting for fastest, most secure manycore processor.

// In-order also applies for Load-slice and Freeflow

Richie Rich · Mar 6, 2020

Cardyak said:
According to WikiChip, 5nm offers a 30% reduction in power (Or 0.7x) compared to 7nm , but a 1.8x increase in density.

Doing some simple maths just in my head:

1.8 * 0.7 = 1.26 - So fully utilising all of the available density in 5nm will result in a ~26% power increase. (I understand Microarchitecture design decisions also influence this but let's just stick a pin in that for now)

Nice math. However instead of 26% power increase they can stay at const TDP while lowering clocks by 8%.

Not sure if 1.8x density allows increasing core count to 128 for Genoa while Zen3 will increase also L3 cache. Maybe they will stay at 8 chiplets while using 12-core CCD/CCX resulting to 96c-core (and 384 threads

of couse).

Tuna-Fish · Mar 6, 2020

Cardyak said:
1.8 * 0.7 = 1.26 - So fully utilising all of the available density in 5nm will result in a ~26% power increase. (I understand Microarchitecture design decisions also influence this but let's just stick a pin in that for now)

Or alternatively: 1/0.7 = 1.42 - So utilising as much of the density as possible whilst keeping the same power envelope will net you approximately ~42% more transistor budget.

I expect AMD to opt for the latter option. They most likely won't fully use all of the density in 5nm, instead opting to have gaps and spaces between the transistors (snip) but keep power consumption and TDP at consistent levels.

No. What they will do instead, as everyone has done for a couple of generations already, is to increase the proportion of the die that is made up of some structure that burns less power than the mean. Such as caches and branch prediction structures.

Cardyak said:
(I believe this is known colloquially as "Dark Silicon")

Dark silicon refers to the idea that you have die area that cannot all simultaneously be lit at the same time. It does not mean leaving silicon area literally blank -- instead, you can still make sure of such area, but you cannot make use of it all at the same time.

Ajay · Mar 8, 2020

Tuna-Fish said:
No. What they will do instead, as everyone has done for a couple of generations already, is to increase the proportion of the die that is made up of some structure that burns less power than the mean. Such as caches and branch prediction structures.

Dark silicon refers to the idea that you have die area that cannot all simultaneously be lit at the same time. It does not mean leaving silicon area literally blank -- instead, you can still make sure of such area, but you cannot make use of it all at the same time.

Actually, we’ve reached a point (Zen2) with there is unused silicon due to conflicting design rules (meaning a gap must be left left blank as a no-go area). Syn tools seem to work off a grid - and the are small blocks and strips that have no xtors. Came out of some of the ISSCC leaks. It gets crazier and crazier going down the road to 1nm.

amrnuke · Mar 9, 2020

Ajay said:
Actually, we’ve reached a point (Zen2) with there is unused silicon due to conflicting design rules (meaning a gap must be left left blank as a no-go area). Syn tools seem to work off a grid - and the are small blocks and strips that have no xtors. Came out of some of the ISSCC leaks. It gets crazier and crazier going down the road to 1nm.

What's crazy is as recently as 2016 it was thought that transistors smaller than 7nm would be highly susceptible to quantum tunneling. That didn't turn out to be an issue as processes move from N7 to N7P/N7+ to 6/5 for TSMC, and even Samsung targeting 3nm GAAFET production in 2021.

The main future issue is that 1 silicon atom is 0.2nm wide. It seems the industry has said they are unsure if nodes beyond 3nm would be viable, though TSMC is researching 2nm, and Intel thinks they can do 1.4nm by 2029.

Circling back to Zen4... if on 5nm, and if it doesn't drop til 2022, it may be very interesting market-wise. If Intel can get 7nm out by then (I know, but bear with me), they *might* regain the process lead. I think this path is the only way I could see them doing so in the next 5 years. 5nm TSMC is projected to have 171.3 MTr/mm2 and 7nm Intel is projected to have 237.18 MTr/mm2.

In any case, it's remarkable to see that we are going to see roughly a doubling of # of transistors from 7nm to 5nm so quickly and if TSMC keeps up the cadence, it'll happen still at a speed nearly in accordance with Moore's observation, even though we are starting to approach a literal atomic limit.

Hitman928 · Mar 9, 2020

Ajay said:
Actually, we’ve reached a point (Zen2) with there is unused silicon due to conflicting design rules (meaning a gap must be left left blank as a no-go area). Syn tools seem to work off a grid - and the are small blocks and strips that have no xtors. Came out of some of the ISSCC leaks. It gets crazier and crazier going down the road to 1nm.

The way you have to layout with FinFETs is different than planar and actually pretty restricting. You kind of have to follow a grid pattern whether hand drawing the layout or relying on synthesis tools.

eek2121 · Mar 9, 2020

amrnuke said:
What's crazy is as recently as 2016 it was thought that transistors smaller than 7nm would be highly susceptible to quantum tunneling. That didn't turn out to be an issue as processes move from N7 to N7P/N7+ to 6/5 for TSMC, and even Samsung targeting 3nm GAAFET production in 2021.

The main future issue is that 1 silicon atom is 0.2nm wide. It seems the industry has said they are unsure if nodes beyond 3nm would be viable, though TSMC is researching 2nm, and Intel thinks they can do 1.4nm by 2029.

Circling back to Zen4... if on 5nm, and if it doesn't drop til 2022, it may be very interesting market-wise. If Intel can get 7nm out by then (I know, but bear with me), they *might* regain the process lead. I think this path is the only way I could see them doing so in the next 5 years. 5nm TSMC is projected to have 171.3 MTr/mm2 and 7nm Intel is projected to have 237.18 MTr/mm2.

In any case, it's remarkable to see that we are going to see roughly a doubling of # of transistors from 7nm to 5nm so quickly and if TSMC keeps up the cadence, it'll happen still at a speed nearly in accordance with Moore's observation, even though we are starting to approach a literal atomic limit.

IIRC there has been no indication that Zen 4 is being released in 2022.

A lot of rumors are flying around that aren’t backed up by facts:

People have continually claimed that Zen 3 won’t be released until Q4. AMD has indirectly denied this, stating that desktop parts will be out “later this year”. Only server parts are due to drop in Q4, which is consistent with last year. See the Anandtech article on the subject.
We’ve received no guidance on release dates for next year, however, I suspect similar cadence to this year.

amrnuke · Mar 9, 2020

eek2121 said:
IIRC there has been no indication that Zen 4 is being released in 2022.

A lot of rumors are flying around that aren’t backed up by facts:

People have continually claimed that Zen 3 won’t be released until Q4. AMD has indirectly denied this, stating that desktop parts will be out “later this year”. Only server parts are due to drop in Q4, which is consistent with last year. See the Anandtech article on the subject.

We’ve received no guidance on release dates for next year, however, I suspect similar cadence to this year.

The products on here are Zen (released in 2017), Zen2, Zen3, Zen4.

The timeline runs from 2017 (when Zen was released) to 2022.

If they planned to release Zen4 in 2021 why would they make the timeline run to 2022?

It could be simply a stupid graphic designer. But they don't make even tiny changes without thorough thought, e.g. the 7nm+ changing to 7nm for Zen3.

I agree with you that it should be released in 2021, but there is no indication that it will be. And we have this official AMD slide that says 2017 at the beginning when Zen was released, and 2022 at the end under Zen 4. It could be that they are implying Zen 4 will release before 2022, but I'm not sure. It's totally ambiguous.

Atari2600 · Mar 9, 2020

amrnuke said:
It's totally ambiguous.

Which is probably the point!

Hans Gruber · Mar 9, 2020

amrnuke said:
View attachment 17923

The products on here are Zen (released in 2017), Zen2, Zen3, Zen4.

The timeline runs from 2017 (when Zen was released) to 2022.

If they planned to release Zen4 in 2021 why would they make the timeline run to 2022?

It could be simply a stupid graphic designer. But they don't make even tiny changes without thorough thought, e.g. the 7nm+ changing to 7nm for Zen3.

I agree with you that it should be released in 2021, but there is no indication that it will be. And we have this official AMD slide that says 2017 at the beginning when Zen was released, and 2022 at the end under Zen 4. It could be that they are implying Zen 4 will release before 2022, but I'm not sure. It's totally ambiguous.

If you look at the (CPU Roadmap) slide closely. It looks as if the Zen4 chip is lined up with end of Q3 and early Q4 2021 for a release date. The graph has a timeline that ends Jan 2022 the way I see it.

eek2121 · Mar 9, 2020

amrnuke said:
View attachment 17923

The products on here are Zen (released in 2017), Zen2, Zen3, Zen4.

The timeline runs from 2017 (when Zen was released) to 2022.

If they planned to release Zen4 in 2021 why would they make the timeline run to 2022?

It could be simply a stupid graphic designer. But they don't make even tiny changes without thorough thought, e.g. the 7nm+ changing to 7nm for Zen3.

I agree with you that it should be released in 2021, but there is no indication that it will be. And we have this official AMD slide that says 2017 at the beginning when Zen was released, and 2022 at the end under Zen 4. It could be that they are implying Zen 4 will release before 2022, but I'm not sure. It's totally ambiguous.

Well first, this references EPYC parts, not desktop or mobile. Second, months and years are not listed. I interpret the “graph” as being from January 2017-January 2022. Finally, note that if EPYC sticks to a 12 month cadence, Genoa’s product cycle would be from September 2021-September 2022.

Atari2600 · Mar 9, 2020

Ah, don't read too much into it folks. Its just a power point slide and there isn't deeper meaning to it!

They couldn't go 2021 as it'd mean the little CPU boxes couldn't line up equi-distance and centred!

SarahKerrigan · Mar 9, 2020

Markfw said:
Silly question, but with the ARM supporters maintaining they have better IPC and use less power, why would the fastest supercomputers in the world all use x86 processors ? (AMD/Intel)

Since supercomputers use their own proprietary OS and applications, they should work on any selected hardware.

This doesn't seem like a good-faith question, but I will endeavour to answer as if it were.

-Currently, they don't. None of the top three supercomputers in the world are x86. Two are PPC, one is a custom Mainland Chinese CPU.
-There are good reasons to prefer a single vendor system, providing both CPU and GPU. Currently this is only really doable with x86.
-The strongest vendor of HPC ARM CPUs, Fujitsu, is not a US company; this may be a consideration for national security workloads.
-Licensable ARM cores, right now, look better on int than technical-computing FP, much of which vectorizes well and can take good advantage of wider vector lengths.
-Large future ARM supercomputers are being announced. Riken's "Fugaku" is a notable example here, and it may take the position of #1 supercomputer in the world depending on timing of its completion(early next year?). It is, per my recollection, projected to be more powerful than any currently-installed system.

I hope this answers your question; I don't particularly have a dog in this fight but it's something I pay close attention to.

Markfw · Mar 9, 2020

SarahKerrigan said:
This doesn't seem like a good-faith question, but I will endeavour to answer as if it were.

-Currently, they don't. None of the top three supercomputers in the world are x86. Two are PPC, one is a custom Mainland Chinese CPU.
-There are good reasons to prefer a single vendor system, providing both CPU and GPU. Currently this is only really doable with x86.
-The strongest vendor of HPC ARM CPUs, Fujitsu, is not a US company; this may be a consideration for national security workloads.
-Licensable ARM cores, right now, look better on int than technical-computing FP, much of which vectorizes well and can take good advantage of wider vector lengths.
-Large future ARM supercomputers are being announced. Riken's "Fugaku" is a notable example here, and it may take the position of #1 supercomputer in the world depending on timing of its completion(early next year?). It is, per my recollection, projected to be more powerful than any currently-installed system.

I hope this answers your question; I don't particularly have a dog in this fight but it's something I pay close attention to.

Based on your answer, here is a revised question. The new supercomputer mentioned in this forum will be 10 times the speed of the current leader, summit. It will be out in 2 years. Why was an x86 processor (genoa based EPYC) chosen instead of ARM ? And currently the number one computer has 2.41 million processors, but number 3 has 10 million processors. I think those must be some prettyy weenie cores for over 4 to 1 rates of cores, and still WAY slower. As far as the PPC cores go, they are about to get trounced by The EPYC cores.

Thunder 57 · Mar 9, 2020

Markfw said:
Based on your answer, here is a revised question. The new supercomputer mentioned in this forum will be 10 times the speed of the current leader, summit. It will be out in 2 years. Why was an x86 processor (genoa based EPYC) chosen instead of ARM ? And currently the number one computer has 2.41 million processors, but number 3 has 10 million processors. I think those must be some prettyy weenie cores for over 4 to 1 rates of cores, and still WAY slower. As far as the PPC cores go, they are about to get trounced by The EPYC cores.

I think the single vendor point has merit, but I think there is more to it than that. My guess is that ARM is still trying to enter into this space. They just never seem to break into it for one reason or another. As far as PPC goes, I think they would still be competitive or considered at least if it wasn't for GloFo screwing things up.

SarahKerrigan · Mar 9, 2020

Markfw said:
Based on your answer, here is a revised question. The new supercomputer mentioned in this forum will be 10 times the speed of the current leader, summit. It will be out in 2 years. Why was an x86 processor (genoa based EPYC) chosen instead of ARM ? And currently the number one computer has 2.41 processors, but number 3 has 10 million processors. I think those must be some prettyy weenie cores for over 4 to 1 rates of cores, and still WAY slower. As far as the PPC cores go, they are about to get trounced by The EPYC cores.

I'm fairly sure I covered that, but I will attempt to elaborate. (By the way, so far as I am aware, it is presently 2020, which means El Capitan will arrive in three years, not two.)

-If DOE had a preference for an accelerator-driven system, and wanted the accelerator tightly-bound, that basically rules out ARM because there are no extant or announced ARM CPUs with an interconnect capable of coherently talking to any extant or announced GPU. In other words, based on public information available today, ARM would be stuck with PCIe. IBM/Nvidia, AMD, and Intel have this capability (NVlink, Infinity, and CXL respectively); however, IBM can only deliver it in conjunction with Nvidia, and there may have been a preference for a single-vendor solution. That leaves the x86 vendors.

-The most HPC-oriented ARM CPU currently in existence is Japanese. El Capitan will be running national security workloads. You may notice that historically, there is something of a preference for domestically-sourced systems and major components for this category.

-Epyc is a highly general-purpose processor, probably the best currently-available general-purpose server CPU in existence today. That's not to say that will necessarily be the case in 2023 - but it makes for an interesting compare with A64FX, which is fairly workload-specific, especially in its focus on bandwidth over capacity (which is limited to 32GB/socket!) There are tradeoffs here for supercompute processors - some workloads crave bandwidth (which is the category catered to by A64FX and NEC's SX-Aurora) while others are more capacity-sensitive or bandwidth-insensitive. It is rarely as simple as "IPC" in this area. You may find the utilization percentage numbers for HPCG interesting, for an example of how different design tradeoffs work out for one HPC code stream.

-Licensable cores, including Neoverse, that exist today are more oriented toward commercial workloads (integer, control-flow-heavy, random data movement) than toward HPC (FP, amenable to SIMD, streaming data movement.)

-HPC sites are generally reluctant to buy into emerging ecosystems; a history of execution goes a long way. Note that Zen saw minimal HPC adoption, but Zen2 is seeing plenty. If ARM is to gain serious HPC traction - and it might or might not - it will be a gradual process.

-While you said "they should work on any selected hardware", the truth is rarely so simple. As an example, Eigen - a popular open linear-algebra library - has support for a few flavors of AVX, VMX, CUDA, and NEON, but not, last I looked, for the emerging ARM SVE vector extension. Ecosystem, especially around optimization, really does matter.

As I said before, I don't think you're intending to discuss this in good faith, but I'm posting this here in the hopes that you or someone else may find it interesting as a perspective of someone with experience in this arena. I think it's indisputable, though, that AMD's execution right now is absolutely excellent, and I'm excited to see their roadmap develop going forward.

Markfw · Mar 9, 2020

I don't know what you "I don't think you're intending to discuss this in good faith ". My point is, ARM is a new struggling technology. Its not at the forefront at the moment, at least for supercomputers. It may have a place, but its kind of out of sync with the supercomputer model Or desktop that needs a lot of "horsepower". Its an efficient integer cpu that works well in mobile type applications is what I see.

Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Senior member

Lifer

Member

Diamond Member

Diamond Member

Platinum Member

Senior member

Platinum Member

Member

Diamond Member

Senior member

Golden Member

Lifer

Golden Member

Diamond Member

Platinum Member

Golden Member

Golden Member

Platinum Member

Platinum Member

Golden Member

Senior member

Moderator Emeritus, Elite Member

Platinum Member

Senior member

Moderator Emeritus, Elite Member