• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 5000)

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

What do you expect with Zen 4?


  • Total voters
    119

SarahKerrigan

Member
Oct 12, 2014
150
107
116
I don't know what you "I don't think you're intending to discuss this in good faith ". My point is, ARM is a new struggling technology. Its not at the forefront at the moment, at least for supercomputers. It may have a place, but its kind of out of sync with the supercomputer model Or desktop that needs a lot of "horsepower". Its an efficient integer cpu that works well in mobile type applications is what I see.
It's developing. A64FX is most assuredly not oriented toward integer or mobile. As with all technology, there are growing pains. ARM HPC will either survive, or it won't, but I can say there's a lot of enthusiasm around it in HPC circles right now.

I expect that in the 06/2021 Top500 list, Fugaku, which is ARM, will be either #1 or #2. Will you still consider it "out of sync with the supercomputer model" should that occur?
 
  • Like
Reactions: Vattila

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
19,773
7,135
136
It's developing. A64FX is most assuredly not oriented toward integer or mobile. As with all technology, there are growing pains. ARM HPC will either survive, or it won't, but I can say there's a lot of enthusiasm around it in HPC circles right now.

I expect that in the 06/2021 Top500 list, Fugaku, which is ARM, will be either #1 or #2. Will you still consider it "out of sync with the supercomputer model" should that occur?
Lets see then. If its number one, is numer 2 the current leader that is how old already ? and how many cores ? 20 million ? Talk to me when there is something to talk about other than "what might be in the future".
 

SarahKerrigan

Member
Oct 12, 2014
150
107
116
Lets see then. If its number one, is numer 2 the current leader that is how old already ? and how many cores ? 20 million ? Talk to me when there is something to talk about other than "what might be in the future".
Fugaku would be above Summit, likely; it should be in the 400PF range with ~8 million cores and no accelerators. That's not a bad place to be.

Fugaku is far less "what might be in the future" than El Capitan is, considering it's being installed as we speak rather than several years in the future with silicon that doesn't currently exist. This is what I mean by bad faith.
 

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
19,773
7,135
136
Fugaku would be above Summit, likely; it should be in the 400PF range with ~8 million cores and no accelerators. That's not a bad place to be.

Fugaku is far less "what might be in the future" than El Capitan is, considering it's being installed as we speak rather than several years in the future with silicon that doesn't currently exist. This is what I mean by bad faith.
Genoa is a tweak of Milan, which is in production, just not released yet. And 8 million cores to beat 2,41 million ? Again, it speaks as if its a weenie core, not a power one.
Again, let talk in a year....
 

SarahKerrigan

Member
Oct 12, 2014
150
107
116
Genoa is a tweak of Milan, which is in production, just not released yet. And 8 million cores to beat 2,41 million ? Again, it speaks as if its a weenie core, not a power one.
Again, let talk in a year....
Accelerated system vs non. Core count isn't comparable. Or are you fine with comparing #59 on the list (160000-core Epyc2) to #58 (65208-core Skylake+GPU?) Because if so, ouch...
 

jamescox

Member
Nov 11, 2009
52
60
91
IIRC there has been no indication that Zen 4 is being released in 2022.

A lot of rumors are flying around that aren’t backed up by facts:
  1. People have continually claimed that Zen 3 won’t be released until Q4. AMD has indirectly denied this, stating that desktop parts will be out “later this year”. Only server parts are due to drop in Q4, which is consistent with last year. See the Anandtech article on the subject.
  2. We’ve received no guidance on release dates for next year, however, I suspect similar cadence to this year.
I think speculation on Zen 4 is a bit pointless since we don’t really know what is in Zen 3. The current info seems to indicate that Zen 3 is the new architecture. I suspect Zen 4 cores will mostly be a 5 nm die shrink of Zen 3 with some tweaks. The pci express and memory controller is all on the IO die, so it is independent of the core CCD design. They may have a reduced power IO die design for Zen 3, but the pci express 5 and DDR5 is probably Zen 4. They will probably continue to make all IO die at Global Foundries. Zen 4 could switch to an interposer or something not made at GF though.

AMD only has a few areas where they trail Intel. One is the monolithic last level cache. Zen 3 most likely takes a CCX up to 8 cores and 32+ MB of cache which should compete well for things that require large cache. It is unclear how they are going to achieve the larger cache sizes. I had assumed that they would just be making a large cache variant of the die, but there has been a lot of talk of stacking cache dies on top somehow.

The Zen 3 CCD may be similar sized to Zen 2 since it is actually the same amount of L3 and cores. There is a good chance they increased the L2 cache size though. I wouldn’t expect them to use stacked cache chips for any consumer level products. It would just be too expensive and 32 MB is plenty for consumer applications. If the L3 is actually on a separate die, then I would expect that they would need two die variants, one for consumer and one stacked variant for Epyc.

So, it is unclear how the stacked cache chips will work. If they have one die variant, then I would expect the cache chip would act as L4 cache rather than L3. I also would expect that the stacked cache chip, if they exists, would only be on high end EPYC variants with relatively low clock only. The power consumption wouldn't be as much of a problem in such chip and the large cache is only needed for some specific applications. Although, if it is L4 cache, then it may make sense for that to be stacked on the IO die rather than on the cpu chips themselves. That would allow fast cache coherency at the L4 level. If it is on the IO die, then that would imply that the IO die is an interposer. It is still unclear whether stacked cache chips is a Zen 3 or Zen 4 feature. This seems unlikely for Zen 3, so I expect chip stacking is a Zen 4 feature if it is a thing.

The other area where AMD occasionally loses is due to AVX512. Some diagrams I have seen for Zen 2 floating point units show 2 FMA and 2 FADD (256 bit) units but it only has 8 ports. The two input ports for one of the FADD units are shared with the 2 FMA units. The FMA units only need 2 operands when doing multiply operations but 3 when doing FMA. I could see them going up to more FMA units while still sharing some ports with some FADD units. It would be great if they could support avx512, but supporting it at 512 bits wide isn’t really that important. There really isn’t that much difference between 1x512-bit instruction and 2x256-bit instructions if you have the same hardware throughout. New instruction types in avx512 can make a big difference though, so actually supporting avx512 is important, even if the width is not. Since Zen 3 is the new architecture, if they are supporting avx512, it seems like it would show up in Zen 3. I expect chip stacking and/or IO changes are Zen 4.
 

lobz

Golden Member
Feb 10, 2017
1,006
797
106
What's crazy is as recently as 2016 it was thought that transistors smaller than 7nm would be highly susceptible to quantum tunneling. That didn't turn out to be an issue as processes move from N7 to N7P/N7+ to 6/5 for TSMC, and even Samsung targeting 3nm GAAFET production in 2021.

The main future issue is that 1 silicon atom is 0.2nm wide. It seems the industry has said they are unsure if nodes beyond 3nm would be viable, though TSMC is researching 2nm, and Intel thinks they can do 1.4nm by 2029.

Circling back to Zen4... if on 5nm, and if it doesn't drop til 2022, it may be very interesting market-wise. If Intel can get 7nm out by then (I know, but bear with me), they *might* regain the process lead. I think this path is the only way I could see them doing so in the next 5 years. 5nm TSMC is projected to have 171.3 MTr/mm2 and 7nm Intel is projected to have 237.18 MTr/mm2.

In any case, it's remarkable to see that we are going to see roughly a doubling of # of transistors from 7nm to 5nm so quickly and if TSMC keeps up the cadence, it'll happen still at a speed nearly in accordance with Moore's observation, even though we are starting to approach a literal atomic limit.
Aside from one-time supercomputer contracts, Intel's 7nm and AMD's first 5nm CPUs will come roughly at the same time to the consumer market.
 

exquisitechar

Senior member
Apr 18, 2017
336
300
106
Charlie Demerjian posted a bit about Zen 3. >15% IPC increase was obvious already, but it's nice to hear that early silicon was in good shape (that was rumored too). I don't think a June launch is happening, though. Probably August.
I think speculation on Zen 4 is a bit pointless since we don’t really know what is in Zen 3. The current info seems to indicate that Zen 3 is the new architecture. I suspect Zen 4 cores will mostly be a 5 nm die shrink of Zen 3 with some tweaks. The pci express and memory controller is all on the IO die, so it is independent of the core CCD design. They may have a reduced power IO die design for Zen 3, but the pci express 5 and DDR5 is probably Zen 4. They will probably continue to make all IO die at Global Foundries. Zen 4 could switch to an interposer or something not made at GF though.

AMD only has a few areas where they trail Intel. One is the monolithic last level cache. Zen 3 most likely takes a CCX up to 8 cores and 32+ MB of cache which should compete well for things that require large cache. It is unclear how they are going to achieve the larger cache sizes. I had assumed that they would just be making a large cache variant of the die, but there has been a lot of talk of stacking cache dies on top somehow.

The Zen 3 CCD may be similar sized to Zen 2 since it is actually the same amount of L3 and cores. There is a good chance they increased the L2 cache size though. I wouldn’t expect them to use stacked cache chips for any consumer level products. It would just be too expensive and 32 MB is plenty for consumer applications. If the L3 is actually on a separate die, then I would expect that they would need two die variants, one for consumer and one stacked variant for Epyc.

So, it is unclear how the stacked cache chips will work. If they have one die variant, then I would expect the cache chip would act as L4 cache rather than L3. I also would expect that the stacked cache chip, if they exists, would only be on high end EPYC variants with relatively low clock only. The power consumption wouldn't be as much of a problem in such chip and the large cache is only needed for some specific applications. Although, if it is L4 cache, then it may make sense for that to be stacked on the IO die rather than on the cpu chips themselves. That would allow fast cache coherency at the L4 level. If it is on the IO die, then that would imply that the IO die is an interposer. It is still unclear whether stacked cache chips is a Zen 3 or Zen 4 feature. This seems unlikely for Zen 3, so I expect chip stacking is a Zen 4 feature if it is a thing.

The other area where AMD occasionally loses is due to AVX512. Some diagrams I have seen for Zen 2 floating point units show 2 FMA and 2 FADD (256 bit) units but it only has 8 ports. The two input ports for one of the FADD units are shared with the 2 FMA units. The FMA units only need 2 operands when doing multiply operations but 3 when doing FMA. I could see them going up to more FMA units while still sharing some ports with some FADD units. It would be great if they could support avx512, but supporting it at 512 bits wide isn’t really that important. There really isn’t that much difference between 1x512-bit instruction and 2x256-bit instructions if you have the same hardware throughout. New instruction types in avx512 can make a big difference though, so actually supporting avx512 is important, even if the width is not. Since Zen 3 is the new architecture, if they are supporting avx512, it seems like it would show up in Zen 3. I expect chip stacking and/or IO changes are Zen 4.
I don't think that Zen 3 will support AVX-512. Zen 4 likely will, though. It will be more than a tweak of Zen 3.
 

jamescox

Member
Nov 11, 2009
52
60
91
Charlie Demerjian posted a bit about Zen 3. >15% IPC increase was obvious already, but it's nice to hear that early silicon was in good shape (that was rumored too). I don't think a June launch is happening, though. Probably August.

I don't think that Zen 3 will support AVX-512. Zen 4 likely will, though. It will be more than a tweak of Zen 3.
AVX-512 has been around for a while, although things that can really take advantage of it should probably be run on a GPU, which may be AMD’s thinking. Intel didn’t have a GPU, so AVX-512 made sense for them to try to make their cpus more GPU-like. AMD still has reason to try to meet or exceed Intel everywhere they can. They generally have to exceed Intel performance significantly to get people to switch.

I kind of suspect that AVX-512 Foundation may be in Zen 3 except they may do the same thing that they did with AVX256 in Zen 1. Zen 1 split 256-bit instructions into two 128-bit operations. Zen 2 then widened the units to full 256-bit at 7 nm. They could do the same with AVX-512 in Zen 3 and then widen the units to full 512-bit in Zen 4 at 5 nm where they have more transistor budget and power budget to pull it off without reducing clocks as Intel does at 14 nm.
 
  • Like
Reactions: Vattila

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
19,773
7,135
136
AVX-512 has been around for a while, although things that can really take advantage of it should probably be run on a GPU, which may be AMD’s thinking. Intel didn’t have a GPU, so AVX-512 made sense for them to try to make their cpus more GPU-like. AMD still has reason to try to meet or exceed Intel everywhere they can. They generally have to exceed Intel performance significantly to get people to switch.

I kind of suspect that AVX-512 Foundation may be in Zen 3 except they may do the same thing that they did with AVX256 in Zen 1. Zen 1 split 256-bit instructions into two 128-bit operations. Zen 2 then widened the units to full 256-bit at 7 nm. They could do the same with AVX-512 in Zen 3 and then widen the units to full 512-bit in Zen 4 at 5 nm where they have more transistor budget and power budget to pull it off without reducing clocks as Intel does at 14 nm.
By all the benchmarks we have seen, except avx512, Rome has a 2x advantage to the best Intel has today, and Milan will only make that greater. I think AMD already has the big performance advantage, now they just need companies to adopt it.
 

jamescox

Member
Nov 11, 2009
52
60
91
Charlie Demerjian posted a bit about Zen 3. >15% IPC increase was obvious already, but it's nice to hear that early silicon was in good shape (that was rumored too). I don't think a June launch is happening, though. Probably August.

I don't think that Zen 3 will support AVX-512. Zen 4 likely will, though. It will be more than a tweak of Zen 3.
Also, If Zen 3 is still DDR4 and PCI-E 4.0, then Zen 4 could be a huge upgrade for Epyc while still using a very similar core because of a completely redesigned IO die with DDR5 and PCI-E 5.0. I don’t think those really matter too much for consumer space, except for the DDR5 upgrade for dual channel systems. If you throw in the possibility of stacked cache chips and/or an interposer based system of some kind, then Zen 4 could be very different, even if it is using almost the same core.
 

Glo.

Diamond Member
Apr 25, 2015
3,560
1,647
136
15% average IPC For Zen 3. The biggest gains however will come in Gaming, because of different cache design, and much, much higher bandwidth available for caches.

Expect at the very least Skylake clock for clock, core for core performance. At the very least(!).

End of Off-Topic.
 

lobz

Golden Member
Feb 10, 2017
1,006
797
106
15% average IPC For Zen 3. The biggest gains however will come in Gaming, because of different cache design, and much, much higher bandwidth available for caches.

Expect at the very least Skylake clock for clock, core for core performance. At the very least(!).

End of Off-Topic.
Zen 2 already delivers that in almost all games, so I don't understand your comment at all.
 
  • Like
Reactions: spursindonesia

Atari2600

Golden Member
Nov 22, 2016
1,105
1,139
106
My point is, ARM is a new struggling technology. Its not at the forefront at the moment, at least for supercomputers.
Given the rate of improvement in ARM, can you not see a future in 5 years where it surpasses x86 in some/many/most performance metrics?

They've only really had a couple of stabs at the HPC market over the past 2 or 3 years - meanwhile x86 has dominated the market for decades.
 

jamescox

Member
Nov 11, 2009
52
60
91
By all the benchmarks we have seen, except avx512, Rome has a 2x advantage to the best Intel has today, and Milan will only make that greater. I think AMD already has the big performance advantage, now they just need companies to adopt it.
There is alway “what aboutism“ to deal with. What about avx512? If you use avx512 applications, then it is important. If you don’t, then it is mostly irrelevant, but detractors can still point at it and say “what about AVX512?” AMD has been systematically eliminating any cases where they trail Intel processors. The larger cache should allow them to surpass Intel in the small number of applications where such large cache Is needed. The cache architecture redesign should increase performance in general; probably not just due to size of L3. I think they may use smaller, faster L1 and larger L2. L3 May be a tiny bit slower.

For avx though, they are up against processors like the 3175x which actually has two 512 bit units. AMD currently only has two 256 bit FMA units. The rumors point to significantly increased cache bandwidth and significantly increased floating point power. Going up to four 256 bit FMA would allow them to match Intel on a per core basis. That takes a lot more register file ports and such, but it will allow them to match general FP code performance (both would have 1024 bits per clock FMA) while still using AVX256 instructions. Three 256 bit FMA would only require 1 more PRF port (8 -> 9), so that could be a possibility also. Four would require 12 ports. They would still have issues where specialized avx512 instructions are used since it will often take many more general instructions to emulate a specialized instruction. This is a new family (17 -> 19?), so it seems like support for axv512 would be plausible.
 
  • Like
Reactions: Vattila

Jimzz

Diamond Member
Oct 23, 2012
4,324
119
106
Aside from one-time supercomputer contracts, Intel's 7nm and AMD's first 5nm CPUs will come roughly at the same time to the consumer market.

Thats if Intel can actually mass produce at 7nm. TSMC will start mass producing 5nm products this year.
Intel is not supposed to produce 7nm till 2021 at best. Being that they are at least a year out that means they are still not close to final product. Last I read they have not even entered risk production. Yet when AMDs chips are being made in 5nm TSMC will have produced other 5nm products before. So yield should be very good.
 
  • Like
Reactions: Vattila

eek2121

Senior member
Aug 2, 2005
377
244
116
Also, If Zen 3 is still DDR4 and PCI-E 4.0, then Zen 4 could be a huge upgrade for Epyc while still using a very similar core because of a completely redesigned IO die with DDR5 and PCI-E 5.0. I don’t think those really matter too much for consumer space, except for the DDR5 upgrade for dual channel systems. If you throw in the possibility of stacked cache chips and/or an interposer based system of some kind, then Zen 4 could be very different, even if it is using almost the same core.
15% average IPC For Zen 3. The biggest gains however will come in Gaming, because of different cache design, and much, much higher bandwidth available for caches.

Expect at the very least Skylake clock for clock, core for core performance. At the very least(!).

End of Off-Topic.
I imagine we'll see a higher increase for Zen 3 then that. Something on the order of 20-25%. They are saying it's a new architecture, and given that they are changing the family from 17h to 19h, that implies a huge foundational change, even above and beyond Zen 2.


It doesn't. Its a little bit slower in games than Skylake.
I hate to be that guy, but does anyone actually care at this point? My CPU stopped being a bottle neck a long time ago. I'm on a 1950X.

Given the rate of improvement in ARM, can you not see a future in 5 years where it surpasses x86 in some/many/most performance metrics?

They've only really had a couple of stabs at the HPC market over the past 2 or 3 years - meanwhile x86 has dominated the market for decades.
Can we please stop turning every thread into an ARM vs the world thread? Thanks.

There is alway “what aboutism“ to deal with. What about avx512? If you use avx512 applications, then it is important. If you don’t, then it is mostly irrelevant, but detractors can still point at it and say “what about AVX512?” AMD has been systematically eliminating any cases where they trail Intel processors. The larger cache should allow them to surpass Intel in the small number of applications where such large cache Is needed. The cache architecture redesign should increase performance in general; probably not just due to size of L3. I think they may use smaller, faster L1 and larger L2. L3 May be a tiny bit slower.

For avx though, they are up against processors like the 3175x which actually has two 512 bit units. AMD currently only has two 256 bit FMA units. The rumors point to significantly increased cache bandwidth and significantly increased floating point power. Going up to four 256 bit FMA would allow them to match Intel on a per core basis. That takes a lot more register file ports and such, but it will allow them to match general FP code performance (both would have 1024 bits per clock FMA) while still using AVX256 instructions. Three 256 bit FMA would only require 1 more PRF port (8 -> 9), so that could be a possibility also. Four would require 12 ports. They would still have issues where specialized avx512 instructions are used since it will often take many more general instructions to emulate a specialized instruction. This is a new family (17 -> 19?), so it seems like support for axv512 would be plausible.
It's all about TCO. Software that uses AVX-512 doesn't simply stop working. It only runs a bit slower. When you have 128 cores/256 threads in a server...well I would go with the AMD option.
 

lobz

Golden Member
Feb 10, 2017
1,006
797
106
Thats if Intel can actually mass produce at 7nm. TSMC will start mass producing 5nm products this year.
Intel is not supposed to produce 7nm till 2021 at best. Being that they are at least a year out that means they are still not close to final product. Last I read they have not even entered risk production. Yet when AMDs chips are being made in 5nm TSMC will have produced other 5nm products before. So yield should be very good.
On these topics we agree 100% :)
 
  • Like
Reactions: Jimzz

jamescox

Member
Nov 11, 2009
52
60
91
Given the rate of improvement in ARM, can you not see a future in 5 years where it surpasses x86 in some/many/most performance metrics?

They've only really had a couple of stabs at the HPC market over the past 2 or 3 years - meanwhile x86 has dominated the market for decades.
I don’t really see why you think ARM is going to improve at a higher rate than x86 derivatives going forward.
 

moinmoin

Golden Member
Jun 1, 2017
1,402
1,189
106
Thats if Intel can actually mass produce at 7nm. TSMC will start mass producing 5nm products this year.
Intel is not supposed to produce 7nm till 2021 at best. Being that they are at least a year out that means they are still not close to final product. Last I read they have not even entered risk production. Yet when AMDs chips are being made in 5nm TSMC will have produced other 5nm products before. So yield should be very good.
This is a good and interesting point. With new nodes at Intel Intel is always likely to go through some growing pains for its products, even if they are not released to the public (so not like Cannonlake). Intel needs something to work with to improve the node. AMD on the other hand so far has been a relatively late mover with TSMC's new nodes so far, which means mass production and yield has already been tested and improved through other major node consumers like first mover Apple, Qualcomm, HiSilicon etc. Meanwhile with Zen 2 (as their recent presentation has shown) AMD adapt their silicon design specifically to the node, so the the degree of optimization may already be that of one or two rounds after an initial node launch.
 
  • Like
Reactions: Vattila and Jimzz

Tuna-Fish

Golden Member
Mar 4, 2011
1,019
510
136
AVX-512 has been around for a while, although things that can really take advantage of it should probably be run on a GPU, which may be AMD’s thinking.
AVX-512 isn't just a wider extension. It provides a much cleaner api to use as a compilation target for vectorized loops. This is important because it helps "worse", less optimized code to benefit more from vector execution. Even if you only had a single 128-bit EU, there would be gains in supporting AVX-512. In a way, if you just consider the API and not execution width, AVX-512 finally does right a lot of the things that should have been done with the original SSE.

It doesn't really matter yet because approximately nothing makes use of it. I just hope AMD adopts it quickly, and Intel starts supporting it across their product line, so that I can sometime in the future finally start making use of it.
 

moinmoin

Golden Member
Jun 1, 2017
1,402
1,189
106
AVX-512 isn't just a wider extension. It provides a much cleaner api to use as a compilation target for vectorized loops. This is important because it helps "worse", less optimized code to benefit more from vector execution. Even if you only had a single 128-bit EU, there would be gains in supporting AVX-512. In a way, if you just consider the API and not execution width, AVX-512 finally does right a lot of the things that should have been done with the original SSE.

It doesn't really matter yet because approximately nothing makes use of it. I just hope AMD adopts it quickly, and Intel starts supporting it across their product line, so that I can sometime in the future finally start making use of it.
What AVX-512 feature subsets should AMD support? On the one hand after Dozer and FMA4 AMD has shown their willingness to only be second mover regarding CPU features, which would leave AVX-512 F & CD at best. On the other hand I'd still hope for a more clean approach akin SVE, with all matching SSE*/AVX* instructions being supported as legacy option.
 
  • Like
Reactions: Vattila

soresu

Senior member
Dec 19, 2014
970
292
136
It doesn't really matter yet because approximately nothing makes use of it. I just hope AMD adopts it quickly, and Intel starts supporting it across their product line, so that I can sometime in the future finally start making use of it.
There's plenty of it in x264, x265, SVT-AV1 and now some in dav1d (therefore by extension rav1e too).

I think some emulators have it too, and I wouldn't be surprised to find it in Intel's Embree which is used in quite a few ray tracing code projects (oddly including one of AMD's).
 
  • Like
Reactions: Vattila

ASK THE COMMUNITY