Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 7 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

extide

Senior member
Nov 18, 2009
261
64
101
www.teraknor.net
Silly question, but with the ARM supporters maintaining they have better IPC and use less power, why would the fastest supercomputers in the world all use x86 processors ? (AMD/Intel)

Since supercomputers use their own proprietary OS and applications, they should work on any selected hardware.

Well, besides the obvious possibility of, "They're wrong," It seems like they are looking for a single vendor to supply CPU and GPU for these builds. Also, you would need an ARM SOC with a ton of I/O -- lots of PCIe lanes, etc. The only one that seems to fit that bill is this just recently announced. AMD is likely using Infinity Fabric to directly link the CPU/GPUs here (almost a guarantee with them saying it is unified memory across the CPU and GPU) -- so you would need an ARM vendor to have interoperability with either their own GPU's (which don't exist for this application currently) or another vendor (AMD, Intel, Nvidia) and it's special coherent link.
 
  • Like
Reactions: Vattila

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
Well, besides the obvious possibility of, "They're wrong," It seems like they are looking for a single vendor to supply CPU and GPU for these builds. Also, you would need an ARM SOC with a ton of I/O -- lots of PCIe lanes, etc. The only one that seems to fit that bill is this just recently announced. AMD is likely using Infinity Fabric to directly link the CPU/GPUs here (almost a guarantee with them saying it is unified memory across the CPU and GPU) -- so you would need an ARM vendor to have interoperability with either their own GPU's (which don't exist for this application currently) or another vendor (AMD, Intel, Nvidia) and it's special coherent link.
Exactly. ARM is a great and efficient CPU model, but ARM are not system builders and the I/O is just not there at this time. Also, one cannot ignore the history (with MS, Intel, AMD, etc.) that x86 has a huge leg up on corporate/server applications because of history, and that ARM is playing catchup in this realm. There is a lot to be said for momentum when it comes to system planning.

I have mentioned this before, but I believe ARM (via Apple in large part) is moving toward x86-style "speed at the expense of efficiency" territory (e.g. purported MacBook Pro Axx), and x86 is moving toward ARM-style "efficiency at the expense of speed" territory (e.g. EPYC and the extreme power savings one can achieve on Zen2 with only a minimal drop in performance).

Really fun to watch. I for one am absolutely enamored with the work being done by ARM, Intel, AMD, Nvidia - not to mention Samsung, TSMC, GloFo, Qualcomm, Apple, etc. This is a super fun time to be a fan of microprocessors!
 

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
Well, besides the obvious possibility of, "They're wrong," It seems like they are looking for a single vendor to supply CPU and GPU for these builds. Also, you would need an ARM SOC with a ton of I/O -- lots of PCIe lanes, etc. The only one that seems to fit that bill is this just recently announced. AMD is likely using Infinity Fabric to directly link the CPU/GPUs here (almost a guarantee with them saying it is unified memory across the CPU and GPU) -- so you would need an ARM vendor to have interoperability with either their own GPU's (which don't exist for this application currently) or another vendor (AMD, Intel, Nvidia) and it's special coherent link.

Ampere, Huawei, and Amazon have all produced multicore server ARM CPUs that could probably go into multinode configurations with fast links to dGPUs. And um, Fujitsu? And I guess Broadcom/Cavium? They're out there.
 
  • Like
Reactions: Vattila

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Silly question, but with the ARM supporters maintaining they have better IPC and use less power, why would the fastest supercomputers in the world all use x86 processors ? (AMD/Intel)
Easy answer: Planing. For this scale of supercomputer you need plan 5 years ahead, constructing building for it etc. Look what ARM had 5 years ago, that was very poor performing Cortex A72 (now in RPi 4) and no server Neoverse. But now with Neoverse N1 and powerfull cores A76/77/78 ARM is able to enter the game. IMHO it's just matter of time when first supercomputer based on ARM will be announced. But Cavium, Ampere and Nuvia need to enter server market successfully first.

3.png
 
  • Like
Reactions: Vattila

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Ok, so let's circle this news back to the original topic; the features of Zen 4. With the information shared with this supercomputer win, and the time frame, we now can pretty much assume DDR-5 support. PCI Express 5 is likely also, given the time frame, but not a given. The big news is Infinity Fabric 3, which now will be able to form cache-coherent links with GPUs in addition to just other CPUs (picking up on AMD's earlier HSA work). More cores are probably a given also. Mark Papermaster has hinted that they are not stopping core-count increases yet (if ever).

But what about single-thread performance — IPC and frequency? Do you expect industry-leading IPC? Industry-leading frequency? Industry-leading core scaling and efficiency are probably given, considering the supercomputer wins.

And what do you expect when it comes to manufacturing? Will it be manufactured on 5nm? Using first-generation EUV? Or second-generation high-NA EUV? Using FinFETs? Using Gate-All-Around nanowire or nanosheet transistors? And packaging — do you expect Zen 4 to use interposer?
 

eek2121

Platinum Member
Aug 2, 2005
2,904
3,906
136
Likely:
  1. DDR5, PCIE5, AM5, 5nm
  2. AVX 512
  3. Ongoing effort to lower memory latency
  4. Higher IPC, higher clocks
  5. Drastically improved power efficiency compared to Zen 2
  6. They will most definitely use chiplets
  7. High speed GPU interconnect
Less likely, but possible:
  1. 8 core CCX/16 core CCD
  2. Double the memory channels, note that they can do this for free as each DDR5 DIMM is actually dual channel. This means, for example, Ryzen desktop becomes quad channel. I almost put this under likely.
  3. 7nm or 5nm chipset with power optimizations.
  4. APUs across the stack, or at least an onboard APU. I am not talking necessarily about gaming APUs mind you, though this is a possibility (gaming for desktop/mobile, compute for EPYC, unsure about Threadripper)
  5. More PCIE lanes
  6. Meme worthy 5/5 announcement and/or launch.
  7. Higher memory limit.
  8. Revamped base/boost if Zen 3 doesn’t have it.
  9. “AI accelerator”
  10. Low power EPYC
Far out there, but good ideas:
  1. Big.little cores for some models.
  2. A 5 watt smartphone SoC style part.
  3. More instruction set additions to allow AI and gaming offload from GPU to CPU (work towards GPU/CPU convergence)
  4. Hardware scheduler assist to allow for GPU and CPU threads to utilize more than one integer/FPU core to be utilized as needed (think decoupled CMT)
  5. Improved product segmentation. Socket convergence of EPYC/Threadripper, for example.
  6. Mobile realignment with desktop.
 
Last edited:
  • Like
Reactions: Vattila

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
But what about single-thread performance — IPC and frequency? Do you expect industry-leading IPC?

x86 leading IPC is very likely.

Industry-leading frequency?

Probably not at the high end consumer side, but most likely on the high core count server side due to process efficiency improvements.

Industry-leading core scaling and efficiency are probably given, considering the supercomputer wins.

Agreed

And what do you expect when it comes to manufacturing? Will it be manufactured on 5nm?

Yes

Using first-generation EUV? Or second-generation high-NA EUV?

Probably first gen but I'm not sure what the EUV roadmap is like. Considering how long it took us to get to first gen being HVM, I'm guessing we're stuck with that for a bit.

Using FinFETs? Using Gate-All-Around nanowire or nanosheet transistors?

TSMC said they are using FinFET at 5 nm and will use FinFETs initially at 3 nm as well before moving to GAA at 3 nm. Samsung is being more aggressive and going straight to GAA nanosheets at 3 nm and expects them to be out late next year. Samsung has been pretty vocal about their 3 nm nanosheet process and seems confident in it. We'll see how it plays out. Samsung's aggressiveness meant their 7 nm didn't pan out as well as TSMC's so maybe they're making the same mistake at 3 nm or maybe it will pay off this time.

And packaging — do you expect Zen 4 to use interposer?

No idea, TSMC is spending a lot on R&D for packaging though so I wouldn't be surprised if we see something new.
 
  • Like
Reactions: Vattila

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
But what about single-thread performance — IPC and frequency? Do you expect industry-leading IPC? Industry-leading frequency? Industry-leading core scaling and efficiency are probably given, considering the supercomputer wins.
IPC will always increase since that is the primary mean of ensuring a newer gen performs better than the older one. The question is in what area IPC is being increased: general computing or some specialized computing like niche extra-wide SIMD? AMD does both so far, achieving the former through optimizing their cache hierarchy and macro/micro ops etc., as well as the latter by supporting wider FPUs along with the necessary wider data paths.

Frequency (in contrary to efficiency) overall seems to be a secondary thought that is of particular importance only in the niche high end desktop market. As the recent revelation regarding Zen 2 on the mobile oriented N7 showed it makes this a result of node capabilities combined with the particular on silicon design how the efficiency features of the node are married with sufficient room for frequency increases.

And what do you expect when it comes to manufacturing? Will it be manufactured on 5nm? Using first-generation EUV? Or second-generation high-NA EUV? Using FinFETs? Using Gate-All-Around nanowire or nanosheet transistors? And packaging — do you expect Zen 4 to use interposer?
Are these all actually in question? I thought common expectation is that Zen 3 will be on TSMC's N7+, and Zen 4 on TSMC's N5. Both are EUV (N7+ partly and N5 fully) and FinFET.

Regarding packaging AMD's reasons for picking MCM for Zen 2 (primarily path lengths) will likely apply in the future as well. But I could imagine more complex hierarchies than the current simple IOD - CCD star, and specific parts could then be connected differently. I expect Zen 4 to introduce new platforms across all markets along with support for DDR5 and PCIe5, so the package layout won't be bound by an existing platform spec anymore and we are very likely going to see bigger changes for both.
 
  • Like
Reactions: bsp2020 and Vattila

eek2121

Platinum Member
Aug 2, 2005
2,904
3,906
136
Ok, so let's circle this news back to the original topic; the features of Zen 4. With the information shared with this supercomputer win, and the time frame, we now can pretty much assume DDR-5 support. PCI Express 5 is likely also, given the time frame, but not a given. The big news is Infinity Fabric 3, which now will be able to form cache-coherent links with GPUs in addition to just other CPUs (picking up on AMD's earlier HSA work). More cores are probably a given also. Mark Papermaster has hinted that they are not stopping core-count increases yet (if ever).

But what about single-thread performance — IPC and frequency? Do you expect industry-leading IPC? Industry-leading frequency? Industry-leading core scaling and efficiency are probably given, considering the supercomputer wins.

And what do you expect when it comes to manufacturing? Will it be manufactured on 5nm? Using first-generation EUV? Or second-generation high-NA EUV? Using FinFETs? Using Gate-All-Around nanowire or nanosheet transistors? And packaging — do you expect Zen 4 to use interposer?

I wanted to touch on this. AMD will likely never have a frequency lead over Intel’s 14nm, however I fully expect Zen 3 and Zen 4 to beat any Intel 14 nm part.

People need to stop focusing one frequency, it’s meaningless. 5+ GHz CPUs will be gone in 3 years except whatever left over crap Intel is shoveling. Some 7nm EUV parts might get close, however.

That is the entire issue Intel is having. Their slight IPC lead in certain tasks stems partially from having a higher frequency than Ryzen. They can’t hit those speeds on 10nm and don’t yet have a part with enough of an IPC advantage to make up the deficit. Because of this, 10nm remains mobile only.
 
  • Like
Reactions: Vattila and uzzi38

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
AMD will likely never have a frequency lead over Intel’s 14nm, however I fully expect Zen 3 and Zen 4 to beat any Intel 14 nm part.

Zen 2 is using slightly relaxed 6T HD library instead of performance optimized libraries. I think they had their reasons. I could imagine AMD deemed density more important overall and it allowed AMD to get EPYC to 64 cores within a reasonable power budget.

Read about HD vs HP devices here

So they did make a conscious choice to not go all out on frequency. So, TSMC's N7 could probably be capable of hitting 5GHz after all.

On the other hand N5 is optimized for HPC from the start, which means 15% performance gains on top of N7 or 25% when using eLVT, but probably with a caveat of lower density and efficiency. So again here expecting AMD to go for a tradeoff in favour of efficiency than raw clock speed.

If what Forrest mentioned is true, that Zen3 being a 'completely new architecture' compared to Zen2 which is an evolution of Zen arch with a move to 2D chiplets,
Then I suppose Zen4 would be an evolution of Zen3 with a move to 3D stacked chiplets.

Zen4 on N5, with almost 1.8x density increase and die stacking on top is going to be packing a lot of heat ... (also literally).

(Zen5 would be a new arch with a move to GAA in 2023, that would be insane , if we recall the performance gains from moving to FinFET)
 

uzzi38

Platinum Member
Oct 16, 2019
2,565
5,572
146
Likely:
  1. DDR5, PCIE5, AM5, 5nm
  2. AVX 512
  3. Ongoing effort to lower memory latency
  4. Higher IPC, higher clocks
  5. Drastically improved power efficiency compared to Zen 2
  6. They will most definitely use chiplets
  7. High speed GPU interconnect
Less likely, but possible:
  1. 8 core CCX/16 core CCD
  2. Double the memory channels, note that they can do this for free as each DDR5 DIMM is actually dual channel. This means, for example, Ryzen desktop becomes quad channel. I almost put this under likely.
  3. 7nm or 5nm chipset with power optimizations.
  4. APUs across the stack, or at least an onboard APU. I am not talking necessarily about gaming APUs mind you, though this is a possibility (gaming for desktop/mobile, compute for EPYC, unsure about Threadripper)
  5. More PCIE lanes
  6. Meme worthy 5/5 announcement and/or launch.
  7. Higher memory limit.
  8. Revamped base/boost if Zen 3 doesn’t have it.
  9. “AI accelerator”
  10. Low power EPYC
Far out there, but good ideas:
  1. Big.little cores for some models.
  2. A 5 watt smartphone SoC style part.
  3. More instruction set additions to allow AI and gaming offload from GPU to CPU (work towards GPU/CPU convergence)
  4. Hardware scheduler assist to allow for GPU and CPU threads to utilize more than one integer/FPU core to be utilized as needed (think decoupled CMT)
  5. Improved product segmentation. Socket convergence of EPYC/Threadripper, for example.
  6. Mobile realignment with desktop.

I'm a bit lazy so I'll reply to the question on Zen 4 expectations but more of my take on this, because I feel like this post really has covered the most stuff.so far.

On the highly likely side of things, these are the things I'm not certain of:

1. Clocks - 5nm might be where we see the first decrease for all I know
2. PCIe Gen 5 - I expect it in servers, but I'm not confident it will be cost effective for consumers
3. High bandwidth interconnect - BigAPU mode as its referred to by Komachi, I'm not sure if again will come down to consumers. It's more of a compute focused thing, consumers will see smaller benefits, so we could potentially see it beinf restricted to EPYC and maybe TR. Unless you mean next gen Infinity Fabric, in which case I agree.

Rest of them I agree with.

Now onto the less likely stuff:
1. I don't see a benefit to 16 core CCDs if I'm honest. To me it makes more sense tostay with 8 core CCDs.
2. 5nm/7nm chipset - Sorry, no. I/O still doesn't scale, so I don't see it being brought to the new nodes. Besides, by Zen 4, 12LPP+ should be a thing, no? if they were going to shrink I/O, GF's 12LPP makes the most sense I'd think.

for the rest, idk if they'd actually be things or not, but I certainly can't find a reason to take away form them.

Now the far out ideas:
1. Gonna start off a little bit negative - I don't think we'll see big little from AMD. Nothing like Lakefield anyway, not for a while. What they're doing with Frontier etc - I think that's the real focus for now.
2. No smartphone SoCs. Would get in the way of their deal on RDNA with Samsung.

I'm not sure about the rest in that last section.
 
  • Like
Reactions: Vattila

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
Fixed, IPC is not dependent on frequency but agnostic to it. Simplified: Performance = IPC * frequency. Other than that you're absolutely on point.

I think he meant to write performance initially, it makes the most sense in context.

With that said IPC being frequency agnostic is not really true except when talking really deep down at the low levels of cache. In a more general sense (as it is typically used) for actual 'real world' IPC, it can be both workload and frequency dependent.
 
  • Like
Reactions: moinmoin

Glo.

Diamond Member
Apr 25, 2015
5,657
4,409
136
Fixed, IPC is not dependent on frequency but agnostic to it. Simplified: Performance = IPC * frequency. Other than that you're absolutely on point.
IPC heavily depends on frequency. The higher you go with frequency, the less IPC you have. IPC relies on dozen of different design decisions: cache latancies, cache bandwidth, memory bandwidth, etc.
 
  • Like
Reactions: moinmoin

uzzi38

Platinum Member
Oct 16, 2019
2,565
5,572
146
It's my understanding that TSMC plans both 'regular' FinFETS and GAA at 3nm. Kind of like how they did 7nm and 7+. My info might be dated at this point, but that was the last roadmap I had heard.
That's the first I've heard of it tbh, all I remember reading was TSMC aren't going with GAA for N3, but upon looking around I found this which seems to suggest you could be right:


So... idk maybe?
 
  • Like
Reactions: Vattila

exquisitechar

Senior member
Apr 18, 2017
655
862
136
I'm a bit lazy so I'll reply to the question on Zen 4 expectations but more of my take on this, because I feel like this post really has covered the most stuff.so far.

On the highly likely side of things, these are the things I'm not certain of:

1. Clocks - 5nm might be where we see the first decrease for all I know
2. PCIe Gen 5 - I expect it in servers, but I'm not confident it will be cost effective for consumers
3. High bandwidth interconnect - BigAPU mode as its referred to by Komachi, I'm not sure if again will come down to consumers. It's more of a compute focused thing, consumers will see smaller benefits, so we could potentially see it beinf restricted to EPYC and maybe TR. Unless you mean next gen Infinity Fabric, in which case I agree.

Rest of them I agree with.

Now onto the less likely stuff:
1. I don't see a benefit to 16 core CCDs if I'm honest. To me it makes more sense tostay with 8 core CCDs.
2. 5nm/7nm chipset - Sorry, no. I/O still doesn't scale, so I don't see it being brought to the new nodes. Besides, by Zen 4, 12LPP+ should be a thing, no? if they were going to shrink I/O, GF's 12LPP makes the most sense I'd think.

for the rest, idk if they'd actually be things or not, but I certainly can't find a reason to take away form them.

Now the far out ideas:
1. Gonna start off a little bit negative - I don't think we'll see big little from AMD. Nothing like Lakefield anyway, not for a while. What they're doing with Frontier etc - I think that's the real focus for now.
2. No smartphone SoCs. Would get in the way of their deal on RDNA with Samsung.

I'm not sure about the rest in that last section.
I think AMD will maintain or even slightly increase clocks with 5nm based on what’s been revealed about it so far, but who knows. The heat density will be a big issue and AMD is concerned with power and area first, increasing clock speeds is secondary.
 
  • Like
Reactions: Vattila

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
I think he meant to write performance initially, it makes the most sense in context.

With that said IPC being frequency agnostic is not really true except when talking really deep down at the low levels of cache. In a more general sense (as it is typically used) for actual 'real world' IPC, it can be both workload and frequency dependent.
IPC heavily depends on frequency. The higher you go with frequency, the less IPC you have. IPC relies on dozen of different design decisions: cache latancies, cache bandwidth, memory bandwidth, etc.
You're both absolutely right of course. IPC is an abstracted form of reflecting performance that in reality depends not only on all the design decisions Glo. mentions and more but also on the frequency range a node offers (which again influences the aforementioned design decisions). But "actual 'real world' IPC" is so far off the original abstraction purpose of IPC that it's a good idea to use another term for that.
 
  • Like
Reactions: Glo.

thesmokingman

Platinum Member
May 6, 2010
2,307
231
106
El Capitan will be running a custom Epyc that is Rome based. The two things we know of El Captian is that it sports a very revised IF with DDR5 and AVX512.
 

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
You're both absolutely right of course. IPC is an abstracted form of reflecting performance that in reality depends not only on all the design decisions Glo. mentions and more but also on the frequency range a node offers (which again influences the aforementioned design decisions). But "actual 'real world' IPC" is so far off the original abstraction purpose of IPC that it's a good idea to use another term for that.

I agree, but it's the term used now very generally to refer to perf/freq. AMD and Intel even use it in their marketing materials so its hard to escape at this point.