Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 236 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

DrMrLordX

Lifer
Apr 27, 2000
21,637
10,855
136
I'm afraid you either got this wrong or being inexact. Perhaps you did mean under the heaviest possible workload with the CPB disabled?

If you size a cooler exactly to the TDP and max out all the cores on the CPU, it will heat soak the cooler almost immediately and revert to ~base clocks. CPB won't have any opportunity to do anything.
 

Abwx

Lifer
Apr 2, 2011
10,953
3,474
136
Some rumoured numbers, dunno what they worth..



 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
I already said that Zen4 will be more power efficient at the same TDP, but the 170W model could end up with lower power efficiency than 5950x. This doesn't mean that model will be a bad product, It will just trade power efficiency for performance.
It is pretty much always true that extreme clock will be lower power efficiency, so what is your point? If they can run 5+ GHz all core without extreme cooling, then they have to be incredibly power efficient. Intel performance per watt is abysmal at high clock, which is why they are doing efficiency cores and AMD doesn’t need to. I think Zen 4 efficiency will be spectacular in comparison to Zen 3. You are talking about Zen 4 being less efficient at clock speeds that Zen 3 might not even be able to reach. Okay, when it comes out, lets overclock a 5950x to 5.5 GHz and compare it to a Zen 4 at 5.5 GHz. Which one do you think will be more “efficient”?
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
6 channels I think.
If the IO die is still 4 quadrants, then 6 channels doesn’t actually make sense. Your options would be all divisible but 4, so 4, 8, or 12 if it uses the same IO die. They could possibly make a half IO die with only 2 of the full Epyc 4 quadrants. For that case, 6 would make sense, but that requires a separate chip.
 

Thibsie

Senior member
Apr 25, 2017
749
801
136
If the IO die is still 4 quadrants, then 6 channels doesn’t actually make sense. Your options would be all divisible but 4, so 4, 8, or 12 if it uses the same IO die. They could possibly make a half IO die with only 2 of the full Epyc 4 quadrants. For that case, 6 would make sense, but that requires a separate chip.

You're right, it is indeed less than 6.
See Charlie article.
 

naad

Member
May 31, 2022
63
176
66
Now I finally remembered what was bugging me with zen4, I don't think anyone sensible expected much changes to the register sizes, either FP or INT, FP will stay 6-wide, might there be a merging on the ports? load/store will probably remain the same, maybe some swapping around on the AGU/ALU side? 4/3 seems balanced, might servers and cloud customers prefer more AGUs for VMs?
Either way I don't expect the execution units or load/store to change, that's probably the reason for the low IPC increase in CB r23

Here's a line for wikchip:
  • L1 and L2 DTLB size increased from 64 to 72 and 2,048 to 3,072 entries
If they're touching the L1 in the first place, there's probably more changes than expected.
BP changes are anyone's guess, but I'm thinking the actual beef is in the memory and caches, what are the current latencies? 12-14 for L2 and around 47 for L3? This is pretty good already and significantly better than Intel's L3, even if the cycles stay the same or go up in the case of the L2, the zen4 clocks significantly higher so real latency might be quite a lot lower, before any uarch changes come into play, and lower wire distance from 5nm, I expect the brunt of gaming "IPC" to come from these changes.

For memory, the fabric being clocked higher will obviously do well, GMI3 upgrade from GMI2? interconnect being faster never hurt.
Point is I expect zen4 to significantly improve latencies, as a server first uarch this is probably the only place where Intel has some kind of lead.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,355
1,548
136
Here's a line for wikchip:
  • L1 and L2 DTLB size increased from 64 to 72 and 2,048 to 3,072 entries
If they're touching the L1 in the first place, there's probably more changes than expected.
BP changes are anyone's guess, but I'm thinking the actual beef is in the memory and caches, what are the current latencies? 12-14 for L2 and around 47 for L3? This is pretty good already and significantly better than Intel's L3, even if the cycles stay the same or go up in the case of the L2, the zen4 clocks significantly higher so real latency might be quite a lot lower, before any uarch changes come into play, and lower wire distance from 5nm, I expect the brunt of gaming "IPC" to come from these changes.

If the AVX-512 execution side is full-width, they needed to re-do the caches to feed it. From the gigabyte leak, the natural alignment for cache is now 64 bytes, which certainly hints at substantial changes.

I also expect essentially no changes inside the core itself (other than the addition of AVX-512), but there can be nice IPC gain from the cache changes. If the change in natural alignment coincides with the increase in width of the L1<->L2 connection to 512 bits, this saves them a cycle when missing L1 (or more when missing L1 and all the lines in the set are dirty, but that shouldn't really happen with smart writeback), which probably helps offset the increase in latency from the doubling of the cache size a bit.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
My expectation is very significant changes to FP/VECTOR side required to support AVX512. Current "symmectric" pairs of 256bit MAC + 256bit ADD will have to change to support 512bit ops. What exact scheme they will support is hard to guess, ZEN->ZEN2 style transition is likely impossible here, as there are plenty of AVX512 instructions that either work on full 512bits in a cycle or would take ages when microcoded. AVX 256bits was extremely careful not to mix between high and low 128bits and AVX512 is opposite, esp all the great stuff in IceLake etc gen.

So i think they will have 512 MAC + 256 ADD + 256 MAC + 256 ADD (+fp l/s) That would be a move to give great compability and reasonable performance for AVX512 stuff, but would not "break" records in AVX512 execution speed and would not break bank in register ports and power and so on. Might even get away with some slack in cache bandwidths.

The Integer core wil likely see improved branch prediction, somewhat larger ROB of 300'ish size, ST/LD queues etc but still firmly 4 ALU machine, that is frankly uninspired and lacking any ambition for year 2022 on 5nm process. If i had to guess - Zen4 was deemed low risk to fail due to 5nm process and hugely increased density, efficiency and clocks and was given to 2nd rate team at AMD, while the stars work on Zen5.

It's lucky for AMD that ARM is equally uninspired and took ages to move to 4 ALU design and only getting deployed now in Graviton 3. The future might not be so kind.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,361
2,849
106
It is pretty much always true that extreme clock will be lower power efficiency, so what is your point? If they can run 5+ GHz all core without extreme cooling, then they have to be incredibly power efficient. Intel performance per watt is abysmal at high clock, which is why they are doing efficiency cores and AMD doesn’t need to. I think Zen 4 efficiency will be spectacular in comparison to Zen 3. You are talking about Zen 4 being less efficient at clock speeds that Zen 3 might not even be able to reach. Okay, when it comes out, lets overclock a 5950x to 5.5 GHz and compare it to a Zen 4 at 5.5 GHz. Which one do you think will be more “efficient”?
What's my point? You already quoted It.
I will repeat, my point was that the top model with 170W TDP(230W PPT) will have most likely worse power efficiency(perf/W) than 5950x and that AMD with this SKU chose to go for max performance even If It means worse power efficiency than Its predecessor.

What's your point in bringing up overclock? You think I don't know that Zen4 will reach much higher clocks than Zen3 and that at those It would be more efficient? I simply compared TOP Zen3 SKU vs the future TOP Zen4 SKU in power efficiency, because I find It interesting. If you consider It an unfair comparison, that's your opinion, and you are entitled to It like I am to mine.
 

cortexa99

Senior member
Jul 2, 2018
319
505
136
Now I finally remembered what was bugging me with zen4, I don't think anyone sensible expected much changes to the register sizes, either FP or INT, FP will stay 6-wide, might there be a merging on the ports? load/store will probably remain the same, maybe some swapping around on the AGU/ALU side? 4/3 seems balanced, might servers and cloud customers prefer more AGUs for VMs?

Either way I don't expect the execution units or load/store to change, that's probably the reason for the low IPC increase in CB r23

(...snip...)

Very insightful, I wanna quote more than just you for the architecture analysis but I decide to say something I know rather than just quoting.

From what I heard, data that I collected from older AMD platform's owner without intention, it seems that Win11/fTPM issues is still hurting AMD processor's performance,
and a rumor I heard is that AMD still ‘negotiate‘ with Microsoft to solve these issues. Since Zen4's design and tapeout was already finished way earlier than Win11 and AMD knew nothing about it, it's possible there also would be issues on AM5 platform. That's why the PPT slides showed about Zen4 is unclear as hell. (Note: the footnote in PPT slides showed all benchmark were done with Win11)
So I think we'd better not to speculate the architecture based on performance on PPT slides.
 
  • Like
Reactions: Tlh97 and Kaluan

Mopetar

Diamond Member
Jan 31, 2011
7,842
5,997
136
Some rumoured numbers, dunno what they worth..




ST performance uplift is going to be heavily dependent on clock speeds for most applications. However, anything that was memory-bound to some degree has a lot of upside due to the larger L2 cache and the move to DDR5.

The results we've seen with both Zen 3D and Alder Lake show that there are some games that benefit from such things. You can get an average out of everything you test, but I think a more careful observation will show that there are a lot of applications that see close to 0% IPC gains and there will be others where 30% gains are possible.

10% average may be a bit conservative as games were the area where Zen 3D excelled and they're similarly more likely to get more out of the Zen 4 changes.
 

yuri69

Senior member
Jul 16, 2013
389
624
136
For memory, the fabric being clocked higher will obviously do well, GMI3 upgrade from GMI2? interconnect being faster never hurt.
Point is I expect zen4 to significantly improve latencies, as a server first uarch this is probably the only place where Intel has some kind of lead.
Lower latency in the IO/fabric? Sure. Lower latency within the core while growing the cache sizes and optimizing for clocks? Nah
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,774
3,153
136
My expectation is very significant changes to FP/VECTOR side required to support AVX512. Current "symmectric" pairs of 256bit MAC + 256bit ADD will have to change to support 512bit ops. What exact scheme they will support is hard to guess, ZEN->ZEN2 style transition is likely impossible here, as there are plenty of AVX512 instructions that either work on full 512bits in a cycle or would take ages when microcoded. AVX 256bits was extremely careful not to mix between high and low 128bits and AVX512 is opposite, esp all the great stuff in IceLake etc gen.

So i think they will have 512 MAC + 256 ADD + 256 MAC + 256 ADD (+fp l/s) That would be a move to give great compability and reasonable performance for AVX512 stuff, but would not "break" records in AVX512 execution speed and would not break bank in register ports and power and so on. Might even get away with some slack in cache bandwidths.

The Integer core wil likely see improved branch prediction, somewhat larger ROB of 300'ish size, ST/LD queues etc but still firmly 4 ALU machine, that is frankly uninspired and lacking any ambition for year 2022 on 5nm process. If i had to guess - Zen4 was deemed low risk to fail due to 5nm process and hugely increased density, efficiency and clocks and was given to 2nd rate team at AMD, while the stars work on Zen5.

It's lucky for AMD that ARM is equally uninspired and took ages to move to 4 ALU design and only getting deployed now in Graviton 3. The future might not be so kind.
If they go to the effort of increasing all the data paths in the Core , L/S pipeline and cache it seems to me to be a complete waste to then just do 1 512bit fmac. I wouldn't be surprised if its 4x 512bit units.
Seems a complete waste to have 521bit L/S units but then limit data to and from the FPU to 256bits a clock.
What else could they be using all the extra transistors for?

would then Make Zen4c interesting they could roll a 256bit or even console like FPU.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
My understanding is that everything for AVX512 is already present in Zen3. To enable it in Zen4 is a relatively minimal change: ~5% area increase,etc.

2x256-bit MUL + 2x256-bit ADD + 2x256-bit MISC/FSTORE separated into two clusters. Zen4 would just operate the two clusters as if they were hi-lo.

FP Retire gets 512-bit operation
FP NSQ splits 512-bit operation into two 256-bit operations
FP Scheduler0 and Scheduler1 would get each half.
Then execution and store to load/store.
 
Last edited:

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
If they go to the effort of increasing all the data paths in the Core , L/S pipeline and cache it seems to me to be a complete waste to then just do 1 512bit fmac. I wouldn't be surprised if its 4x 512bit units.

We'll see, they could have 512/768/1024 bits of FMA per cycle, with according units. Do they really need to match 512bits in adders or do they need separate add units at all? No idea, but 4x 512 units would need ridiculous amount of register read bandwidth to not sit completely idle during execution. That is kinda a luxury to waste so much silicon in the name of flexibility.
Gonna be interesting to see what they come up with. Some other likely options:

1) Having Intel's scheme of P0+P1 doubled, doing either 2x512 FMA or up to 4x256 ( could limit 256bit FMAs to 2x per cycle, but have other more register bw friendly ops at 4 )
2) Having above on single port plus separate 256 FMA + 256 ADD ports, doing 1x512 or 3x256 FMA
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
Lower latency in the IO/fabric? Sure. Lower latency within the core while growing the cache sizes and optimizing for clocks? Nah
The switch to 5 nm may allow them to keep good latencies while increasing clock speeds. I don’t expect some massive pipeline reorganization between Zen 3 and Zen 4. Rebalancing pipeline stages is a lot of work. Zen 5 might be drastically different though.
 
  • Like
Reactions: Kaluan

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
The Integer core wil likely see improved branch prediction, somewhat larger ROB of 300'ish size, ST/LD queues etc but still firmly 4 ALU machine, that is frankly uninspired and lacking any ambition for year 2022 on 5nm process.

At 7% expected perf/clock increase in general code ST and 10% for MT, I don't think the changes will be thorough as you are expecting. Consider some targetted changes to certain instructions, and caches to support AVX-512 instructions without bottlenecking it too much.

The extra gains on MT is probably because Zen 3 had some bottlenecks in the area related to that. I remember Zen 3's gain in MT to be less compared to ST for example.
 
  • Like
Reactions: Tlh97

jamescox

Senior member
Nov 11, 2009
637
1,103
136
What's my point? You already quoted It.
I will repeat, my point was that the top model with 170W TDP(230W PPT) will have most likely worse power efficiency(perf/W) than 5950x and that AMD with this SKU chose to go for max performance even If It means worse power efficiency than Its predecessor.

What's your point in bringing up overclock? You think I don't know that Zen4 will reach much higher clocks than Zen3 and that at those It would be more efficient? I simply compared TOP Zen3 SKU vs the future TOP Zen4 SKU in power efficiency, because I find It interesting. If you consider It an unfair comparison, that's your opinion, and you are entitled to It like I am to mine.
I really don’t know why I find your “AMD is sacrificing power efficiency for performance” so ridiculous, but I do, so I guess I still have to try to write a reply.

Zen 4 will be significantly more power efficient than Zen 3 just about anyway you look at it. They are increasing the power limits since they will have a lot higher max all core clock, but it isn’t like that doesn’t deliver significantly more performance. I expect it will be a large difference in many cases. Base clock for the 5950x is only 3.4 GHz; I think someone said they can run all core at 3.7, presumably without extreme cooling. If Zen 4 does all core over 5 GHz, the performance will be truly amazing in comparison.

I bring up overclocking because no matter what clock speed you compare, Zen 4 will almost certainly be more efficient. If you compare the same performance, then Zen 4 will likely be significantly more efficient everywhere also. They probably did a lot of power optimization on the IO die, so I expect idle power is significantly improved. So what if your absolute performance per watt might be slight lower at extreme clocks, but absolute performance is something like 40 percent higher or more? I do not consider this worse power efficiency; it is in a range that Zen 3 can’t reach without taking ridiculously more power.

Saying that they are sacrificing efficiency seems misleading at best. If they are pushing Zen 4 significantly farther up the frequency / power curve, where you get huge increase in power for a small increase in performance, then I would say they are sacrificing efficiency. We don’t have any evidence that they are doing that. They are increasing the max power, but that appears to be coming with a reasonable increase in performance for that power. It isn’t unexpected with a new socket after so many years of AM4.
 

lixlax

Member
Nov 6, 2014
183
150
116
I just want to make clear that if the "power reporting deviation" in HWinfo is higher than 100% then the processor consumes less power than reported and if its lower than 100% then its actually fed more power than reported.
And only check this value when under load.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
I just want to make clear that if the "power reporting deviation" in HWinfo is higher than 100% then the processor consumes less power than reported and if its lower than 100% then its actually fed more power than reported.
And only check this value when under load.

Yeah, You pretty much need to run CB20, as suggested by HWInfo. And if under that workload it is still showing > 105% or so, then BIOS might have mistaken values. I think only MSI exposes this reference amperage settings in some of higher end MBs, so even if it is wrong and You are on latest BIOS, not much can be done about it.
 

Timmah!

Golden Member
Jul 24, 2010
1,419
632
136
I just realized that if the rumor about Zen4 CB23 score of 38K is true, it would double MT performance of my current CPU (14C Skylake-X).... that actually sound good. Until now i was still under impression that these new CPUs (by which i mean all these new CPU designs post Skylake), are at best like 40 percent faster - which is not bad, but not necessarily reason to upgrade. That i guess was not taking into account the clock speed increases...

All in all, i was eyeing something with more cores, either new TR or that rumored new Intel HEDT Sapphire Rapids, something with over 20 cores, and if it was coming this year, i guess i still would, but i am slowly warming up to the idea of this Z4 and saving up money in the process.
 
Last edited:

HurleyBird

Platinum Member
Apr 22, 2003
2,684
1,268
136
It's just MLID again. Who knows?

He seems to have great Intel sources, and leaked Rialto Ridge in March. My impression is that he has a few actual intel insiders, but that most of his sources for AMD/Nvidia are 2nd party. To the extent he may have 1st party AMD sources, AMD seems to do compartmentalization really well nowadays.

I don't know if he wildly missed the mark with Zen 4, even if the IPC miss stood out a bit. My recollection is that he was pretty on point with x670 stuff, which makes sense if his AMD sources are primarily 2nd party (eg. Mobo markers).

It's easy to see how a 2nd party source may get an estimate of the overall performance but misattribute clocks vs IPC, or extrapolate server IPC to desktop, etc. Just because there are compounding error bars does not mean that a prediction was based on nothing.

I think leakers get a bad rap on here, and the accusations of well-known leakers "making stuff up" seems like a stretch. Sometimes leakers get truly bad intel that a source either made up or a company deliberately planted (Adored's 5GHz Zen 2 lineup reveal comes to mind), but even when that isn't the case you need to realize that it's a game of telephone, with some amount of interpretation at every stage, talking about future products where things can change in development. For guys like MLID and Charlie, it's a mistake to take their word as gospel, but it's also a mistake to discount them entirely.
 
Last edited: