Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 468 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

Hitman928

Diamond Member
Apr 15, 2012
5,290
7,920
136
I don't know. The performance gap isn't very large to begin with, nothing like what I listed in my post. The 13900K also had the same average and 1% lows at 1080p and 1440p, which is odd because the game is mostly very GPU intensive.

I have the game so I know. The Spider-Man games are way more CPU intensive, and so is Cyberpunk 2077. A good example of how CPU intensive CBP 2077 can be, this guy tested a 5800X3D in the market place at 1080p DLSS quality with RT ultra on an RTX 4080 and maxed crowd settings and he couldn't maintain 60 FPS. If you look at the GPU load, it's low so it's being CPU bottlenecked.


Now here's a 13900K rig with an RTX 4090. The 4090 is being CPU bottlenecked because he's testing 1080p with DLSS set to quality and psycho RT settings and the GPU usage is also low. But look at the framerate. He stays in the triple digit territory practically the entire time.


Kind of blows up your shakily established conclusion though, doesn't it? I mean, there is no pre-baked lighting whatsoever in the game and yet the 7950x is faster than a 13900k. It doesn't really make sense that Zen4 has an inherent problem with RT calculations (or RPL has a huge inherent advantage) when in a game with no pre-baked lighting, Zen4 is faster. Additionally, here's another test of the same game (Metro Exodus Enhanced) showing the 7950x being 25% faster than a 13900K when DLSS is turned on.

1673102382894.png

Here's another test (COD:BOCW) from the same review where RT is enabled and yet the 7950x is faster. Sure, it's not faster by much, but on average the 13900k is a faster gaming CPU, so this shouldn't be happening if indeed Zen4 had issues with RT calculations compared to RPL.

1673102798798.png

Others already mentioned it but you are taking very limited data as confirmation bias of a predetermined conclusion but there is data out there that directly contradicts your conclusion so maybe what you thought about Zen chips and RT isn't actually true.

 

Geddagod

Golden Member
Dec 28, 2021
1,157
1,020
106
5800X3D clocks to ~4,32GHz in all core loads (like rendering) and 4,45GHz in single core. So not a lot of difference. It also draws less power than 5800X under similar heavy loads.
All of this just points to me that it's possibly artificially limited.

Curve Optimizer or PBO may claw back some of that clockspeed or it may not do much. But it's certainly something we should hope reviewers dig into properly (SkatterBench where u at bro?).

Fmax is also a question, 7700X actually goes up to 5,55GHz on multiple cores in lightly threaded or light multithreaded loads. Real world clocks of stock 7800X3D may be different than advertised 5GHz as well.
I'm pretty sure the clock speeds of the 7800x3D aren't artificially limited but rather the 7900X3D and 7950x3D have higher clock speeds because they have chiplets that don't have V-Cache, and are using those cores as their ST max frequency.
 
  • Like
Reactions: Kedas

Timmah!

Golden Member
Jul 24, 2010
1,419
631
136
Your BVH hypothesis is kind of interesting and could well prove to have legs. You don't need to bring it up in every other post you make though. You decided that Intel's memory speed advantage was a huge deal months before Raptor even launched and now you're hyper-focused on pointing out any edge case where it might bear fruit. The problem is you are drawing questionable conclusions from incomplete data (including websites/youtubers many of have never heard of) as evidence and repeatedly posting the same arguments ad-nausea.

Personally, i doubt it. There have to be some reasons, why those games run better on Intel hardware, but i dont believe its something you can broadly pin to lacking RT performance of AMD CPUs overall. As has been posted, Metro Exodus, which was the most RT intensive game not so long ago, performs comparably to RPL. Additionally, BVH traversal is the most time consuming part of RT calculations, thats why RT cores were added into GPUs in the first place, so i pretty much doubt CPU takes any part in it.

As i see it, even if there are more games that run somewhat better on 13900k, thats not surprising, as Golden/Raptor Cove core is still stronger than Zen4 core, and the only outliers i am aware of, where the performance difference is 30 percent or more, are those Spider-Man games, which are made by the same studio, so i would be looking for problem in there. AFAIK the games are using proprietary Insomniac engine, that the original 2018? game ran on, and the Remastered and MM games added RT features onto it - so its possible such significant performance difference is down to some bug or specific coding favoring Intel CPUs.
 

inf64

Diamond Member
Mar 11, 2011
3,702
4,021
136
Resource wise Zen 4 definitely looks weaker in most cases compared to GLC, when you look at cache, OOO size, width, etc etc
Yes but it effectively has the same IPC as GC ( which lacks AVX512 on desktop), so what does that tell you about both designs? Intel had to bruteforce its way to get the same IPC as AMD, which is a disaster IMO.
 

Abwx

Lifer
Apr 2, 2011
10,951
3,469
136
Resource wise Zen 4 definitely looks weaker in most cases compared to GLC, when you look at cache, OOO size, width, etc etc

Zen 4 has higher IPC than Intel s GLC...


And the average include POV Ray wich is actually quite faster/clock than GLC once it also use AVX2.
 

Mopetar

Diamond Member
Jan 31, 2011
7,837
5,992
136
Zen 4 is basically an overclocked Zen 3. It has very little resources even vs Alder Lake. It shows from time to time.

We have to wait to 2024 for some Zen 5 goodies to match Alder Lake.

This was something that people were assuming, but it turned out to be quite wrong. Here's a list of changes in Zen 4:


Larger BTB (50% increase in the L1)
Larger DTLB (50% increase in the L2)
Larger Op cache (68% larger, 50% more output)
Larger Reorder Buffer (25% increase)
Larger Register File (20% increase)
Larger Load Queue (22% increase)
Larger L2 cache (100% increase)
Integrated Graphics
AVX-512 instruction support

Really the only thing that didn't change in some was was the number of execution ports, but there wasn't a lot of point in increasing that without a wider front-end. The notion that all we got was an overclocked Zen 3 is woefully incorrect.
 

Geddagod

Golden Member
Dec 28, 2021
1,157
1,020
106
Yes but it effectively has the same IPC as GC ( which lacks AVX512 on desktop), so what does that tell you about both designs? Intel had to bruteforce its way to get the same IPC as AMD, which is a disaster IMO.
Depends on the workload. For example, in POV Ray or CB 20, it has nearly 10% higher IPC but in CB15 it has slightly worse IPC than Zen 4 even, at 3.6 Ghz according to Computer Base. Different applications hit different bottlenecks and stress the architecture in different ways, but I think it's pretty clear that if not GLC, RPL still holds the IPC advantage over Zen 4, even if slim, in ST, from Raichu's spec 2006? testing, to Computer Bases IPC testing as well.

The only "disaster" here would be if A) the high IPC came at the cost of clock speeds, or B) if the higher IPC caused a severe increase in die size.
We both know that A is certainly not true, considering that it looks like RPL is clocking higher than Zen 4 despite being a node behind. And it's not like Zen 4 is using a cutting edge node with low clock speeds, TSMC 5nm was extremely mature by the time AMD swung on to 5nm.
As for B, GLC is pretty big no doubt about it. But comparing it core/core of it's previous architecture, willow cove, you see a 26% increase in area for a (1.2 IPC x 1.1 clock speed) 32% increase in performance.
Comparing this to Zen 3, 8 core complex by 8 core complex, (comparing it to Zen 3 because that was GLC competitor timeline wise, and also uses a comparable node for density comparisons) you get a 15-21% perf/area advantage for Zen 3, but you also get a ~15% lead from GLC compared to Zen 3, so I would say that was a decent trade off.

The only "disaster" I see here is a bad PR look, which honestly doesn't even look that bad. Intel's GLC architecture was enough for Zen 3, Zen 3 V-cache, and looking at 3Dcenter meta data, certainly competitive vs Zen 4 as well in terms of core-to-core competitiveness for ST/gaming. With GLC, they created a base that they don't really have to do any hard additions with until likely Lion Cove with ARL, seeing how Raptor Cove is a GLC variant, and redwood cove looks to be an iteration of GLC as well.
Despite introducing major changes with GLC, proper optimization of it's immense front end is something that I'm pretty sure Intel is going to work on with Redwood Cove. Just because you are still on the same "width" of the architecture doesn't mean you can't make major IPC increases, as you see with Zen but also between Intel through Conroe through skylake.

GLC looks to be pretty much the "setup" stage of Intel's architectures for 3 generations, GLC, RPL, and RWC. Perf/Area wise it wasn't bad vs it's competitor, so despite it's larger resources it looked like it put them to use pretty well. The high IPC obviously didn't come at any other performance flaws as well, as clock speed is high as well.

Btw area numbers are all from Locuza, found from his twitter and also his substack, if you want to double check.
 

Geddagod

Golden Member
Dec 28, 2021
1,157
1,020
106
On paper Zen 4 *does* looks weaker/simpler than Golden Cove but its unquantifiable characteristics must be excellent as it ends up with similar performance at similar clock rates. Not in every workload but when averaged out they're pretty close.
Unquantifiable characteristics?
I think Zen 4s relative performance without having a wider architecture boils down to a couple of points:
Making a architecture wider and wider has diminishing returns
Zen 4 makes better uses of the resources it does have, there are a couple examples of this from what I could find:
  1. in high IPC workloads, GLC is more limited by the number of execution ports than Zen 3
  2. Zen 4s reorder buffer performs surprisingly well compared to GLC despite the size difference
Also GLC has to compensate for it's higher latency cache hierarchy with a larger reoderbuffer compared to Zen 3/4.
I'm sure there are a multitude of more reasons but those are some I can find based on the chipsandcheese articles I have read about GLC and Zen 3/4 architectures.
 

Geddagod

Golden Member
Dec 28, 2021
1,157
1,020
106
This was something that people were assuming, but it turned out to be quite wrong. Here's a list of changes in Zen 4:


Larger BTB (50% increase in the L1)
Larger DTLB (50% increase in the L2)
Larger Op cache (68% larger, 50% more output)
Larger Reorder Buffer (25% increase)
Larger Register File (20% increase)
Larger Load Queue (22% increase)
Larger L2 cache (100% increase)
Integrated Graphics
AVX-512 instruction support

Really the only thing that didn't change in some was was the number of execution ports, but there wasn't a lot of point in increasing that without a wider front-end. The notion that all we got was an overclocked Zen 3 is woefully incorrect.
To be fair that still does kind of look like Zen 3 but with just some larger structures, and avx-512 support. There's a reason Zen 3 was billed an architectural overhaul and Zen 4 wasn't....
Also I think it's pretty interesting how Zen 5 is pretty much doing exactly that, widening the front end, to better feed the large execution throughput power AMD has had since Zen 3 it seems like. I also find it kind of funny, how Intel seems to be doing it the other way around, they widened their architecture first with GLC, while optimizing the core and the utilization of its large front end later.
 

Geddagod

Golden Member
Dec 28, 2021
1,157
1,020
106
Zen 4 has higher IPC than Intel s GLC...


And the average include POV Ray wich is actually quite faster/clock than GLC once it also use AVX2.
It has higher IPC in the multicore IPC test, by a margin of 2%.
And you know what? Go check out the RPL MT IPC tests, and GLC and RPL have the same MT IPC as Zen 4 now, while also using slower ram. It arguably looks like the first test was within the margin of error.
But yes, I also do believe that Zen 4 has better SMT application in its cores than what GLC has, though I forgot where I read it. So someone might want to double check me on that.
 

Mopetar

Diamond Member
Jan 31, 2011
7,837
5,992
136
To be fair that still does kind of look like Zen 3 but with just some larger structures, and avx-512 support. There's a reason Zen 3 was billed an architectural overhaul and Zen 4 wasn't....

I was replying to a post that specifically said that Zen 4 is just an overclocked Zen 3. Also any new x86 CPU is going to look much the same as a previous generation processor, only with more stuff in it. I don't know what people are expecting.

But even saying that, there were some changes to Zen 4 that weren't listed in my post. The IO die saw some big changes as did the infinity fabric links between the chips, branch prediction had more than just minor tweaks, and the changes to the op cache being able to output 50% more instructions had some performance implications as well as AMD has said that the front-end changes they made account for about half of the IPC gains.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Where are you getting all core boost numbers from? Is that what "boost max" is known to represent?
It's 2.25 Ghz Base Clock for a 128C/256T, 360W CPU With AVX-512 Always on.. That's Unprecedented Computing power from a Single CPU just right there.

1673129811563.png


I am usually not fan of Hyperbolic Catchphrases, but allow me to indulge: Intel DEAD....! I will show myself out... :kissing:

@Markfw
 
Last edited:

Abwx

Lifer
Apr 2, 2011
10,951
3,469
136
Depends on the workload. For example, in POV Ray or CB 20, it has nearly 10% higher IPC but in CB15 it has slightly worse IPC than Zen 4 even, at 3.6 Ghz according to Computer Base

Their POV Ray version has not AVX2 enabled for AMD CPUs, only for Intel ones, if AVX2 is enabled Zen 4 has better IPC in this soft as well, there s a member by here who made comparative runs to demonstrate the thing, IIRC 18% better perf/clock than what is displayed at Computerbase.
 
  • Like
Reactions: lightmanek

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
It's 2.25 Ghz Base Clock for a 128C/256T, 360W CPU With AVX-512 Always on.. That's Unprecedented Computing power from a Single CPU just right there.

View attachment 74179


I am usually not fan of Hyperbolic Catchphrases, but allow me to indulge: Intel DEAD....! I will show myself out... :kissing:
When I quoted your comment, it had a remark about all core boost exceeding the best sapphire rapids, hence my question.

But yes, Bergamo is quite an exciting product. Still, a bit hyperbolic to call Intel dead :p
 

gdansk

Platinum Member
Feb 8, 2011
2,107
2,606
136
Unquantifiable characteristics?
I think Zen 4s relative performance without having a wider architecture boils down to a couple of points:
Making a architecture wider and wider has diminishing returns
Zen 4 makes better uses of the resources it does have, there are a couple examples of this from what I could find:
  1. in high IPC workloads, GLC is more limited by the number of execution ports than Zen 3
  2. Zen 4s reorder buffer performs surprisingly well compared to GLC despite the size difference
Also GLC has to compensate for it's higher latency cache hierarchy with a larger reoderbuffer compared to Zen 3/4.
I'm sure there are a multitude of more reasons but those are some I can find based on the chipsandcheese articles I have read about GLC and Zen 3/4 architectures.
I couldn't find the right word for it. But let's say "seldom quantified" characteristics. If you look at the available figures about queues/buffers, execution units and so on it isn't as impressive as GLC and its offspring.
 

yuri69

Senior member
Jul 16, 2013
389
622
136
This was something that people were assuming, but it turned out to be quite wrong. Here's a list of changes in Zen 4:


Larger BTB (50% increase in the L1)
Larger DTLB (50% increase in the L2)
Larger Op cache (68% larger, 50% more output)
Larger Reorder Buffer (25% increase)
Larger Register File (20% increase)
Larger Load Queue (22% increase)
Larger L2 cache (100% increase)
Integrated Graphics
AVX-512 instruction support

Really the only thing that didn't change in some was was the number of execution ports, but there wasn't a lot of point in increasing that without a wider front-end. The notion that all we got was an overclocked Zen 3 is woefully incorrect.
Well, it's known Zen 4 is not a direct shrink of Zen 3. So some structures got a minor boost in size - some were surely driven by the AVX512 instruction mix. But overall, the design and performance characteristics are similar to Zen 3 - except the frequency and AVX512 - in most cases. That is the point.

As for the IPC debate, in SPECCPU 2017 Zen 4 has higher INT IPC than GC but lower than RC and lower FP IPC than both Intel's. Source: @OneRaichu bench
 
  • Like
Reactions: Henry swagger

Henry swagger

Senior member
Feb 9, 2022
370
239
86
On paper Zen 4 *does* looks weaker/simpler than Golden Cove but its unquantifiable characteristics must be excellent as it ends up with similar performance at similar clock rates. Not in every workload but when averaged out they're pretty close.
Yeah zen 4 is a good core.. but amd has a massive node advantage like 48% density and smaller cells
 

Asterox

Golden Member
May 15, 2012
1,026
1,775
136
  • Like
Reactions: lightmanek

inf64

Diamond Member
Mar 11, 2011
3,702
4,021
136
Well, it's known Zen 4 is not a direct shrink of Zen 3. So some structures got a minor boost in size - some were surely driven by the AVX512 instruction mix. But overall, the design and performance characteristics are similar to Zen 3 - except the frequency and AVX512 - in most cases. That is the point.

As for the IPC debate, in SPECCPU 2017 Zen 4 has higher INT IPC than GC but lower than RC and lower FP IPC than both Intel's. Source: @OneRaichu bench
You do realize that the specint is within 0.5% and specfp(less important) is within 4%? That is margin of error stuff basically. For the amount of transistors they spewed in GC, it should be stomping Zen4, and not matching or beating it with 0-4% difference. And all of that is against Zen4 vanilla version.