Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 460 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
809
1,412
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

Thunder 57

Platinum Member
Aug 19, 2007
2,992
4,568
136
I don't know. The performance gap isn't very large to begin with, nothing like what I listed in my post. The 13900K also had the same average and 1% lows at 1080p and 1440p, which is odd because the game is mostly very GPU intensive.

I have the game so I know. The Spider-Man games are way more CPU intensive, and so is Cyberpunk 2077. A good example of how CPU intensive CBP 2077 can be, this guy tested a 5800X3D in the market place at 1080p DLSS quality with RT ultra on an RTX 4080 and maxed crowd settings and he couldn't maintain 60 FPS. If you look at the GPU load, it's low so it's being CPU bottlenecked.


Now here's a 13900K rig with an RTX 4090. The 4090 is being CPU bottlenecked because he's testing 1080p with DLSS set to quality and psycho RT settings and the GPU usage is also low. But look at the framerate. He stays in the triple digit territory practically the entire time.


Who uses random tubers a references when comparing anything? Is this where you are really going?

Going back to Spiderman and that reddit thread, someone made a joke on there. They said if you wanted to know the best configuration was to play the game on, it was a console. Crap ports have always existed. Give it a break.
 
  • Like
Reactions: Mopetar and Kaluan

Thunder 57

Platinum Member
Aug 19, 2007
2,992
4,568
136
You're fairly intellectually lazy I see, always wanting others to find evidence for you when you're quite capable of doing it yourself....or maybe not. Computerbase.de is one of the best known and most credible German hardware review sites and they've done more testing with RT workloads than any other.

Core i9-13900K, i7-13700K & i5-13600K: Gaming Kings Review: Benchmarks in Games - ComputerBase

They tested Cyberpunk with RT, Far Cry 6 with RT, and Spider-Man PC Remastered with RT on an RTX 3090 Ti at 720p and the 13900K was nearly 30% faster than the 7950x in CBP and Spider-Man, and 17% faster with Far Cry 6. And let me remind you, this is with an RTX 3090 Ti. The gap would have been even wider if they had used an RTX 4090 for their first review.

Another reputable German review site, PCgameshardware.de, tested Spider-Man Miles Morales, which uses the same engine as Spider-Man Remastered but also includes RT shadows as well as reflections and found a nearly 50% gap between the 7950x and the 13900K.

Spider-Man: Miles Morales - CPU-Benchmarks (pcgameshardware.de)

Different games, different engines, different publishers. I'm sure you will find some way of accusing them of being biased towards AMD though LOL!

Witcher 3 with RT effects also shows large gaps and favors Raptor Lake. Unfortunately, GameGPU doesn't have access to Zen 4 CPUs, but they tested a 5900X and the 13900K was 62.5% faster. Zen 4 would likely have reduced the gap somewhat.

The Witcher 3: Wild Hunt v. 4.0 GPU/CPU test | Action / FPS / TPS | GPU test (gamegpu.com)

There are also other examples on YouTube I can post if requested. Fact is, this is a real weakness in Zen CPUs compared to Golden/Raptor Cove. It has to do with the CPU having to initialize and maintain BVH structures, which is very demanding on the CPU. I think ADL and RPL's strength in these workloads comes from having higher memory and cache bandwidth plus a wider core with more OoO resources and throughput.

RT BVH workloads are going to increase in the future as RT becomes more prevalent, so CPUs will need to be ready.

I'm sure you'll dismiss all of this though. Gotta keep the AMD hype train moving at all costs, no dissent allowed! :D

You're the one making the argument, why should I find data to back it up? If what you say is true there should be coverage of it. Maybe everybody has missed that Zen 4 sucks at RT. I'll check this stuff out tomorrow.
 

Rigg

Senior member
May 6, 2020
540
1,273
136
You're fairly intellectually lazy I see, always wanting others to find evidence for you when you're quite capable of doing it yourself....or maybe not. Computerbase.de is one of the best known and most credible German hardware review sites and they've done more testing with RT workloads than any other.

Core i9-13900K, i7-13700K & i5-13600K: Gaming Kings Review: Benchmarks in Games - ComputerBase

They tested Cyberpunk with RT, Far Cry 6 with RT, and Spider-Man PC Remastered with RT on an RTX 3090 Ti at 720p and the 13900K was nearly 30% faster than the 7950x in CBP and Spider-Man, and 17% faster with Far Cry 6. And let me remind you, this is with an RTX 3090 Ti. The gap would have been even wider if they had used an RTX 4090 for their first review.

Another reputable German review site, PCgameshardware.de, tested Spider-Man Miles Morales, which uses the same engine as Spider-Man Remastered but also includes RT shadows as well as reflections and found a nearly 50% gap between the 7950x and the 13900K.

Spider-Man: Miles Morales - CPU-Benchmarks (pcgameshardware.de)

Different games, different engines, different publishers. I'm sure you will find some way of accusing them of being biased towards AMD though LOL!

Witcher 3 with RT effects also shows large gaps and favors Raptor Lake. Unfortunately, GameGPU doesn't have access to Zen 4 CPUs, but they tested a 5900X and the 13900K was 62.5% faster. Zen 4 would likely have reduced the gap somewhat.

The Witcher 3: Wild Hunt v. 4.0 GPU/CPU test | Action / FPS / TPS | GPU test (gamegpu.com)

There are also other examples on YouTube I can post if requested. Fact is, this is a real weakness in Zen CPUs compared to Golden/Raptor Cove. It has to do with the CPU having to initialize and maintain BVH structures, which is very demanding on the CPU. I think ADL and RPL's strength in these workloads comes from having higher memory and cache bandwidth plus a wider core with more OoO resources and throughput.

RT BVH workloads are going to increase in the future as RT becomes more prevalent, so CPUs will need to be ready.

I'm sure you'll dismiss all of this though. Gotta keep the AMD hype train moving at all costs, no dissent allowed! :D
13900k vs 7950x - Spiderman is 23% faster Spiderman RT is 28%.

This is at 720p w/ a 3090ti @ default JDEC. Somewhat interesting sure, though not really relevant to a reasonable real world gaming scenario. Saying its 30% faster (28% in reality) with RT on without pointing out its already 23% faster with RT off is misleading at best and dishonest at worst.

They don't have data for Far Cry without RT so we can't draw any solid conclusions. Same story with Cyberpunk. HUB has a 10% gap in Far Cry (no RT) at 1080p w/ a 4090 and reasonable memory speeds which is a mildly compelling argument for this being another misleading and cherry picked example.

I don't know what to make of Cyberpunk. It's CPU and GPU intensive. According to HUB, 13900k is 10% faster at 1080p and 4% slower at 1440p than 7950x with no RT.

While there does seem to be some indication that turning on RT in some of these games does seriously hammer memory bandwidth, we don't have nearly enough adequate testing data to really know what to make of any of this. Especially as it pertains to relevant, real world gaming scenarios.

Your BVH hypothesis is kind of interesting and could well prove to have legs. You don't need to bring it up in every other post you make though. You decided that Intel's memory speed advantage was a huge deal months before Raptor even launched and now you're hyper-focused on pointing out any edge case where it might bear fruit. The problem is you are drawing questionable conclusions from incomplete data (including websites/youtubers many of us have never heard of) and using it as evidence to support your argument. Repeatedly posting the same argument ad-nausea based on shaky evidence doesn't make it more convincing.
 
Last edited:

Vattila

Senior member
Oct 22, 2004
809
1,412
136
Here is The Next Platform's article on the Instinct MI300 announcement. They are a little confused about what the package rendering in the AMD slides show, though.

"AMD says that there are nine 5 nanometer chiplets and four 6 nanometer chiplets on the MI300A package, with HBM3 memory surrounding it. [...] That sure looks like six GPU chiplets, plus two CPU chiplets, plus an I/O die chiplet on the top, with four underlying chiplets that link two banks of HBM3 memory to the complex at eight different points and to each other. That would mean AMD re-implemented the I/O and memory die in 5 nanometer processes, rather than the 6 nanometer process used in the I/O and memory die in the Genoa Epyc 9004 complex. We strongly suspect that there is Infinity Cache implemented on those four 6 nanometer connecting chiplets, but nothing was said about that. It does not look like there is 3D V-Cache on the CPU cores in the MI300A package."

AMD Teases Details On Future MI300 Hybrid Compute Engines (nextplatform.com)

amd-instinct-mi300-render-zoom.jpg

The rendering doesn't quite match the package photo, either. The HBM is arranged a little differently on the real chip.

My speculation is that the MI300 rendering shows 6 APU chiplets, each with 4 Zen 4 cores and 40 CDNA 3 units. That adds up to the revealed number of Zen 4 cores (24). Then I guess there are 2 adaptive chiplets (FPGA, DSP) and a massive AI chiplet (Xilinx AI Engine). That adds up to 9 chiplets on N5, as announced. Which, as revealed, sit on top of 4 base chiplets made on N6, which I presume have the L3 cache, I/O and network-on-chip. Reportedly, the dice between the HBM chiplets are just structural silicon. However, that seems odd to me (why two then, and why the odd size?). Could they be memory controllers, possibly with encryption, compression and memory-side cache, perhaps?

Instinct MI300 (chiplet speculation).png
 
Last edited:

biostud

Lifer
Feb 27, 2003
18,700
5,431
136
That's interesting, then it may be better to get the 7800X3D, but clocks that much higher would actually help in old games that are bottlenecked by a single thread.

It also seems the B650E boards are nearly identical to the X670E ones but quite a bit cheaper, so I might go for one of them.
I would think that older games would run perfectly well on either CPU.

The X670E chipset is two B650E chipset in tandem, so unless you need the extra connectivity, there is absolutely no reason to choose X670E over B650E.
 

Timorous

Golden Member
Oct 27, 2008
1,748
3,240
136
You're fairly intellectually lazy I see, always wanting others to find evidence for you when you're quite capable of doing it yourself....or maybe not. Computerbase.de is one of the best known and most credible German hardware review sites and they've done more testing with RT workloads than any other.

Core i9-13900K, i7-13700K & i5-13600K: Gaming Kings Review: Benchmarks in Games - ComputerBase

They tested Cyberpunk with RT, Far Cry 6 with RT, and Spider-Man PC Remastered with RT on an RTX 3090 Ti at 720p and the 13900K was nearly 30% faster than the 7950x in CBP and Spider-Man, and 17% faster with Far Cry 6. And let me remind you, this is with an RTX 3090 Ti. The gap would have been even wider if they had used an RTX 4090 for their first review.

Another reputable German review site, PCgameshardware.de, tested Spider-Man Miles Morales, which uses the same engine as Spider-Man Remastered but also includes RT shadows as well as reflections and found a nearly 50% gap between the 7950x and the 13900K.

Spider-Man: Miles Morales - CPU-Benchmarks (pcgameshardware.de)

Different games, different engines, different publishers. I'm sure you will find some way of accusing them of being biased towards AMD though LOL!

Witcher 3 with RT effects also shows large gaps and favors Raptor Lake. Unfortunately, GameGPU doesn't have access to Zen 4 CPUs, but they tested a 5900X and the 13900K was 62.5% faster. Zen 4 would likely have reduced the gap somewhat.

The Witcher 3: Wild Hunt v. 4.0 GPU/CPU test | Action / FPS / TPS | GPU test (gamegpu.com)

There are also other examples on YouTube I can post if requested. Fact is, this is a real weakness in Zen CPUs compared to Golden/Raptor Cove. It has to do with the CPU having to initialize and maintain BVH structures, which is very demanding on the CPU. I think ADL and RPL's strength in these workloads comes from having higher memory and cache bandwidth plus a wider core with more OoO resources and throughput.

RT BVH workloads are going to increase in the future as RT becomes more prevalent, so CPUs will need to be ready.

I'm sure you'll dismiss all of this though. Gotta keep the AMD hype train moving at all costs, no dissent allowed! :D

Look at the delta in Doon Eternal between a 6950XT and a 3090Ti at the CPU limit.


About 95 FPS going from 520 to 615 with a 7700X.

Old 3000 series drivers have odd behaviour with Zen 4 at CPU limited settings.

To be honest any test not using a 4090 at this point is essentially worthless.
 
  • Like
Reactions: inf64 and Kaluan

BorisTheBlade82

Senior member
May 1, 2020
680
1,069
136
Here is The Next Platform's article on the Instinct MI300 announcement. They are a little confused about what the package rendering in the AMD slides show, though.

"AMD says that there are nine 5 nanometer chiplets and four 6 nanometer chiplets on the MI300A package, with HBM3 memory surrounding it. [...] That sure looks like six GPU chiplets, plus two CPU chiplets, plus an I/O die chiplet on the top, with four underlying chiplets that link two banks of HBM3 memory to the complex at eight different points and to each other. That would mean AMD re-implemented the I/O and memory die in 5 nanometer processes, rather than the 6 nanometer process used in the I/O and memory die in the Genoa Epyc 9004 complex. We strongly suspect that there is Infinity Cache implemented on those four 6 nanometer connecting chiplets, but nothing was said about that. It does not look like there is 3D V-Cache on the CPU cores in the MI300A package."

AMD Teases Details On Future MI300 Hybrid Compute Engines (nextplatform.com)

amd-instinct-mi300-render-zoom.jpg

The rendering doesn't quite match the package photo, either. The HBM is arranged a little differently on the real chip.

My speculation is that the MI300 rendering shows 6 APU chiplets, each with 4 Zen 4 cores and 40 CDNA 3 units. That adds up to the revealed number of Zen 4 cores (24). Then I guess there are 2 adaptive chiplets (FPGA, DSP) and a massive AI chiplet (Xilinx AI Engine). That adds up to 9 chiplets on N5, as announced. Which, as revealed, sit on top of 4 base chiplets made on N6, which I presume have the L3 cache, I/O and network-on-chip. Reportedly, the die between the HBM chiplets are just structural silicon. However, that seems odd to me (why two then, and why the odd size?). Could they be memory controllers, possibly with encryption and compression, perhaps?

View attachment 74130
Your speculation about the different chiplets is the most reasonable I have seen until now (including my own 😉).
MI300 is likely the most interesting 2023 product from a technological point of view.
 

BorisTheBlade82

Senior member
May 1, 2020
680
1,069
136
@Hitman928 & @nicalandia
I remember that speculation on Twitter, as I actually was a part of it.
This could make sense as otherwise Zen4c would only have half the bandwidth per core compared to Zen4. As each IOD has 12 ports, they could be connected to each other by four ports (128Gbyte/s IIRC). Could still be a bit narrow. And also it might not fit on the package physically. So just wild speculation.
 
  • Like
Reactions: nicalandia

Geddagod

Golden Member
Dec 28, 2021
1,296
1,368
106
5800X3D clocks to ~4,32GHz in all core loads (like rendering) and 4,45GHz in single core. So not a lot of difference. It also draws less power than 5800X under similar heavy loads.
All of this just points to me that it's possibly artificially limited.

Curve Optimizer or PBO may claw back some of that clockspeed or it may not do much. But it's certainly something we should hope reviewers dig into properly (SkatterBench where u at bro?).

Fmax is also a question, 7700X actually goes up to 5,55GHz on multiple cores in lightly threaded or light multithreaded loads. Real world clocks of stock 7800X3D may be different than advertised 5GHz as well.
I'm pretty sure the clock speeds of the 7800x3D aren't artificially limited but rather the 7900X3D and 7950x3D have higher clock speeds because they have chiplets that don't have V-Cache, and are using those cores as their ST max frequency.
 
  • Like
Reactions: Kedas

Timmah!

Golden Member
Jul 24, 2010
1,513
832
136
Your BVH hypothesis is kind of interesting and could well prove to have legs. You don't need to bring it up in every other post you make though. You decided that Intel's memory speed advantage was a huge deal months before Raptor even launched and now you're hyper-focused on pointing out any edge case where it might bear fruit. The problem is you are drawing questionable conclusions from incomplete data (including websites/youtubers many of have never heard of) as evidence and repeatedly posting the same arguments ad-nausea.

Personally, i doubt it. There have to be some reasons, why those games run better on Intel hardware, but i dont believe its something you can broadly pin to lacking RT performance of AMD CPUs overall. As has been posted, Metro Exodus, which was the most RT intensive game not so long ago, performs comparably to RPL. Additionally, BVH traversal is the most time consuming part of RT calculations, thats why RT cores were added into GPUs in the first place, so i pretty much doubt CPU takes any part in it.

As i see it, even if there are more games that run somewhat better on 13900k, thats not surprising, as Golden/Raptor Cove core is still stronger than Zen4 core, and the only outliers i am aware of, where the performance difference is 30 percent or more, are those Spider-Man games, which are made by the same studio, so i would be looking for problem in there. AFAIK the games are using proprietary Insomniac engine, that the original 2018? game ran on, and the Remastered and MM games added RT features onto it - so its possible such significant performance difference is down to some bug or specific coding favoring Intel CPUs.
 

inf64

Diamond Member
Mar 11, 2011
3,865
4,549
136
Resource wise Zen 4 definitely looks weaker in most cases compared to GLC, when you look at cache, OOO size, width, etc etc
Yes but it effectively has the same IPC as GC ( which lacks AVX512 on desktop), so what does that tell you about both designs? Intel had to bruteforce its way to get the same IPC as AMD, which is a disaster IMO.
 

Mopetar

Diamond Member
Jan 31, 2011
8,113
6,768
136
Zen 4 is basically an overclocked Zen 3. It has very little resources even vs Alder Lake. It shows from time to time.

We have to wait to 2024 for some Zen 5 goodies to match Alder Lake.

This was something that people were assuming, but it turned out to be quite wrong. Here's a list of changes in Zen 4:


Larger BTB (50% increase in the L1)
Larger DTLB (50% increase in the L2)
Larger Op cache (68% larger, 50% more output)
Larger Reorder Buffer (25% increase)
Larger Register File (20% increase)
Larger Load Queue (22% increase)
Larger L2 cache (100% increase)
Integrated Graphics
AVX-512 instruction support

Really the only thing that didn't change in some was was the number of execution ports, but there wasn't a lot of point in increasing that without a wider front-end. The notion that all we got was an overclocked Zen 3 is woefully incorrect.
 

Geddagod

Golden Member
Dec 28, 2021
1,296
1,368
106
On paper Zen 4 *does* looks weaker/simpler than Golden Cove but its unquantifiable characteristics must be excellent as it ends up with similar performance at similar clock rates. Not in every workload but when averaged out they're pretty close.
Unquantifiable characteristics?
I think Zen 4s relative performance without having a wider architecture boils down to a couple of points:
Making a architecture wider and wider has diminishing returns
Zen 4 makes better uses of the resources it does have, there are a couple examples of this from what I could find:
  1. in high IPC workloads, GLC is more limited by the number of execution ports than Zen 3
  2. Zen 4s reorder buffer performs surprisingly well compared to GLC despite the size difference
Also GLC has to compensate for it's higher latency cache hierarchy with a larger reoderbuffer compared to Zen 3/4.
I'm sure there are a multitude of more reasons but those are some I can find based on the chipsandcheese articles I have read about GLC and Zen 3/4 architectures.
 

Geddagod

Golden Member
Dec 28, 2021
1,296
1,368
106
This was something that people were assuming, but it turned out to be quite wrong. Here's a list of changes in Zen 4:


Larger BTB (50% increase in the L1)
Larger DTLB (50% increase in the L2)
Larger Op cache (68% larger, 50% more output)
Larger Reorder Buffer (25% increase)
Larger Register File (20% increase)
Larger Load Queue (22% increase)
Larger L2 cache (100% increase)
Integrated Graphics
AVX-512 instruction support

Really the only thing that didn't change in some was was the number of execution ports, but there wasn't a lot of point in increasing that without a wider front-end. The notion that all we got was an overclocked Zen 3 is woefully incorrect.
To be fair that still does kind of look like Zen 3 but with just some larger structures, and avx-512 support. There's a reason Zen 3 was billed an architectural overhaul and Zen 4 wasn't....
Also I think it's pretty interesting how Zen 5 is pretty much doing exactly that, widening the front end, to better feed the large execution throughput power AMD has had since Zen 3 it seems like. I also find it kind of funny, how Intel seems to be doing it the other way around, they widened their architecture first with GLC, while optimizing the core and the utilization of its large front end later.
 

Geddagod

Golden Member
Dec 28, 2021
1,296
1,368
106
Zen 4 has higher IPC than Intel s GLC...


And the average include POV Ray wich is actually quite faster/clock than GLC once it also use AVX2.
It has higher IPC in the multicore IPC test, by a margin of 2%.
And you know what? Go check out the RPL MT IPC tests, and GLC and RPL have the same MT IPC as Zen 4 now, while also using slower ram. It arguably looks like the first test was within the margin of error.
But yes, I also do believe that Zen 4 has better SMT application in its cores than what GLC has, though I forgot where I read it. So someone might want to double check me on that.
 

Mopetar

Diamond Member
Jan 31, 2011
8,113
6,768
136
To be fair that still does kind of look like Zen 3 but with just some larger structures, and avx-512 support. There's a reason Zen 3 was billed an architectural overhaul and Zen 4 wasn't....

I was replying to a post that specifically said that Zen 4 is just an overclocked Zen 3. Also any new x86 CPU is going to look much the same as a previous generation processor, only with more stuff in it. I don't know what people are expecting.

But even saying that, there were some changes to Zen 4 that weren't listed in my post. The IO die saw some big changes as did the infinity fabric links between the chips, branch prediction had more than just minor tweaks, and the changes to the op cache being able to output 50% more instructions had some performance implications as well as AMD has said that the front-end changes they made account for about half of the IPC gains.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Where are you getting all core boost numbers from? Is that what "boost max" is known to represent?
It's 2.25 Ghz Base Clock for a 128C/256T, 360W CPU With AVX-512 Always on.. That's Unprecedented Computing power from a Single CPU just right there.

1673129811563.png


I am usually not fan of Hyperbolic Catchphrases, but allow me to indulge: Intel DEAD....! I will show myself out... :kissing:

@Markfw
 
Last edited:

Abwx

Lifer
Apr 2, 2011
11,557
4,349
136
Depends on the workload. For example, in POV Ray or CB 20, it has nearly 10% higher IPC but in CB15 it has slightly worse IPC than Zen 4 even, at 3.6 Ghz according to Computer Base

Their POV Ray version has not AVX2 enabled for AMD CPUs, only for Intel ones, if AVX2 is enabled Zen 4 has better IPC in this soft as well, there s a member by here who made comparative runs to demonstrate the thing, IIRC 18% better perf/clock than what is displayed at Computerbase.
 
  • Like
Reactions: lightmanek