Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 467 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

In2Photos

Golden Member
Mar 21, 2007
1,631
1,655
136
That's the least we should expect. But AMD really ought to work on being able to offer required software and OS tweaks already at the time when their products launch. "Fine wine" makes for a nice meme but essentially expanding it across the product range (now potentially with "Ryzen AI" as well etc.) is not exactly a good look.
Did I miss the part where AMD said the chips wouldn't work as intended when they launch? Or that special software will need to be developed and it isn't ready yet? Do you have insider information that these tools don't already exist?
 

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,176
1,519
136
Did I miss the part where AMD said the chips wouldn't work as intended when they launch? Or that special software will need to be developed and it isn't ready yet? Do you have insider information that these tools don't already exist?

We can only assume they intend to have the scheduler driver prepared for launch. Given the magnitude of the performance deficit in gaming if it was not, it would be a PR disaster.
 

Saylick

Diamond Member
Sep 10, 2012
3,199
6,506
136
N5/N4 still have ok SRAM scaling vs N7/N6, so in theory, there's still an advantage to be had. Since it's the same capacity, someone could probably do some quick estimates from screenshots.
Sorry, what I meant was that it doesn't scale relative to the cost. You might get 1.3x SRAM scaling going to N5/N4, but the node costs far more than 1.3x. Additionally, I presume the same V-cache chiplets go into Genoa-X, which doesn't clock nearly as high so the clock benefits of a V-cache die on N5/N4 would be unnecessary.
 

CP5670

Diamond Member
Jun 24, 2004
5,516
592
126
Most believe the boost clocks only will be the CCD without cache, while the one with cache will probably top out at 5Ghz like the 7800X3D.

That's interesting, then it may be better to get the 7800X3D, but clocks that much higher would actually help in old games that are bottlenecked by a single thread.

It also seems the B650E boards are nearly identical to the X670E ones but quite a bit cheaper, so I might go for one of them.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,679
3,806
136
Except it's not just with that particular game or that particular engine. Nice try though.

All I could find on Spiderman was a reddit post linking a Computerbase.de benchmark. If it was such a big deal there would be far more mention of it.

Not just that game or engine? List another. You didn't because any evidence must be so shallow it's probably idiots on some forum who don't know what they are doing. If RT was broken on AMD CPU's in multiple games, it would be huge news. Why not move on and try to find another reason Zen sucks?
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Sorry, what I meant was that it doesn't scale relative to the cost. You might get 1.3x SRAM scaling going to N5/N4, but the node costs far more than 1.3x. Additionally, I presume the same V-cache chiplets go into Genoa-X, which doesn't clock nearly as high so the clock benefits of a V-cache die on N5/N4 would be unnecessary.
Ah, gotcha.

On the topic, I wonder what the actual source of the frequency deficit is. Thermals, hybrid bond latency/overhead, extra latency from capacity, etc. Would be interesting if nothing else.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
All I could find on Spiderman was a reddit post linking a Computerbase.de benchmark. If it was such a big deal there would be far more mention of it.

Not just that game or engine? List another. You didn't because any evidence must be so shallow it's probably idiots on some forum who don't know what they are doing. If RT was broken on AMD CPU's in multiple games, it would be huge news. Why not move on and try to find another reason Zen sucks?

You're fairly intellectually lazy I see, always wanting others to find evidence for you when you're quite capable of doing it yourself....or maybe not. Computerbase.de is one of the best known and most credible German hardware review sites and they've done more testing with RT workloads than any other.

Core i9-13900K, i7-13700K & i5-13600K: Gaming Kings Review: Benchmarks in Games - ComputerBase

They tested Cyberpunk with RT, Far Cry 6 with RT, and Spider-Man PC Remastered with RT on an RTX 3090 Ti at 720p and the 13900K was nearly 30% faster than the 7950x in CBP and Spider-Man, and 17% faster with Far Cry 6. And let me remind you, this is with an RTX 3090 Ti. The gap would have been even wider if they had used an RTX 4090 for their first review.

Another reputable German review site, PCgameshardware.de, tested Spider-Man Miles Morales, which uses the same engine as Spider-Man Remastered but also includes RT shadows as well as reflections and found a nearly 50% gap between the 7950x and the 13900K.

Spider-Man: Miles Morales - CPU-Benchmarks (pcgameshardware.de)

Different games, different engines, different publishers. I'm sure you will find some way of accusing them of being biased towards AMD though LOL!

Witcher 3 with RT effects also shows large gaps and favors Raptor Lake. Unfortunately, GameGPU doesn't have access to Zen 4 CPUs, but they tested a 5900X and the 13900K was 62.5% faster. Zen 4 would likely have reduced the gap somewhat.

The Witcher 3: Wild Hunt v. 4.0 GPU/CPU test | Action / FPS / TPS | GPU test (gamegpu.com)

There are also other examples on YouTube I can post if requested. Fact is, this is a real weakness in Zen CPUs compared to Golden/Raptor Cove. It has to do with the CPU having to initialize and maintain BVH structures, which is very demanding on the CPU. I think ADL and RPL's strength in these workloads comes from having higher memory and cache bandwidth plus a wider core with more OoO resources and throughput.

RT BVH workloads are going to increase in the future as RT becomes more prevalent, so CPUs will need to be ready.

I'm sure you'll dismiss all of this though. Gotta keep the AMD hype train moving at all costs, no dissent allowed! :D
 

Hitman928

Diamond Member
Apr 15, 2012
5,340
8,110
136
There are also other examples on YouTube I can post if requested. Fact is, this is a real weakness in Zen CPUs compared to Golden/Raptor Cove. It has to do with the CPU having to initialize and maintain BVH structures, which is very demanding on the CPU. I think ADL and RPL's strength in these workloads comes from having higher memory and cache bandwidth plus a wider core with more OoO resources and throughput.

RT BVH workloads are going to increase in the future as RT becomes more prevalent, so CPUs will need to be ready.

How do you explain this then?

A188A39D-A678-4915-953D-4C094B6612DE.png

 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
How do you explain this then?

I don't know. The performance gap isn't very large to begin with, nothing like what I listed in my post. The 13900K also had the same average and 1% lows at 1080p and 1440p, which is odd because the game is mostly very GPU intensive.

I have the game so I know. The Spider-Man games are way more CPU intensive, and so is Cyberpunk 2077. A good example of how CPU intensive CBP 2077 can be, this guy tested a 5800X3D in the market place at 1080p DLSS quality with RT ultra on an RTX 4080 and maxed crowd settings and he couldn't maintain 60 FPS. If you look at the GPU load, it's low so it's being CPU bottlenecked.


Now here's a 13900K rig with an RTX 4090. The 4090 is being CPU bottlenecked because he's testing 1080p with DLSS set to quality and psycho RT settings and the GPU usage is also low. But look at the framerate. He stays in the triple digit territory practically the entire time.

 
  • Like
Reactions: Henry swagger

Thunder 57

Platinum Member
Aug 19, 2007
2,679
3,806
136
I don't know. The performance gap isn't very large to begin with, nothing like what I listed in my post. The 13900K also had the same average and 1% lows at 1080p and 1440p, which is odd because the game is mostly very GPU intensive.

I have the game so I know. The Spider-Man games are way more CPU intensive, and so is Cyberpunk 2077. A good example of how CPU intensive CBP 2077 can be, this guy tested a 5800X3D in the market place at 1080p DLSS quality with RT ultra on an RTX 4080 and maxed crowd settings and he couldn't maintain 60 FPS. If you look at the GPU load, it's low so it's being CPU bottlenecked.


Now here's a 13900K rig with an RTX 4090. The 4090 is being CPU bottlenecked because he's testing 1080p with DLSS set to quality and psycho RT settings and the GPU usage is also low. But look at the framerate. He stays in the triple digit territory practically the entire time.


Who uses random tubers a references when comparing anything? Is this where you are really going?

Going back to Spiderman and that reddit thread, someone made a joke on there. They said if you wanted to know the best configuration was to play the game on, it was a console. Crap ports have always existed. Give it a break.
 
  • Like
Reactions: Mopetar and Kaluan

Thunder 57

Platinum Member
Aug 19, 2007
2,679
3,806
136
You're fairly intellectually lazy I see, always wanting others to find evidence for you when you're quite capable of doing it yourself....or maybe not. Computerbase.de is one of the best known and most credible German hardware review sites and they've done more testing with RT workloads than any other.

Core i9-13900K, i7-13700K & i5-13600K: Gaming Kings Review: Benchmarks in Games - ComputerBase

They tested Cyberpunk with RT, Far Cry 6 with RT, and Spider-Man PC Remastered with RT on an RTX 3090 Ti at 720p and the 13900K was nearly 30% faster than the 7950x in CBP and Spider-Man, and 17% faster with Far Cry 6. And let me remind you, this is with an RTX 3090 Ti. The gap would have been even wider if they had used an RTX 4090 for their first review.

Another reputable German review site, PCgameshardware.de, tested Spider-Man Miles Morales, which uses the same engine as Spider-Man Remastered but also includes RT shadows as well as reflections and found a nearly 50% gap between the 7950x and the 13900K.

Spider-Man: Miles Morales - CPU-Benchmarks (pcgameshardware.de)

Different games, different engines, different publishers. I'm sure you will find some way of accusing them of being biased towards AMD though LOL!

Witcher 3 with RT effects also shows large gaps and favors Raptor Lake. Unfortunately, GameGPU doesn't have access to Zen 4 CPUs, but they tested a 5900X and the 13900K was 62.5% faster. Zen 4 would likely have reduced the gap somewhat.

The Witcher 3: Wild Hunt v. 4.0 GPU/CPU test | Action / FPS / TPS | GPU test (gamegpu.com)

There are also other examples on YouTube I can post if requested. Fact is, this is a real weakness in Zen CPUs compared to Golden/Raptor Cove. It has to do with the CPU having to initialize and maintain BVH structures, which is very demanding on the CPU. I think ADL and RPL's strength in these workloads comes from having higher memory and cache bandwidth plus a wider core with more OoO resources and throughput.

RT BVH workloads are going to increase in the future as RT becomes more prevalent, so CPUs will need to be ready.

I'm sure you'll dismiss all of this though. Gotta keep the AMD hype train moving at all costs, no dissent allowed! :D

You're the one making the argument, why should I find data to back it up? If what you say is true there should be coverage of it. Maybe everybody has missed that Zen 4 sucks at RT. I'll check this stuff out tomorrow.
 

Rigg

Senior member
May 6, 2020
472
976
106
You're fairly intellectually lazy I see, always wanting others to find evidence for you when you're quite capable of doing it yourself....or maybe not. Computerbase.de is one of the best known and most credible German hardware review sites and they've done more testing with RT workloads than any other.

Core i9-13900K, i7-13700K & i5-13600K: Gaming Kings Review: Benchmarks in Games - ComputerBase

They tested Cyberpunk with RT, Far Cry 6 with RT, and Spider-Man PC Remastered with RT on an RTX 3090 Ti at 720p and the 13900K was nearly 30% faster than the 7950x in CBP and Spider-Man, and 17% faster with Far Cry 6. And let me remind you, this is with an RTX 3090 Ti. The gap would have been even wider if they had used an RTX 4090 for their first review.

Another reputable German review site, PCgameshardware.de, tested Spider-Man Miles Morales, which uses the same engine as Spider-Man Remastered but also includes RT shadows as well as reflections and found a nearly 50% gap between the 7950x and the 13900K.

Spider-Man: Miles Morales - CPU-Benchmarks (pcgameshardware.de)

Different games, different engines, different publishers. I'm sure you will find some way of accusing them of being biased towards AMD though LOL!

Witcher 3 with RT effects also shows large gaps and favors Raptor Lake. Unfortunately, GameGPU doesn't have access to Zen 4 CPUs, but they tested a 5900X and the 13900K was 62.5% faster. Zen 4 would likely have reduced the gap somewhat.

The Witcher 3: Wild Hunt v. 4.0 GPU/CPU test | Action / FPS / TPS | GPU test (gamegpu.com)

There are also other examples on YouTube I can post if requested. Fact is, this is a real weakness in Zen CPUs compared to Golden/Raptor Cove. It has to do with the CPU having to initialize and maintain BVH structures, which is very demanding on the CPU. I think ADL and RPL's strength in these workloads comes from having higher memory and cache bandwidth plus a wider core with more OoO resources and throughput.

RT BVH workloads are going to increase in the future as RT becomes more prevalent, so CPUs will need to be ready.

I'm sure you'll dismiss all of this though. Gotta keep the AMD hype train moving at all costs, no dissent allowed! :D
13900k vs 7950x - Spiderman is 23% faster Spiderman RT is 28%.

This is at 720p w/ a 3090ti @ default JDEC. Somewhat interesting sure, though not really relevant to a reasonable real world gaming scenario. Saying its 30% faster (28% in reality) with RT on without pointing out its already 23% faster with RT off is misleading at best and dishonest at worst.

They don't have data for Far Cry without RT so we can't draw any solid conclusions. Same story with Cyberpunk. HUB has a 10% gap in Far Cry (no RT) at 1080p w/ a 4090 and reasonable memory speeds which is a mildly compelling argument for this being another misleading and cherry picked example.

I don't know what to make of Cyberpunk. It's CPU and GPU intensive. According to HUB, 13900k is 10% faster at 1080p and 4% slower at 1440p than 7950x with no RT.

While there does seem to be some indication that turning on RT in some of these games does seriously hammer memory bandwidth, we don't have nearly enough adequate testing data to really know what to make of any of this. Especially as it pertains to relevant, real world gaming scenarios.

Your BVH hypothesis is kind of interesting and could well prove to have legs. You don't need to bring it up in every other post you make though. You decided that Intel's memory speed advantage was a huge deal months before Raptor even launched and now you're hyper-focused on pointing out any edge case where it might bear fruit. The problem is you are drawing questionable conclusions from incomplete data (including websites/youtubers many of us have never heard of) and using it as evidence to support your argument. Repeatedly posting the same argument ad-nausea based on shaky evidence doesn't make it more convincing.
 
Last edited:

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Here is The Next Platform's article on the Instinct MI300 announcement. They are a little confused about what the package rendering in the AMD slides show, though.

"AMD says that there are nine 5 nanometer chiplets and four 6 nanometer chiplets on the MI300A package, with HBM3 memory surrounding it. [...] That sure looks like six GPU chiplets, plus two CPU chiplets, plus an I/O die chiplet on the top, with four underlying chiplets that link two banks of HBM3 memory to the complex at eight different points and to each other. That would mean AMD re-implemented the I/O and memory die in 5 nanometer processes, rather than the 6 nanometer process used in the I/O and memory die in the Genoa Epyc 9004 complex. We strongly suspect that there is Infinity Cache implemented on those four 6 nanometer connecting chiplets, but nothing was said about that. It does not look like there is 3D V-Cache on the CPU cores in the MI300A package."

AMD Teases Details On Future MI300 Hybrid Compute Engines (nextplatform.com)

amd-instinct-mi300-render-zoom.jpg

The rendering doesn't quite match the package photo, either. The HBM is arranged a little differently on the real chip.

My speculation is that the MI300 rendering shows 6 APU chiplets, each with 4 Zen 4 cores and 40 CDNA 3 units. That adds up to the revealed number of Zen 4 cores (24). Then I guess there are 2 adaptive chiplets (FPGA, DSP) and a massive AI chiplet (Xilinx AI Engine). That adds up to 9 chiplets on N5, as announced. Which, as revealed, sit on top of 4 base chiplets made on N6, which I presume have the L3 cache, I/O and network-on-chip. Reportedly, the dice between the HBM chiplets are just structural silicon. However, that seems odd to me (why two then, and why the odd size?). Could they be memory controllers, possibly with encryption, compression and memory-side cache, perhaps?

Instinct MI300 (chiplet speculation).png
 
Last edited:

biostud

Lifer
Feb 27, 2003
18,253
4,771
136
That's interesting, then it may be better to get the 7800X3D, but clocks that much higher would actually help in old games that are bottlenecked by a single thread.

It also seems the B650E boards are nearly identical to the X670E ones but quite a bit cheaper, so I might go for one of them.
I would think that older games would run perfectly well on either CPU.

The X670E chipset is two B650E chipset in tandem, so unless you need the extra connectivity, there is absolutely no reason to choose X670E over B650E.
 

Timorous

Golden Member
Oct 27, 2008
1,635
2,828
136
You're fairly intellectually lazy I see, always wanting others to find evidence for you when you're quite capable of doing it yourself....or maybe not. Computerbase.de is one of the best known and most credible German hardware review sites and they've done more testing with RT workloads than any other.

Core i9-13900K, i7-13700K & i5-13600K: Gaming Kings Review: Benchmarks in Games - ComputerBase

They tested Cyberpunk with RT, Far Cry 6 with RT, and Spider-Man PC Remastered with RT on an RTX 3090 Ti at 720p and the 13900K was nearly 30% faster than the 7950x in CBP and Spider-Man, and 17% faster with Far Cry 6. And let me remind you, this is with an RTX 3090 Ti. The gap would have been even wider if they had used an RTX 4090 for their first review.

Another reputable German review site, PCgameshardware.de, tested Spider-Man Miles Morales, which uses the same engine as Spider-Man Remastered but also includes RT shadows as well as reflections and found a nearly 50% gap between the 7950x and the 13900K.

Spider-Man: Miles Morales - CPU-Benchmarks (pcgameshardware.de)

Different games, different engines, different publishers. I'm sure you will find some way of accusing them of being biased towards AMD though LOL!

Witcher 3 with RT effects also shows large gaps and favors Raptor Lake. Unfortunately, GameGPU doesn't have access to Zen 4 CPUs, but they tested a 5900X and the 13900K was 62.5% faster. Zen 4 would likely have reduced the gap somewhat.

The Witcher 3: Wild Hunt v. 4.0 GPU/CPU test | Action / FPS / TPS | GPU test (gamegpu.com)

There are also other examples on YouTube I can post if requested. Fact is, this is a real weakness in Zen CPUs compared to Golden/Raptor Cove. It has to do with the CPU having to initialize and maintain BVH structures, which is very demanding on the CPU. I think ADL and RPL's strength in these workloads comes from having higher memory and cache bandwidth plus a wider core with more OoO resources and throughput.

RT BVH workloads are going to increase in the future as RT becomes more prevalent, so CPUs will need to be ready.

I'm sure you'll dismiss all of this though. Gotta keep the AMD hype train moving at all costs, no dissent allowed! :D

Look at the delta in Doon Eternal between a 6950XT and a 3090Ti at the CPU limit.


About 95 FPS going from 520 to 615 with a 7700X.

Old 3000 series drivers have odd behaviour with Zen 4 at CPU limited settings.

To be honest any test not using a 4090 at this point is essentially worthless.
 
  • Like
Reactions: inf64 and Kaluan

BorisTheBlade82

Senior member
May 1, 2020
664
1,015
136
Here is The Next Platform's article on the Instinct MI300 announcement. They are a little confused about what the package rendering in the AMD slides show, though.

"AMD says that there are nine 5 nanometer chiplets and four 6 nanometer chiplets on the MI300A package, with HBM3 memory surrounding it. [...] That sure looks like six GPU chiplets, plus two CPU chiplets, plus an I/O die chiplet on the top, with four underlying chiplets that link two banks of HBM3 memory to the complex at eight different points and to each other. That would mean AMD re-implemented the I/O and memory die in 5 nanometer processes, rather than the 6 nanometer process used in the I/O and memory die in the Genoa Epyc 9004 complex. We strongly suspect that there is Infinity Cache implemented on those four 6 nanometer connecting chiplets, but nothing was said about that. It does not look like there is 3D V-Cache on the CPU cores in the MI300A package."

AMD Teases Details On Future MI300 Hybrid Compute Engines (nextplatform.com)

amd-instinct-mi300-render-zoom.jpg

The rendering doesn't quite match the package photo, either. The HBM is arranged a little differently on the real chip.

My speculation is that the MI300 rendering shows 6 APU chiplets, each with 4 Zen 4 cores and 40 CDNA 3 units. That adds up to the revealed number of Zen 4 cores (24). Then I guess there are 2 adaptive chiplets (FPGA, DSP) and a massive AI chiplet (Xilinx AI Engine). That adds up to 9 chiplets on N5, as announced. Which, as revealed, sit on top of 4 base chiplets made on N6, which I presume have the L3 cache, I/O and network-on-chip. Reportedly, the die between the HBM chiplets are just structural silicon. However, that seems odd to me (why two then, and why the odd size?). Could they be memory controllers, possibly with encryption and compression, perhaps?

View attachment 74130
Your speculation about the different chiplets is the most reasonable I have seen until now (including my own 😉).
MI300 is likely the most interesting 2023 product from a technological point of view.
 

BorisTheBlade82

Senior member
May 1, 2020
664
1,015
136
@Hitman928 & @nicalandia
I remember that speculation on Twitter, as I actually was a part of it.
This could make sense as otherwise Zen4c would only have half the bandwidth per core compared to Zen4. As each IOD has 12 ports, they could be connected to each other by four ports (128Gbyte/s IIRC). Could still be a bit narrow. And also it might not fit on the package physically. So just wild speculation.
 
  • Like
Reactions: nicalandia