Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 92 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
809
1,412
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

Joe NYC

Platinum Member
Jun 26, 2021
2,539
3,471
106
Since when does “K” mean gaming oriented? All “K” means is that they are unlocked. Quite often, “K” CPUs are often binned better and have better clocks than “non-K” equivalents.

When predominant customers base of "K" SKUs are gamers, I took a liberty to call it a gaming oriented CPU.

Also, I am going to call you out on calling the 5950X “sub-optimal” for gaming. Out of the 5 games I play regularly, only 1 uses less than 8 threads. The rest use > 8 threads. This isn’t 5 years ago, games are catching up with hardware.

There is a difference between "uses" and is constrained by. I have one recent game that uses all cores, but is running them at a single digit utilization. There would likely be no difference in this game if my CPU had only 1-2 cores.

OTOH, I have 1 game I tested that is perfectly threaded, where all cores run at full utilization (when I fast forward the game), out of about 100 games I played over the years (called Sim Airport), with one more candidate that I have not tested. So maybe up to 2:98 ratio.

So I am not denying these games exist, but the games that would overwhelm 5800x, that they would need more cores, those are in a small minority.
 

eek2121

Diamond Member
Aug 2, 2005
3,100
4,398
136
When predominant customers base of "K" SKUs are gamers, I took a liberty to call it a gaming oriented CPU.



There is a difference between "uses" and is constrained by. I have one recent game that uses all cores, but is running them at a single digit utilization. There would likely be no difference in this game if my CPU had only 1-2 cores.

OTOH, I have 1 game I tested that is perfectly threaded, where all cores run at full utilization (when I fast forward the game), out of about 100 games I played over the years (called Sim Airport), with one more candidate that I have not tested. So maybe up to 2:98 ratio.

So I am not denying these games exist, but the games that would overwhelm 5800x, that they would need more cores, those are in a small minority.
You ignored my posts calling you out for your inaccuracies. Want to address those? Please find me a single reproducible workload where my (or anyone else's) 5950x is crippled due to power limits (i.e. turns in worse scores/frame rate) vs. a 1 CCD 5950x, let alone a 5800x. I'll wait. I almost wanted to go down the rabbit hole of "make sure it is an actual workload that people use and it isn't malware", but I'll exclude that for now (though if you can't prove it is used or I suspect it contains malware, I'm not going to run it). Sure, it won't exhibit 100% scaling, but core-to-core/CCD to CCD latency can be a factor among a million other things.

For the record, the definition here is "Does not perform as well as the 5800X". The software must be usable on the latest version of Windows 10 without issue and/or Windows 11 (I'm on Windows 11). If it is a game, it must use DirectX 12 and/or Vulkan. (Very few games, if any, don't support either of those APIs), though if there is a recent game that only works on DX 11 and I own it, I'm willing to test it. (A single core game is not going to bottleneck a 5950x, if it runs slower on a 5950x it likely has other bottlenecks related to things outside the CPU. I am a developer and I'm willing to debate this.)
 

Joe NYC

Platinum Member
Jun 26, 2021
2,539
3,471
106
The problem with Anandtech testing is that they test FPS on turn based game. <facepalm>

Civilization is still heavily Single Thread CPU constrained when processing turns, but this sort of test would need the reviewer play and understand the game to know what is important to test and how to set up a scenario.

I found the issue with AnandTech's "review". Civilization 6 is still GPU bound at 1080p max. My benchmark figures:

Average: 223.26 fps
99th Percentile: 154.53fps

GPU usage was still 100%, CPU usage was 40%. AnandTech is GPU bound. Why did they get a better result on lower end CPUs? Probably less heat in the system so the GPU could run faster. Who knows. My CPU usage was around 30%.

My hypothesis is that the 2 CCDs are just stepping on each other.

EDIT: If you claim the 5950x is power limited, keep in mind it is easy for those of us with one to prove you wrong. It is trivial to disable a CCD.

I don't think they are power limited. It would be interesting to see, under how many scenarios disabling 1 CCD improves performance.

EDIT: Because I'm feeling extra crunchy today and need to get to bed. With 1 CCD enabled (basically a 5800x with the faster clocks you mentioned):

Average: 184.03 fps
99th Percentile: 128.80fps

Nice! Thanks for running the test.

But you are getting different results than AnandTech is getting with actual 5800x, so perhaps there is still something different about a system with 1 CCD disabled just in BIOS vs. actual 5800x.

Looking at another set of reviews, of AnandTech Rocket Lake Review, where they have 5600x, 5800x and 5900x, and 5900x loses more than half the tests despite having 100 to 200 MHz advantage and extra 4 or 6 cores and 40 Watt advantage over 5600x. With equal clock speed and Wattage, 5900x would probably lose 9 out of 10.
Gaming Tests: Deus Ex Mankind Divided - Intel Rocket Lake (14nm) Review: Core i9-11900K, Core i7-11700K, and Core i5-11600K (anandtech.com)

AMD is just not sending us their best
(SKU configurations for gaming)
 

eek2121

Diamond Member
Aug 2, 2005
3,100
4,398
136
The problem with Anandtech testing is that they test FPS on turn based game. <facepalm>

Civilization is still heavily Single Thread CPU constrained when processing turns, but this sort of test would need the reviewer play and understand the game to know what is important to test and how to set up a scenario.



My hypothesis is that the 2 CCDs are just stepping on each other.



I don't think they are power limited. It would be interesting to see, under how many scenarios disabling 1 CCD improves performance.



Nice! Thanks for running the test.

But you are getting different results than AnandTech is getting with actual 5800x, so perhaps there is still something different about a system with 1 CCD disabled just in BIOS vs. actual 5800x.

Looking at another set of reviews, of AnandTech Rocket Lake Review, where they have 5600x, 5800x and 5900x, and 5900x loses more than half the tests despite having 100 to 200 MHz advantage and extra 4 or 6 cores and 40 Watt advantage over 5600x. With equal clock speed and Wattage, 5900x would probably lose 9 out of 10.
Gaming Tests: Deus Ex Mankind Divided - Intel Rocket Lake (14nm) Review: Core i9-11900K, Core i7-11700K, and Core i5-11600K (anandtech.com)

AMD is just not sending us their best
(SKU configurations for gaming)

Unless AnandTech's tests are reproducible, I have zero confidence in their tests. Some of their tests HAVE been reproducible, and my stock hardware or stock hardware I've been able to use has turned in different results from theirs.

Geekbench, for instance. In a closed case with a similar cooler, I get better results.

With JEDEC specced DDR4 3200 memory and stock settings on a 5950x I turn in 20%-30% better results on Geekbench, Cinebench, and a few others. This is with Stock BIOS settings and, in my case, with 2 different motherboards from 2 different vendors (ASROCK and Gigabyte) So do many other folks. I love AnandTech and I know Ian knows his stuff, but some of the benchmarks they've posted have shown to be flawed, and they almost NEVER follow up on the feedback. Once again...where is your proof? I can provide mine, but you are the one making extraordinary claims. Extraordinary claims need extraordinary evidence.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,539
3,471
106
You ignored my posts calling you out for your inaccuracies. Want to address those? Please find me a single reproducible workload where my (or anyone else's) 5950x is crippled due to power limits (i.e. turns in worse scores/frame rate) vs. a 1 CCD 5950x, let alone a 5800x. I'll wait.

No I never said that. That if 5950x loses, (in ST gaming tests) that it would be because of power limit. Because that's not what I believe is going on.

Quite the opposite, I think 5950x is most likely free to run at the max boost clock in games (when called for), which means that 5950x has 200-300 MHz advantage over 5800x or 5600x (which I did say).

My theory why 5900x and 5950x are losing so often in games has to do with having 2 CCDs, and some overhead associated with it.

Sure, it won't exhibit 100% scaling, but core-to-core/CCD to CCD latency can be a factor among a million other things.

That's the biggest difference, so I would call it #1 of the million other reasons far less likely.

For the record, the definition here is "Does not perform as well as the 5800X". The software must be usable on the latest version of Windows 10 without issue and/or Windows 11 (I'm on Windows 11). If it is a game, it must use DirectX 12 and/or Vulkan. (Very few games, if any, don't support either of those APIs), though if there is a recent game that only works on DX 11 and I own it, I'm willing to test it. (A single core game is not going to bottleneck a 5950x, if it runs slower on a 5950x it likely has other bottlenecks related to things outside the CPU. I am a developer and I'm willing to debate this.)

What I am wondering (among other things) is how often Windows just switches the tasks that may have been somewhat idle to a completely different core, and perhaps a completely different CCD.

That could be another reason. Because if a task that is somewhat idle comes back to life on a different CCD, most of its data may be in L3 of the original CCD.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,539
3,471
106
Unless AnandTech's tests are reproducible, I have zero confidence in their tests. Some of their tests HAVE been reproducible, and my stock hardware or stock hardware I've been able to use has turned in different results from theirs.

I would not expect the same result, but I would expect that 5950x > 5900x > 5800x > 5600x, which is rarely the case in the 3 set of Anandtech reviews I looked since Zen 3 launch.

I was drawing my conclusions mainly from Anandech tests.

So just out of curiosity, I looked at Tom's review here, mostly consistent with clock speed scaling
Core i9-11900K and Core i5-11600K Gaming Benchmarks - Intel Core i9-11900K and Core i5-11600K Review: Rocket Lake Blasts Off | Tom's Hardware (tomshardware.com)

And Hardware Nexus video here, seems most consistent in Zen 3 performance scaling
Waste of Sand: Intel Core i7-11700K CPU Review & Benchmarks vs. AMD 5800X, 5900X, More - YouTube

PC Magazine is similar to Anandtech. 5800x mostly beats, sometimes trounces 5950x in gaming:
Intel Core i9-11900K Review | PCMag

This one, 5800x leads 5950x about half the time:
Intel 11th Gen 'Rocket Lake' Review - Games: Far Cry New Dawn - Tweakers
 

Timorous

Golden Member
Oct 27, 2008
1,748
3,240
136
I never said that. That if 5950x loses, (in ST gaming tests) that it would be because of power limit. Because that's not what I believe is going on.

Civ 6 is not the game to pick though. If you get 30fps then that is plenty. What matters is turn time and there the 5800X does fall behind the 5900X and 5950X.
 
  • Like
Reactions: Joe NYC

moinmoin

Diamond Member
Jun 1, 2017
5,064
8,032
136
This is an odd discussion since nobody can optimize for "gaming". Gaming is purely a PR target for consumer products. The (closed source) software environment is a mess of non-optimized parts, and gaming software performance is often defined by the most inane bottlenecks. Aside simply pushing up frequency that's always the easy solution to everything, the best approach to improving "gaming" is to improve performance for all possible workloads.

AMD claimed to have done just that between the first and current Epyc gen:
bild_2021-08-08_14113y2knt.png


Why do I quote an Epyc slide? Because that's AMD focus:
  • All efforts first go into the server market. This defines the fundamental Zen design.
  • Second focus now is the mobile market. There improvements in power management are tested.
  • The desktop market gets scraps from the previous two. PR uses "gaming" to sell the overall performance improvements to consumers. (See also "game cache".)
I doubt AMD will change anything about this approach. Selling Zen 3D as a superior "gaming" platform to the consumer public tells me AMD is perfectly content with this modus operandi.
 

Thibsie

Senior member
Apr 25, 2017
865
973
136
2 * 5800x = 2 * 449 = $898
1 x 5950x = $799

2 of the 5950x CCDs have to perform in the same 105W envelope as 1 of 5800x CCDs.

AMD is wasting 2 best binning dies just to get decent ST performance in a CPU that is sub-optimal for ST.

Let me recap that: 2 best binning dies to just to get decent ST performance.

One of the best binning dies is sitting idle, slowing the other best binning die, but the one die that is doing work is so highly binned that it still does a decent job running ST task.

AMD doesn't want to sell 2x 5800x, they want to sell 2x 5950x (or at least 5900x).
The pricing pretty much reflects that IMO.
 
  • Like
Reactions: Joe NYC

Vattila

Senior member
Oct 22, 2004
809
1,412
136
5800x configuration with a single CCD is better suited for gaming, if it had the best binning parts and highest boost clock.

I think you are overly concerned about inter-CCX latency. For poorly written software, corner-cases and legacy applications where it has big effect, the user can mitigate it by disabling all but one CCX, or force the affinity of the process to use cores on a single CCX only (through the use of system tools).

Personally, I am a developer, and like others here who are developers or content creators, I like that the top SKU makes "no compromises". Currently, the only mainstream SKU that I somewhat covet is the 5950X, and I am very interested to see what the new Threadripper series can do. I like the idea of getting lots of cores without compromising single-thread performance. So I hope that focus continues at AMD.

There is also the idea of optimising for tomorrow, not yesterday. Like AMD CEO Lisa Su likes to say: "we need to always push the boundaries". By attracting gamers to SKUs with more cores, they are moving the technological progress along, making it possible for developers to depend on higher-performance hardware in future software. It is obvious that a game written to fully exploit a 16-core 5950X can achieve more than it can with an 8-core 5800X. Developers will just need to learn to deal with many cores, multiple CCXs and heterogeneous accelerators (GPU compute, FPGA, AI). That's where we want to be moving.

All that said, the N7 CCDs have been in production at TSMC for a very long time, and the yield and binning must be pretty good by now, so there may be room for increasing the boost frequency on the SKUs that lie further down the product stack as well. So I expect a "Zen 3" refresh with V-Cache and slightly better frequencies across the product range.

PS. I don't think you need to worry that AMD is not aware of the competition (e.g. "Alder Lake"). Like you, I think they care a lot about leading in the gaming space. Gaming is part of high performance compute — AMD's core business — and a lot of their customers, products and revenue is in the gaming industry. But, of course, they will do what makes most business sense within their capabilities and schedule. We can only hope it will be good!
 
Last edited:

Joe NYC

Platinum Member
Jun 26, 2021
2,539
3,471
106
Civ 6 is not the game to pick though. If you get 30fps then that is plenty. What matters is turn time and there the 5800X does fall behind the 5900X and 5950X.

Yes, when sifting through results, I found some web sites that in fact test the turn times (Hardware Nexus) and 5950x had the highest score. The margin was in proportion to boost clock, not core count.

Hard to send a link because it was somewhere in the middle of a video review.

The way to interpret it is that if the game is running faster on 5950x (not slower as it seems to happen randomly and quite frequently) and if the margin is proportional to the clock difference (not core count), then the second CCD is wasted for that game.
 
  • Like
Reactions: Tlh97 and Vattila

Joe NYC

Platinum Member
Jun 26, 2021
2,539
3,471
106
Are you sure about that? AMD usually uses one good die and one bad die with their 2xCCD CPUs since the second CCD rarely ever sees max boost clocks.

Interesting idea.

Slight problem is that if you have 2 cores highly utilized, you would want them to be on different CCDs, to fully take advantage of L3s and thermals.

And the 16 cores still need to adhere to minimum all core frequency within the power limit...
 

Joe NYC

Platinum Member
Jun 26, 2021
2,539
3,471
106
This is an odd discussion since nobody can optimize for "gaming". Gaming is purely a PR target for consumer products. The (closed source) software environment is a mess of non-optimized parts, and gaming software performance is often defined by the most inane bottlenecks. Aside simply pushing up frequency that's always the easy solution to everything, the best approach to improving "gaming" is to improve performance for all possible workloads.

AMD claimed to have done just that between the first and current Epyc gen:
bild_2021-08-08_14113y2knt.png


Why do I quote an Epyc slide? Because that's AMD focus:
  • All efforts first go into the server market. This defines the fundamental Zen design.
  • Second focus now is the mobile market. There improvements in power management are tested.
  • The desktop market gets scraps from the previous two. PR uses "gaming" to sell the overall performance improvements to consumers. (See also "game cache".)
I doubt AMD will change anything about this approach. Selling Zen 3D as a superior "gaming" platform to the consumer public tells me AMD is perfectly content with this modus operandi.

I am not talking about optimizing the cores to gaming. The core is what it is, and luckily for gamers, the profile of what makes the "Enterprise" apps perform well is not too different from gaming.

What I AM talking about is optimizing SKUs using 2 variables:
- binning
- core count (CCD count).

It is a very easy equation to solve: you get the best cost effective high gaming performance with high binning part with single CCD.

In other words, SKU that AMD did not release.

In not too distant future, 3rd variable - 3D stacked L3 will enter the equation.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,539
3,471
106
AMD doesn't want to sell 2x 5800x, they want to sell 2x 5950x (or at least 5900x).
The pricing pretty much reflects that IMO.

Well, AMD is able to sell a lot of the 5800x despite the fact that it is arguably the worst deal for the customers (and the best deal for AMD) of the 5000x line of processors.

8:1 in proportion to 5950x

1628455557382.png

TechEpiphany on Twitter: "🔥 CPU Sales Week 30 (mindfactory) 🔥 AMD: 2575 units sold, 76.98%, ASP: 320.84 Intel: 770, 23.02%, ASP: 233.89 AMD Revenue: 826'175, 82.1% Intel Revenue: 180'092, 17.9% Please share 🙏This is a product of weeks of work #AMD #Intel #AMDRyzen https://t.co/3C0LbV2GTK" / Twitter
 

Thibsie

Senior member
Apr 25, 2017
865
973
136

Of course they are. you don't even try to answer my point.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,539
3,471
106
Of course they are. you don't even try to answer my point.

Actually, I did not understood your point.

Because if CCDs are the limiting factor (AMD capability to obtain sufficient number from TSMC), you can't sell 2x 5900x or 2x 5950x instead of 2x5800x

Unless you meant to say that the substrate is the limit, which you did not say.
 

LightningZ71

Golden Member
Mar 10, 2017
1,798
2,156
136
This isn't really supporting your argument very well. Given what we are seeing with availability, they are selling every chip they make, and what you are seeing is, roughly, what they are sending to the market.
 

DrMrLordX

Lifer
Apr 27, 2000
22,065
11,693
136
Interesting idea.

Slight problem is that if you have 2 cores highly utilized, you would want them to be on different CCDs, to fully take advantage of L3s and thermals.

And the 16 cores still need to adhere to minimum all core frequency within the power limit...

AMD already does this. It doesn't really pose a problem for them, especially since "bad" CCDs often have only 2-4 cores that put the CCD into a bad bin position. The minimum all-core frequencies are quite low compared to max boost.
 

Vattila

Senior member
Oct 22, 2004
809
1,412
136
My problem is that AMD is NOT selling a version of 5800x, say 5810x with 4.9 GHz or higher boost clock.

Just buy a 5950X, disable the lowest boosting CCX, and slap a home-made "5810X inside" sticker on your DIY PC. ;)

Note that by reserving the highest binning CCDs for the 5950X low-volume flagship, AMD could push frequency to the max, finding the optimal frequency where the limited supply of parts binning at said high frequency was sufficient to satisfy the target sales at the corresponding price point. They then set the 8-core volume parts at a frequency that was binning well.

PS. On V-Cache — if AMD really needs 2, 3 or 4 layers of V-Cache to compete with "Alder Lake", it would of course be nice if they could launch around the same time. Assuming multi-layer V-Cache is in the works, then like you, I see no technical barriers. The CCDs with V-Cache should be usable across server, HEDT/workstation and desktop SKUs (barring any Z-height issues). That said, one layer of V-Cache (i.e. triple the amount of current L3!) plus a little higher boost frequency may be enough to match or beat "Alder Lake" (especially at the same power consumption, rumours suggest). We'll see!
 

Thunder 57

Platinum Member
Aug 19, 2007
2,993
4,570
136
All I really hope for with Zen4 is that they branch out from the limitations of SMT to a more general multithreading, which would allow a mode of operation where one thread is given a priority over all other threads (which would be 1 other thread, or 3 other threads per core, depending on whether the core is upgraded to 4-way MT or stays 2-way MT).

Oh no, not SMT4 again. How would it know which thread to give priority too, though? Presumably it's possible, as Intel is counting on less demanding threads being sent to Atom cores.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
On V-Cache — if AMD really needs 2, 3 or 4 layers of V-Cache to compete with "Alder Lake"
For the client computing,
Zen2 --> Zen3 has 9% increase in effective die area for average 19% increase in IPC across all loads.
Zen3 --> Zen3D with 4 layers of V-Cache would result in 2.8x increase in effective die area for a questionable gain in general purpose compute outside of gaming.
Not to mention 2.8x die area + packaging cost would result in almost 3x more production cost per CCD. N6/7 might have gotten cheaper but I doubt 3x cheaper.
Just plain lackluster engineering if its entire purpose is to defeat Alder Lake. On the same lines like NetBurst going for the MHz with no improvements, if not regression, elsewhere.
And what about that projected 46% GM in the Earnings call?

Zen3D was not developed to address gaming or as a response to Alder Lake, Ryzen with Zen3D is, most likely, simply some rejected dies from Milan-X because the HPC/DC/Server market can sustain those high costs and they are really finding the excellent use for those huge caches.
Don't expect good availability.
 
Last edited:
  • Like
Reactions: Tlh97 and Vattila

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
All I really hope for with Zen4 is that they branch out from the limitations of SMT to a more general multithreading, which would allow a mode of operation where one thread is given a priority over all other threads (which would be 1 other thread, or 3 other threads per core, depending on whether the core is upgraded to 4-way MT or stays 2-way MT).
Seems AMD have been thinking on those lines with one thread having priority of resources in SMT mode. The threads in the core compete for the resources and gets some priority over the other and the Core resources are not evenly split.
Seems like a good idea, provided the algorithm for resource priority assignment and allocation is working well.

20210096914
SOFT WATERMARKING IN THREAD SHARED RESOURCES IMPLEMENTED THROUGH THREAD MEDIATION

20210096920

SHARED RESOURCE ALLOCATION IN A MULTI-THREADED MICROPROCESSOR