Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 212 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
It isn't because Cinebench is a benchmark which spits out numbers based on a workload that nobody in the real world runs on a CPU.

Whether it's reflective of a real world workload isn't the point. The point is that it's a good workload for isolating IPC differences between architectures, because it mostly ignores cache size and memory bandwidth/latency and focuses on the core.

On the flipside, that's also exactly why games are a horrible measure of IPC because they are highly affected by cache size and memory bandwidth/latency. Spec is also highly influenced by cache size and memory bandwidth/latency.
 
  • Like
Reactions: PJVol

coercitiv

Diamond Member
Jan 24, 2014
6,214
11,961
136
I get that
I don't think you do. No significant number of people in the Intel thread challenged the good numbers Alder Lake was getting after release at 125W TDP or even in tests posted by me at 75W, 45W, heck even 25-35W. The entire discourse around shaming Alder Lake was the insane stock power configuration for 12900K.

Meanwhile, as @majord already mentioned, the more vocal members in the Intel thread (with neutral or even pro-intel stance) were FAR more interested in efficiency cores and their bright future than anything Cove related, to the point where they saw Cove design being eventually replaced by an evolved Mont. The chorus was so narrowly focused on this mission they even downplayed Golden Cove perf/watt numbers. Multi-threaded performance was the new GOD, with members "explaining" us how efficiency cores are required for multitasking, as if 8 Golden Cove cores are just peeled potatoes.

You came to the Intel thread, attempted to present Golden Cove in a somewhat distorted way by focusing on the wins and ignoring the fails, and surprise-surprise people reacted. The fact that you're more willing to use CB over SPEC as predictor for real-world performance is a red flag in itself.

On the flipside, that's also exactly why games are a horrible measure of IPC because they are highly affected by cache size and memory bandwidth/latency.
Games are an important workload in the consumer world. How do you reconcile that with promoting CB as predictor for commercial application performance in consumer devices?
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
I don't think you do. No significant number of people in the Intel thread challenged the good numbers Alder Lake was getting after release at 125W TDP or even in tests posted by me at 75W, 45W, heck even 25-35W. The entire discourse around shaming Alder Lake was the insane stock power configuration for 12900K.

I remember those discussions and my general sentiment was that the extremely high power consumption of the 12900K was because Intel wanted to beat AMD in multithreaded applications, so they pushed the chip as hard as possible. It was a pure marketing inspired ploy, as most of us agreed.

To me personally this was acceptable, because the vast majority of people were not going to be doing rendering on these CPUs anyway. The 5950x was, and still is a better CPU for heavy tasks like rendering and the benchmarks reflected that as larger renders ran significantly faster and with less power usage on the 5950x. Only on smaller renders was the 12900K competitive.

Meanwhile, as @majord already mentioned, the more vocal members in the Intel thread (with neutral or even pro-intel stance) were FAR more interested in efficiency cores and their bright future than anything Cove related, to the point where they saw Cove design being eventually replaced by an evolved Mont. The chorus was so narrowly focused on this mission they even downplayed Golden Cove perf/watt numbers. Multi-threaded performance was the new GOD, with members "explaining" us how efficiency cores are required for multitasking, as if 8 Golden Cove cores are just peeled potatoes.

While I personally was convinced of the viability of the efficiency cores to dramatically increase performance per watt in multithreaded applications, I wasn't so convinced that they could completely replace the big cores and I don't think I was alone in that. From what I recall, most people were supporting using efficiency type cores in servers and other specialized workflows. For desktop and workstation though, big cores will probably always be a necessity.

You came to the Intel thread, attempted to present Golden Cove in a somewhat distorted way by focusing on the wins and ignoring the fails, and surprise-surprise people reacted. The fact that you're more willing to use CB over SPEC as predictor for real-world performance is a red flag in itself.

Lots of people who are far more knowledgeable than myself and actually work in STEM and computing have had many problems with Spec. I got that impression from them specifically, on this forum and others like Realworldtech.

As for CB, I can't really say that it's a good predictor of overall real world performance, just that it's a good way to isolate core IPC. IPC is only one aspect of a CPU's performance and it's really difficult to measure it seems.

Games are an important workload in the consumer world. How do you reconcile that with promoting CB as predictor for commercial application performance in consumer devices?

Again, I'm not saying that CB is a better predictor of actual real world performance, because real world performance depends on many other factors than IPC. So let me clarify. I think CB is a better predictor of IPC than Spec because of how it isolates core performance.

And since core performance is a good indicator of real world performance, CB is to me quite useful. Case in point, Spec showed hardly any performance difference between the 12900K and the 5950x in compiling and encoding, while real world applications had much greater differences in favor of the 12900K.

As for games, cache size and memory latency/bandwidth can have huge effects on performance, more so than IPC as the 5800x3D demonstrates. I mean, it has the exact same IPC as the 5800x but performs substantially better due to not having to access system memory as much.
 
  • Like
Reactions: igor_kavinski

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
CBs and Blenders are irrelevant to real world, but GBs and SPECs have plenty of bs as well. Like strange focus on crypto instructions giving boosts to score when actual core is not so good. Or compiler games, where versions and flags can make massive difference in score. And some subtests look like they get completely beaten by new cores, inflating the results. There is even more danger with inclusion of AI type "instructions" and tests nowadays.
I think out of SPEC suite, I'd only look at GCC results, as they are hard to impact without having actual good core with proper memory subsystem. And over the years i saw this opinion voiced by for example Linus Torvalds on RWT forums and others.
GB5 without seeing subtests and comparing side by side is useless as well.

What i value in benchmarks lately is combination of:

1) CB15 and CB23 scores to see "peak" potential of core to execute instructions when provided with "ideal" memory subsystem.
2) Browser tests like Speedometer 2.0, Optane etc as they seem to generate a brutal varied workload that includes variuos disciplines like compilation, garbage collection, branch prediction unfriendly workloads, that correlates very well with what most people do when they don't work, game or watch movies on their desktop machines.
3) Gaming benchmarks from some sites that present each of CPUs with peak tuned memory subsystem, as that is what is relevant to me. 5800x3D was interesting case that performed wonderfully without any memory tuning and that is always a good thing. Too bad not applicable for initial release of Z4.
4) Some self created benchmarks i run myself, like some stuff from work i know is a fixed workload and easy to test each system with
 

biostud

Lifer
Feb 27, 2003
18,251
4,765
136
Using standardized benchmarks are often also used in ways that does not make any sense for real world scenarios. For me mostly a gamer, I really dont need to look at anything else than gaming benchmarks, and if you do professional work, you probably have a relatively narrow suite of software that you have to see how perform on different platforms. And if you do both, combine these results.
 

coercitiv

Diamond Member
Jan 24, 2014
6,214
11,961
136
Lots of people who are far more knowledgeable than myself and actually work in STEM and computing have had many problems with Spec. I got that impression from them specifically, on this forum and others like Realworldtech.
What do people working in STEM and forum members of Realworldtech have to say about CB as being a better core performance predictor than SPEC?

As for CB, I can't really say that it's a good predictor of overall real world performance, just that it's a good way to isolate core IPC. IPC is only one aspect of a CPU's performance and it's really difficult to measure it seems.
The concept of IPC as it's generally being used on the forums is an average of most if not all relevant real-world workloads. Real-world apps show very different affinities for memory speed and cache. Not only should a benchmark make use of memory and cache in a similar way as consumer applications do, but memory subsystem performance needs to be taken into account by any benchmark that attempts to give a realistic indication of how a CPU will behave in the real-world.

The one and only reason we value CB for isolating core performance is for the first stages of benchmarking a core (leaks, previews), usually with limited resources and more importantly with no accurate information about the memory subsystem, or no means of comparing using the same memory/cache configuration. However, once we have 3rd party controlled testing available, taking memory subsystem performance into account is mandatory if we aim for more accurate predictions.

As for games, cache size and memory latency/bandwidth can have huge effects on performance, more so than IPC as the 5800x3D demonstrates. I mean, it has the exact same IPC as the 5800x but performs substantially better due to not having to access system memory as much.
That's not true, in gaming 5800X and 5800X3D have wildly different IPC. The difference in the memory subsystem increases IPC for most gaming workloads. Again, I think you need to revisit the concept of IPC and how it varies based on memory subsystem performance.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
The one and only reason we value CB for isolating core performance is for the first stages of benchmarking a core (leaks, previews), usually with limited resources and more importantly with no accurate information about the memory subsystem, or no means of comparing using the same memory/cache configuration.

This is not the only reason we value CB. It can also act as indicator of peak performance the core will have once memory subsystem is less of a bottleneck. Maybe memory OC can remove part of that bottleneck today, maybe AMD is adding X3D cache, maybe Intel is increasing mem speeds with each gen or releasing RPL type derivative.
Imagine we have CB23 scores of Skylake/GC/Z3 all at 2133 DDR4 at 3Ghz fixed clocks and that would still provide us with peak capabilities of each core. Even more so when combined with fact that CB23 workload is a well defined by sites like Chips and Cheese.
 

coercitiv

Diamond Member
Jan 24, 2014
6,214
11,961
136
This is not the only reason we value CB. It can also act as indicator of peak performance the core will have once memory subsystem is less of a bottleneck.
How did CB help us evaluate peak Zen 3 gaming performance?

Removing the memory subsystem from the equation using a workload with little affinity for memory performance is not the same as emulating peak performance for a workload with high affinity for memory performance.

1653384248447.png
 
Last edited:
  • Like
Reactions: lightmanek

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
How did CB help us evaluate peak Zen 3 gaming performance?

By establishing so called "pecking" order of cores. Skylake -> ZEN3 -> GC. This is the order these CPUs would perform in games given 96MB of L3 each. It can even be seen in some stupid comparisons in games too, like games that don't care about large L3 cache like i think CounterStrikes.
CB23 shows that GC core can scale further than Zen3 and that's it, not just some early leak estimation benefit You are claiming.
 

Timmah!

Golden Member
Jul 24, 2010
1,428
650
136
I'm not blaming anyone. I'm merely extrapolating the rise in pro AMD fanaticism on these forums (and all across the internet for that matter) and the blatant disparagement towards anything Intel for why so many people are freaking out that Zen 4 likely won't crush Intel, and may even be slower. It's like they created this alternate reality where Golden Cove barely matched Zen 3, and Intel 7 was complete and utter trash compared to TSMC's N5.



I get that, but despite the power characteristics of the P cores, it was still patently obvious that they were immensely powerful and capable and on another level compared to Zen 3.

oh, you have no idea what is “pro-amd fanaticism”, if you never paid visit to a czech site called ddworld :). its off the charts in there.
 

carrotmania

Member
Oct 3, 2020
67
184
76
I'm merely extrapolating the rise in pro AMD fanaticism on these forums (and all across the internet for that matter) and the blatant disparagement towards anything Intel for why so many people are freaking out that Zen 4 likely won't crush Intel, and may even be slower. It's like they created this alternate reality where Golden Cove barely matched Zen 3, and Intel 7 was complete and utter trash compared to TSMC's N5.

I get that, but despite the power characteristics of the P cores, it was still patently obvious that they were immensely powerful and capable and on another level compared to Zen 3.

Intel is disparaged because for a decade or more they held technology back, simply because they could and it made them more money. One big leap in performance, at the expense of heat, power and price does not cut it for most people. I know you can't understand that, and that leads you deride everyone as "fanatics", but take a look in the mirror. Intel are going back to the usual 5% increase with RPL.

We don't fully know where Zen4 lies, but its obvious that AMD, with a massively smaller RnD budget, are at least attempting new solutions and new technologies. ADL is still beaten in situations by regular Zen3, and X3D stomps the KS in lots of scenarios, so no, ADL is not "on another level". It came out a year later and is pushed to the maximum to win, sometimes.

Intel has a lot of animosity to undo among those of us who like technology and not just the company. All I see so far is Intels same old PR tactics, shout loudest and longest.
 
Jul 27, 2020
16,344
10,356
106
One way to get an accurate picture of a CPU's performance would be to capture real world instruction traces of popular applications/games and replay them. Downside is that even a one minute trace of a running application can amount to gigabytes. Maybe AI could be used to sift through gigabytes of instruction traces and cobble together a relatively compact mix of instructions that would more closely resemble real world performance.
 

MarkPost

Senior member
Mar 1, 2017
234
332
136
And since core performance is a good indicator of real world performance, CB is to me quite useful. Case in point, Spec showed hardly any performance difference between the 12900K and the 5950x in compiling and encoding, while real world applications had much greater differences in favor of the 12900K.

I own both CPUs, and in compiling tasks (I tested it compiling LibreOffice, Firefox, Blender, LuxCoreRender, Chromium and Unreal Engine) 5950X (@4.5ghz) is >15% on average faster than 12900K (P @5.0GHz / E @4.0GHz). Both with DDR4 (dont know if DDR5 would change this).

In encoding tasks, 5950X is more times faster than slower in comparison with 12900K.

On summary: in MT tasks, 5950X is the way to go: its clearly faster while consumes much less power than 12900K
 

Abwx

Lifer
Apr 2, 2011
10,970
3,524
136
By establishing so called "pecking" order of cores. Skylake -> ZEN3 -> GC. This is the order these CPUs would perform in games given 96MB of L3 each. It can even be seen in some stupid comparisons in games too, like games that don't care about large L3 cache like i think CounterStrikes.
CB23 shows that GC core can scale further than Zen3 and that's it, not just some early leak estimation benefit You are claiming.

As i already said methink that for INT based code 7 ZIP is more usefull than CB wich is mainly FP.

FTR we got some tests from a member that got better ST perf in 7 ZIP at 4.5GHz using a 5800X3D than what you got from your RAM tuned 5GHz clocked ADL, and curiously his CPU is generaly better in games than the 12900K.

Looking at raw IPC we can see that the 5800X3D got 0% ST/MT gain in CB but in 7 ZIP MT IPC increased by 6% on a non tuned set up, that s less than the gain in games but still quite significant.


As for rendering tests the 12900K does better in CB and POV Ray while the 5950X is dominant in Corona and Blender, so using CB as only metric is somewhat biaised.
 

uzzi38

Platinum Member
Oct 16, 2019
2,637
5,990
146
I don't really want to go off on a tangent about Spec's reliability when it comes to predicting a CPU's performance in actual commercial applications but lets just say I'm not a fan.

Cinebench R20 had a 16% performance advantage for Golden Cove over Zen 3 with both clocked at 3.6ghz, and I think that was a better predictor of performance than Spec.

That said, IPC is really difficult to measure it seems and highly variable across applications.

No offense, but I find that to be a horribly ridiculous statement that is oddly out of character for you.

What is it exactly about SPEC that makes it a poor determinant in the performance of a CPU core? Claims about memory bandwidth/latency are irrelevent because they provide nothing else but numbers that very rarely actually apply to most CPU tasks. If you're doing something that isn't reliant on memory latency especially, then chances are there's a way to do the same task far better on a GPU. Ironically, that also applies to 3D rendering - GPU rendering is many times faster than CPU rendering here. And besides, Alder Lake always has an advantage in either latency or bandwidth, depending on if it's tested with DDR4/DDR5. If this was such a major issue, then the gap between Alder Lake and Zen 3 should widen, not shrink.

On top of that, what is it about CB that makes it such a good determinant in the performance of a CPU core? I mentionned earlier how high branch prediction rates are in CB, which if anything causes a bit of an overrepresentation of the larger ROB Golden Cove brings relative to most other CPU loads.

Thing is that Cinebench isn't even a good representation of how rendering workloads perform. Compare the 12900KS in Cinebench (20% lead over 5950X) to Blender, V-Ray or Corona and you'll find the two are matched in most reviews, or the 5950X pulls ahead.

The advantage you're seeing in workloads like Cinebench is down to what I mentionned before - the extremely high branch prediction rates in R20/R23. The >99% hit rate for Zen 3 comes from CnC FYI, not from an article but from discussions with them. All you're seeing is Golden Cove's 512 ROB really flex it's legs, and you're attributing that to "the core itself being faster", without realising all you're doing is measuring part of the capabilities of each of the cores.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
The advantage you're seeing in workloads like Cinebench is down to what I mentionned before - the extremely high branch prediction rates in R20/R23. The >99% hit rate for Zen 3 comes from CnC FYI, not from an article but from discussions with them. All you're seeing is Golden Cove's 512 ROB really flex it's legs, and you're attributing that to "the core itself being faster", without realising all you're doing is measuring part of the capabilities of each of the cores.

Or maybe 512 ROB + 5 ALUs each with LEA + uOP cache + wide chip overall being able to chew through instructions and branches, even if some are mispredicted? I would not rush to conclusions that "512 ROB" is sole enabler, maybe the whole core is wider and stronger?
Having benchmark that is not completely broken by more L2 or broken by changing several secondary timings adding 10% to Linpack gflops has virtues of it's own.

I have long and consistently argued against this CB bs ( both in "our Threadripper designed by morons with only half of chiplets connected to memory is 50% faster in CB" and "we ran cb23 and nothing else and found 12900K 20% faster" ), but writing it off cause it does not scale with X3D or Zen4 might or might not catch GC is as stupid.
It's awesome, as well characterized workload that is not easily beaten and heavy enough to run into TDP.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
At blender which is MT based. Not ST. Zen 4 MT will be killer due to higher clocks. But the >15% ST uplift on Cinebench R23 compared to 5950X just makes it equal to 12900K.

Rememeber Intel will be doubling e-cores on desktop. AMD has nothing of value in the mid to low end

7950X on MT Cinebench is hitting 38,000. A 13900K with 8 + 16 + 10% will be about the same, but one will be a Furnace and worst a gaming, the other one will be close to 6 Ghz with POB enabled and hitting nearly 40,000 points while on air cooling.
 

Henry swagger

Senior member
Feb 9, 2022
375
239
86
By establishing so called "pecking" order of cores. Skylake -> ZEN3 -> GC. This is the order these CPUs would perform in games given 96MB of L3 each. It can even be seen in some stupid comparisons in games too, like games that don't care about large L3 cache like i think CounterStrikes.
CB23 shows that GC core can scale further than Zen3 and that's it, not just some early leak estimation benefit You are claiming.
Yeah golden cove is a big wide core that scales well with power.. amd need a wide core for zen 5 to compete
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
What do people working in STEM and forum members of Realworldtech have to say about CB as being a better core performance predictor than SPEC?

I won't speak on their behalf, but if you go over to Realworldtech forums and ask I'm sure you'll get a response. Also, I don't think they themselves ever said that CB is a better core performance predictor than SPEC, only that SPEC itself wasn't a good benchmark.


The concept of IPC as it's generally being used on the forums is an average of most if not all relevant real-world workloads. Real-world apps show very different affinities for memory speed and cache. Not only should a benchmark make use of memory and cache in a similar way as consumer applications do, but memory subsystem performance needs to be taken into account by any benchmark that attempts to give a realistic indication of how a CPU will behave in the real-world.

I agree that memory subsystem and system memory performance are paramount for determining real world performance, but I still think CB has its uses; mostly for single core performance.

When I see a chart like this, it corresponds exactly to where I would place each CPU in the lineup in terms of their attributes and performance:

cinebench-single.png


The one and only reason we value CB for isolating core performance is for the first stages of benchmarking a core (leaks, previews), usually with limited resources and more importantly with no accurate information about the memory subsystem, or no means of comparing using the same memory/cache configuration. However, once we have 3rd party controlled testing available, taking memory subsystem performance into account is mandatory if we aim for more accurate predictions.

I don't disagree with this statement at all.

That's not true, in gaming 5800X and 5800X3D have wildly different IPC. The difference in the memory subsystem increases IPC for most gaming workloads. Again, I think you need to revisit the concept of IPC and how it varies based on memory subsystem performance.

Discussions about IPC to me are frustrating I won't deny, because there is core/microarchitecture IPC (which I think CB measures quite well) and then actual CPU IPC which includes the memory subsystem and the system memory performance. CB to me is great for the former and poor for the latter.
 

MarkPost

Senior member
Mar 1, 2017
234
332
136
As i already said methink that for INT based code 7 ZIP is more usefull than CB wich is mainly FP.

FTR we got some tests from a member that got better ST perf in 7 ZIP at 4.5GHz using a 5800X3D than what you got from your RAM tuned 5GHz clocked ADL, and curiously his CPU is generaly better in games than the 12900K.

Looking at raw IPC we can see that the 5800X3D got 0% ST/MT gain in CB but in 7 ZIP MT IPC increased by 6% on a non tuned set up, that s less than the gain in games but still quite significant.


As for rendering tests the 12900K does better in CB and POV Ray while the 5950X is dominant in Corona and Blender, so using CB as only metric is somewhat biaised.

The "funny" thing is that some sites uses Pov-Ray bench to measure perfomance, but they dont know (I hope thats the reason, and not other) that Pov-Ray public build doesnt use AVX2 on Ryzen CPUs, just AVX. Of course this pitiful issue clearly penalizes Ryzen perf... In fact, compiling Pov-Ray executable (its open source) enabling AVX2 for Ryzen, makes it a little faster than Alder Lake.

In CB, stock vs stock, ADL is faster, but things change when ocing both.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Intel is disparaged because for a decade or more they held technology back, simply because they could and it made them more money. One big leap in performance, at the expense of heat, power and price does not cut it for most people. I know you can't understand that, and that leads you deride everyone as "fanatics", but take a look in the mirror. Intel are going back to the usual 5% increase with RPL.

You're right Intel was definitely cruising due to their process leadership and lack of competition. Well, now things have changed haven't they, because nothing lasts forever. Pat Gelsinger likely won't let Intel backslide and rest on its laurels like his predecessors did, and Dr. Su is a very ambitious CEO who isn't going to let AMD play second fiddle to Intel like it did in the past if she can help it.

The more they compete, the better it will be for us, the consumers.

We don't fully know where Zen4 lies, but its obvious that AMD, with a massively smaller RnD budget, are at least attempting new solutions and new technologies.

Lets agree to not enter a make believe world where Intel is the enemy of all tech enthusiasts and AMD is the white knight fighting against the tyranny and injustice. Not only is it delusional, it's also completely untrue.

Both companies have done lots of great things to progress technology forward, and both have stagnated for a time as well.

ADL is still beaten in situations by regular Zen3, and X3D stomps the KS in lots of scenarios, so no, ADL is not "on another level". It came out a year later and is pushed to the maximum to win, sometimes.

X3D doesn't stomp the KS in lots of scenarios. It's slower than the regular 5800x in non gaming scenarios, and only dominates in gaming when Alder Lake is paired with DDR4 or slow DDR5 memory. When Alder Lake has fast DDR5, it's practically on equal footing with the X3D.

Also, when you consider that Alder Lake only has 8 big cores, the fact that it can even compete with the 5950x in multithreaded apps is astonishing to me.
 
  • Like
Reactions: igor_kavinski