Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 478 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

Joe NYC

Golden Member
Jun 26, 2021
1,946
2,286
106
I'd guess they're straight up pinning the entire process to only one CCX. Today, the vast majority of games don't benefit from the second one. But I worry that as CPU intensity and thread demands grow, this solution will start to break down.

The worst case scenario (if the thread is assigned incorrectly) - the game runs like 13900K P core.

Less of a problem then when a demanding gaming thread gets assigned to an E Core on Alder / Raptor Lake.
 

Hougy

Member
Jan 13, 2021
77
60
61
Exactly. But didn't we cover that topic at length weeks ago when the Allowedlist approach went public?
But just to reiterate why this is a very bad idea:
  • They will quite likely put process names like Game.exe on that list - so the assignment will be static for the whole game although preference might change with scenes, Game modes, etc.
  • If the process has more threads than the CCD has cores, who chooses the cache sensitive threads over the frequency sensitive?
  • They need to continually maintain that list. So how future-proof might that be?
  • They will concentrate efforts on AAA titles.
  • Linux support?
So these are only the first dominant thoughts that occurred to me. This is a kind of implementation which puts a company like AMD to shame - considering that they literally develop the most sophisticated kind of object in the world (no, rockets aren't that kind of thing anymore).

The upside is, that it won't matter as much as when the scheduler goes whack on Intel's Big.little. And from a technical perspective this will be quite an interesting topic - how to prioritise when no type of core is the best in every regard. But honestly, a dynamic approach based on thread behaviour by performance counter time-series analysis would have been the much better approach.
That's why I would never buy the versions with 2 dies over the 7800X3D
 
Last edited:
  • Like
Reactions: Joe NYC and biostud

maddie

Diamond Member
Jul 18, 2010
4,740
4,674
136
Intel ALD/RPL got two sets of very different cores (speed, cache, interconnect) compared to Zen 4 3D heterogenous CCDs. Yet it notably hinders the performance in just a few workloads - even on unsupported platforms like Windows 10 or Linux.

Disabling the Intel E-cores helps in a few apps. Disabling SMT also helps now and then. Pining a workload on a single core/cluster also might help. Using the fastest core is exploited to help ST apps. Creating a scheduling group also tends to help a class of workloads. Splitting processors to multiple NUMA domains also can bring gains in certain workloads. Certain apps like certain memory allocation strategies better.

There is a lot of options to tune in the processor vs memory vs kernel realm. However, it depends on the context - averaging the view using many workloads or tuning to fit just a single one.
Just wondering, could the coming new AI blocks help with this? Learn & train on the fly to optimize programs execution?
 
  • Like
Reactions: Kaluan

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
I can't wait for the first games to be identified as having a few threads that really thrash small caches but love big ones and a few threads that need as much frequency as you can throw at them, and wind up just cratering in any strategy on the dual CCD processors because any stratwgy that optimizes for either type of process hurts the other and splitting the processe across CCDs gets hanstrung by the slow cross-ccd communications of the processor.
 

BorisTheBlade82

Senior member
May 1, 2020
664
1,014
106
Just wondering, could the coming new AI blocks help with this? Learn & train on the fly to optimize programs execution?
This is not needed. Plain old in-flight analysis could help in a big way. This is really just lazy thinking from AMD. For the most part of it, this is a solved problem.
I can't wait for the first games to be identified as having a few threads that really thrash small caches but love big ones and a few threads that need as much frequency as you can throw at them, and wind up just cratering in any strategy on the dual CCD processors because any stratwgy that optimizes for either type of process hurts the other and splitting the processe across CCDs gets hanstrung by the slow cross-ccd communications of the processor.
Exactly. Static assignment sucks in a big way. This will only get emphasized, because Zen 7000 will very likely be the only generation where this applies.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
This is not needed. Plain old in-flight analysis could help in a big way. This is really just lazy thinking from AMD. For the most part of it, this is a solved problem.

Exactly. Static assignment sucks in a big way. This will only get emphasized, because Zen 7000 will very likely be the only generation where this applies.
In AMD's defense, the telemetry and algorithm development needed would be a pretty significant undertaking, and as others have said, the penalty for a wrong guess isn't too bad. Still, think this will be a bit of a pain point down the line.
 
  • Like
Reactions: Kaluan

Grabo

Senior member
Apr 5, 2005
240
40
91
Windows only Thread Director software along with a special Windows 11 launch that includes an adapted scheduler.

"Windows only" how?

The Intel Hardware Feedback Interface is used for communicating performance and energy efficiency capabilities of individual CPU cores of the system. Linux in turn will use the Intel HFI data for making improved task placement decisions about where to place given work among the available CPU cores/threads. Intel HFI is important for new Intel Alder Lake processors and forthcoming hybrid processor designs marketed as having "Thread Director"
 

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,661
136
In September 2022 it was still being worked on:
This news from the end of November 2022 is the lastest one on Phoronix:
"As it's 600+ lines of new code and still undergoing review, it's probably not likely it will be squared away in time for the v6.2 kernel merge window opening in two weeks. But hopefully this code and complete Thread Director implementation will be ready to go for a Linux kernel release in the first half of 2023."
 

Grabo

Senior member
Apr 5, 2005
240
40
91
In September 2022 it was still being worked on:
This news from the end of November 2022 is the lastest one on Phoronix:
"As it's 600+ lines of new code and still undergoing review, it's probably not likely it will be squared away in time for the v6.2 kernel merge window opening in two weeks. But hopefully this code and complete Thread Director implementation will be ready to go for a Linux kernel release in the first half of 2023."

A 'complete' TD yes. The big.little architecture is pretty well supported since 5.18. It's an ongoing effort..as it is with Windows I would think, though perhaps slightly behind time-wise. It isn't in Intel's interest to have their current and future CPUs run only under Windows but that's probably not what you were trying to say either.
 

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,661
136
A 'complete' TD yes. The big.little architecture is pretty well supported since 5.18. It's an ongoing effort..as it is with Windows I would think, though perhaps slightly behind time-wise. It isn't in Intel's interest to only have their current and future CPUs run under Windows but that's probably not what you were trying to say either.
What you call well supported I call an unnecessary mess better avoided.

But we are in a Zen 4 thread, not an ADL one, and the post I was responding to was about Zen 4 X3D support and how Intel managed it with P and E cores. The equivalent to that would be AMD convincing Microsoft to launch Windows 12.
 

Grabo

Senior member
Apr 5, 2005
240
40
91
But we are in a Zen 4 thread, not an ADL one, and the post I was responding to was about Zen 4 X3D support and how Intel managed it with P and E cores. The equivalent to that would be AMD convincing Microsoft to launch Windows 12.

I was trying to answer the question on how an OS could assign P and E Cores (specifically how Intel deals with it). Your answer seems to be that a whole new OS would be required, I doubt it, especially since AMD has most likely worked with MS when designing the new 3D versions and there was that test from PCworld this year that showed that Win 10 was pretty much on par with 11. Likewise the Linux kernel is getting jiggy with it. It does depend on what kind of strategy AMD has implemented.
 
Last edited:
  • Like
Reactions: Kaluan and Exist50

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,661
136
Your answer seems to be that a whole new OS would be required
No, I was just stating what Intel did: Launch a Windows only Thread Director software along with Microsoft's Windows 11 launch that includes an adapted scheduler.

In case that wasn't abundantly clear this was in jest, and I'm not really interested in discussing the intricacies of such hybrid implementations as I personally don't think the scheduling complexity they introduce are worth all the troubles involved.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,349
1,534
136
Please consider my post above. It is not as simple as you make it seem.

It doesn't matter what the scheduler likes to do on it's own, on both Windows and Linux it's possible to restrict a program to only ever run on specified logical cpus. See taskset (1) for Linux and you can do the same using a gui through control panel on windows.

Processes are the exact correct granularity for most situations. The desired outcome for 99% of games is almost certainly "all the threads of this program are restricted to run on one CCD". Which CCD they prefer can change, but it is genuinely rare for the advantages of running more than 16 threads to outweigh the increased interprocess communication costs. This is why 7700X often beats the 7950X in games, despite lower clocks.

You can probably get >95% of the way to optimal performance by just tasksetting steam, and then letting everything it starts inherit the setting. It'd be better if steam itself ran on the other CCD, but that's a minimal gain and a lot less effort than doing the games individually.
 

alexruiz

Platinum Member
Sep 21, 2001
2,836
556
126
HUB is testing B650 boards and their initial impression is... bad.

View attachment 75527

While in general I have defended some of the price hikes in motherboards as legitimate due to higher BoM, this kind of poor performance is unacceptable.

On the Zen 4 builder thread I had posted that AGESA ComboAM5 1.0.0.3D is broken on Gigabyte boards.
Games will crash even with RAM as auto, and it will simply not POST with EXPO enabled.
I tried both the B650M DS3H and the X670 Aorus Elite AX
AGESA ComboAM5 1.0.0.3A and 1.0.0.4 work fine though.
I have yet to try a MSI AM5 board, so I cannot comment on that one.

My suspicion is that he tried UEFIs with AGESA 1.0.0.3D.
The B650M DS3H is perfectly stable with AGESA 1.0.0.4 or 1.0.0.3A
I honestly don't what Gigabyte did to their 1.0.0.3D UEFIs
 

Timmah!

Golden Member
Jul 24, 2010
1,418
630
136
Quick question: I am considering keeping the 3090 from my current setup to bolster my rendering speeds, instead of selling it with the rest of the rig. Since the new rig already has 2x 4090 inside, i would need to get the external enclosure for this one. Some people in my country are selling used Lenovo Legion BoostStation for cca 200 EUROs, so i was looking at that.

Here comes the question: the board (asus hero) has that USB4 port, which should work as Thunderbolt 3, if not 4. Apparently, per techpower-up article, it does share bandwith with first M1 slot, which i am going to populate (obviously). Will the bandwith of 4x PCI-E 5.0 lanes be enough for both PCIE 4.0 nvme drive and the GPU connected via the USB4? The GPU will be used for rendering obviously, not gaming, in which case the lesser USB/TB throughput compared to regular onboard PCIE 16x/8x should only be only affecting loading the scene into VRAM, but not actual render speed.

Anyway, from what i could gather 1 link of PCIE 5 is 4GB/s. So 4 of them = 16GB/s. The M2 drive is PCI-E 4.0, thus 4x 2GB = 8GB/s max. USB4 is 40Gbps thus = 5GB/s. 8 + 5 = 13, which is less than 16, so it should be fine for both at the same time. Am i doing this right? Or is there more nuance to it, something i dont know, not that straightforward?