Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Vattila · Oct 6, 2019

Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts!

BorisTheBlade82 · Feb 3, 2023

Tuna-Fish said:
Moving processes from one core to another is something for which tools already exist, both in Windows and in Linux. The only major issue is that some games have anticheat that makes this difficult.

Please consider my post above. It is not as simple as you make it seem.

Joe NYC · Feb 3, 2023

Exist50 said:
I'd guess they're straight up pinning the entire process to only one CCX. Today, the vast majority of games don't benefit from the second one. But I worry that as CPU intensity and thread demands grow, this solution will start to break down.

The worst case scenario (if the thread is assigned incorrectly) - the game runs like 13900K P core.

Less of a problem then when a demanding gaming thread gets assigned to an E Core on Alder / Raptor Lake.

Hougy · Feb 3, 2023

BorisTheBlade82 said:
Exactly. But didn't we cover that topic at length weeks ago when the Allowedlist approach went public?
But just to reiterate why this is a very bad idea:

They will quite likely put process names like Game.exe on that list - so the assignment will be static for the whole game although preference might change with scenes, Game modes, etc.

If the process has more threads than the CCD has cores, who chooses the cache sensitive threads over the frequency sensitive?

They need to continually maintain that list. So how future-proof might that be?

They will concentrate efforts on AAA titles.

Linux support?

So these are only the first dominant thoughts that occurred to me. This is a kind of implementation which puts a company like AMD to shame - considering that they literally develop the most sophisticated kind of object in the world (no, rockets aren't that kind of thing anymore).

The upside is, that it won't matter as much as when the scheduler goes whack on Intel's Big.little. And from a technical perspective this will be quite an interesting topic - how to prioritise when no type of core is the best in every regard. But honestly, a dynamic approach based on thread behaviour by performance counter time-series analysis would have been the much better approach.

That's why I would never buy the versions with 2 dies over the 7800X3D

maddie · Feb 3, 2023

yuri69 said:
Intel ALD/RPL got two sets of very different cores (speed, cache, interconnect) compared to Zen 4 3D heterogenous CCDs. Yet it notably hinders the performance in just a few workloads - even on unsupported platforms like Windows 10 or Linux.

Disabling the Intel E-cores helps in a few apps. Disabling SMT also helps now and then. Pining a workload on a single core/cluster also might help. Using the fastest core is exploited to help ST apps. Creating a scheduling group also tends to help a class of workloads. Splitting processors to multiple NUMA domains also can bring gains in certain workloads. Certain apps like certain memory allocation strategies better.

There is a lot of options to tune in the processor vs memory vs kernel realm. However, it depends on the context - averaging the view using many workloads or tuning to fit just a single one.

Just wondering, could the coming new AI blocks help with this? Learn & train on the fly to optimize programs execution?

LightningZ71 · Feb 3, 2023

I can't wait for the first games to be identified as having a few threads that really thrash small caches but love big ones and a few threads that need as much frequency as you can throw at them, and wind up just cratering in any strategy on the dual CCD processors because any stratwgy that optimizes for either type of process hurts the other and splitting the processe across CCDs gets hanstrung by the slow cross-ccd communications of the processor.

MrTeal · Feb 3, 2023

Just rename every game to CallOfDuty.exe and you're good to go.

BorisTheBlade82 · Feb 3, 2023

maddie said:
Just wondering, could the coming new AI blocks help with this? Learn & train on the fly to optimize programs execution?

This is not needed. Plain old in-flight analysis could help in a big way. This is really just lazy thinking from AMD. For the most part of it, this is a solved problem.

LightningZ71 said:
I can't wait for the first games to be identified as having a few threads that really thrash small caches but love big ones and a few threads that need as much frequency as you can throw at them, and wind up just cratering in any strategy on the dual CCD processors because any stratwgy that optimizes for either type of process hurts the other and splitting the processe across CCDs gets hanstrung by the slow cross-ccd communications of the processor.

Exactly. Static assignment sucks in a big way. This will only get emphasized, because Zen 7000 will very likely be the only generation where this applies.

Exist50 · Feb 4, 2023

BorisTheBlade82 said:
This is not needed. Plain old in-flight analysis could help in a big way. This is really just lazy thinking from AMD. For the most part of it, this is a solved problem.

Exactly. Static assignment sucks in a big way. This will only get emphasized, because Zen 7000 will very likely be the only generation where this applies.

In AMD's defense, the telemetry and algorithm development needed would be a pretty significant undertaking, and as others have said, the penalty for a wrong guess isn't too bad. Still, think this will be a bit of a pain point down the line.

biostud · Feb 4, 2023

How does Intel handle P and E cores?

moinmoin · Feb 4, 2023

biostud said:
How does Intel handle P and E cores?

Windows only Thread Director software along with a special Windows 11 launch that includes an adapted scheduler.

Grabo · Feb 4, 2023

moinmoin said:
Windows only Thread Director software along with a special Windows 11 launch that includes an adapted scheduler.

"Windows only" how?

The Intel Hardware Feedback Interface is used for communicating performance and energy efficiency capabilities of individual CPU cores of the system. Linux in turn will use the Intel HFI data for making improved task placement decisions about where to place given work among the available CPU cores/threads. Intel HFI is important for new Intel Alder Lake processors and forthcoming hybrid processor designs marketed as having "Thread Director"

Intel HFI To Premiere In Linux 5.18 For Improving Hybrid CPU Performance/Efficiency - Phoronix

www.phoronix.com

moinmoin · Feb 4, 2023

Grabo said:
"Windows only" how?

Intel HFI To Premiere In Linux 5.18 For Improving Hybrid CPU Performance/Efficiency - Phoronix

www.phoronix.com

In September 2022 it was still being worked on:

Intel Posts Big Linux Patch Set For "Classes of Tasks" On Hybrid CPUs, Thread Director - Phoronix

www.phoronix.com

This news from the end of November 2022 is the lastest one on Phoronix:

Intel Advances Linux "IPC Classes" Design To Improve Load Balancing For Hybrid CPUs - Phoronix

www.phoronix.com

"As it's 600+ lines of new code and still undergoing review, it's probably not likely it will be squared away in time for the v6.2 kernel merge window opening in two weeks. But hopefully this code and complete Thread Director implementation will be ready to go for a Linux kernel release in the first half of 2023."

Grabo · Feb 4, 2023

moinmoin said:
In September 2022 it was still being worked on:

Intel Posts Big Linux Patch Set For "Classes of Tasks" On Hybrid CPUs, Thread Director - Phoronix

www.phoronix.com

This news from the end of November 2022 is the lastest one on Phoronix:

Intel Advances Linux "IPC Classes" Design To Improve Load Balancing For Hybrid CPUs - Phoronix

www.phoronix.com

"As it's 600+ lines of new code and still undergoing review, it's probably not likely it will be squared away in time for the v6.2 kernel merge window opening in two weeks. But hopefully this code and complete Thread Director implementation will be ready to go for a Linux kernel release in the first half of 2023."

A 'complete' TD yes. The big.little architecture is pretty well supported since 5.18. It's an ongoing effort..as it is with Windows I would think, though perhaps slightly behind time-wise. It isn't in Intel's interest to have their current and future CPUs run only under Windows but that's probably not what you were trying to say either.

moinmoin · Feb 4, 2023

Grabo said:
A 'complete' TD yes. The big.little architecture is pretty well supported since 5.18. It's an ongoing effort..as it is with Windows I would think, though perhaps slightly behind time-wise. It isn't in Intel's interest to only have their current and future CPUs run under Windows but that's probably not what you were trying to say either.

What you call well supported I call an unnecessary mess better avoided.

But we are in a Zen 4 thread, not an ADL one, and the post I was responding to was about Zen 4 X3D support and how Intel managed it with P and E cores. The equivalent to that would be AMD convincing Microsoft to launch Windows 12.

Grabo · Feb 4, 2023

moinmoin said:
But we are in a Zen 4 thread, not an ADL one, and the post I was responding to was about Zen 4 X3D support and how Intel managed it with P and E cores. The equivalent to that would be AMD convincing Microsoft to launch Windows 12.

I was trying to answer the question on how an OS could assign P and E Cores (specifically how Intel deals with it). Your answer seems to be that a whole new OS would be required, I doubt it, especially since AMD has most likely worked with MS when designing the new 3D versions and there was that test from PCworld this year that showed that Win 10 was pretty much on par with 11. Likewise the Linux kernel is getting jiggy with it. It does depend on what kind of strategy AMD has implemented.

moinmoin · Feb 4, 2023

Grabo said:
Your answer seems to be that a whole new OS would be required

No, I was just stating what Intel did: Launch a Windows only Thread Director software along with Microsoft's Windows 11 launch that includes an adapted scheduler.

In case that wasn't abundantly clear this was in jest, and I'm not really interested in discussing the intricacies of such hybrid implementations as I personally don't think the scheduling complexity they introduce are worth all the troubles involved.

Tuna-Fish · Feb 4, 2023

BorisTheBlade82 said:
Please consider my post above. It is not as simple as you make it seem.

It doesn't matter what the scheduler likes to do on it's own, on both Windows and Linux it's possible to restrict a program to only ever run on specified logical cpus. See taskset (1) for Linux and you can do the same using a gui through control panel on windows.

Processes are the exact correct granularity for most situations. The desired outcome for 99% of games is almost certainly "all the threads of this program are restricted to run on one CCD". Which CCD they prefer can change, but it is genuinely rare for the advantages of running more than 16 threads to outweigh the increased interprocess communication costs. This is why 7700X often beats the 7950X in games, despite lower clocks.

You can probably get >95% of the way to optimal performance by just tasksetting steam, and then letting everything it starts inherit the setting. It'd be better if steam itself ran on the other CCD, but that's a minimal gain and a lot less effort than doing the games individually.

alexruiz · Feb 5, 2023

coercitiv said:
HUB is testing B650 boards and their initial impression is... bad.

View attachment 75527

While in general I have defended some of the price hikes in motherboards as legitimate due to higher BoM, this kind of poor performance is unacceptable.

On the Zen 4 builder thread I had posted that AGESA ComboAM5 1.0.0.3D is broken on Gigabyte boards.
Games will crash even with RAM as auto, and it will simply not POST with EXPO enabled.
I tried both the B650M DS3H and the X670 Aorus Elite AX
AGESA ComboAM5 1.0.0.3A and 1.0.0.4 work fine though.
I have yet to try a MSI AM5 board, so I cannot comment on that one.

My suspicion is that he tried UEFIs with AGESA 1.0.0.3D.
The B650M DS3H is perfectly stable with AGESA 1.0.0.4 or 1.0.0.3A
I honestly don't what Gigabyte did to their 1.0.0.3D UEFIs

Timmah! · Feb 5, 2023

Quick question: I am considering keeping the 3090 from my current setup to bolster my rendering speeds, instead of selling it with the rest of the rig. Since the new rig already has 2x 4090 inside, i would need to get the external enclosure for this one. Some people in my country are selling used Lenovo Legion BoostStation for cca 200 EUROs, so i was looking at that.

Here comes the question: the board (asus hero) has that USB4 port, which should work as Thunderbolt 3, if not 4. Apparently, per techpower-up article, it does share bandwith with first M1 slot, which i am going to populate (obviously). Will the bandwith of 4x PCI-E 5.0 lanes be enough for both PCIE 4.0 nvme drive and the GPU connected via the USB4? The GPU will be used for rendering obviously, not gaming, in which case the lesser USB/TB throughput compared to regular onboard PCIE 16x/8x should only be only affecting loading the scene into VRAM, but not actual render speed.

Anyway, from what i could gather 1 link of PCIE 5 is 4GB/s. So 4 of them = 16GB/s. The M2 drive is PCI-E 4.0, thus 4x 2GB = 8GB/s max. USB4 is 40Gbps thus = 5GB/s. 8 + 5 = 13, which is less than 16, so it should be fine for both at the same time. Am i doing this right? Or is there more nuance to it, something i dont know, not that straightforward?

MadRat · Feb 6, 2023

Obviously none of the people complaining were playing games in the Win9x through XP years, where games largely depended on end-user configuration tuning. Younger folks need to up their nerd skills.

Thibsie · Feb 6, 2023

MadRat said:
Obviously none of the people complaining were playing games in the Win9x through XP years, where games largely depended on end-user configuration tuning. Younger folks need to up their nerd skills.

Those never did address, irq, DMA or SCSI channels manual settings know nothing. 'nuff said. xD

biostud · Feb 6, 2023

Thibsie said:
Those never did address, irq, DMA or SCSI channels manual settings know nothing. 'nuff said. xD

anschlusskabel-fuer-35-und-525-diskettenlaufwerke-fdd-kabel.jpg

BorisTheBlade82 · Feb 6, 2023

Or that needed to customize config.sys and autoexec.bat in order to get as much of the precious 640Kbyte conventional memory as you probably could 😂

biostud · Feb 6, 2023

BorisTheBlade82 said:
Or that needed to customize config.sys and autoexec.bat in order to get as much of the precious 640Kbyte conventional memory as you probably could 😂

I remember creating a boot menu in dos 6.22, with QEMM and Stacker installed on a 386sx 16mhz with a 40mb HDD partitioned in two drives

Mopetar · Feb 6, 2023

MadRat said:
Obviously none of the people complaining were playing games in the Win9x through XP years, where games largely depended on end-user configuration tuning. Younger folks need to up their nerd skills.

But why do all that when you could just download more RAM off the internet?

Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Senior member

Senior member

Diamond Member

Member

Diamond Member

Platinum Member

Diamond Member

Senior member

Platinum Member

Lifer

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Golden Member

Platinum Member

Golden Member

Lifer

Golden Member

Lifer

Senior member

Lifer

Diamond Member