Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

igor_kavinski · Dec 27, 2024

Joe NYC said:
I think the reason people want the 2 V-Cache configuration is because it would be a non-brainer for gaming, would likely beat 9800x3d in pretty much all cases (if the clock speed was just a little higher).

Dual CCD V-cache even 9900X3D with SMT off has a better chance of beating the 9800X3D due to more cache available to all threads. Turning SMT off on 9800X3D may tank performance for some games relying on more than 8 threads.

igor_kavinski · Dec 27, 2024

AMD needs to do something like Intel APO where they create specific profiles for games to pin their cache hungry threads to the V-cache CCD at all times and ensure that no such thread is allowed to be scheduled on the non-V-cache CCD.

Or just give people what they want (dual V-cache CCDs) and then they don't have to bother wasting time profiling every popular game under the sun and keep doing it for every new game.

coercitiv · Dec 27, 2024

I'm with @gdansk on this one: dual V-cache is very tempting from a convenience PoV only as long as we ignore the dual-CCD nature of the product. The single V-cache die problem only makes the dual-CCD problem more obvious. It would be nice if they came up with a more robust software solution than the MS Game Bar "collab".

igor_kavinski · Dec 27, 2024

coercitiv said:
It would be nice if they came up with a more robust software solution than the MS Game Bar "collab".

Maybe once they are done with AI...

I wonder if this utility can be modified to keep threads on the V-cache cores: https://forums.anandtech.com/threads/raptor-lake-official-thread.2599551/post-41361559

MS_AT · Dec 27, 2024

igor_kavinski said:
AMD needs to do something like Intel APO where they create specific profiles for games to pin their cache hungry threads to the V-cache CCD at all times and ensure that no such thread is allowed to be scheduled on the non-V-cache CCD.

Rather game developers need to stop pretending that hybrid CPUs do not exist. Every OS side solution will have some short-comings. APO requires that Intel maintains it, so it will always lag behind. Likewise for game-bar. If the games are CPU-aware they will be able to handle the scheduling correctly, as I think Windows already is exposing all relevant APIs.

igor_kavinski · Dec 27, 2024

MS_AT said:
Rather game developers need to stop pretending that hybrid CPUs do not exist.

I think AMD's game bar solution and Intel's APO are tacit admissions from both companies that they can't do anything about game developers' coding habits.

DrMrLordX · Dec 27, 2024

MS_AT said:
Rather game developers need to stop pretending that hybrid CPUs do not exist.

But not everyone has CPUs with heterogeneous core configurations. Plus do mobile developers need to worry about heterogeneous core configurations? Those have been omnipresent in the mobile space for years now. It seems like mobile ARM platforms handle such configurations better than do Windows desktops.

Win2012R2 · Dec 27, 2024

Joe NYC said:
I think the reason people want the 2 V-Cache configuration is because it would be a non-brainer for gaming, would likely beat 9800x3d in pretty much all cases (if the clock speed was just a little higher).

It's great for work too! Some workloads LOVE cache, this can increase perf way more than 5% extra frequency (which costs lots of extra power).

MS_AT · Dec 27, 2024

igor_kavinski said:
I think AMD's game bar solution and Intel's APO are tacit admissions from both companies that they can't do anything about game developers' coding habits.

They can do, it's called money, I guess this is what nVidia is using by offering software engineering support😉

DrMrLordX said:
Plus do mobile developers need to worry about heterogeneous core configurations?

I would imagine they are using the more powerful cluster to run the games, and ignore the A5xx cores all together. Unless A5xx is all there is, but I am not familiar enough with android to offer anything but a guess.

DrMrLordX · Dec 27, 2024

MS_AT said:
I would imagine they are using the more powerful cluster to run the games, and ignore the A5xx cores all together. Unless A5xx is all there is, but I am not familiar enough with android to offer anything but a guess.

From what I understand they rely on software/firmware for scheduling but I'm not really sure about it either.

StefanR5R · Dec 28, 2024

Win2012R2 said:
"For HPC workloads AMD recommends
1) disabling SMT
2) engaging the "high performance" power profile,
3) running in performance determinism mode
4) running the respective CPU at its maximum configurable TDP (cTDP) value
5) running with four NUMA nodes per socket (4 NPS)"

#1-4 are obvious even without any guides (had them for long time on older EPYCs), will give #5 a go once I get my hands on new stuff, been suggested on here too.

1) There are many HPC workloads indeed which don't scale well or even negatively with SMT. But disabling SMT in the BIOS is not the correct answer to this. Instead, leave SMT on, determine the optimum program thread count of your workload on your hardware and configure the application program respectively. And, perhaps, either use Linux or give the Windows kernel some scheduling hints if it needs them.

Regarding 3),

igor_kavinski said:
https://www.phoronix.com/review/amd-epyc-9005-determinism/7

Those results suggest power determinism is faster.

Yes, it generally is to some extent, depending on the outcome of silicon lottery.

Power determinism: If you have several CPUs of the same SKU, clock each of them so that they individually manage to make the most out of the power budget.
Individual CPUs are as fast as they individually can be. Performance differs between your CPUs to random extent.
Performance determinism: If you have several CPUs of the same SKU, clock them all about the same, such that the worst one doesn't exceed the power budget.
All your CPU's speeds are practically the same. Consequently you know upfront how to optimally distribute the work among them.

When AMD suggests that you stick with performance determinism, they assume (or maybe even state in the documentation) that you have several CPUs (dual socket nodes or/and a multi node cluster)¹ working on your problem and gain something from all partitions of the workload being solved with the same FLOPS.

¹) I haven't researched whether this also pertains to the multiple CCDs within a single CPU.

As a side note, since power determinism lets an individual CPU use more power than when in performance determinism mode, this means that power efficiency in power determinism mode is a bit lower than in performance determinism mode... if we consider CPU socket power only. Maybe not if we consider the whole machine energy use per task. This is the same consideration as with upping the cTDP from default to max.

StefanR5R · Dec 28, 2024

gdansk said:
It would be an inferior gamer part than even the 9800X3D. And inferior 1T to 9950X. And an inferior MT to 9950X. But cost twice as much. So what's it for? And who is it for?

Others have already answered. But I respond in other words: In a variety of games, in several HPC/ workstation use cases, and in several database use cases, it doesn't matter whether the CPU is clocking at 5.4 or at 5.7 GHz while it is waiting for memory requests to be fulfilled. Instead, it matters that more of such requests hit cache.

Furthermore, heterogeneous CPUs are generally a no-go in technical computing. You also tend to avoid such CPUs if running a bunch of VMs is your use case, unless you know beforehand that the VMs have heterogeneous performance requirements and how to spread them best across your heterogeneous CPU.

Win2012R2 said:
If 2nd non-3D chiplet got 5% faster clocks (5.7 vs 5.4) then overall that's like 2% diff for whole chip,

And that's just the core clock speed difference, not the application performance difference,

Win2012R2 said:
totally nothing for risks of bad thread management,

indeed!

gdansk said:
It reduces the "uplift" in 1T from 1.14x (allegedly, more like 1.1x) to 1.07x.

You are considering 1T workloads (or even few-T workloads) which scale very well with core clock speed and are well served by 1 MB L2$ + 32 MB L3$. Many workloads are like this. But this is a kind of workload which is not critical to the prospective buyer of a 16c/32t CPU with 16×1 MB L2$ and 2×96 MB L3$ and an according price tag.

gdansk said:
It would be denounced, rightfully, as a pointless money grab part. AMD shouldn't hurt their reputation in the only place they have a positive reception.

Almost sounds as if you were talking about AMD EPYC 4484PX/4584PX.

gdansk said:
Tiny niche.

AMD served even tinier niches before. (I am not saying they want to, or even will, serve this one.)

gdansk said:
What would actually be an improvement while using the same amount of silicon would be taking both of those 3D caches and stacking them on a single die. But TSMC/AMD aren't doing that.

This increased heterogeneity would be an improvement to matching heterogeneous workloads indeed. (As long as they are scheduled correctly.)

gdansk said:
Mixed is genuinely the better […] configuration.

Except whenever it is the worse configuration. — I am looking at high performance CPUs mostly from the technical computing angle, and I have two rhetorical questions to ask: Who cares about high CPU performance in areas in which high CPU performance is not critical? And in areas in which it is critical, who touches heterogeneous CPUs even with a ten foot pole?

gdansk said:
99.5% people asking for it are wrong to even want that configuration.

Yep, 99.5% people asking for it — I verified your head count and arrived at the same figure ;-) — are erring about their objective need for high performance CPUs.

Now, AMD's product development needs to target what people ask for, not what people objectively require. But as you pointed out, price brackets and production volumes play in to this consideration.

aigomorla · Dec 30, 2024

Trying to get a 9800X3D reminds me of when i was trying to get a 3090 during COVID.
They are being scalped til no tomorrow, with prices as high almost 800 dollars.

I even went to microcenter to get one at the store, and right as i got there, they were all sold out.
The guy said there was a line waiting for tickets even, as i think they are buying them and flipping them on ebay.

Really annoying...

gdansk · Dec 30, 2024

StefanR5R said:
Yep, 99.5% people asking for it — I verified your head count and arrived at the same figure ;-) — are erring about their objective need for high performance CPUs.

Yes, they're wrong and AMD will show them by not supplying it. I'm sorry this keeps upsetting people who are wrong about what they want.

Consumer workloads are better off with one low latency CCD and one big cache CCD. Workstation might not be. But AMD has a mantra for you people: "go buy EPYC". Repeat until you stop imagining markets.

AMD didn't increase core counts. AMD won't increase the number of cache CCDs until well after 9800X3D supply improves. You may not like it but it's a better configuration for the majority of users, including people who want a 9800X3D, and AMD too.

Josh128 · Dec 30, 2024

gdansk said:
Yes, they're wrong and AMD will show them by not supplying it. I'm sorry this keeps upsetting people who are wrong about what they want.

Consumer workloads are better off with one low latency CCD and one big cache CCD. Workstation might not be. But AMD has a mantra for you people: "go buy EPYC". Repeat until you stop imagining markets.

AMD didn't increase core counts. AMD won't increase the number of cache CCDs until well after 9800X3D supply improves. You may not like it but it's a better configuration for the majority of users, including people who want a 9800X3D, and AMD too.

The future isnt 2 vcached CCDs, its 2 CCDs over a single unifying vcache. 🙂

Josh128 · Dec 30, 2024

aigomorla said:
Trying to get a 9800X3D reminds me of when i was trying to get a 3090 during COVID.
They are being scalped til no tomorrow, with prices as high almost 800 dollars.

I even went to microcenter to get one at the store, and right as i got there, they were all sold out.
The guy said there was a line waiting for tickets even, as i think they are buying them and flipping them on ebay.

Really annoying...

There is one answer to this, and as a former scalper myself (sorry, PS3 launch), I can confidently declare--- NEVER pay above MSRP for tech. F***k 'em, dont do it. A little self control will put a world of hurt on scalpers. They can keep their stock until AMD restocks, and restocks again until I get MSRP or better. This is the way.

Bigos · Dec 30, 2024

Josh128 said:
The future isnt 2 vcached CCDs, its 2 CCDs over a single unifying vcache. 🙂

At that point it is one CCX over two CCD, which seems unlikely. You cannot "extend" 2 private L3 caches onto a "unified" vcache and keeping it all as "L3".

You would need to design your CCD to work with a second one to extend the CCX over two dies using a cache die below them both which would connect the ring bus parts. Such a design would require both CCD to be present and the cache die, meaning dedicated silicon for this specific configuration. That makes no sense unless AMD switches all of their client products to such a configuration which would eliminate their low-end products (or, alternatively, make high-end products require special silicon and thus be not economically viable).

It would be interesting if the L3 cache + the ring bus were moved to the bottom die in all configurations. You could then have single-CCD configuration and dual-CCD configuration, with same CCD silicon on top but different CCX sizes. Assuming the reports of SoIC volume being low are correct, this does not seem viable for now, unfortunately.

Thunder 57 · Dec 30, 2024

Josh128 said:
There is one answer to this, and as a former scalper myself (sorry, PS3 launch), I can confidently declare--- NEVER pay above MSRP for tech. F***k 'em, dont do it. A little self control will put a world of hurt on scalpers. They can keep their stock until AMD restocks, and restocks again until I get MSRP or better. This is the way.

Unfortunatly, many lack self control in this "instant gratification" world.

Saylick · Dec 30, 2024

Thunder 57 said:
Unfortunatly, many lack self control in this "instant gratification" world.

Lord Jensen thanks those without impulse control. It's what allows for >$2,000 flagship GPU prices.

DrMrLordX · Dec 30, 2024

If there isn't good day 1 sales on a tech item, it can torpedo the entire product. Look at how many people lost their minds over the Zen5 launch. Over time, the 9950X et al will probably rack up pretty good sales, despite the 9800X3D obviously cannibalizing a lot of those early sales via Osborne effect. It still doesn't matter. People will declare Zen5 to have been a dud, and if AMD didn't have X3D processors lined up and ready to go a few months later, it might have shifted some influencial thinking wrt whether it's worth it for AMD to continue to service the enthusiast diy PC crowd.

Thibsie · Dec 31, 2024

DrMrLordX said:
If there isn't good day 1 sales on a tech item, it can torpedo the entire product. Look at how many people lost their minds over the Zen5 launch. Over time, the 9950X et al will probably rack up pretty good sales, despite the 9800X3D obviously cannibalizing a lot of those early sales via Osborne effect. It still doesn't matter. People will declare Zen5 to have been a dud, and if AMD didn't have X3D processors lined up and ready to go a few months later, it might have shifted some influencial thinking wrt whether it's worth it for AMD to continue to service the enthusiast diy PC crowd.

Yep. Fortunately for AMD, Intel products and behaviour is worse.

StefanR5R · Dec 31, 2024

gdansk said:
Consumer workloads are better off with one low latency CCD and one big cache CCD.

[This has been discussed to death by now, but: A) What has been semi-valid for Zen 4 is no longer going to be valid for Zen 5. B) They are "better off" only in a fantasy world in which operating systems have an omniscient task scheduler. In the real world, heterogeneous CPUs are in some economic respects preferable to homogeneous CPUs, but from the technical perspective they are nothing but kludges.]

gdansk said:
Workstation might not be. But AMD has a mantra for you people: "go buy EPYC".

The complete current wording of the mantra is "go buy EPYC, but not EPYC 4000". Earlier versions of the mantra also mentioned a "Threadripper" but this was a long time ago.

Win2012R2 · Dec 31, 2024

StefanR5R said:
"go buy EPYC, but not EPYC 4000"

Understandable since they can't charge too much for EPYC 4000 since it's just rebadged "consumer" stuff, Intel solved this problem in the past by removing support for ECC, but since Zen 4/5 support it (if you get BIOS and mb) it's hard for AMD to do that.

gdansk · Dec 31, 2024

StefanR5R said:
[This has been discussed to death by now, but: A) What has been semi-valid for Zen 4 is no longer going to be valid for Zen 5. B) They are "better off" only in a fantasy world in which operating systems have an omniscient task scheduler. In the real world, heterogeneous CPUs are in some economic respects preferable to homogeneous CPUs, but from the technical perspective they are nothing but kludges.]

The complete current wording of the mantra is "go buy EPYC, but not EPYC 4000". Earlier versions of the mantra also mentioned a "Threadripper" but this was a long time ago.

Even the plain old 9950X is dependent on preferred CCD scheduling for peak performance. We all lament that but it is the real world for months now.
EPYC 4004 doesn't have any dual cache CCD parts but its successor stands the best chance (after sales channels are saturated with 9800X3D). Threadripper is not being refreshed frequently because self-identified workstation enthusiasts aren't a big market and most prefer used EYPC instead.

LightningZ71 · Dec 31, 2024

Josh128 said:
There is one answer to this, and as a former scalper myself (sorry, PS3 launch), I can confidently declare--- NEVER pay above MSRP for tech. F***k 'em, dont do it. A little self control will put a world of hurt on scalpers. They can keep their stock until AMD restocks, and restocks again until I get MSRP or better. This is the way.

There are enough people out there that don't have a practical limit on what they will spend on new hotness without caring about the risks that will forever give scalpers a target market.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Lifer

Lifer

Diamond Member

Lifer

Golden Member

Lifer

Lifer

Golden Member

Golden Member

Lifer

Elite Member

Elite Member

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

Diamond Member

Banned

Banned

Senior member

Diamond Member

Diamond Member

Lifer

Golden Member

Elite Member

Golden Member

Diamond Member

Platinum Member