Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 906 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

gdansk

Diamond Member
Feb 8, 2011
4,574
7,691
136
Narrowing the clock speed deficit from 500 MHz (7800x3d vs 7700x) to 300 MHz (9800x3d vs 9700x) was enough to make 9800x3d approx. equal in to 9700 in non-gaming benchmarks
That's with 142W PL (which it actually hits now unlike the 7800X3D) vs 88W. If you're looking at MT it's a terrible extrapolation to the 9950X3D. In 1T it's still generally behind and the same when power limits go out the window. It won't perform better in games than the 9800X3D without applying the same tweaks to prevent core migration that the plain 9950X uses. And if you're doing that anyway, why not have the more flexible approach with a lower latency, higher frequency CCD for the multitude of applications and some games that prefer it? At a lower cost too.
 
Last edited:
Jul 27, 2020
28,042
19,146
146
I think the reason people want the 2 V-Cache configuration is because it would be a non-brainer for gaming, would likely beat 9800x3d in pretty much all cases (if the clock speed was just a little higher).
Dual CCD V-cache even 9900X3D with SMT off has a better chance of beating the 9800X3D due to more cache available to all threads. Turning SMT off on 9800X3D may tank performance for some games relying on more than 8 threads.
 
  • Like
Reactions: Tlh97 and Golgatha
Jul 27, 2020
28,042
19,146
146
AMD needs to do something like Intel APO where they create specific profiles for games to pin their cache hungry threads to the V-cache CCD at all times and ensure that no such thread is allowed to be scheduled on the non-V-cache CCD.

Or just give people what they want (dual V-cache CCDs) and then they don't have to bother wasting time profiling every popular game under the sun and keep doing it for every new game.
 

coercitiv

Diamond Member
Jan 24, 2014
7,355
17,425
136
I'm with @gdansk on this one: dual V-cache is very tempting from a convenience PoV only as long as we ignore the dual-CCD nature of the product. The single V-cache die problem only makes the dual-CCD problem more obvious. It would be nice if they came up with a more robust software solution than the MS Game Bar "collab".
 

MS_AT

Senior member
Jul 15, 2024
869
1,765
96
AMD needs to do something like Intel APO where they create specific profiles for games to pin their cache hungry threads to the V-cache CCD at all times and ensure that no such thread is allowed to be scheduled on the non-V-cache CCD.
Rather game developers need to stop pretending that hybrid CPUs do not exist. Every OS side solution will have some short-comings. APO requires that Intel maintains it, so it will always lag behind. Likewise for game-bar. If the games are CPU-aware they will be able to handle the scheduling correctly, as I think Windows already is exposing all relevant APIs.
 
  • Like
Reactions: Vattila and Kryohi

DrMrLordX

Lifer
Apr 27, 2000
22,905
12,974
136
Rather game developers need to stop pretending that hybrid CPUs do not exist.
But not everyone has CPUs with heterogeneous core configurations. Plus do mobile developers need to worry about heterogeneous core configurations? Those have been omnipresent in the mobile space for years now. It seems like mobile ARM platforms handle such configurations better than do Windows desktops.
 

Win2012R2

Golden Member
Dec 5, 2024
1,209
1,248
96
I think the reason people want the 2 V-Cache configuration is because it would be a non-brainer for gaming, would likely beat 9800x3d in pretty much all cases (if the clock speed was just a little higher).
It's great for work too! Some workloads LOVE cache, this can increase perf way more than 5% extra frequency (which costs lots of extra power).
 

MS_AT

Senior member
Jul 15, 2024
869
1,765
96
I think AMD's game bar solution and Intel's APO are tacit admissions from both companies that they can't do anything about game developers' coding habits.
They can do, it's called money, I guess this is what nVidia is using by offering software engineering support;)
Plus do mobile developers need to worry about heterogeneous core configurations?
I would imagine they are using the more powerful cluster to run the games, and ignore the A5xx cores all together. Unless A5xx is all there is, but I am not familiar enough with android to offer anything but a guess.
 
  • Like
Reactions: lightmanek

DrMrLordX

Lifer
Apr 27, 2000
22,905
12,974
136
I would imagine they are using the more powerful cluster to run the games, and ignore the A5xx cores all together. Unless A5xx is all there is, but I am not familiar enough with android to offer anything but a guess.
From what I understand they rely on software/firmware for scheduling but I'm not really sure about it either.
 

StefanR5R

Elite Member
Dec 10, 2016
6,675
10,576
136
"For HPC workloads AMD recommends
1) disabling SMT
2) engaging the "high performance" power profile,
3) running in performance determinism mode
4) running the respective CPU at its maximum configurable TDP (cTDP) value
5) running with four NUMA nodes per socket (4 NPS)"


#1-4 are obvious even without any guides (had them for long time on older EPYCs), will give #5 a go once I get my hands on new stuff, been suggested on here too.
1) There are many HPC workloads indeed which don't scale well or even negatively with SMT. But disabling SMT in the BIOS is not the correct answer to this. Instead, leave SMT on, determine the optimum program thread count of your workload on your hardware and configure the application program respectively. And, perhaps, either use Linux or give the Windows kernel some scheduling hints if it needs them.

Regarding 3),
https://www.phoronix.com/review/amd-epyc-9005-determinism/7

Those results suggest power determinism is faster.
Yes, it generally is to some extent, depending on the outcome of silicon lottery.
  • Power determinism: If you have several CPUs of the same SKU, clock each of them so that they individually manage to make the most out of the power budget.
    Individual CPUs are as fast as they individually can be. Performance differs between your CPUs to random extent.
  • Performance determinism: If you have several CPUs of the same SKU, clock them all about the same, such that the worst one doesn't exceed the power budget.
    All your CPU's speeds are practically the same. Consequently you know upfront how to optimally distribute the work among them.
When AMD suggests that you stick with performance determinism, they assume (or maybe even state in the documentation) that you have several CPUs (dual socket nodes or/and a multi node cluster)¹ working on your problem and gain something from all partitions of the workload being solved with the same FLOPS.

¹) I haven't researched whether this also pertains to the multiple CCDs within a single CPU.

As a side note, since power determinism lets an individual CPU use more power than when in performance determinism mode, this means that power efficiency in power determinism mode is a bit lower than in performance determinism mode... if we consider CPU socket power only. Maybe not if we consider the whole machine energy use per task. This is the same consideration as with upping the cTDP from default to max.
 

StefanR5R

Elite Member
Dec 10, 2016
6,675
10,576
136
It would be an inferior gamer part than even the 9800X3D. And inferior 1T to 9950X. And an inferior MT to 9950X. But cost twice as much. So what's it for? And who is it for?
Others have already answered. But I respond in other words: In a variety of games, in several HPC/ workstation use cases, and in several database use cases, it doesn't matter whether the CPU is clocking at 5.4 or at 5.7 GHz while it is waiting for memory requests to be fulfilled. Instead, it matters that more of such requests hit cache.

Furthermore, heterogeneous CPUs are generally a no-go in technical computing. You also tend to avoid such CPUs if running a bunch of VMs is your use case, unless you know beforehand that the VMs have heterogeneous performance requirements and how to spread them best across your heterogeneous CPU.

If 2nd non-3D chiplet got 5% faster clocks (5.7 vs 5.4) then overall that's like 2% diff for whole chip,
And that's just the core clock speed difference, not the application performance difference,
totally nothing for risks of bad thread management,
indeed!

It reduces the "uplift" in 1T from 1.14x (allegedly, more like 1.1x) to 1.07x.
You are considering 1T workloads (or even few-T workloads) which scale very well with core clock speed and are well served by 1 MB L2$ + 32 MB L3$. Many workloads are like this. But this is a kind of workload which is not critical to the prospective buyer of a 16c/32t CPU with 16×1 MB L2$ and 2×96 MB L3$ and an according price tag.

It would be denounced, rightfully, as a pointless money grab part. AMD shouldn't hurt their reputation in the only place they have a positive reception.
Almost sounds as if you were talking about AMD EPYC 4484PX/4584PX.

Tiny niche.
AMD served even tinier niches before. (I am not saying they want to, or even will, serve this one.)

What would actually be an improvement while using the same amount of silicon would be taking both of those 3D caches and stacking them on a single die. But TSMC/AMD aren't doing that.
This increased heterogeneity would be an improvement to matching heterogeneous workloads indeed. (As long as they are scheduled correctly.)

Mixed is genuinely the better […] configuration.
Except whenever it is the worse configuration. — I am looking at high performance CPUs mostly from the technical computing angle, and I have two rhetorical questions to ask: Who cares about high CPU performance in areas in which high CPU performance is not critical? And in areas in which it is critical, who touches heterogeneous CPUs even with a ten foot pole?

99.5% people asking for it are wrong to even want that configuration.
Yep, 99.5% people asking for it — I verified your head count and arrived at the same figure ;-) — are erring about their objective need for high performance CPUs.

Now, AMD's product development needs to target what people ask for, not what people objectively require. But as you pointed out, price brackets and production volumes play in to this consideration.
 

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
21,067
3,574
126
Trying to get a 9800X3D reminds me of when i was trying to get a 3090 during COVID.
They are being scalped til no tomorrow, with prices as high almost 800 dollars.

I even went to microcenter to get one at the store, and right as i got there, they were all sold out.
The guy said there was a line waiting for tickets even, as i think they are buying them and flipping them on ebay.

Really annoying...
 

gdansk

Diamond Member
Feb 8, 2011
4,574
7,691
136
Yep, 99.5% people asking for it — I verified your head count and arrived at the same figure ;-) — are erring about their objective need for high performance CPUs.
Yes, they're wrong and AMD will show them by not supplying it. I'm sorry this keeps upsetting people who are wrong about what they want.

Consumer workloads are better off with one low latency CCD and one big cache CCD. Workstation might not be. But AMD has a mantra for you people: "go buy EPYC". Repeat until you stop imagining markets.

AMD didn't increase core counts. AMD won't increase the number of cache CCDs until well after 9800X3D supply improves. You may not like it but it's a better configuration for the majority of users, including people who want a 9800X3D, and AMD too.
 
Last edited:

Josh128

Golden Member
Oct 14, 2022
1,324
1,996
106
Yes, they're wrong and AMD will show them by not supplying it. I'm sorry this keeps upsetting people who are wrong about what they want.

Consumer workloads are better off with one low latency CCD and one big cache CCD. Workstation might not be. But AMD has a mantra for you people: "go buy EPYC". Repeat until you stop imagining markets.

AMD didn't increase core counts. AMD won't increase the number of cache CCDs until well after 9800X3D supply improves. You may not like it but it's a better configuration for the majority of users, including people who want a 9800X3D, and AMD too.
The future isnt 2 vcached CCDs, its 2 CCDs over a single unifying vcache. :)
 

Josh128

Golden Member
Oct 14, 2022
1,324
1,996
106
Trying to get a 9800X3D reminds me of when i was trying to get a 3090 during COVID.
They are being scalped til no tomorrow, with prices as high almost 800 dollars.

I even went to microcenter to get one at the store, and right as i got there, they were all sold out.
The guy said there was a line waiting for tickets even, as i think they are buying them and flipping them on ebay.

Really annoying...
There is one answer to this, and as a former scalper myself (sorry, PS3 launch), I can confidently declare--- NEVER pay above MSRP for tech. F***k 'em, dont do it. A little self control will put a world of hurt on scalpers. They can keep their stock until AMD restocks, and restocks again until I get MSRP or better. This is the way.
 

Bigos

Senior member
Jun 2, 2019
204
519
136
The future isnt 2 vcached CCDs, its 2 CCDs over a single unifying vcache. :)
At that point it is one CCX over two CCD, which seems unlikely. You cannot "extend" 2 private L3 caches onto a "unified" vcache and keeping it all as "L3".

You would need to design your CCD to work with a second one to extend the CCX over two dies using a cache die below them both which would connect the ring bus parts. Such a design would require both CCD to be present and the cache die, meaning dedicated silicon for this specific configuration. That makes no sense unless AMD switches all of their client products to such a configuration which would eliminate their low-end products (or, alternatively, make high-end products require special silicon and thus be not economically viable).

It would be interesting if the L3 cache + the ring bus were moved to the bottom die in all configurations. You could then have single-CCD configuration and dual-CCD configuration, with same CCD silicon on top but different CCX sizes. Assuming the reports of SoIC volume being low are correct, this does not seem viable for now, unfortunately.
 

Thunder 57

Diamond Member
Aug 19, 2007
4,034
6,748
136
There is one answer to this, and as a former scalper myself (sorry, PS3 launch), I can confidently declare--- NEVER pay above MSRP for tech. F***k 'em, dont do it. A little self control will put a world of hurt on scalpers. They can keep their stock until AMD restocks, and restocks again until I get MSRP or better. This is the way.

Unfortunatly, many lack self control in this "instant gratification" world.
 

DrMrLordX

Lifer
Apr 27, 2000
22,905
12,974
136
If there isn't good day 1 sales on a tech item, it can torpedo the entire product. Look at how many people lost their minds over the Zen5 launch. Over time, the 9950X et al will probably rack up pretty good sales, despite the 9800X3D obviously cannibalizing a lot of those early sales via Osborne effect. It still doesn't matter. People will declare Zen5 to have been a dud, and if AMD didn't have X3D processors lined up and ready to go a few months later, it might have shifted some influencial thinking wrt whether it's worth it for AMD to continue to service the enthusiast diy PC crowd.
 

Thibsie

Golden Member
Apr 25, 2017
1,127
1,334
136
If there isn't good day 1 sales on a tech item, it can torpedo the entire product. Look at how many people lost their minds over the Zen5 launch. Over time, the 9950X et al will probably rack up pretty good sales, despite the 9800X3D obviously cannibalizing a lot of those early sales via Osborne effect. It still doesn't matter. People will declare Zen5 to have been a dud, and if AMD didn't have X3D processors lined up and ready to go a few months later, it might have shifted some influencial thinking wrt whether it's worth it for AMD to continue to service the enthusiast diy PC crowd.
Yep. Fortunately for AMD, Intel products and behaviour is worse.
 

StefanR5R

Elite Member
Dec 10, 2016
6,675
10,576
136
Consumer workloads are better off with one low latency CCD and one big cache CCD.
[This has been discussed to death by now, but: A) What has been semi-valid for Zen 4 is no longer going to be valid for Zen 5. B) They are "better off" only in a fantasy world in which operating systems have an omniscient task scheduler. In the real world, heterogeneous CPUs are in some economic respects preferable to homogeneous CPUs, but from the technical perspective they are nothing but kludges.]
Workstation might not be. But AMD has a mantra for you people: "go buy EPYC".
The complete current wording of the mantra is "go buy EPYC, but not EPYC 4000". Earlier versions of the mantra also mentioned a "Threadripper" but this was a long time ago.
 

Win2012R2

Golden Member
Dec 5, 2024
1,209
1,248
96
"go buy EPYC, but not EPYC 4000"
Understandable since they can't charge too much for EPYC 4000 since it's just rebadged "consumer" stuff, Intel solved this problem in the past by removing support for ECC, but since Zen 4/5 support it (if you get BIOS and mb) it's hard for AMD to do that.
 

gdansk

Diamond Member
Feb 8, 2011
4,574
7,691
136
[This has been discussed to death by now, but: A) What has been semi-valid for Zen 4 is no longer going to be valid for Zen 5. B) They are "better off" only in a fantasy world in which operating systems have an omniscient task scheduler. In the real world, heterogeneous CPUs are in some economic respects preferable to homogeneous CPUs, but from the technical perspective they are nothing but kludges.]

The complete current wording of the mantra is "go buy EPYC, but not EPYC 4000". Earlier versions of the mantra also mentioned a "Threadripper" but this was a long time ago.
Even the plain old 9950X is dependent on preferred CCD scheduling for peak performance. We all lament that but it is the real world for months now.
EPYC 4004 doesn't have any dual cache CCD parts but its successor stands the best chance (after sales channels are saturated with 9800X3D). Threadripper is not being refreshed frequently because self-identified workstation enthusiasts aren't a big market and most prefer used EYPC instead.
 
Last edited: