Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

cherullo · Jul 5, 2023

Ok, here goes nothing: the workload where SMT is not very helpful is gaming.
So you can have, say 4 SMT-less, normal-sized cores able to reach 5Ghz running the game's main thread and other high priority stuff for that sweet 120hz goodness, and 8 dense, SMT-capable cores for throughput, running around 3Ghz.
Now, you'll want some more cores for OS background tasks, let's say 4 dense cores for this. You organize this in two 2+6 CCXs which also works well for low-end mobile, and there you go.

RnR_au · Jul 5, 2023

Kepler_L2 said:
Yup it's Styx.

"Just like other 2016 chips from AMD, “Styx” will be made using 14nm fabrication process at GlobalFoundries."

Wait... wut? Codenames are confusing...

naukkis · Jul 5, 2023

Tuna-Fish said:
When AMD ships c and normal cores as part of a single CPU, this is not true for them. If both cores have SMT2, when every core is loaded, the c cores have significantly lower throughput, assuming they clock significantly lower. If SMT2 is disabled for them, this would be fixed, and in the client world where the windows scheduler determines how good your product is, it might be worthwhile to match what Intel does so that the systems work the same.

What they want to do is to disable SMT from fast cores. So 1-thread preferred loads can always get best performing cores. Slow cores can have SMT enabled as they are utilized only for high-throughput cases when all slow cores are already in use and SMT can bring additional throughput. SMT enabled fast cores would do exactly opposite, non smt slow core would then be faster than fast core with SMT complicating finding fastest core for high-priority threads.

Kepler_L2 · Jul 5, 2023

RnR_au said:
"Just like other 2016 chips from AMD, “Styx” will be made using 14nm fabrication process at GlobalFoundries."

Wait... wut? Codenames are confusing...

Was that ever released? Maybe they just felt like reusing the codename. Anyway this info is from a old-ish roadmap, about 1/3 of the Zen4/Zen5 products there have been canned, so I wouldn't put too much faith into it being the final codename.

Abwx · Jul 5, 2023

Tuna-Fish said:
This is not about disabling HT through software, it's about asking if the core is capable of HT in the first place.

And the reason is probably making scheduling easier. Right now, on Intel the small core/big core divide is not as much of a problem as it could be for scheduling because a small core is pretty close in performance to one HT thread of the big core. That is, if all threads on the CPU are loaded, they are all about equally fast. The problem cases where it's difficult for schedulers is when the available parallelism of multithreaded workload is > the amount of big cores, but not unbounded.

When AMD ships c and normal cores as part of a single CPU, this is not true for them. If both cores have SMT2, when every core is loaded, the c cores have significantly lower throughput, assuming they clock significantly lower. If SMT2 is disabled for them, this would be fixed, and in the client world where the windows scheduler determines how good your product is, it might be worthwhile to match what Intel does so that the systems work the same.

SMT is used only when the CPU run out of cores for the thread amount, with Bergamo 128 cores will be used for the 128 first threads and SMT is used for threads 129 to 256, this way up to 128T there s an optimal ST perf and quite higher than what is provided by a small core, it s not like such a CPU is always working at full 256T loading.

RnR_au · Jul 5, 2023

Kepler_L2 said:
Was that ever released?

Briefly apparently...

Amid the renewed interest in Arm-based servers, it is easy to forget that one company with experience in building server platforms actually brought to market its own Arm-based processor before apparently losing interest: AMD.

Now it has emerged that Jim Keller, a key architect who worked on Arm development at AMD, reckons the chipmaker was wrong to halt the project after he left the company in 2016.

https://www.theregister.com/2022/06/20/jim_keller_arm_cpu/

Exist50 · Jul 5, 2023

Tuna-Fish said:
And the reason is probably making scheduling easier. Right now, on Intel the small core/big core divide is not as much of a problem as it could be for scheduling because a small core is pretty close in performance to one HT thread of the big core. That is, if all threads on the CPU are loaded, they are all about equally fast. The problem cases where it's difficult for schedulers is when the available parallelism of multithreaded workload is > the amount of big cores, but not unbounded.

Even if the per-thread performance ends up similar, "Thread Director" definitely prioritizes SMT threads last. But in this case, I'm interested in what exactly is making the scheduling decisions. Sounds like they're making this threading info OS visible to assist that side of the scheduling algorithm?

Ajay · Jul 5, 2023

HurleyBird said:
Well, this is going to depend a lot on implementation details, but you got that generally backwards. Increased ILP will not scale anywhere near 1:1 as you add more resources to the core. Everything else being equal, when you have more resources sitting idle more often, that is beneficial for SMT.

More ILP is only worth while when the extraction (execution) rate increases - then SMT has less utilization. But, this is heavily implementation dependent and benefits from better tuned machine code. Not saying SMT won’t be used, but there are ways to make it a better choice, rewriting billions of lines of x86 code just isn’t an option. It different in the MacOS/IOS word.

Exist50 · Jul 5, 2023

RnR_au said:
"Just like other 2016 chips from AMD, “Styx” will be made using 14nm fabrication process at GlobalFoundries."

Kepler_L2 said:
Was that ever released? Maybe they just felt like reusing the codename.

I think the "roadmap" that quote is referencing ended up being a very well done fake. It had a bunch of stuff like ARM consumer chips. You probably still see some old articles thinking it was real.

RnR_au said:
https://www.theregister.com/2022/06/20/jim_keller_arm_cpu/

The only ARM server AMD kinda-sorta released was the Opteron A1100. https://www.anandtech.com/show/8362/amds-big-bet-on-arm-powered-servers-a1100-revealed

And at least publicly, they never gave the Styx name to either that, nor anything K12. I only guessed that because it would be perfectly in keeping with the Greek underworld naming scheme that team likes.

naukkis · Jul 5, 2023

Exist50 said:
Even if the per-thread performance ends up similar, "Thread Director" definitely prioritizes SMT threads last. But in this case, I'm interested in what exactly is making the scheduling decisions. Sounds like they're making this threading info OS visible to assist that side of the scheduling algorithm?

Whole point of hybrid-designs is to differentiate cores by speed. High priority threads gets scheluded to fast cores and low priority threads to slow cores. Slowing down fast cores by splitting their execution speed to half with SMT is just opposite what they try to achieve. If per thread execution speed is sacrifiable for MT performance tdp constrained hybrid designs most likely will achieve best results by totally shutting down priority cores and divide whole available TDP to efficient cores, or at least scale down priority cores frequency to point where they are much slower than efficiency cores if SMT is used.

adroc_thurston · Jul 5, 2023

RnR_au said:
Wait... wut? Codenames are confusing...

AMD loves reusing codenames.

Durango is whole two AMD parts.

A/// · Jul 5, 2023

adroc_thurston said:
AMD loves reusing codenames.

Durango is whole two AMD parts.

bestest way to describe dodge's cars when they first began making durangos. or now im sure.

yuri69 · Jul 6, 2023

Kepler_L2 said:
Anyway this info is from a old-ish roadmap, about 1/3 of the Zen4/Zen5 products there have been canned, so I wouldn't put too much faith into it being the final codename.

This is interesting. What was canned from the Zen 4 line? We got the same stuff we did for the previous gens plus the "c" core variant, right?

I mean Zen 4 got desktop IOD, server IOD, reusable CCD, an APU, and the additional "c" CCD.

Kepler_L2 · Jul 6, 2023

yuri69 said:
This is interesting. What was canned from the Zen 4 line? We got the same stuff we did for the previous gens plus the "c" core variant, right?

I mean Zen 4 got desktop IOD, server IOD, reusable CCD, an APU, and the additional "c" CCD.

There was supposed to be a Phoenix refresh on 4nm, because originally PHX1 was going to be 5nm.

soresu · Jul 6, 2023

adroc_thurston said:
Durango is whole two AMD parts.

To be fair I'm pretty sure that one is as much a Microsoft part as an AMD one.

soresu · Jul 6, 2023

yuri69 said:
I mean Zen 4 got desktop IOD, server IOD, reusable CCD, an APU, and the additional "c" CCD.

So Bergamo uses the same IOD as Genoa then?

A/// · Jul 6, 2023

AMD Begins Sending "Family 26" Linux Patches For Apparent Zen 5 CPUs - Phoronix

www.phoronix.com

adroc_thurston · Jul 6, 2023

soresu said:
So Bergamo uses the same IOD as Genoa then?

Correct.

A/// said:
AMD Begins Sending "Family 26" Linux Patches For Apparent Zen 5 CPUs - Phoronix

www.phoronix.com

Yea, meet your future overlord.

Markfw · Jul 6, 2023

adroc_thurston said:
Correct.

Yea, meet your future overlord.

Maybe not me. I have a 9654 and 2 9554 Genoa..... Waiting to see how much better Zen 5 is.

adroc_thurston · Jul 6, 2023

Markfw said:
Waiting to see how much better Zen 5 is.

A lot but you pay a power tax.

Ajay · Jul 6, 2023

adroc_thurston said:
A lot but you pay a power tax.

When Mark turns all his computers on, the lights in the neighborhood dim

adroc_thurston · Jul 6, 2023

Ajay said:
When Mark turns all his computers on, the lights in the neighborhood dim

He should buy a few DGX boxes for a true test of local power grid.
:^)

A/// · Jul 7, 2023

adroc_thurston said:
Yea, meet your future overlord.

lisa su is a dominatrix?

TESKATLIPOKA · Jul 7, 2023

cherullo said:
Ok, here goes nothing: the workload where SMT is not very helpful is gaming.
So you can have, say 4 SMT-less, normal-sized cores able to reach 5Ghz running the game's main thread and other high priority stuff for that sweet 120hz goodness, and 8 dense, SMT-capable cores for throughput, running around 3Ghz.
Now, you'll want some more cores for OS background tasks, let's say 4 dense cores for this. You organize this in two 2+6 CCXs which also works well for low-end mobile, and there you go.

Games are the workload where HT(SMT) help a lot, If you have a low CPU core count. Of course If you have 8 or more cores It help little.

jamescox · Jul 7, 2023

BorisTheBlade82 said:
Yep, and it is quite likely that they will, for low core count / high bandwidth needs. They did the same for Genoa already with their 4 CCD SKUs.

The genoa 4 CCD devices don't make too much sense, or at least, it doesn't seem to make that much sense for them to have dual GMI links. They are relatively low clocked parts, but they do have 6 to 8 cores, so it is unclear whether they can actually consume that much bandwidth. If they took an 8 CCD F-series part and connected them with dual GMI links, then that seems to make more sense, although they are lower core count per CCD and higher cache per core.

If they do have 16 GMI links then why aren't they available now? Are current IO die salvage parts with disabled links? Will there be a new version of the IO die instead? They are a relatively large chip on 6 nm, so having defective parts may be more likely than 14 nm IO die.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Member

Platinum Member

Golden Member

Senior member

Lifer

Platinum Member

Platinum Member

Lifer

Platinum Member

Golden Member

Diamond Member

Diamond Member

Senior member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Moderator Emeritus, Elite Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Platinum Member

Senior member