Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 64 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

cherullo

Member
May 19, 2019
55
126
106
Ok, here goes nothing: the workload where SMT is not very helpful is gaming.
So you can have, say 4 SMT-less, normal-sized cores able to reach 5Ghz running the game's main thread and other high priority stuff for that sweet 120hz goodness, and 8 dense, SMT-capable cores for throughput, running around 3Ghz.
Now, you'll want some more cores for OS background tasks, let's say 4 dense cores for this. You organize this in two 2+6 CCXs which also works well for low-end mobile, and there you go.
 

naukkis

Golden Member
Jun 5, 2002
1,020
853
136
When AMD ships c and normal cores as part of a single CPU, this is not true for them. If both cores have SMT2, when every core is loaded, the c cores have significantly lower throughput, assuming they clock significantly lower. If SMT2 is disabled for them, this would be fixed, and in the client world where the windows scheduler determines how good your product is, it might be worthwhile to match what Intel does so that the systems work the same.

What they want to do is to disable SMT from fast cores. So 1-thread preferred loads can always get best performing cores. Slow cores can have SMT enabled as they are utilized only for high-throughput cases when all slow cores are already in use and SMT can bring additional throughput. SMT enabled fast cores would do exactly opposite, non smt slow core would then be faster than fast core with SMT complicating finding fastest core for high-priority threads.
 

Kepler_L2

Senior member
Sep 6, 2020
999
4,265
136
"Just like other 2016 chips from AMD, “Styx” will be made using 14nm fabrication process at GlobalFoundries."

Wait... wut? Codenames are confusing...
Was that ever released? Maybe they just felt like reusing the codename. Anyway this info is from a old-ish roadmap, about 1/3 of the Zen4/Zen5 products there have been canned, so I wouldn't put too much faith into it being the final codename.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
This is not about disabling HT through software, it's about asking if the core is capable of HT in the first place.

And the reason is probably making scheduling easier. Right now, on Intel the small core/big core divide is not as much of a problem as it could be for scheduling because a small core is pretty close in performance to one HT thread of the big core. That is, if all threads on the CPU are loaded, they are all about equally fast. The problem cases where it's difficult for schedulers is when the available parallelism of multithreaded workload is > the amount of big cores, but not unbounded.

When AMD ships c and normal cores as part of a single CPU, this is not true for them. If both cores have SMT2, when every core is loaded, the c cores have significantly lower throughput, assuming they clock significantly lower. If SMT2 is disabled for them, this would be fixed, and in the client world where the windows scheduler determines how good your product is, it might be worthwhile to match what Intel does so that the systems work the same.

SMT is used only when the CPU run out of cores for the thread amount, with Bergamo 128 cores will be used for the 128 first threads and SMT is used for threads 129 to 256, this way up to 128T there s an optimal ST perf and quite higher than what is provided by a small core, it s not like such a CPU is always working at full 256T loading.
 

RnR_au

Platinum Member
Jun 6, 2021
2,675
6,124
136
Was that ever released?
Briefly apparently...

Amid the renewed interest in Arm-based servers, it is easy to forget that one company with experience in building server platforms actually brought to market its own Arm-based processor before apparently losing interest: AMD.

Now it has emerged that Jim Keller, a key architect who worked on Arm development at AMD, reckons the chipmaker was wrong to halt the project after he left the company in 2016.
https://www.theregister.com/2022/06/20/jim_keller_arm_cpu/
 
  • Like
Reactions: Tlh97 and Kepler_L2

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
And the reason is probably making scheduling easier. Right now, on Intel the small core/big core divide is not as much of a problem as it could be for scheduling because a small core is pretty close in performance to one HT thread of the big core. That is, if all threads on the CPU are loaded, they are all about equally fast. The problem cases where it's difficult for schedulers is when the available parallelism of multithreaded workload is > the amount of big cores, but not unbounded.
Even if the per-thread performance ends up similar, "Thread Director" definitely prioritizes SMT threads last. But in this case, I'm interested in what exactly is making the scheduling decisions. Sounds like they're making this threading info OS visible to assist that side of the scheduling algorithm?
 

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
Well, this is going to depend a lot on implementation details, but you got that generally backwards. Increased ILP will not scale anywhere near 1:1 as you add more resources to the core. Everything else being equal, when you have more resources sitting idle more often, that is beneficial for SMT.
More ILP is only worth while when the extraction (execution) rate increases - then SMT has less utilization. But, this is heavily implementation dependent and benefits from better tuned machine code. Not saying SMT won’t be used, but there are ways to make it a better choice, rewriting billions of lines of x86 code just isn’t an option. It different in the MacOS/IOS word.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
"Just like other 2016 chips from AMD, “Styx” will be made using 14nm fabrication process at GlobalFoundries."
Was that ever released? Maybe they just felt like reusing the codename.
I think the "roadmap" that quote is referencing ended up being a very well done fake. It had a bunch of stuff like ARM consumer chips. You probably still see some old articles thinking it was real.
The only ARM server AMD kinda-sorta released was the Opteron A1100. https://www.anandtech.com/show/8362/amds-big-bet-on-arm-powered-servers-a1100-revealed

And at least publicly, they never gave the Styx name to either that, nor anything K12. I only guessed that because it would be perfectly in keeping with the Greek underworld naming scheme that team likes.
 
  • Like
Reactions: Tlh97 and RnR_au

naukkis

Golden Member
Jun 5, 2002
1,020
853
136
Even if the per-thread performance ends up similar, "Thread Director" definitely prioritizes SMT threads last. But in this case, I'm interested in what exactly is making the scheduling decisions. Sounds like they're making this threading info OS visible to assist that side of the scheduling algorithm?

Whole point of hybrid-designs is to differentiate cores by speed. High priority threads gets scheluded to fast cores and low priority threads to slow cores. Slowing down fast cores by splitting their execution speed to half with SMT is just opposite what they try to achieve. If per thread execution speed is sacrifiable for MT performance tdp constrained hybrid designs most likely will achieve best results by totally shutting down priority cores and divide whole available TDP to efficient cores, or at least scale down priority cores frequency to point where they are much slower than efficiency cores if SMT is used.
 

yuri69

Senior member
Jul 16, 2013
677
1,215
136
Anyway this info is from a old-ish roadmap, about 1/3 of the Zen4/Zen5 products there have been canned, so I wouldn't put too much faith into it being the final codename.
This is interesting. What was canned from the Zen 4 line? We got the same stuff we did for the previous gens plus the "c" core variant, right?

I mean Zen 4 got desktop IOD, server IOD, reusable CCD, an APU, and the additional "c" CCD.
 

Kepler_L2

Senior member
Sep 6, 2020
999
4,265
136
This is interesting. What was canned from the Zen 4 line? We got the same stuff we did for the previous gens plus the "c" core variant, right?

I mean Zen 4 got desktop IOD, server IOD, reusable CCD, an APU, and the additional "c" CCD.
There was supposed to be a Phoenix refresh on 4nm, because originally PHX1 was going to be 5nm.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,696
3,260
136
Ok, here goes nothing: the workload where SMT is not very helpful is gaming.
So you can have, say 4 SMT-less, normal-sized cores able to reach 5Ghz running the game's main thread and other high priority stuff for that sweet 120hz goodness, and 8 dense, SMT-capable cores for throughput, running around 3Ghz.
Now, you'll want some more cores for OS background tasks, let's say 4 dense cores for this. You organize this in two 2+6 CCXs which also works well for low-end mobile, and there you go.
Games are the workload where HT(SMT) help a lot, If you have a low CPU core count. Of course If you have 8 or more cores It help little.
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
Yep, and it is quite likely that they will, for low core count / high bandwidth needs. They did the same for Genoa already with their 4 CCD SKUs.
The genoa 4 CCD devices don't make too much sense, or at least, it doesn't seem to make that much sense for them to have dual GMI links. They are relatively low clocked parts, but they do have 6 to 8 cores, so it is unclear whether they can actually consume that much bandwidth. If they took an 8 CCD F-series part and connected them with dual GMI links, then that seems to make more sense, although they are lower core count per CCD and higher cache per core.

If they do have 16 GMI links then why aren't they available now? Are current IO die salvage parts with disabled links? Will there be a new version of the IO die instead? They are a relatively large chip on 6 nm, so having defective parts may be more likely than 14 nm IO die.