- Mar 3, 2017
- 1,777
- 6,791
- 136
"Just like other 2016 chips from AMD, “Styx” will be made using 14nm fabrication process at GlobalFoundries."Yup it's Styx.
When AMD ships c and normal cores as part of a single CPU, this is not true for them. If both cores have SMT2, when every core is loaded, the c cores have significantly lower throughput, assuming they clock significantly lower. If SMT2 is disabled for them, this would be fixed, and in the client world where the windows scheduler determines how good your product is, it might be worthwhile to match what Intel does so that the systems work the same.
Was that ever released? Maybe they just felt like reusing the codename. Anyway this info is from a old-ish roadmap, about 1/3 of the Zen4/Zen5 products there have been canned, so I wouldn't put too much faith into it being the final codename."Just like other 2016 chips from AMD, “Styx” will be made using 14nm fabrication process at GlobalFoundries."
Wait... wut? Codenames are confusing...
This is not about disabling HT through software, it's about asking if the core is capable of HT in the first place.
And the reason is probably making scheduling easier. Right now, on Intel the small core/big core divide is not as much of a problem as it could be for scheduling because a small core is pretty close in performance to one HT thread of the big core. That is, if all threads on the CPU are loaded, they are all about equally fast. The problem cases where it's difficult for schedulers is when the available parallelism of multithreaded workload is > the amount of big cores, but not unbounded.
When AMD ships c and normal cores as part of a single CPU, this is not true for them. If both cores have SMT2, when every core is loaded, the c cores have significantly lower throughput, assuming they clock significantly lower. If SMT2 is disabled for them, this would be fixed, and in the client world where the windows scheduler determines how good your product is, it might be worthwhile to match what Intel does so that the systems work the same.
Briefly apparently...Was that ever released?
https://www.theregister.com/2022/06/20/jim_keller_arm_cpu/Amid the renewed interest in Arm-based servers, it is easy to forget that one company with experience in building server platforms actually brought to market its own Arm-based processor before apparently losing interest: AMD.
Now it has emerged that Jim Keller, a key architect who worked on Arm development at AMD, reckons the chipmaker was wrong to halt the project after he left the company in 2016.
Even if the per-thread performance ends up similar, "Thread Director" definitely prioritizes SMT threads last. But in this case, I'm interested in what exactly is making the scheduling decisions. Sounds like they're making this threading info OS visible to assist that side of the scheduling algorithm?And the reason is probably making scheduling easier. Right now, on Intel the small core/big core divide is not as much of a problem as it could be for scheduling because a small core is pretty close in performance to one HT thread of the big core. That is, if all threads on the CPU are loaded, they are all about equally fast. The problem cases where it's difficult for schedulers is when the available parallelism of multithreaded workload is > the amount of big cores, but not unbounded.
More ILP is only worth while when the extraction (execution) rate increases - then SMT has less utilization. But, this is heavily implementation dependent and benefits from better tuned machine code. Not saying SMT won’t be used, but there are ways to make it a better choice, rewriting billions of lines of x86 code just isn’t an option. It different in the MacOS/IOS word.Well, this is going to depend a lot on implementation details, but you got that generally backwards. Increased ILP will not scale anywhere near 1:1 as you add more resources to the core. Everything else being equal, when you have more resources sitting idle more often, that is beneficial for SMT.
"Just like other 2016 chips from AMD, “Styx” will be made using 14nm fabrication process at GlobalFoundries."
I think the "roadmap" that quote is referencing ended up being a very well done fake. It had a bunch of stuff like ARM consumer chips. You probably still see some old articles thinking it was real.Was that ever released? Maybe they just felt like reusing the codename.
The only ARM server AMD kinda-sorta released was the Opteron A1100. https://www.anandtech.com/show/8362/amds-big-bet-on-arm-powered-servers-a1100-revealed
Even if the per-thread performance ends up similar, "Thread Director" definitely prioritizes SMT threads last. But in this case, I'm interested in what exactly is making the scheduling decisions. Sounds like they're making this threading info OS visible to assist that side of the scheduling algorithm?
AMD loves reusing codenames.Wait... wut? Codenames are confusing...
bestest way to describe dodge's cars when they first began making durangos. or now im sure.AMD loves reusing codenames.
Durango is whole two AMD parts.
This is interesting. What was canned from the Zen 4 line? We got the same stuff we did for the previous gens plus the "c" core variant, right?Anyway this info is from a old-ish roadmap, about 1/3 of the Zen4/Zen5 products there have been canned, so I wouldn't put too much faith into it being the final codename.
There was supposed to be a Phoenix refresh on 4nm, because originally PHX1 was going to be 5nm.This is interesting. What was canned from the Zen 4 line? We got the same stuff we did for the previous gens plus the "c" core variant, right?
I mean Zen 4 got desktop IOD, server IOD, reusable CCD, an APU, and the additional "c" CCD.
To be fair I'm pretty sure that one is as much a Microsoft part as an AMD one.Durango is whole two AMD parts.
So Bergamo uses the same IOD as Genoa then?I mean Zen 4 got desktop IOD, server IOD, reusable CCD, an APU, and the additional "c" CCD.
Correct.So Bergamo uses the same IOD as Genoa then?
Yea, meet your future overlord.
Maybe not me. I have a 9654 and 2 9554 Genoa..... Waiting to see how much better Zen 5 is.Correct.
Yea, meet your future overlord.
A lot but you pay a power tax.Waiting to see how much better Zen 5 is.
When Mark turns all his computers on, the lights in the neighborhood dimA lot but you pay a power tax.
He should buy a few DGX boxes for a true test of local power grid.When Mark turns all his computers on, the lights in the neighborhood dim
lisa su is a dominatrix?Yea, meet your future overlord.
Games are the workload where HT(SMT) help a lot, If you have a low CPU core count. Of course If you have 8 or more cores It help little.Ok, here goes nothing: the workload where SMT is not very helpful is gaming.
So you can have, say 4 SMT-less, normal-sized cores able to reach 5Ghz running the game's main thread and other high priority stuff for that sweet 120hz goodness, and 8 dense, SMT-capable cores for throughput, running around 3Ghz.
Now, you'll want some more cores for OS background tasks, let's say 4 dense cores for this. You organize this in two 2+6 CCXs which also works well for low-end mobile, and there you go.
The genoa 4 CCD devices don't make too much sense, or at least, it doesn't seem to make that much sense for them to have dual GMI links. They are relatively low clocked parts, but they do have 6 to 8 cores, so it is unclear whether they can actually consume that much bandwidth. If they took an 8 CCD F-series part and connected them with dual GMI links, then that seems to make more sense, although they are lower core count per CCD and higher cache per core.Yep, and it is quite likely that they will, for low core count / high bandwidth needs. They did the same for Genoa already with their 4 CCD SKUs.