Speculation: Ryzen 4000 series/Zen 3

amd6502 · Aug 24, 2019

amd6502 said:
Secondly, the OS could void the tasksetting of high niceness processes to little cores when the system load goes to a low number (say, a load that is below the number of physical cores). In these conditions, all software threads get taskset to the main SMT2 cores.

Correction:

Actually I think for the consumer world, it's best to have all process affinity be for the main SMT2 logical cores, but that lower niceness process simply have higher priority for this affinity. This means logical small cores simply stay idle most of the time, and during very high system loads, lower priority (highly niced) threads simply overflow into the small logical threads. On a 4c SoC this would only happen when system load exceeds 8.

If not already implemented in linux, it could easily be done.

moinmoin · Aug 24, 2019

Ajay said:
And every tom, dick and harry has to chime in as soon as SMT4 is mentioned.
Put up some proof, or just forget about it.

Zen 2 is already as wide as Power 7, which had SMT4. *shrugs*
It is farfetched, but it is not unfounded.

DrMrLordX said:
I'm still not sold on SMT4, 8, or anything else. Necessarily. What's the real advantage of 4c/16t over 8c/16t? Fewer transistors? Lower power consumption? And what would be the drawbacks of relying on SMT4? ARM designers went in a completely different direction by just adding a bunch of little extra cores to their SoCs via big.LITTLE/DynamIQ. They were very successful doing so.

SMT is for better utilization of all the chip's resources, which in turn is more power efficient than running the same threads on separate cores.

I honestly think big.LITTLE/DynamIQ of the ARM world doesn't really apply to x86. In the ARM world you have plenty very power efficient older cores while new designs become wider and wider. Combining such older core with bleeding edge performance cores is the fastest approach to offering both the best performance as well as the polished power efficiency of older cores everybody knows from ARM. In the x86 only Intel has something partially comparable with its Atom cores and is finally trying to make use of that with Lakefield. AMD doesn't truly have any low power high efficiency cores, instead the Zen cores are both high performance and high efficiency.

Furthermore the big.LITTLE/DynamIQ approach is a bottom up one, the primary target is a relatively small number of cores in a mobile space and growing from there. AMD's approach with Zen is top down, design for the server space and cut down from there. There AMD is already throwing in the kitchen sink wrt the amount of cores, adding a couple of little cores to 64 wide cores is inane. Instead SMT offers the opportunity to run many low utilization threads using already existing resources while keep as many cores power gated as possible, thus increasing power efficiency this way.

DrMrLordX · Aug 24, 2019

moinmoin said:
SMT is for better utilization of all the chip's resources, which in turn is more power efficient than running the same threads on separate cores.

But what does this do for power efficiency when resources are generally underutilized, such as when I have a moderately-taxing 4t workload on a 16t chip? What's the power usage going to be on 2c/16t, 4c/16t, 8c/16t, etc.? Seems like having more cores rather than relying on SMT makes power gating much easier in those scenarios.

I honestly think big.LITTLE/DynamIQ of the ARM world doesn't really apply to x86.

You mentioned Lakefield. We'll see if Intel continues in that direction. AMD has their old cat cores which they could update . . . or they could cut down Zen2. I doubt they want to spend the money on that. AMD is mostly ignoring the low end anyway.

In the ARM world you have plenty very power efficient older cores while new designs become wider and wider.

That's sort-of true, but not entirely. Take a look at ARM chips combining A76 and A55. A55 is probably very similar to A53 (and some older cores still), but it's also Aarch64-compliant. It's not literally and old core from 5+ years ago. There's work done to at least update the "little" cores.

Thunder 57 · Aug 24, 2019

NostaSeronx said:
...Zen1(K8) -> Zen2(Greyhound) -> Zen3(New core), anyone asking this is my official position.

What the? It's early and I'm tired, but did you just call Zen K8?? Now Greyhound is interesting as I had never heard of it before.

moinmoin · Aug 24, 2019

DrMrLordX said:
But what does this do for power efficiency when resources are generally underutilized, such as when I have a moderately-taxing 4t workload on a 16t chip? What's the power usage going to be on 2c/16t, 4c/16t, 8c/16t, etc.? Seems like having more cores rather than relying on SMT makes power gating much easier in those scenarios.

How do you think do more cores make power gating easier than SMT? Not sure I'm following there.

Take a single CCX. Without SMT and 4 concurrent low utilization threads all four cores of the CCX would fire up. With SMT2 two cores could stay in deep sleep state. With SMT4 this could be increased to three cores staying in deep sleep state.

big.LITTLE usually relies on an imbalanced ratio, with significantly more little cores than big ones. Lakefield relies on a 4-1 ratio. For the above example giving a tangible difference the ratio would have to be at least 1-1 or better for little cores (SMT2: 2-1, SMT4: 4-1). Considering Zen cores are rather small to begin with, the additional space required for such little core counterparts is likely better spent making the big cores as well as SMT more efficient.

DrMrLordX said:
or they could cut down Zen2.

Fine grained power gating is essentially cutting down without changing the silicon. Why add additional cores just for that if you can do the same in real time anyway?

Ajay · Aug 24, 2019

maddie said:
Proof? What's the name of this thread again?

I suppose I meant additional sources of info on SMT4, rather than just the AdoredTV link (and all the rumors based on that one video).

NostaSeronx · Aug 24, 2019

Thunder 57 said:
What the? It's early and I'm tired, but did you just call Zen K8?? Now Greyhound is interesting as I had never heard of it before.

Greyhound is the actual name of the Family 10h cores; Agena, Deneb, Thuban, Llano all use Greyhound.

K7 = Bobcat & Jaguar (The floating-point execution units include a store-convert unit (STC) that drives results to the main-core data cache, a floating-point adder (FPA) that shares roots with the AMD K7 FPA, and a floating-point iterative multiplier derived from the Bobcat FPM design and K7 divide/square-root algorithms.)
K8 = Zen (17h) (Rather than 64-bit(vertical power), it is more IPC(horizontal power))
Greyhound = Zen2 (17h) // Probably don't look at Agena, instead look at Deneb(Greyhound+) ((Higher frequency + at lower node + 256-bit FPU)
New core = Zen3 (19h)
Enhanced new core = Zen4 (19h)
Next-gen new core = Zen5 (21h)
Improved new core = ZenX, etc (21h?/22h?) <- 3D-Arch project

nicalandia · Aug 24, 2019

Ajay said:
I suppose I meant additional sources of info on SMT4, rather than just the AdoredTV link (and all the rumors based on that one video).

I have been searching for any AMD Patents for 4 way SMT but there are none, I guess they will be using IBM SMT-4 just like when they did IBM SMT-2?

amd6502 · Aug 24, 2019

moinmoin said:
Take a single CCX. Without SMT and 4 concurrent low utilization threads all four cores of the CCX would fire up. With SMT2 two cores could stay in deep sleep state. With SMT4 this could be increased to three cores staying in deep sleep state.

big.LITTLE usually relies on an imbalanced ratio, with significantly more little cores than big ones. Lakefield relies on a 4-1 ratio. For the above example giving a tangible difference the ratio would have to be at least 1-1 or better for little cores (SMT2: 2-1, SMT4: 4-1). Considering Zen cores are rather small to begin with, the additional space required for such little core counterparts is likely better spent making the big cores as well as SMT more efficient.

That's a great approach and the battery life gains on such mobile quadcore would make amazing gains. I just don't see how they can NOT go SMT4 in the next two generations.

Also, a brilliant way to look at the SMT4 versus ARM approach.

I think of 7nm node's main strength as efficiency improvement. So double up on efficiency with both architecture and fabrication and the products coming to mobile in the 1-3 years will make leaps and bounds. (And the great thing about SMT4 or similar wide core approach is that performance/IPC can also make gains while making that leap in efficiency.)

Ajay · Aug 24, 2019

nicalandia said:
I have been searching for any AMD Patents for 4 way SMT but there are none, I guess they will be using IBM SMT-4 just like when they did IBM SMT-2?

So, aside from the AdoredTV rumor, there is nothing indicating AMD will be moving to SMT4. Zero.

Yotsugi · Aug 24, 2019

Ajay said:
So, aside from the AdoredTV rumor, there is nothing indicating AMD will be moving to SMT4. Zero.

AMD isn't using the IBM kind of SMT either.

nicalandia · Aug 24, 2019

Yotsugi said:
AMD isn't using the IBM kind of SMT either.

So... What kind of SMT are they using? You seem to know

Yotsugi · Aug 24, 2019

nicalandia said:
So... What kind of SMT are they using? You seem to know

Their own kind.
ARM also has SMT on at least two cores.
I think something from the SPARC lands did it too.

nicalandia · Aug 24, 2019

Yotsugi said:
Their own kind.

You would think they would have patented their own SMT, but I am still searching for that patent, when I find it I will post it here.

Yotsugi · Aug 24, 2019

nicalandia said:
You would think they would have patented their own SMT, but I am still searching for that patent, when I find it I will post it here.

Not everything is patented.
I don't think there are patents for Navi L1 either.

nicalandia · Aug 24, 2019

Well I was able to find Said patent, but I am not to versed in CPU Architecture to know if it's the real deal or not.

https://patents.google.com/patent/US5944816A/en

The patent was assigned to GlobalFoundries Inc (1996) The inquire at the time believed that it was for a possible future HT.

AMD patent could enable hyperthreading
https://www.theinquirer.net/inquirer/news/1029950/amd-patent-enable-hyperthreading

DrMrLordX · Aug 24, 2019

moinmoin said:
How do you think do more cores make power gating easier than SMT? Not sure I'm following there.

Take a single CCX. Without SMT and 4 concurrent low utilization threads all four cores of the CCX would fire up. With SMT2 two cores could stay in deep sleep state. With SMT4 this could be increased to three cores staying in deep sleep state.

That assumes the scheduler works that way. If I have a 4c/8t chip (such as a single CCX) and all I have is one demanding thread and one low-utilization thread, do you think the scheduler is going to put them both on the same core? In Win10, the low-utilization thread will probably bounce between three cores, bringing them into and out of sleep constantly while the demanding thread stays on the first core. SMT will probably not see any utilization. In the same scenario on a DynamIQ setup with 4C + 4c, the scheduler can keep up to four low-utilization threads busy while only waking up one of the big cores. Or I can have smaller, narrower cores and just keep two of them awake instead of having two larger, wider cores awake to handle the same two threads.

In the extreme example, let's say I have SMT8 with 1c/8t instead of a CCX. Now if all I have are two threads, regardless of their intensivity, I have to wake up the entire beast to do anything. Surely that comes at a power penalty no?

big.LITTLE usually relies on an imbalanced ratio, with significantly more little cores than big ones.

Not necessarily. Look at the Snapdragon SoCs. And Kirin 980. DynamIQ would let them use an asynchronous arrangement of cores - something that was less possible under big.LITTLE - but they still have an even balance of resources.Kirin 980 is 4 A76 + 4 A55, and so is Snapdragon 855 (though one of the A76 cores in Snapdragon 855 runs at a higher clockspeed than the others).

Fine grained power gating is essentially cutting down without changing the silicon. Why add additional cores just for that if you can do the same in real time anyway?

Compare Intel's Core-Y series to the high-performance mobile SoCs, and look at their power profiles. Intel can't match their idle power consumption, even in generations where the mobile SoCs didn't necessarily have a big process lead as they do today. Power gating can only do so much.

NostaSeronx · Aug 24, 2019

nicalandia said:
Well I was able to find Said patent, but I am not to versed in CPU Architecture to know if it's the real deal or not.
https://patents.google.com/patent/US5944816A/en

imho, it is these patents that are more sensible
https://patents.google.com/patent/US7930519B2
https://patents.google.com/patent/US6574725B1

nicalandia · Aug 24, 2019

NostaSeronx said:
imho, it is these patents that are more sensible
https://patents.google.com/patent/US7930519B2
https://patents.google.com/patent/US6574725B1

Thanks.

Coincidentally I was looking at Patent# EP1226498B1 https://patents.google.com/patent/EP1226498B1/en?oq=EP1226498B1

which I believe is the same but worded differently: Fast multithreading for closely coupled multiprocessors

moinmoin · Aug 24, 2019

DrMrLordX said:
That assumes the scheduler works that way. If I have a 4c/8t chip (such as a single CCX) and all I have is one demanding thread and one low-utilization thread, do you think the scheduler is going to put them both on the same core? In Win10, the low-utilization thread will probably bounce between three cores, bringing them into and out of sleep constantly while the demanding thread stays on the first core. SMT will probably not see any utilization. In the same scenario on a DynamIQ setup with 4C + 4c, the scheduler can keep up to four low-utilization threads busy while only waking up one of the big cores. Or I can have smaller, narrower cores and just keep two of them awake instead of having two larger, wider cores awake to handle the same two threads.

That all is purely a software problem though. If a scheduler is theoretically capable of detecting low utilization thread and keeping them on little cores it's also theoretically capable of keeping them on fewer cores using SMT. That the Windows scheduler is mindless and braindead has been repeatedly shown, but surely you agree that shouldn't influence hardware design decisions in any way?

DrMrLordX said:
In the extreme example, let's say I have SMT8 with 1c/8t instead of a CCX. Now if all I have are two threads, regardless of their intensivity, I have to wake up the entire beast to do anything. Surely that comes at a power penalty no?

Sure, big.LITTLE is better in the mobile space, I wrote as much before. But the context I'm talking about all this is server chips with up to 64 cores right now. Unless you are arguing AMD adding 64 little cores to their 64 big ones is a better idea than going SMT4?

DrMrLordX said:
Compare Intel's Core-Y series to the high-performance mobile SoCs, and look at their power profiles. Intel can't match their idle power consumption, even in generations where the mobile SoCs didn't necessarily have a big process lead as they do today. Power gating can only do so much.

The idle power consumption is mainly down to the uncore and caches that can't be gated off. That's a separate optimization issue neither big.LITTLE nor SMT can help with.

In Intel's Core-Y series the additional issue is that they are not SoCs, so the further required chipset and controllers (like for Thunderbold etc.) only add to the power requirement. The Atom chips being SoCs have better idle usage at the cost of worse connectivity.

DrMrLordX · Aug 24, 2019

moinmoin said:
That all is purely a software problem though. If a scheduler is theoretically capable of detecting low utilization thread and keeping them on little cores it's also theoretically capable of keeping them on fewer cores using SMT. That the Windows scheduler is mindless and braindead has been repeatedly shown, but surely you agree that shouldn't influence hardware design decisions in any way?

Is any other operating system's scheduler going to do better with an SMT CPU? Also if you try moving threads onto occupied cores, then you have the issue of what happens if the "big" thread runs a slice of code that can utilize all available execution resources (AVX2 or what have you). Now you have the scheduler trying to put a second thread on the CPU when there are no pipeline stalls or other obvious "gaps" where the second thread can execute. Now the scheduler is going to have to move that thread to another core entirely, which is probably why "mindless and braindead" schedulers pick physical cores over logical cores first. Or at least one reason why.

Sure, big.LITTLE is better in the mobile space, I wrote as much before. But the context I'm talking about all this is server chips with up to 64 cores right now.

But we are also talking about AMD. Their server core design will be present in all of their products, at least until they grow to the point where they want to maintain separate core designs. I see no clear indicator that AMD will even consider such a strategy on any of their roadmaps. Do we want SMT4 on the desktop? In a server, it's realistic to believe that most of a CPU's resources will be committed most of the time (if not all of the time). So we don't worry so much about when and how a scheduler wakes up a particular core. Zen2 is heading for laptops in Renoir. Presumably, Zen3 will follow the same circuitous path. Does AMD want SMT4 in laptops? I don't think we should rationally consider it possible (or plausible) that AMD will emulate big.LITTLE or DynamIQ in their core designs, but you have to admit, if they did, it would ease the transition to low-end computing devices, far moreso than would adoption of SMT4. Realistically-speaking, I think AMD will avoid any change away from SMT2 in the near future. They will keep selling more of the same since it works.

There's also the issue of SMT and VMs. A lot of cloud vendors just disable SMT/HT right out of the gate. AMD has every intention of selling hardware to them, and I do not think that SMT4 will be a big selling point for those buyers. I also question whether a DynamIQ-style ansychronous core arrangement would be useful since it would complicate the allocation of bare metal assets during creation of a VM.

Unless you are arguing AMD adding 64 little cores to their 64 big ones is a better idea than going SMT4?

I think the answer is c). None of the above. AMD simply doesn't have little cores available to use, so being the frugal sorts that they are, they'll just punt on that question and add more of the same SMT2 cores they already have (with planned updates).

The idle power consumption is mainly down to the uncore and caches that can't be gated off. That's a separate optimization issue neither big.LITTLE nor SMT can help with.

Not entirely true. Some of those challenges are unique to Infinity Fabic. Others are unique to AMD's CCX design. The mobile SoCs can easily gate off lower-level caches since they are not shared (I think the standard DynamIQ design calls for shared L3). So can pretty-much anyone else. ARM's DSU has some interesting additional features though, like being able to gate off part or all of a cluster's L3 cache depending on load:

https://www.androidauthority.com/arm-dynamiq-need-to-know-770349/

It still remains to be seen whether any of these power gating features will be attractive outside of the mobile world. Does anyone want a server processor made up of multiple clusters of 1x A76 + 4x A55, or what have you? If so, why? Nobody has made that use case yet. The existing ARM server SoCs appear to have synchronous core configurations. Everything is the same core, at the same clockspeed.

In Intel's Core-Y series the additional issue is that they are not SoCs, so the further required chipset and controllers (like for Thunderbold etc.) only add to the power requirement. The Atom chips being SoCs have better idle usage at the cost of worse connectivity.

To date, Atom hasn't been competitive either, though. Not in the lower-power mobile space.

amd6502 · Aug 24, 2019

nicalandia said:
Well I was able to find Said patent, but I am not to versed in CPU Architecture to know if it's the real deal or not.

https://patents.google.com/patent/US5944816A/en

The patent was assigned to GlobalFoundries Inc (1996) The inquire at the time believed that it was for a possible future HT.

AMD patent could enable hyperthreading
https://www.theinquirer.net/inquirer/news/1029950/amd-patent-enable-hyperthreading

Hyper threatting is way too dangerous. I don't think they would do it, and if they did, people probably would be hesitant to use it.

As for SMT-n it's been floating around CS academia for decades, so I doubt it's patentable.

Maybe a trademark search (eg threadripping) might some day give us more to speculate on.

naukkis · Aug 25, 2019

Windows scheduler does it's job. There's absolutely no point of putting threads to SMT cores instead of real cores for power reasons, even without big.Little low core utilization will keep it's clock frequency and voltages low and save energy. SMT only makes sense when there's full utilization of cores and using SMT will provide more throughput. And putting other threads to same core that already runs high-priority thread instead of idle cores is just stupid as it will slow down that high-priority thread.

moinmoin · Aug 25, 2019

DrMrLordX said:
Is any other operating system's scheduler going to do better with an SMT CPU? Also if you try moving threads onto occupied cores, then you have the issue of what happens if the "big" thread runs a slice of code that can utilize all available execution resources (AVX2 or what have you). Now you have the scheduler trying to put a second thread on the CPU when there are no pipeline stalls or other obvious "gaps" where the second thread can execute. Now the scheduler is going to have to move that thread to another core entirely, which is probably why "mindless and braindead" schedulers pick physical cores over logical cores first. Or at least one reason why.

You have exactly the same issue with big.LITTLE. If a scheduler is theoretically capable of detecting high utilization threads and moving them from little to big cores it's also theoretically capable of moving them from SMT shared to dedicated physical cores. It's all a software problem.

DrMrLordX said:
But we are also talking about AMD. Their server core design will be present in all of their products, at least until they grow to the point where they want to maintain separate core designs. I see no clear indicator that AMD will even consider such a strategy on any of their roadmaps.

Did you see AMD going with SMT2 before they announced it? Did anybody see that first implementation beating Intel's HT with the very first implementation?

DrMrLordX said:
Do we want SMT4 on the desktop?

That's completely beside the point. Does the majority of desktop users need AVX2? Most very likely do not.

DrMrLordX said:
In a server, it's realistic to believe that most of a CPU's resources will be committed most of the time (if not all of the time).

That's actually wrong unless you are specifically talking about HPC specifically. Servers in general are all about over-provisioning all kinds of resources, being prepared for the worst case resource usage scenarios.

DrMrLordX said:
So we don't worry so much about when and how a scheduler wakes up a particular core.

Patently wrong. The more cores a chip contains in one shared envelope the more the cores' activity will affect each other. The more cores can be put into deep sleep state the more headroom other cores can make use of. And as we know AMD developed Zen's microcode in PB in a way to dynamically make use of more headroom so it profits from that now already.

DrMrLordX said:
Zen2 is heading for laptops in Renoir. Presumably, Zen3 will follow the same circuitous path. Does AMD want SMT4 in laptops? I don't think we should rationally consider it possible (or plausible) that AMD will emulate big.LITTLE or DynamIQ in their core designs, but you have to admit, if they did, it would ease the transition to low-end computing devices, far moreso than would adoption of SMT4. Realistically-speaking, I think AMD will avoid any change away from SMT2 in the near future. They will keep selling more of the same since it works.

But in the last two years AMD did the opposite of "selling more of the same since it works". Zen to Zen 2 completely changed the MCM topology. SMT is still very new to AMD, having been introduced only two years ago. Software support didn't prevent AMD from launching any of the Ryzen nor the Threadripper chips. Windows scheduler had serious issues with TR 1's NUMA, then again with TR 2 WX's unbalanced NUMA.

DrMrLordX said:
There's also the issue of SMT and VMs. A lot of cloud vendors just disable SMT/HT right out of the gate. AMD has every intention of selling hardware to them, and I do not think that SMT4 will be a big selling point for those buyers. I also question whether a DynamIQ-style ansychronous core arrangement would be useful since it would complicate the allocation of bare metal assets during creation of a VM.

What is this "allocation of bare metal assets during creation of a VM" you are speaking of, resource allocation can be changed even after the creation of a VM, just as you can change the PC hardware after installing an OS. That again is purely a software issue.

And disabling SMT/HT for cloud providers is due to them specifically offering resources per single vCPU, and you don't want this vCPU resource being a variable that depends on how many concurrent threads are on it. But that doesn't prevent server providers offering computing resources per CCX (or comparable big.LITTLE blocks) instead where SMT could be left enabled.

DrMrLordX said:
I think the answer is c). None of the above. AMD simply doesn't have little cores available to use, so being the frugal sorts that they are, they'll just punt on that question and add more of the same SMT2 cores they already have (with planned updates).

You yourself were arguing for the cat cores before.

DrMrLordX said:
Not entirely true. Some of those challenges are unique to Infinity Fabic.

...which is part of the uncore and offers intra chip connectivity that one always needs on any chip...

DrMrLordX said:
Others are unique to AMD's CCX design. The mobile SoCs can easily gate off lower-level caches since they are not shared (I think the standard DynamIQ design calls for shared L3). So can pretty-much anyone else.

And Zen cores can power gate everything except the shared L3$. (I think I remember the APUs can even power gate the L3$ itself since it's not shared due to its single CCX nature, not sure.)

DrMrLordX said:
ARM's DSU has some interesting additional features though, like being able to gate off part or all of a cluster's L3 cache depending on load:

That's a good area for further improvements for AMD there indeed. (Also finding a way to make the shared L3$ globally writable instead just local slices per core. Making better use of that massive L3$ should give a good performance boost.)

But that's again about the cores which are plenty optimized for power efficiency as is already. The uncore is where most further power efficiency optimizations can be done.

naukkis said:
And putting other threads to same core that already runs high-priority thread instead of idle cores is just stupid as it will slow down that high-priority thread.

Is that what the Windows scheduler does?

DrMrLordX · Aug 25, 2019

moinmoin said:
You have exactly the same issue with big.LITTLE. If a scheduler is theoretically capable of detecting high utilization threads and moving them from little to big cores it's also theoretically capable of moving them from SMT shared to dedicated physical cores. It's all a software problem.

I haven't seen a scheduler move low-utilization threads to a logical core over a physical core, ever. At least not under Linux. Which scheduler actually does this?

Did you see AMD going with SMT2 before they announced it? Did anybody see that first implementation beating Intel's HT with the very first implementation?

The only thing that was clear that was Zen would not be a CMT design. So the next logical conclusion was an SMT2 implementation, at least to copy Intel.

That's completely beside the point. Does the majority of desktop users need AVX2? Most very likely do not.

Depends on which users we're talking about here. Anyone who does encoding or rendering will like it. There are use cases. SMT4 though? Maaaaaybe, maybe not. I'd have to see some benchmarks to really understand how AMD's implementation of SMT4 would work before I was sold on it. If you asked me, do I want 8c/16t Zen3 or 4c/16t Zen3 on my desktop, obviously I'd prefer the former. 8c/32t probably moves me into a different price bracket/power envelope, making it maybe not an option for me anymore.

That's actually wrong unless you are specifically talking about HPC specifically. Servers in general are all about over-provisioning all kinds of resources, being prepared for the worst case resource usage scenarios.

HPC is one of the server applications where you'd want SMT4, so I was sort of erring on that side. Might be useful in high-utilization databases as well.

Patently wrong. The more cores a chip contains in one shared envelope the more the cores' activity will affect each other. The more cores can be put into deep sleep state the more headroom other cores can make use of. And as we know AMD developed Zen's microcode in PB in a way to dynamically make use of more headroom so it profits from that now already.

Now you're arguing thermals though, which is missing the point I'm making, since I'm assuming high CPU utilization overall for servers in scenarios where SMT4 might make sense. If all your cores are routinely sitting at 75% or higher utilization, then no, you do not worry about how the scheduler wakes up particular cores, since they aren't sleeping anyway.

But in the last two years AMD did the opposite of "selling more of the same since it works". Zen to Zen 2 completely changed the MCM topology. SMT is still very new to AMD, having been introduced only two years ago. Software support didn't prevent AMD from launching any of the Ryzen nor the Threadripper chips. Windows scheduler had serious issues with TR 1's NUMA, then again with TR 2 WX's unbalanced NUMA.

They sold SMT2 between 2017 and 2019. WRT SMT (or alternate strategies), that's "more of the same". They improved the individual cores and rejiggered IF links, but they didn't change their SMT strategy at all. They didn't go asynchronous core, they didn't go SMT4, they didn't kill SMT altogehter, they didn't resurrect CMT (thank goodness), etc.

resource allocation can be changed even after the creation of a VM,

In a matter of seconds? Milliseconds?

And disabling SMT/HT for cloud providers is due to them specifically offering resources per single vCPU, and you don't want this vCPU resource being a variable that depends on how many concurrent threads are on it. But that doesn't prevent server providers offering computing resources per CCX (or comparable big.LITTLE blocks) instead where SMT could be left enabled.

Okay, fair point. Some cloud providers might like SMT4. Others might not.

You yourself were arguing for the cat cores before.

Ah, but you have missed the larger point. Yes, I mentioned that a cut-down core or updated cat core might make more sense than moving to a wider Zen3 + SMT4. I'm still willing to acknowledge that it's 99.999999% unlikely that AMD would ever do such a thing. Making it even less likely that AMD will adopt SMT4.

...which is part of the uncore and offers intra chip connectivity that one always needs on any chip...

You may notice that not everyone has these problems in their design.

And Zen cores can power gate everything except the shared L3$. (I think I remember the APUs can even power gate the L3$ itself since it's not shared due to its single CCX nature, not sure.)

If the L3 is inclusive then I don't think they can. ARM manages to gate off parts of the L3 by using one that's exclusive or . . . psuedo-exclusive or something.

Speculation: Ryzen 4000 series/Zen 3

Senior member

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Senior member

Lifer

Golden Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Lifer

Senior member

Golden Member

Diamond Member

Lifer