itsmydamnation
Diamond Member
What you mean like all those other cloud x86 servers....... wait ... what ?. Especially if you have per thread QoS requirements. Many Bergamo deployments will probably have SMT disabled for that reason.
What you mean like all those other cloud x86 servers....... wait ... what ?. Especially if you have per thread QoS requirements. Many Bergamo deployments will probably have SMT disabled for that reason.
I tried to get to the bottom of this a while back. My observation was, that when comparing two identical CPUs in the same TDP budget, where one has SMT and the other has not, there is almost no power tax for SMT.... and SMT does change nothing to the equation because power goes proportionaly with augmented throughput if frequency is kept constant.
Also the bigger the total core count ,interconnect and IO becomes the less impact a single core not power/clock gating for 100us has on total power consumption.I tried to get to the bottom of this a while back. My observation was, that when comparing two identical CPUs in the same TDP budget, where one has SMT and the other has not, there is almost no power tax for SMT.
In my specific case it was a Renoir 4700U vs. 4800U. The latter had around 25% more throughput at almost exactly the same consumption in CB23.
I can only assume, that power gating is not as finely grained as to disable every single pipeline stage for any fraction of time not being in use.
There's ADL-N, if we want to get technical, but the entire context of this conversation is theoretical abstractions based on what limited information we have available. Even Bergamo isn't out yet.There is no product from Intel out there with nothing but e-cores, so how can we compare ?
You should start with your own link. Dynamic power scales proportionally to Cdyn * Frequency * Voltage^2. You're focusing only on frequency, ignoring both the Cdyn and Voltage terms. Zen 4c isn't half the area with the same VF curve.For someone who want to understand what it is about when it comes to frequency and power he should first understand what it is about here :
It actually doesn't. What's both interesting and impressive about Zen 4c is that they actually changed very little beyond targeting a lower frequency point. If anything, now that its benefits are proven in the wild, we'll likely see more divergence in future gens.Also it s about sure that Zen 4c use more power constrained libraries, so it should be a little more efficient than Zen 4 at same low frequencies.
If you care primarily about ST, then neither Bergamo nor Intel's Forest line makes sense. The big factor that you need to include is that one E-core is substantially smaller than one Zen 4c core. So from a product level, we'd see something more like 128 Zen 4c/5c vs 256c Crestmont. Makes things more interesting.I m taking a best case figure for the e cores, even at only 20% IPC difference there s roughly 50% more power to get the same ST perf, and SMT does change nothing to the equation because power goes proportionaly with augmented throughput if frequency is kept constant.
The workloads targeted by these chips, by and large, do not make heavy use of vector instructions. It's certainly nice to have, but the heaviest vector workloads are stuff like AI, and often latency bound, hence running on the bigger cores.And there are places where I use it, and I am sure even in the cloud it can come in handy.
Yes, think that's correct. Either way, will certainly be better than Intel 4. Intel 3, we'll see.Isn't Bergamo on AMD's flavor of N5HPC though, not N4P?
A similar tradeoff does exist with current products. For example, core counts for the frequency optimized SKUs, trading throughput for stronger individual cores/threads.What you mean like all those other cloud x86 servers....... wait ... what ?
Where do you see that? I don’t see any public data on Intel 3 node characteristics. The available data on Intel 4 HP has it pretty far ahead of N4P.Though N4P is still the better node.
Generally yes. But specifically Renoir does not have that much uncore overhead.Also the bigger the total core count ,interconnect and IO becomes the less impact a single core not power/clock gating for 100us has on total power consumption.
Each zen4c thread as vCPU is probably stronger than what is needed in a lot of cloud use cases, disabling SMT would be a waste. Some workloads wouldn't want to do it but for general use case, it is more than capable. With all IO being equal, they can sell a SMT core as 2 vCPUs vs just 1 CPU if disabled, makes it way more attractive to do that. Considering SMT basically increase throughput by >20% for "free" and they can price these threads directly as vCPU, why wouldn't they want this? Obviously there is limits but this has been the case for most Intel/AMD servers in the cloud for a long time now. 1 zen4c thread is probably more powerful than 1 thread of icelake or w/e they would replace.Zen 4c is particularly attractive in this regard because you can fit ~twice the cores per area (i.e. per dollar), but each individual core in 1T mode provides more performance than each thread on Zen 4 with SMT. I brought this up in the old "future of SMT" thread, but this pricing dynamic has significantly reduced the importance of SMT. It's still useful for some cases (flexibility on one machine, max throughput regardless of perf per thread, etc), but no longer irreplaceable.
That is the exact opposite of what me and others have observed, SMT proportionally increases power usage based on the increase in throughput it is able to provide for a given workload.I tried to get to the bottom of this a while back. My observation was, that when comparing two identical CPUs in the same TDP budget, where one has SMT and the other has not, there is almost no power tax for SMT.
In my specific case it was a Renoir 4700U vs. 4800U. The latter had around 25% more throughput at almost exactly the same consumption in CB23.
For general use cases, they're positioning Genoa as the default. Bergamo is more of a targeted option. And of course, in a vacuum, CSPs would love to have more threads to offer and to net the "free" throughput SMT provides, but ultimately, customers demand more than just raw throughput. There will certainly be deployments of Bergamo with SMT, but don't be surprised if many companies disable it. I expect that will be particularly common with web-heavy deployments (e.g. Google, Meta, etc).Each zen4c thread as vCPU is probably stronger than what is needed in a lot of cloud use cases, disabling SMT would be a waste. Some workloads wouldn't want to do it but for general use case, it is more than capable. With all IO being equal, they can sell a SMT core as 2 vCPUs vs just 1 CPU if disabled, makes it way more attractive to do that. Considering SMT basically increase throughput by >20% for "free" and they can price these threads directly as vCPU, why wouldn't they want this? Obviously there is limits but this has been the case for most Intel/AMD servers in the cloud for a long time now. 1 zen4c thread is probably more powerful than 1 thread of icelake or w/e they would replace.
Some workloads require more performance than 1 thread can offer which is why they have the SKUs without SMT but its not like SMT isn't useful.
What exactly is Zen 4c? Is this a Zen 4 refresh?
You should start with your own link. Dynamic power scales proportionally to Cdyn * Frequency * Voltage^2. You're focusing only on frequency, ignoring both the Cdyn and Voltage terms. Zen 4c isn't half the area with the same VF curve.
Your observation may not hold true for Intel architectures. Intel engineers haven't seemed to figure out "free" SMT.My observation was, that when comparing two identical CPUs in the same TDP budget, where one has SMT and the other has not, there is almost no power tax for SMT.
In my specific case it was a Renoir 4700U vs. 4800U.
All this tells me is that you have no idea what that term is. To give the simplest possible example, if you have two identical transistors instead of one, that term would ~double. So no, it is not in any way redundant, nor scales with frequency. It's a constant for a given design and workload.Putting the capacitance and frequency is using twice the same parameter in a way.
Yep, I gladly and wholeheartedly agree with you. The trouble is, that I do not have a machine at my disposal that would allow me that kind of thorough testing.I would also caution you not to measure the energy impact of SMT on two different CPU dies, let alone two different bins. We do this when we have no other choice, but in the case of SMT we can use the same CPU die and just disable/enable SMT, removing die variance and binning variance.
The only thing about SMT that I have noticed, is when you have an application that is very sensitive to L3 cache, either disabling it, or running 50% of the normal jobs helps a lot. Other than that, I see no penalty. If I thought it was worth testing to prove to some people that it makes no change, I would test it for you. But some people are "always right" and can not be convinced.Yep, I gladly and wholeheartedly agree with you. The trouble is, that I do not have a machine at my disposal that would allow me that kind of thorough testing.
Yeah. The HT threads vie with the normal threads for resources, putting pressure on the limited cache. AMD V-cache is the solution.The only thing about SMT that I have noticed, is when you have an application that is very sensitive to L3 cache, either disabling it, or running 50% of the normal jobs helps a lot.
Could have just put a tarp on the shed (or the part of it that needed repair) and ran for cover.id post more on this but feel awful having spent a few hours under rain the other day repairing my shed.
What do you know? It actually might be: https://www.tomshardware.com/news/i...ew-chinas-exclusive-black-edition-gaming-chipCould just be rejected 12600K dies with the E-core cluster disabled.
C0 stepping chips like our 12490F actually have a total of eight P-cores and eight E-cores, but Intel disables the extra cores to trim it down to a 6+0 design.