Question Zen4c vs E core Die area.

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

BorisTheBlade82

Senior member
May 1, 2020
667
1,022
136
... and SMT does change nothing to the equation because power goes proportionaly with augmented throughput if frequency is kept constant.
I tried to get to the bottom of this a while back. My observation was, that when comparing two identical CPUs in the same TDP budget, where one has SMT and the other has not, there is almost no power tax for SMT.
In my specific case it was a Renoir 4700U vs. 4800U. The latter had around 25% more throughput at almost exactly the same consumption in CB23.
I can only assume, that power gating is not as finely grained as to disable every single pipeline stage for any fraction of time not being in use.
 
Last edited:

itsmydamnation

Platinum Member
Feb 6, 2011
2,847
3,387
136
I tried to get to the bottom of this a while back. My observation was, that when comparing two identical CPUs in the same TDP budget, where one has SMT and the other has not, there is almost no power tax for SMT.
In my specific case it was a Renoir 4700U vs. 4800U. The latter had around 25% more throughput at almost exactly the same consumption in CB23.
I can only assume, that power gating is not as finely grained as to disable every single pipeline stage for any fraction of time not being in use.
Also the bigger the total core count ,interconnect and IO becomes the less impact a single core not power/clock gating for 100us has on total power consumption.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,101
136
There is no product from Intel out there with nothing but e-cores, so how can we compare ?
There's ADL-N, if we want to get technical, but the entire context of this conversation is theoretical abstractions based on what limited information we have available. Even Bergamo isn't out yet.
For someone who want to understand what it is about when it comes to frequency and power he should first understand what it is about here :
You should start with your own link. Dynamic power scales proportionally to Cdyn * Frequency * Voltage^2. You're focusing only on frequency, ignoring both the Cdyn and Voltage terms. Zen 4c isn't half the area with the same VF curve.
Also it s about sure that Zen 4c use more power constrained libraries, so it should be a little more efficient than Zen 4 at same low frequencies.
It actually doesn't. What's both interesting and impressive about Zen 4c is that they actually changed very little beyond targeting a lower frequency point. If anything, now that its benefits are proven in the wild, we'll likely see more divergence in future gens.
I m taking a best case figure for the e cores, even at only 20% IPC difference there s roughly 50% more power to get the same ST perf, and SMT does change nothing to the equation because power goes proportionaly with augmented throughput if frequency is kept constant.
If you care primarily about ST, then neither Bergamo nor Intel's Forest line makes sense. The big factor that you need to include is that one E-core is substantially smaller than one Zen 4c core. So from a product level, we'd see something more like 128 Zen 4c/5c vs 256c Crestmont. Makes things more interesting.
And there are places where I use it, and I am sure even in the cloud it can come in handy.
The workloads targeted by these chips, by and large, do not make heavy use of vector instructions. It's certainly nice to have, but the heaviest vector workloads are stuff like AI, and often latency bound, hence running on the bigger cores.

Really the best reference would be to compare to Graviton, as that's why AMD and Intel created these products to begin with.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,101
136
What you mean like all those other cloud x86 servers....... wait ... what ?
A similar tradeoff does exist with current products. For example, core counts for the frequency optimized SKUs, trading throughput for stronger individual cores/threads.

Many cloud workloads demand a certain level of performance per thread. Say, for example, a web server, which is expected to respond in a certain amount of time. This is also important for AWS/Azure/GCP pricing tiers. So looking at the problem here, if you actually take advantage of SMT and the throughput benefits it provides, you end up sacrificing a substantial amount of per thread performance. If your baseline is too low (say, equivalent to a 1.5GHz Zen 4 core), then you simply can't just grab the max core count SKU and run it with SMT.

But you pay the hardware vendor based on the core count, not how many threads you run. This can create interesting pricing niches when you think about it, and you can find some fun examples floating around.

Zen 4c is particularly attractive in this regard because you can fit ~twice the cores per area (i.e. per dollar), but each individual core in 1T mode provides more performance than each thread on Zen 4 with SMT. I brought this up in the old "future of SMT" thread, but this pricing dynamic has significantly reduced the importance of SMT. It's still useful for some cases (flexibility on one machine, max throughput regardless of perf per thread, etc), but no longer irreplaceable.
 

H433x0n

Golden Member
Mar 15, 2023
1,040
1,201
96
Though N4P is still the better node.
Where do you see that? I don’t see any public data on Intel 3 node characteristics. The available data on Intel 4 HP has it pretty far ahead of N4P.

I’ve seen you say this a few times and it seems like you’ve got a good pulse on this particular topic so I’m genuinely curious where that sentiment comes from.
 
Last edited:

BorisTheBlade82

Senior member
May 1, 2020
667
1,022
136
Also the bigger the total core count ,interconnect and IO becomes the less impact a single core not power/clock gating for 100us has on total power consumption.
Generally yes. But specifically Renoir does not have that much uncore overhead.
I ran it with different cTDPs in order to find out its max. Energy efficiency. The result was 12w for 8 cores - only below that margin the uncore started to eat too much into the power budget.

SweetSpotFinding.png
 

desrever

Member
Nov 6, 2021
122
301
106
Zen 4c is particularly attractive in this regard because you can fit ~twice the cores per area (i.e. per dollar), but each individual core in 1T mode provides more performance than each thread on Zen 4 with SMT. I brought this up in the old "future of SMT" thread, but this pricing dynamic has significantly reduced the importance of SMT. It's still useful for some cases (flexibility on one machine, max throughput regardless of perf per thread, etc), but no longer irreplaceable.
Each zen4c thread as vCPU is probably stronger than what is needed in a lot of cloud use cases, disabling SMT would be a waste. Some workloads wouldn't want to do it but for general use case, it is more than capable. With all IO being equal, they can sell a SMT core as 2 vCPUs vs just 1 CPU if disabled, makes it way more attractive to do that. Considering SMT basically increase throughput by >20% for "free" and they can price these threads directly as vCPU, why wouldn't they want this? Obviously there is limits but this has been the case for most Intel/AMD servers in the cloud for a long time now. 1 zen4c thread is probably more powerful than 1 thread of icelake or w/e they would replace.

Some workloads require more performance than 1 thread can offer which is why they have the SKUs without SMT but its not like SMT isn't useful.
 
  • Like
Reactions: Tlh97

coercitiv

Diamond Member
Jan 24, 2014
6,340
12,596
136
I tried to get to the bottom of this a while back. My observation was, that when comparing two identical CPUs in the same TDP budget, where one has SMT and the other has not, there is almost no power tax for SMT.
In my specific case it was a Renoir 4700U vs. 4800U. The latter had around 25% more throughput at almost exactly the same consumption in CB23.
That is the exact opposite of what me and others have observed, SMT proportionally increases power usage based on the increase in throughput it is able to provide for a given workload.

I would also caution you not to measure the energy impact of SMT on two different CPU dies, let alone two different bins. We do this when we have no other choice, but in the case of SMT we can use the same CPU die and just disable/enable SMT, removing die variance and binning variance. Testing with a TDP cap that is easily reached by the die even with SMT disabled can also be tricky, as enabling SMT may drop clocks, keeping the die in a more efficient operating point. This can interfere with measurements depending on what one needs to evaluate. Combine a better binned die with a relatively low TDP cap and the efficiency double-dip can make SMT look like free performance. (which isn't necessarily false if all one wants is to improve efficiency)
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,101
136
Each zen4c thread as vCPU is probably stronger than what is needed in a lot of cloud use cases, disabling SMT would be a waste. Some workloads wouldn't want to do it but for general use case, it is more than capable. With all IO being equal, they can sell a SMT core as 2 vCPUs vs just 1 CPU if disabled, makes it way more attractive to do that. Considering SMT basically increase throughput by >20% for "free" and they can price these threads directly as vCPU, why wouldn't they want this? Obviously there is limits but this has been the case for most Intel/AMD servers in the cloud for a long time now. 1 zen4c thread is probably more powerful than 1 thread of icelake or w/e they would replace.

Some workloads require more performance than 1 thread can offer which is why they have the SKUs without SMT but its not like SMT isn't useful.
For general use cases, they're positioning Genoa as the default. Bergamo is more of a targeted option. And of course, in a vacuum, CSPs would love to have more threads to offer and to net the "free" throughput SMT provides, but ultimately, customers demand more than just raw throughput. There will certainly be deployments of Bergamo with SMT, but don't be surprised if many companies disable it. I expect that will be particularly common with web-heavy deployments (e.g. Google, Meta, etc).

Though on the topic, AWS now defines 1 vCPU as 1 core, not one SMT thread. I think that might also help them avoid any side channel concerns with SMT. Will be interesting to see if Microsoft and Google follow.
 

Shmee

Memory & Storage, Graphics Cards Mod Elite Member
Super Moderator
Sep 13, 2008
7,512
2,528
146
What exactly is Zen 4c? Is this a Zen 4 refresh?
 

Timorous

Golden Member
Oct 27, 2008
1,723
3,124
136
What exactly is Zen 4c? Is this a Zen 4 refresh?

Density optimised Zen 4. The chiplet itself is less than 10% larger and it has double the core count vs the standard Zen 4 CCD. That is split into 2 CCXs each with 16MB of L3 so per core L3 is halved.
 

Abwx

Lifer
Apr 2, 2011
11,143
3,840
136
You should start with your own link. Dynamic power scales proportionally to Cdyn * Frequency * Voltage^2. You're focusing only on frequency, ignoring both the Cdyn and Voltage terms. Zen 4c isn't half the area with the same VF curve.

This show that you dont really understand the thing...

Putting the capacitance and frequency is using twice the same parameter in a way.

The current through an ideal mosfet increase as the square of the voltage.

To increase the current by a X factor , and hence frequency by the same X ratio, you ll have to increase voltage by sqrt(X)

FI to increase frequency by a 2 factor voltage must be increased by 1.414x
Power will be increased by 2 if we account only this factor, but since frequency is also increased by a 2 factor the whole power increase by a 4 ratio.

So we can write that P(f) = f^2 without normalizing the equation, FI if a CPU use 100W at 5GHz the normalized relation would be :

P(f) = 4.f^2 with frequency unities in GHz.

That is, power increase quadratically in respect of frequency, but keep in mind that it s a theorical best case and that real mosfets do not exhibits that good of a power/frequency slope, generaly the exponent is between 2.2 and 2.8 depending of the process.

As for the capacitance it is not needed in this relation because it is assumed as being at its maximal value since we are talking of a CPU that work at full throughput.

Now Intel put great care to linearize its process as much as possible and they have generally a better slope than TSMC who seems more concerned about time to market, if we look at ADL FI they manage to have a 2.2 exponent while TSMC s 7nm process hoover at 2.6-2.8 depending of the exact process iteration, but that s only part of the story because TSMC has lower cpacitance to begin with, so at low power/low frequency their process has a better perf/watt at equivalent node.
 
Last edited:
  • Like
Reactions: Joe NYC and Tlh97
Jul 27, 2020
17,479
11,266
106
My observation was, that when comparing two identical CPUs in the same TDP budget, where one has SMT and the other has not, there is almost no power tax for SMT.
In my specific case it was a Renoir 4700U vs. 4800U.
Your observation may not hold true for Intel architectures. Intel engineers haven't seemed to figure out "free" SMT.
 
  • Like
Reactions: Tlh97 and Markfw

Exist50

Platinum Member
Aug 18, 2016
2,452
3,101
136
Putting the capacitance and frequency is using twice the same parameter in a way.
All this tells me is that you have no idea what that term is. To give the simplest possible example, if you have two identical transistors instead of one, that term would ~double. So no, it is not in any way redundant, nor scales with frequency. It's a constant for a given design and workload.
 

BorisTheBlade82

Senior member
May 1, 2020
667
1,022
136
I would also caution you not to measure the energy impact of SMT on two different CPU dies, let alone two different bins. We do this when we have no other choice, but in the case of SMT we can use the same CPU die and just disable/enable SMT, removing die variance and binning variance.
Yep, I gladly and wholeheartedly agree with you. The trouble is, that I do not have a machine at my disposal that would allow me that kind of thorough testing.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,721
14,747
136
Yep, I gladly and wholeheartedly agree with you. The trouble is, that I do not have a machine at my disposal that would allow me that kind of thorough testing.
The only thing about SMT that I have noticed, is when you have an application that is very sensitive to L3 cache, either disabling it, or running 50% of the normal jobs helps a lot. Other than that, I see no penalty. If I thought it was worth testing to prove to some people that it makes no change, I would test it for you. But some people are "always right" and can not be convinced.
 
Jul 27, 2020
17,479
11,266
106
The only thing about SMT that I have noticed, is when you have an application that is very sensitive to L3 cache, either disabling it, or running 50% of the normal jobs helps a lot.
Yeah. The HT threads vie with the normal threads for resources, putting pressure on the limited cache. AMD V-cache is the solution.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
unrelated but that new linus video has him using a chinese mini pc with a 12490f or whatever with a slightly larger cache. I went to bed six minutes in because I couldn't take anymore of his whiny voice but I wonder if that particular china only cpu was a kind of test bed for future cpus with larger than standard l3 cache or anything else.

id post more on this but feel awful having spent a few hours under rain the other day repairing my shed.
 
  • Wow
Reactions: igor_kavinski