Discussion PES | Assessing Power and Performance Efficiency of x86 CPU architectures

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Abwx

Diamond Member
Apr 2, 2011
9,867
2,312
136
Efficency at different perfs levels is moot and perfectly flawed, if i take a 5900HS at a given frequency F it will be roughly 2x more efficent at F/2, that is, half the perf at a quarter of the power.

To make sense comparison should be made at isoperf.
 

jeanlain

Member
Oct 26, 2020
130
103
86
What kind of (free) workload would you suggest?
None that comes to mind. I would not use X264/X265 as it contains X86 assembly optimisation that probably have no equivalent in the ARM build.

It's better to combine an array of different algorithms that are specifically coded for cross-platform comparisons.
I would have recommended geekbench pro, but it's $99. The free version of geekbench does not run long enough.
SPEC tests are much more expensive AFAIK.
 
  • Like
Reactions: BorisTheBlade82

BorisTheBlade82

Senior member
May 1, 2020
380
538
106
Efficency at different perfs levels is moot and perfectly flawed, if i take a 5900HS at a given frequency F it will be roughly 2x more efficent at F/2, that is, half the perf at a quarter of the power.

To make sense comparison should be made at isoperf.
There already are a lot of comparisons you can make at ISO performance. And you will find out that even at ISO perf there are huge power efficiency differences.

As a sidenote: It is not as trivial as you describe it. Energy efficiency is measured as total consumption for a given workload. So for your 5900HS to be 2x more efficient at F/2 it would need to consume only 1/4 average power as it will need twice the time for the workload.
You are invited to try this out for yourself. Just follow the download link in the OP.
 

BorisTheBlade82

Senior member
May 1, 2020
380
538
106
@jeanlain
Yes, I would love to use GB5 or SPEC. But as this is a community oriented benchmark (otherwise it would have been impossible to get so many samples) CB23 was a good compromise of being free, popular and relevant.
 
  • Like
Reactions: jeanlain

PJVol

Senior member
May 25, 2020
210
180
86
BorisTheBlade82
Did I understand correctly, it won't work without excel installed? Since I'm trying to keep my home PC as "clean" as possible and not gonna use Office in near future, is it possible to make the script work with something like google spreadsheets? I tried to import csv data to the table created after first import of original xlsx , but there are some incompatibilities between ST/MT tables structure and the data in csv (btw better to use standard "comma" separator, instead of ";" in your export). The excel tables has two columns more than the csv has.
 
Last edited:

Abwx

Diamond Member
Apr 2, 2011
9,867
2,312
136
. So for your 5900HS to be 2x more efficient at F/2 it would need to consume only 1/4 average power as it will need twice the time for the workload.
That s what i stated explicitely on my post, if you reduce frequency by a 2 factor power will be reduced by at least 4x, so efficency will be 2x better, that s inherent to transistors current scaling vs voltage.
 

BorisTheBlade82

Senior member
May 1, 2020
380
538
106
BorisTheBlade82
Did I understand correctly, it won't work without excel installed? Since I'm trying to keep my home PC as "clean" as possible and not gonna use Office in near future, is it possible to make the script work with something like google spreadsheets? I tried to import csv data to the table created after first import of original xlsx , but there are some incompatibilities between ST/MT tables structure and the data in csv (btw better to use standard "comma" separator, instead of ";" in your export). The excel tables has two columns more than the csv has.
For generating the charts Excel is necessary. But you could also just provide me with the CSVs here and I will post your result.

Regarding the separator: First of all there is no real "standard", this I can tell you. In many regions of the world the comma is the decimal separator so ";" gets used. But this does not matter as long as the sender (my Powershell script) and the receiver (my Excel file) expect it to be the same.
 

BorisTheBlade82

Senior member
May 1, 2020
380
538
106
That s what i stated explicitely on my post, if you reduce frequency by a 2 factor power will be reduced by at least 4x, so efficency will be 2x better, that s inherent to transistors current scaling vs voltage.
In general I absolutely agree with you as double the frequency needs the current to be squared. But this does not hold true for all actual SKUs. For Alder Lake for example this holds true as the applied 241w are waaaaay beyond the sweet spot. But for most AMD SKUs you just won't get twice the efficiency compared to stock. This is because there is a lower border at which drecreasing cTDP and therefore frequency further, power efficiency actually decreases. If you are interested you can look up the numbers of my measurements for my own AMD R7 4700U here: https://www.linkedin.com/pulse/finding-performance-efficiency-sweet-spot-given-cpu-boris-vogel

And of course this is a bit trivial: Because of the uncore, I/O, etc. there is a baseline tax which ofc is smaller for monolithic architectures than for Chiplets.
 
Feb 17, 2020
89
213
76
Is it really true so? ...
Yes. I'm not even going to go through your entire reply because you have some fundamental misconceptions about how a cpu is designed, marketed, and funded. Also, I never mentioned x86 vs ARM because AMD and Intel's problems are (mostly) unrelated. It's a design philosophy issue, not an ISA issue. Maybe one of them grows a brain, changes their design approach, and it eventually becomes an ISA issue, but it's not there yet.

As a sidenote: It is not as trivial as you describe it. Energy efficiency is measured as total consumption for a given workload. So for your 5900HS to be 2x more efficient at F/2 it would need to consume only 1/4 average power as it will need twice the time for the workload.
You are invited to try this out for yourself. Just follow the download link in the OP.
It actually is that trivial. Let's go through the math. Keep in mind that this is a very simple napkin math type example to show how much impact the v/f curve has on efficiency.

By definition, power is proportional to capacitance * frequency * voltage^2. So initially, you might think that changing frequency doesn't impact power efficiency, but you're forgetting that frequency and voltage are also related. In general, you can assume that relationship is more or less linear, though it can break down at very high or very low voltages.

So with that in mind, power is actually more proportional to capacitance * frequency^3. So cutting frequency in half theoretically cuts power by 7/8, which is a 4x increase in efficiency. Obviously there are a lot of other factor that would influence this (leakage, uncore, shape of v/f curve), but in general this is close enough. Ex: with Alder Lake, you can run at half clocks for 1/7th of the power.

Now let's apply this thought process to M1 vs AMD/Intel. Again, this is only napkin math to show how powerful the cubic relationship's effect on power is, and I'm dramatically simplifying. Let's say AMD/Intel's chips run at 5 GHz with a capacitance of 3 units. Now let's say that they try to shift the operating region and reduce the target clock speed to 3 GHz. To make up for the performance, they want to boost IPC by 66%. Here, IPC and capacitance are what's linked. A well executed architecture should have IPC-cap scaling of around 1:1. Very well executed changes are better, and poor architectures do worse. But for simplicity's sake, let's assume they achieve a 1:1 ratio (not easy, but also not impossible), bumping up the capacitance from 3 units to 5. If you compare the power draw of the two approaches, the initial frequency-centric approach has a symbolic power draw of 3 * 5^3, while the second IPC-centric one has a symbolic power draw of 5 * 3^3. If you compare the two, the IPC-centric design approach has a power draw that's 64% lower than the frequency-centric one.

Like I said, I'm dramatically oversimplifying the cpu design process, but it's pretty obvious that by definition, chasing maximum clock speeds is ridiculously inefficient. Sadly, AMD and Intel marketing prefers this approach because uninformed consumers don't understand what IPC is. Frequency is just easier to market. I can make a whole new post just about that.
 

BorisTheBlade82

Senior member
May 1, 2020
380
538
106
@trexfromouterspace
I know, I know.
I am not really sure if you read my prior post before answering.

But I also wanted point out that AMD is running most of their SKUs much closer to the sweet spot than Intel. And compared to Apple, AMD needed to make another compromise as @DisEnchantment already mentioned. Their TSMC supply for CPUs is much more constrained than Apple's, so they need to cut corners area-wide.
 

Abwx

Diamond Member
Apr 2, 2011
9,867
2,312
136
In general I absolutely agree with you as double the frequency needs the current to be squared.
Actually current increase proportionatly with frequency but to get this current increasement you ll have to increase voltage by an amount such that power will increase as a square of frequency, albeit only at frequencies low enough, as frequency increase the frequency/power curve will morph gradually to higher exponents.
 

PJVol

Senior member
May 25, 2020
210
180
86
Their TSMC supply for CPUs is much more constrained than Apple's
I always wanted to ask someone who has industry insight, is it really the case? I mean all these rumoured tsmc supply constraints, cause i've heard one person explained well why it's not.
 

BorisTheBlade82

Senior member
May 1, 2020
380
538
106
Actually current increase proportionatly with frequency but to get this current increasement you ll have to increase voltage by an amount such that power will increase as a square of frequency, albeit only at frequencies low enough, as frequency increase the frequency/power curve will morph gradually to higher exponents.
Sorry, I am no native speaker - so I might have messed up some fine differences between technical expressions.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,366
3,299
136
Thanks for sharing your result here. What Notebook do you have? There seems to be some thermal throttling going on.
It's a Dell Inspiron. Details in my flair. And it is not thermal throttling; it instead lowers PL1 dynamically so as to avoid thermal throttling. Temps were around 85 C throughout.
 
  • Like
Reactions: BorisTheBlade82

Stuka87

Diamond Member
Dec 10, 2010
6,130
2,423
136
Looks like you do not yet have a 11900H in your results. My Dell Precision has this, will test when I get a chance. I will say it runs really hot. All eight cores have throttled on me today.
 

BorisTheBlade82

Senior member
May 1, 2020
380
538
106
  • Just added results of an i7 12700H (Alder Lake 6p+8e).
  • If anyone has a Rembrandt I would appreciate if you give it a try as I have no results for it up until now
  • The new version should already support Raptor Lake. For Zen4 there are no modifications needed: v0.7.5
 
  • Like
Reactions: lightmanek

BorisTheBlade82

Senior member
May 1, 2020
380
538
106
So just as promised here are the first PES results for a Zen 4 CPU - specifically the R9 7950X (Raphael) - Andreas Schilling from https://www.hardwareluxx.de/ was so kind to provide me with them.
A Schilling R9 7950X 2022-09-27 063854.png

Some remarks
  • As you can see, the MT loop got a bit disturbed by some parallel process. Although I do not consider the influence to be significant, please see these as preliminary results. Andreas promised to provide me with another clean run later on.
  • In the MT run we can also see some thermal throttling. This actually helps with efficiency as frequency and voltage goes down. Seems that the be quiet! Dark Rock Pro 4 used in the review is not able to keep temperatures below 95°C
Competitive landscape - ST
PES
1664291657775.png

Consumption
1664291890034.png

Performance-Consumption-Matrix

1664291992985.png

  • Energy consumption stayed roughly the same for Raphael compared to its predecessor. Thanks to the much improved performance it can gain roughly 30% in the PES-department.
  • Compared to Alder Lake there is still a quite large gap. The IFoP-tax could not be reduced significantly. I have to admit that I had hoped for more improvement regarding ST (performance) efficiency. After all the talk about power saving techniques from Rembrandt, 6nm for the IOD, optimized IF and what not. So this as well as idle consumption remain the main weaknesses of AMD's otherwise great chiplet CPUs.
Competitive landscape - MT
PES
1664293369807.png

Consumption
1664293497299.png

Performance-Consumption-Matrix
1664293726760.png

  • No need to emphasize that MT is what the 5950X was made for. It is the first result below 30 seconds per CB23 run I got - so I had to adjust the scale of the matrix ;)
  • Although consuming slightly more than its predecessor, the sheer performance advantage puts it in clean air compared to everything else.
What to look out for
  • I will be getting results for the R7950X in ECO mode soon. This way we will be able to compare R7950X@ECO vs. 5950X vs. 12900K@125w at roughly ISO-wattage.
  • I am also hoping to get results for the other Zen 4 SKUs soon as well.
P.S.
And there is really no one out there having a Rembrandt-based device that might give it a try?
 

Attachments

BorisTheBlade82

Senior member
May 1, 2020
380
538
106
AMD R5 7600X
7600X 2022-09-28 210440.png


The single-threaded performance efficiency is outstanding. This is in the range of Cezanne and only bested by the Apple M1. Seems that only having one CCD and the reduced boost frequency of 5.4Ghz have a huge impact.
I would not have expected that AMD would be able to reach these regions with an IFoP-based design.

AMD R7 7700X
7700X 2022-09-28 210632.png


Still a very good ST performance-efficiency-score - slightly above the Intel 12900K.

AMD R9 7900X
7900X 2022-09-28 210811.png

AMD R9 7950X (cleaner result)
7950X 2022-09-28 211021.png
This one has a much better ST performance-efficiency. MT is more or less the same as before.

AMD R9 7950X @ 65w TDP / 88w PPT
7950X ECO 88w 2022-09-28 211321.png

At 88w PPT MT performance-efficiency is out of this world. But as we all know it is a bit of an unfair comparison. I hope to get another result at 105w TDP from Andreas Schilling. This would give us a good comparison to the R9 5950X at ISO-wattage and the 12900K@125w at roughly ISO-wattage.

I have also updated all the matrices and rankings in posts #2 and #3. I hope you enjoy this so far.
 

ASK THE COMMUNITY