Discussion PES | Assessing Power and Performance Efficiency of x86 CPU architectures

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

BorisTheBlade82

Senior member
May 1, 2020
664
1,014
106
Dear Community,

so this is my first thread here as a long-time lurker - but I felt the desire to share a small hobby-project of mine from the last couple of months with you...

Performance Efficiency Suite - What is it about?
Most Reviewers solely focus on what they consider to be the most important aspect of modern CPUs - the absolute performance. But this is only one side of the equation. Today Power Efficiency is at least as important - or to be more precise: The amount of energy (Wattseconds or Joules) a CPU needs in order to accomplish a given workload. Sadly most Reviewers shy away from the extra mile it needs to assess this aspect. This suite measures the Total Package Power of a CPU while running the Cinebench R23 benchmarks first in single-threaded mode (1 run), then running in multi-threaded mode (for 10 minutes + whatever it takes to finish the last run). The results will be rendered in the provided Results.xlsx Excel file. To combine Efficiency and Performance there is also a score provided called Performance Efficiency Score (how amazingly inspired I am ;)).

In the meantime I was able to aggregate more than 80 samples from members of the 3DC & CB communities (see below).

How-To
  1. Unzip the latest release to wherever you want EXCEPT on your local OneDrive folder.
  2. Open Settings.txt and insert your local Cinebench23 Directory.
  3. Run PES Start - it will ask for Administrator rights as these are needed for measuring Package Power
  4. Wait until the Powershell finishes.
  5. Open the Excel file...
  6. Allow external connections (to the generated CSV-files with the data)
  7. Go to Data -> Refresh all
  8. Enjoy and share your results - just take a screenshot of what the Excel renders.
  9. If you want to do multiple measurements with different settings just copy the Excel file (inside the root-folder) before running and refreshing the data.

Some explanations about the Suite
  • This Suite has been made possible by Michael Möller and his amazing free and open-source Open Hardware Monitor and his .NET Library OpenHardwareMonitorLib.dll - Thanks a lot!!!
    Homepage: https://openhardwaremonitor.org/
    GitHub: https://github.com/openhardwaremonitor
  • The results for the Package Power look pretty accurate compared to the sparse data the internet provides. Seems, that the vendors are much more honest with those sensors than they are with temperature etc.
  • The suite basically consists some powershell scripts and an Excel file for presentation purposes
    • RunAsAdminWrapper.ps1
      This is needed to have a convenient relative path shortcut in the root folder and request admin-rights at the same time
    • Main.ps1
      • After setting up some stuff it basically starts the Cinebench R23 one at a time. It then checks for the "Cinebench.exe" process being active.
      • While this is true it queries the Package Power Sensor data with a lower bound of 10ms (in order to keep CPU-load of the script at bay).
      • After each run the aquired data gets pushed to CSV files located in the LogCsv subfolder.
    • Results.xslx
      • The Excel file basically just does some import, calculations and a hopefully nice presentation of the data.
      • Histogram
        The bold line shows a running average of the last 100 data-points which should be sufficiently accurate. The pale line shows each single data-point.
      • Calculation of Total Package Consumption
        To get that number we need the integral. That is why we first calculate the timeframe between two data-points and then multiply the measured value.
      • Everything else in that Excel is hopefully more or less self-explaining

Online Resources

Disclaimer
I am by no means a Powershell professional or a professional Reviewer. I was just sick of the lack of information and wanted to propose a low-effort solution. Any input for further improvement is highly welcomed. Please feel free to use/extend/rip-off this solution as you wish. But please share your findings to the world.
 
Last edited:

Abwx

Lifer
Apr 2, 2011
10,947
3,457
136
Efficency at different perfs levels is moot and perfectly flawed, if i take a 5900HS at a given frequency F it will be roughly 2x more efficent at F/2, that is, half the perf at a quarter of the power.

To make sense comparison should be made at isoperf.
 

jeanlain

Member
Oct 26, 2020
149
122
86
What kind of (free) workload would you suggest?
None that comes to mind. I would not use X264/X265 as it contains X86 assembly optimisation that probably have no equivalent in the ARM build.

It's better to combine an array of different algorithms that are specifically coded for cross-platform comparisons.
I would have recommended geekbench pro, but it's $99. The free version of geekbench does not run long enough.
SPEC tests are much more expensive AFAIK.
 
  • Like
Reactions: BorisTheBlade82

BorisTheBlade82

Senior member
May 1, 2020
664
1,014
106
Efficency at different perfs levels is moot and perfectly flawed, if i take a 5900HS at a given frequency F it will be roughly 2x more efficent at F/2, that is, half the perf at a quarter of the power.

To make sense comparison should be made at isoperf.
There already are a lot of comparisons you can make at ISO performance. And you will find out that even at ISO perf there are huge power efficiency differences.

As a sidenote: It is not as trivial as you describe it. Energy efficiency is measured as total consumption for a given workload. So for your 5900HS to be 2x more efficient at F/2 it would need to consume only 1/4 average power as it will need twice the time for the workload.
You are invited to try this out for yourself. Just follow the download link in the OP.
 

PJVol

Senior member
May 25, 2020
534
447
106
BorisTheBlade82
Did I understand correctly, it won't work without excel installed? Since I'm trying to keep my home PC as "clean" as possible and not gonna use Office in near future, is it possible to make the script work with something like google spreadsheets? I tried to import csv data to the table created after first import of original xlsx , but there are some incompatibilities between ST/MT tables structure and the data in csv (btw better to use standard "comma" separator, instead of ";" in your export). The excel tables has two columns more than the csv has.
 
Last edited:

Abwx

Lifer
Apr 2, 2011
10,947
3,457
136
. So for your 5900HS to be 2x more efficient at F/2 it would need to consume only 1/4 average power as it will need twice the time for the workload.

That s what i stated explicitely on my post, if you reduce frequency by a 2 factor power will be reduced by at least 4x, so efficency will be 2x better, that s inherent to transistors current scaling vs voltage.
 

BorisTheBlade82

Senior member
May 1, 2020
664
1,014
106
BorisTheBlade82
Did I understand correctly, it won't work without excel installed? Since I'm trying to keep my home PC as "clean" as possible and not gonna use Office in near future, is it possible to make the script work with something like google spreadsheets? I tried to import csv data to the table created after first import of original xlsx , but there are some incompatibilities between ST/MT tables structure and the data in csv (btw better to use standard "comma" separator, instead of ";" in your export). The excel tables has two columns more than the csv has.
For generating the charts Excel is necessary. But you could also just provide me with the CSVs here and I will post your result.

Regarding the separator: First of all there is no real "standard", this I can tell you. In many regions of the world the comma is the decimal separator so ";" gets used. But this does not matter as long as the sender (my Powershell script) and the receiver (my Excel file) expect it to be the same.
 

BorisTheBlade82

Senior member
May 1, 2020
664
1,014
106
That s what i stated explicitely on my post, if you reduce frequency by a 2 factor power will be reduced by at least 4x, so efficency will be 2x better, that s inherent to transistors current scaling vs voltage.
In general I absolutely agree with you as double the frequency needs the current to be squared. But this does not hold true for all actual SKUs. For Alder Lake for example this holds true as the applied 241w are waaaaay beyond the sweet spot. But for most AMD SKUs you just won't get twice the efficiency compared to stock. This is because there is a lower border at which drecreasing cTDP and therefore frequency further, power efficiency actually decreases. If you are interested you can look up the numbers of my measurements for my own AMD R7 4700U here: https://www.linkedin.com/pulse/finding-performance-efficiency-sweet-spot-given-cpu-boris-vogel

And of course this is a bit trivial: Because of the uncore, I/O, etc. there is a baseline tax which ofc is smaller for monolithic architectures than for Chiplets.
 
Feb 17, 2020
100
245
116
Is it really true so? ...

Yes. I'm not even going to go through your entire reply because you have some fundamental misconceptions about how a cpu is designed, marketed, and funded. Also, I never mentioned x86 vs ARM because AMD and Intel's problems are (mostly) unrelated. It's a design philosophy issue, not an ISA issue. Maybe one of them grows a brain, changes their design approach, and it eventually becomes an ISA issue, but it's not there yet.

As a sidenote: It is not as trivial as you describe it. Energy efficiency is measured as total consumption for a given workload. So for your 5900HS to be 2x more efficient at F/2 it would need to consume only 1/4 average power as it will need twice the time for the workload.
You are invited to try this out for yourself. Just follow the download link in the OP.

It actually is that trivial. Let's go through the math. Keep in mind that this is a very simple napkin math type example to show how much impact the v/f curve has on efficiency.

By definition, power is proportional to capacitance * frequency * voltage^2. So initially, you might think that changing frequency doesn't impact power efficiency, but you're forgetting that frequency and voltage are also related. In general, you can assume that relationship is more or less linear, though it can break down at very high or very low voltages.

So with that in mind, power is actually more proportional to capacitance * frequency^3. So cutting frequency in half theoretically cuts power by 7/8, which is a 4x increase in efficiency. Obviously there are a lot of other factor that would influence this (leakage, uncore, shape of v/f curve), but in general this is close enough. Ex: with Alder Lake, you can run at half clocks for 1/7th of the power.

Now let's apply this thought process to M1 vs AMD/Intel. Again, this is only napkin math to show how powerful the cubic relationship's effect on power is, and I'm dramatically simplifying. Let's say AMD/Intel's chips run at 5 GHz with a capacitance of 3 units. Now let's say that they try to shift the operating region and reduce the target clock speed to 3 GHz. To make up for the performance, they want to boost IPC by 66%. Here, IPC and capacitance are what's linked. A well executed architecture should have IPC-cap scaling of around 1:1. Very well executed changes are better, and poor architectures do worse. But for simplicity's sake, let's assume they achieve a 1:1 ratio (not easy, but also not impossible), bumping up the capacitance from 3 units to 5. If you compare the power draw of the two approaches, the initial frequency-centric approach has a symbolic power draw of 3 * 5^3, while the second IPC-centric one has a symbolic power draw of 5 * 3^3. If you compare the two, the IPC-centric design approach has a power draw that's 64% lower than the frequency-centric one.

Like I said, I'm dramatically oversimplifying the cpu design process, but it's pretty obvious that by definition, chasing maximum clock speeds is ridiculously inefficient. Sadly, AMD and Intel marketing prefers this approach because uninformed consumers don't understand what IPC is. Frequency is just easier to market. I can make a whole new post just about that.
 

BorisTheBlade82

Senior member
May 1, 2020
664
1,014
106
@trexfromouterspace
I know, I know.
I am not really sure if you read my prior post before answering.

But I also wanted point out that AMD is running most of their SKUs much closer to the sweet spot than Intel. And compared to Apple, AMD needed to make another compromise as @DisEnchantment already mentioned. Their TSMC supply for CPUs is much more constrained than Apple's, so they need to cut corners area-wide.
 

Abwx

Lifer
Apr 2, 2011
10,947
3,457
136
In general I absolutely agree with you as double the frequency needs the current to be squared.

Actually current increase proportionatly with frequency but to get this current increasement you ll have to increase voltage by an amount such that power will increase as a square of frequency, albeit only at frequencies low enough, as frequency increase the frequency/power curve will morph gradually to higher exponents.
 

BorisTheBlade82

Senior member
May 1, 2020
664
1,014
106
Actually current increase proportionatly with frequency but to get this current increasement you ll have to increase voltage by an amount such that power will increase as a square of frequency, albeit only at frequencies low enough, as frequency increase the frequency/power curve will morph gradually to higher exponents.
Sorry, I am no native speaker - so I might have messed up some fine differences between technical expressions.
 

BorisTheBlade82

Senior member
May 1, 2020
664
1,014
106
  • Just added results of an i7 12700H (Alder Lake 6p+8e).
  • If anyone has a Rembrandt I would appreciate if you give it a try as I have no results for it up until now
  • The new version should already support Raptor Lake. For Zen4 there are no modifications needed: v0.7.5
 
  • Like
Reactions: lightmanek

BorisTheBlade82

Senior member
May 1, 2020
664
1,014
106
So just as promised here are the first PES results for a Zen 4 CPU - specifically the R9 7950X (Raphael) - Andreas Schilling from https://www.hardwareluxx.de/ was so kind to provide me with them.
A Schilling R9 7950X 2022-09-27 063854.png

Some remarks
  • As you can see, the MT loop got a bit disturbed by some parallel process. Although I do not consider the influence to be significant, please see these as preliminary results. Andreas promised to provide me with another clean run later on.
  • In the MT run we can also see some thermal throttling. This actually helps with efficiency as frequency and voltage goes down. Seems that the be quiet! Dark Rock Pro 4 used in the review is not able to keep temperatures below 95°C
Competitive landscape - ST
PES
1664291657775.png

Consumption
1664291890034.png

Performance-Consumption-Matrix

1664291992985.png

  • Energy consumption stayed roughly the same for Raphael compared to its predecessor. Thanks to the much improved performance it can gain roughly 30% in the PES-department.
  • Compared to Alder Lake there is still a quite large gap. The IFoP-tax could not be reduced significantly. I have to admit that I had hoped for more improvement regarding ST (performance) efficiency. After all the talk about power saving techniques from Rembrandt, 6nm for the IOD, optimized IF and what not. So this as well as idle consumption remain the main weaknesses of AMD's otherwise great chiplet CPUs.
Competitive landscape - MT
PES
1664293369807.png

Consumption
1664293497299.png

Performance-Consumption-Matrix
1664293726760.png

  • No need to emphasize that MT is what the 5950X was made for. It is the first result below 30 seconds per CB23 run I got - so I had to adjust the scale of the matrix ;)
  • Although consuming slightly more than its predecessor, the sheer performance advantage puts it in clean air compared to everything else.
What to look out for
  • I will be getting results for the R7950X in ECO mode soon. This way we will be able to compare R7950X@ECO vs. 5950X vs. 12900K@125w at roughly ISO-wattage.
  • I am also hoping to get results for the other Zen 4 SKUs soon as well.
P.S.
And there is really no one out there having a Rembrandt-based device that might give it a try?
 

Attachments

  • 1664293475046.png
    1664293475046.png
    144.5 KB · Views: 2

BorisTheBlade82

Senior member
May 1, 2020
664
1,014
106
AMD R5 7600X
7600X 2022-09-28 210440.png


The single-threaded performance efficiency is outstanding. This is in the range of Cezanne and only bested by the Apple M1. Seems that only having one CCD and the reduced boost frequency of 5.4Ghz have a huge impact.
I would not have expected that AMD would be able to reach these regions with an IFoP-based design.

AMD R7 7700X
7700X 2022-09-28 210632.png


Still a very good ST performance-efficiency-score - slightly above the Intel 12900K.

AMD R9 7900X
7900X 2022-09-28 210811.png

AMD R9 7950X (cleaner result)
7950X 2022-09-28 211021.png
This one has a much better ST performance-efficiency. MT is more or less the same as before.

AMD R9 7950X @ 65w TDP / 88w PPT
7950X ECO 88w 2022-09-28 211321.png

At 88w PPT MT performance-efficiency is out of this world. But as we all know it is a bit of an unfair comparison. I hope to get another result at 105w TDP from Andreas Schilling. This would give us a good comparison to the R9 5950X at ISO-wattage and the 12900K@125w at roughly ISO-wattage.

I have also updated all the matrices and rankings in posts #2 and #3. I hope you enjoy this so far.