Discussion PES | Assessing Power and Performance Efficiency of x86 CPU architectures

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

JoeRambo

Golden Member
Jun 13, 2013
1,642
1,798
136
5950x pbo curve optimizer limited to 100w:
  • ST Efficiency Score = 89,92
  • MT Efficiency Score = 7516,68
1634010097330.png


5950x Hydra optimized to 100w:
  • ST Efficiency Score = 97,04
  • MT Efficiency Score = 8844,92
Great stuff, yet another reminder how mediocre stock and PBO settings are. Also how far things can be pushed with some know-how and tuning.
 

amrnuke

Golden Member
Apr 24, 2019
1,175
1,767
106
I think it runs for 10 minutes? At least that's what the script is supposed to do from the source.
I should probably not exaggerate! It felt like I waited more than 10 minutes, but I'm not sure.
Will trial again and time it to be sure.
 

amrnuke

Golden Member
Apr 24, 2019
1,175
1,767
106
5600X - PBO +200MHz
ST PES 65.24, Consumption 29,960, Duration 511.64
MT PES 1515.75, Consumption 8,638, Duration 76.37

Hmm...
 
  • Like
Reactions: lightmanek

BorisTheBlade82

Senior member
May 1, 2020
477
740
106
40
So in general, what to make of all these numbers...

Let's have a look at the comparison between the Pentium Silver N6000 and the R3 4300G under ST for example as this should give us some nice clues regarding the underlying architecture and process:



The former is the direct predecessor of the ADL little Gracemont core. The process is also 10nm - although more the variant ICL was released on and not the current 10ESF / Intel7.
When comparing to the latter it is clear to me how much better Zen2 and TSMC 7nm would work for a small core. Not only is the latter much much faster, it also is quite a lot more power efficient. So for Intel to catch up or overtake with Gracemont will be quite a stretch. For me it is very impressive how widely Zen2/3 scales. Although it is the best foundation for what we could call a "little" core it also works pretty well as a "big" core at the same time.

Too bad we can not directly compare the Apple M1. I guess the results would be devastating for the competition.
 

JoeRambo

Golden Member
Jun 13, 2013
1,642
1,798
136
The former is the direct predecessor of the ADL little Gracemont core. The process is also 10nm - although more the variant ICL was released on and not the current 10ESF / Intel7.
I think calling it "predecessor of Gracement" is giving WAY too much credit to previous Atom core. It was not meant to run workloads like CB at all. It is underpowered little core, behind the curve in power efficiency, good for minor integer workloads.
It has no machinery to properly run CB workload, and is meant to run Chromebook style of workloads at ARM SoC speeds.

Tremont:
1634378013040.png

vs this monster that has Skylake level of FP vec resources:

1634378100794.png

3x vec ALU, two symmetric FMUL/FADD capable pipes, backed by dual load / dual store.
 
Last edited:

BorisTheBlade82

Senior member
May 1, 2020
477
740
106
40
@JoeRambo
You are absolutely right. What you are pointing out leans more to the performance side of the equation. The question is what will happen wrt power efficiency. And there I do not expect miracles from 10ESF.
If we compare ICL and TGL we can see that 10SF helped with improving max frequency but not so much performance efficiency. Of course we would need a comparison with ISO frequency to be fair.
 
  • Like
Reactions: moinmoin

sallymander

Junior Member
Nov 20, 2020
10
15
51
Too bad we can not directly compare the Apple M1. I guess the results would be devastating for the competition.
I was actually trying to do this on my M1 Mac Mini. I got similar results to Andrei (about 3.8w package power average over the run) and the performance was about the same as the i7 1165G7 for ST. I think that gives an ST efficiency score of 800+ if I calculated it correctly.
 

mmaenpaa

Member
Aug 4, 2009
47
78
91
5600G @45W (set in bios, memory 3600MHz XMP, other settings at stock), fully passive (CPU that is) which begins to show in multi runs, build in progress picture below. PSU stays also passive with these loads (WAF requirement for living room ;))

1634402595561.png

1634402759205.png
 

BorisTheBlade82

Senior member
May 1, 2020
477
740
106
40
@sallymander
Yes, you are right. So let's call this an estimate:
According to several reviews (AT for example) the CB23 ST performance is practically identical to the 1165G. So let's take its 553 seconds and with the 3,8w from you and Andrei we have 2101Ws for the run. So we are looking at a PES of 860,7. That is total carnage for basically anyone else. I need to insert that into the x-y-chart because I think visually this is better to grasp. That is like NFL vs. High-school Football.

What is interesting is that M1 loses a lot of its relative advantage in MT.
If we take the 7833 points from Andrei we are looking at around 102s for one run (because approximately CB23 score = 800000 / duration in seconds).
So with 15w we get around 1530ws and a PES of “only" around 6400.
That is still only second to the 16c/32t 5950x but nevertheless a significant relative regression.
Here I an only speculate:
  • Icestorm is not so much an efficiency core for PPW but PPA.
  • At full load Icestorm is way beyond its perf-eff-sweet-spot as it was designed for light load (background tasks, low frequency and voltage).
  • Something entirely different.
 
Last edited:
  • Like
Reactions: Viknet and moinmoin

sallymander

Junior Member
Nov 20, 2020
10
15
51
@sallymander
Yes, you are right. So let's call this an estimate:
According to several reviews (AT for example) the CB23 ST performance is practically identical to th 1165G. So let's take its 553 seconds and with the 3,8w from you and Andrei we have 2101Ws for the run. So we are looking at a PES of 860,7. That is total carnage for basically anyone else. I need to insert that into the x-y-chart because I think visually this is better to grasp. That is like NFL vs. High-school Football.

What is interesting is that M1 loses a lot of its relative advantage in MT.
If we take the 7833 points from Andrei we are looking at around 102s for one run (because approximately CB23 score = 800000 / duration in seconds).
So with 15w we get around 1530ws and a PES of “only" around 6400.
That is still only second to the 16c/32t 5950x but nevertheless a significant relative regression.
Here I an only speculate:
  • Icestorm is not so much an efficiency core for PPW but PPA.
  • At full load Icestorm is way beyond its perf-eff-sweet-spot as it was designed for light load (background tasks, low frequency and voltage).
  • Something entirely different.
I'll see if I can do some MT testing too. I wonder if Ryzen has some fixed overheads (RAM?) that are much lower on the M1, making the Ryzen ST look worse.
 
  • Like
Reactions: BorisTheBlade82

BorisTheBlade82

Senior member
May 1, 2020
477
740
106
40
@BorisTheBlade82

Great work here. Thanks for your effort. How are you computing efficiency? I assume multiplying total power x time?
Well, basically sampling package power as often as possible and then calculating the integral (Joule or Wattseconds). Because this is what it is about with a fixed workload: How much energy is needed to work it through?
 

Hulk

Diamond Member
Oct 9, 1999
3,831
1,514
136
Well, basically sampling package power as often as possible and then calculating the integral (Joule or Wattseconds). Because this is what it is about with a fixed workload: How much energy is needed to work it through?
Got it. Calculating the area under the power vs. time function. Smart. You say calculating the integral. Are you finding a function for the curve and then integrating or using the trapezoidal rule with a definite number of values for "n?" Just wondering because you mentioned the integral. Not sure if you meant that to mean calculating the area under the curve numerically or closed solution with the function between to bounds, that's why I'm asking. Just curious.

You could also take the average power during the run and multiply it by the time of the run and get the same result right?
 
  • Like
Reactions: BorisTheBlade82

BorisTheBlade82

Senior member
May 1, 2020
477
740
106
40
To be precise this is of course a discretized integral. I am gathering the package power samples and multiply it with the amount of time between two samples. Average power only works for very uniform data. With PL1, PL2 and stuff this just does not cut it in order to be accurate.
 
  • Like
Reactions: Hulk

Hulk

Diamond Member
Oct 9, 1999
3,831
1,514
136
To be precise this is of course a discretized integral. I am gathering the package power samples and multiply it with the amount of time between two samples. Average power only works for very uniform data. With PL1, PL2 and stuff this just does not cut it in order to be accurate.
Understood. By average power I meant adding the power values sampled and then dividing by the total number of samples. Of course this is ultimately the same thing you are doing though.

Thanks for responding. I'm curious as to the sampling rate?

Finally, as you wrote with all of the opportunistic frequency manipulation of modern CPU's it is also very difficult to nail down frequency during a benchmark run. Is the data available to sample the frequency of each core as well? The total of all of these samples, divided by the total number of samples would provide an average clock speed during the benchmark run, which would be very interesting as it would allow IPC, or "throughput" for each core architecture and more insight into the results.

HWinfo provide a data point like this and they call it "average effective clock."

I'm sorry if I'm being that guy who has to ask someone who is doing something out of the goodness of their heart, donating time and intellect to the community to do additional work!!! Your effort is extremely appreciated!
 

ASK THE COMMUNITY