Sudden jump in power consumption of haswell

futureishere

Junior Member
Aug 6, 2015
15
0
0
I am monitoring the power consumption of haswell processor and I have noticed some sudden changes in the power consumption for no apparent reason. For demonstration, as shown in this figure (left y-axis is in Watt and right y-axis is in Celsius), I have fixed the cpu C-state for all cores to C0 at a fixed frequency and then I increase the temperature using a hair dryer. I measure the power consumption at CPU ATX power rail and using RAPL. As one can see, the power rises linearly in correlation with the temperature till about 16.5W and then there is a sudden jump to ~18.5W. This jump is visible in both ATX and RAPL values. Similarly, when the processor is allowed to cool down, the power value decreases steadily till ~17W and then drops suddenly to ~15W. There is no change in CPU/cache activity during this and the results are consistent across different runs. A similar jump is seen at the mark of around 23-24W and 35-36W. The only explanation that I can come up with is that this is because of phase change in Haswell's internal voltage regulator. Can someone confirm if this a fair assumption?

2LHHhLH.jpg
 

know of fence

Senior member
May 28, 2009
555
2
71
Vcore voltages are discrete states, that have to adjust to temperature, they aren't all that fine grained, apparently. Or rather the temerature rise/fall is very quick, assuming those are seconds on the X-axis.
 

futureishere

Junior Member
Aug 6, 2015
15
0
0
Vcore voltages are discrete states, that have to adjust to temperature, they aren't all that fine grained, apparently. Or rather the temerature rise/fall is very quick, assuming those are seconds on the X-axis.

I forgot to mention that the Vcore voltage has been fixed from BIOS. And I am monitoring the Vcore as well and it remains flat as expected. So that can't be the reason. And yes, those are seconds on X-Axis. The temperature rise as seen in the graph is quite gradual and does not explain the sudden jump. The temperature shown in the figure is package temp but I am also monitoring the core temperatures (not shown in the figure) and those are also in line with the package temperature.
 

Abwx

Lifer
Apr 2, 2011
11,879
4,864
136
As temperature increase transistors caracteristics degrade, they conduct less and become hence slower.

The solution is to increase voltages to keep the circuit working within stable voltages margins, apparently this is done by steps once clock jitters reach a given value, voltage is then increased wich get back temporal coherencies at safe values.
 
Last edited:

know of fence

Senior member
May 28, 2009
555
2
71
I forgot to mention that the Vcore voltage has been fixed from BIOS. And I am monitoring the Vcore as well and it remains flat as expected.

Honestly I have but a vague undersanding of the variuous Voltages, VID, Voffset, droop, core/uncore ..., but I'm pretty sure that even disabling frequency scaling, you don't get to set those perimeters. Rather they are set when the CPU is binned as well as adjusted on the fly depending on temp. Vcore is just an upper limit, or rather an overall highest state for all cores.

Good luck finding software that shows individual core states, there are S(system)- states, C(core)-states and P-(voltage-frequency)States; the latter aren't entirely accessible even if you disable power saving options, the CPU still has to pick and adjust voltages for every core depending on temps and frequency?

It's obvious that actual measured voltage across a circuit will drop when the circuit is heated, due to increased resistance. What you see is on-the-fly power management interfering and adjusting for that drop, which is, I think, what Abwx said.
 

sm625

Diamond Member
May 6, 2011
8,172
137
106
You need to put a meter right on the core voltage rail. You should see it stepping up. It is surprising to see that it has so few discrete steps over such a large temperature range.
 

Abwx

Lifer
Apr 2, 2011
11,879
4,864
136
You need to put a meter right on the core voltage rail. You should see it stepping up. It is surprising to see that it has so few discrete steps over such a large temperature range.

Because the management doesnt use temperature as metric to increase the voltage, temperature wont tell you if the transistors switch fast enough or not as transistor caracteristic have dispersion from a CPU to another, some will switch faster than others at same temps.

So the metric used is clock jitters measured in strategical points, this is a reliable measurement of CPU capability to maintain coherency between different parts.

https://en.wikipedia.org/wiki/Jitter
 

futureishere

Junior Member
Aug 6, 2015
15
0
0
Honestly I have but a vague undersanding of the variuous Voltages, VID, Voffset, droop, core/uncore ...,
That makes two of us. :)

but I'm pretty sure that even disabling frequency scaling, you don't get to set those perimeters. Rather they are set when the CPU is binned as well as adjusted on the fly depending on temp. Vcore is just an upper limit, or rather an overall highest state for all cores.

The core voltage and the L3 cache voltage can be fixed from BIOS, which I have done. The Vcore can be monitored using CPU MSR which I am doing and it shows the same value as fixed using BIOS and remains constant during the entire experiment. Now there are more voltage plains inside the chip like for graphics chip for EDRAM but both these units are power gated when not in use, like in my case. The I/Os also have their voltage plain but I would be surprised if the change in VccIO is causing this jump.

Good luck finding software that shows individual core states, there are S(system)- states, C(core)-states and P-(voltage-frequency)States; the latter aren't entirely accessible even if you disable power saving options, the CPU still has to pick and adjust voltages for every core depending on temps and frequency?

I have fixed the C-state to C0 for each core. And from what I understand, when the cores are in C0 state, the package can only be in the most active state. There is no question of P-state if the Speedstep has been disabled and the core frequency+voltage is fixed from BIOS.

It's obvious that actual measured voltage across a circuit will drop when the circuit is heated, due to increased resistance. What you see is on-the-fly power management interfering and adjusting for that drop, which is, I think, what Abwx said.

I am not sure I understand you. Why would the measured voltage across a circuit drop with the increased resistance?
 

futureishere

Junior Member
Aug 6, 2015
15
0
0
Because the management doesnt use temperature as metric to increase the voltage, temperature wont tell you if the transistors switch fast enough or not as transistor caracteristic have dispersion from a CPU to another, some will switch faster than others at same temps.

So the metric used is clock jitters measured in strategical points, this is a reliable measurement of CPU capability to maintain coherency between different parts.

https://en.wikipedia.org/wiki/Jitter

If the voltage is indeed being increased with increasing temperature, it is strange that MSR_PERF_STATUS doesn't show that. Also, this data was generated at core voltage fixed at 0.85V and core frequency fixed at 800 MHz. If I decrease the core voltage from BIOS to 0.7V (while keeping the core frequency at 800 MHz), and repeat the experiment, I don't see any jump. Similarly, if I increase the core voltage to 1.0V, I again don't see any of these weird jumps.
 

futureishere

Junior Member
Aug 6, 2015
15
0
0
Here is the data at core voltage fixed to 0.7V. As you can see, the power values are very smooth with no visible jumps as seen for 0.85V.

XpFWyHc.jpg
 
Last edited:

futureishere

Junior Member
Aug 6, 2015
15
0
0
And here is the data at core voltage fixed to 1.0V. Actually, there is a jump to be seen here, but it occurs at much lower temperature.
WqR1bbe.jpg



As I mentioned in my original post, these jumps are seen only when the power consumption values are crossing over from ~15-16W to a higher value. This tells me that it doesn't really have to do with temperature but the amount of load on the voltage regulator.
 
Last edited:

Abwx

Lifer
Apr 2, 2011
11,879
4,864
136
If the voltage is indeed being increased with increasing temperature, it is strange that MSR_PERF_STATUS doesn't show that. Also, this data was generated at core voltage fixed at 0.85V and core frequency fixed at 800 MHz. If I decrease the core voltage from BIOS to 0.7V (while keeping the core frequency at 800 MHz), and repeat the experiment, I don't see any jump. Similarly, if I increase the core voltage to 1.0V, I again don't see any of these weird jumps.


Voltage is not increased with temperature increasing but with increasing clock jitters, wich can effectively occur if temperature rise at high levels.

At 0.7V the CPU will dissipate 24% less power from the start, wich reduce its susceptibility to temperature variations.

Of course i m not totaly sure of this, that s the only explanation i have so far, btw , what RAPL stand for.?.
 

futureishere

Junior Member
Aug 6, 2015
15
0
0
Voltage is not increased with temperature increasing but with increasing clock jitters, wich can effectively occur if temperature rise at high levels.

At 0.7V the CPU will dissipate 24% less power from the start, wich reduce its susceptibility to temperature variations.

Well, then this still doesn't explain the trend at 1.0V!

Of course i m not totaly sure of this, that s the only explanation i have so far, btw , what RAPL stand for.?.

RAPL is "Running Average Power Limit" which is the soft energy model inside Haswell chip that reports total chip energy usage to the user.
 

Abwx

Lifer
Apr 2, 2011
11,879
4,864
136
Well, then this still doesn't explain the trend at 1.0V!

With the last pic results you could be right about this :

And here is the data at core voltage fixed to 1.0V. Actually, there is a jump to be seen here, but it occurs at much lower temperature.
As I mentioned in my original post, these jumps are seen only when the power consumption values are crossing over from ~15-16W to a higher value. This tells me that it doesn't really have to do with temperature but the amount of load on the voltage regulator.

It s possible that there s two or more regulations circuits optimised for different powers..

At 15W a higher power regulation is switched on, the higher voltage could be required due to expected higher currents wich inherently create higher voltages losses in conductive tracks and VRMs.

The fact that there s a discontinuity point to something abruptly switched on.
 

know of fence

Senior member
May 28, 2009
555
2
71
So consumption goes up anywhere from 50% to almost doubling, and all you did was hold a hair dryer to the CPU. This is at least a great demonstration why good active cooling is important...
I'd bet that the increase we see is due to CPU actively compensating rather than power delivery somehow becoming inefficient.
Also none of those plotted readouts are actually point in time measured not even temp, rather they are all averages, which at least explains why they are smooth to begin with. Measured with an oscilloscope power consumptions looks more like this:
04-Power-Consumption-Torture-i5-6600K_r_600x450.png
 

futureishere

Junior Member
Aug 6, 2015
15
0
0
So consumption goes up anywhere from 50% to almost doubling, and all you did was hold a hair dryer to the CPU. This is at least a great demonstration why good active cooling is important...
I'd bet that the increase we see is due to CPU actively compensating rather than power delivery somehow becoming inefficient.
Also none of those plotted readouts are actually point in time measured not even temp, rather they are all averages, which at least explains why they are smooth to begin with. Measured with an oscilloscope power consumptions looks more like this:
04-Power-Consumption-Torture-i5-6600K_r_600x450.png

Yes, the ATX and RAPL power consumption values are average over a second but the temperature values are just instantaneous values.
 

know of fence

Senior member
May 28, 2009
555
2
71
To not embarrass myself further I will refrain from further guesswork. You've given very little context, and charging ahead with explanations is obviously foolish. What benchmark, stress test were you running or was the CPU idle during the test, also what is the number of cores, core count does multiply any power consumption effects.

Reading this explanation of P-states I realize that we can assume that voltage is the same across all cores, well sometimes such as in Turbo modes, some cores have a corresponding frequency of zero.
 
Last edited:

futureishere

Junior Member
Aug 6, 2015
15
0
0
To not embarrass myself further I will refrain from further guesswork. You've given very little context, and charging ahead with explanations is obviously foolish. What benchmark, stress test were you running or was the CPU idle during the test, also what is the number of cores, core count does multiply any power consumption.
I am not running any benchmark but I have restricted the CPU to C0 state which means that the CPU is in polling mode, not idle. It's a 4-core CPU.

Reading this explanation of P-states I realize that we can assume that voltage is the same across all cores, well sometimes such as in Turbo modes, some cores have a corresponding frequency of zero.

Well, the speedstep has been disabled through BIOS, so P-state doesn't really matter anyway. TURBO mode is disabled too, so all cores are running at same frequency (800 MHz).
 

Abwx

Lifer
Apr 2, 2011
11,879
4,864
136
Here is the data at core voltage fixed to 0.7V. As you can see, the power values are very smooth with no visible jumps as seen for 0.85V.

XpFWyHc.jpg

In this pic power stay below 16W, possible that the threshold is not reached, do a test with 0.7175V, 0.735V and then 0.77V, this will increase power by 5%, 10% and 21% respectively.

This will allow to approach slowly and bound the suspicious value...
 

Dufus

Senior member
Sep 20, 2010
675
119
101
You need to monitor VCCIN as well. RAPL measurements are estimations only. What are you using to measure vcore?
 

know of fence

Senior member
May 28, 2009
555
2
71
At least in my simplistic understanding it all makes sense now.
Because the CPU isn't doing much and is set to a low clock and voltage, power consumption is low to begin with. But it spikes considerably because you have 4 cores running (< 2 W per core) and because you essentially disabled power gating thus increasing leakage. Not a typical use case.
The spike is mostly attributable to Static Leakage, and from what we see in the explanations by IdontCare, who did a similar experiment. There is actually a Temperature dependent exponent in the equation.
PtotalVccTGHzSymbolicEquation.png

The jump on the other hand has to be some kind of Voltage change, that you didn't record or notice. How is it triggered and what kind of +V(unknown) are we dealing with, is the actual question that you raised. Also wouldn't "a phase change" inside a FIVR also result in a change in voltage.
I still think there are several VIDs for any given clock, that a CPU can switch to, how exactly this is governed by Voffset and whatnot I would like to know as well.
 

futureishere

Junior Member
Aug 6, 2015
15
0
0
I have another plausible theory for this jump. The internal voltage regulator of Haswell is divided into multiple power cells each of which act as mini voltage regulator. Each of these power cells are connected in parallel such that each of them can supply certain amount of current to the cores. So if each power cell is rated at max load current of say 10A, and if the current consumption of all cores is less than 10A, then only one power cell is activated. When the current consumption nears the 10A mark, the next power cell is activated in anticipation of the increasing current requirements and at the activation point, both power cells are now supplying 5A each. As such it shouldn't make any difference, but the efficiency curve of these power cells is not flat. So at 10A the efficiency may be 85% while at 5A, the efficiency may be 75%. Which means that the power consumption wasted inside these power cells has now increased. If the core was running at 1V, at 10A, it would be consuming 10W. At 85% efficiency, it would mean that ~1.7W was being wasted inside the power cells. After activating the next power cell, at 75% efficiency, the power cells are now wasting ~3.3W even though the actual power consumed by the core remains the same. This is why I may be seeing a jump in power consumption.

This is just a speculation from my side but I can't think of any other reason.

Thoughts?
 
Last edited: