Intel Broadwell Thread

TechFan1 · Aug 16, 2014

DirectX 12 mainly helps with CPU efficiency. It may help GPU efficiency a little, but I think only if the fps is locked.

witeken · Aug 16, 2014

TechFan1 said:
DirectX 12 mainly helps with CPU efficiency. It may help GPU efficiency a little, but I think only if the fps is locked.

At the same frame rate as DirectX 11, DirectX 12 halves power consumption by reducing GPU and CPU load. If you can cut CPU load in ~2-4, the difference in a 4.5W chassis will not be negligible. You're right that it doesn't impact the GPU as meaningfully, but combined with the CPU it does impact gaming efficiency.

bullzz · Aug 16, 2014

@antihelten - although i am not sure abt the "50% improvement" there are arch improvements for reducing power consumption. intel is doing duty cycle control for GPU clocks (mentioned in Ryan's article) which helps in reducing power during idle. they are also making the GPU wider (2 vs 3 sub slices) while increasing EUs by only 20% like how they did for HD5000 vs HD4400. that means more opportunities to do power gating when not in full load

Khato · Aug 16, 2014

witeken said:
At the same frame rate as DirectX 11, DirectX 12 halves power consumption by reducing GPU and CPU load. If you can cut CPU load in ~2-4, the difference in a 4.5W chassis will not be negligible. You're right that it doesn't impact the GPU as meaningfully, but combined with the CPU it does impact gaming efficiency.

Though keep in mind that the scenario Intel presented is likely emphasizing the effect. Not saying that it won't make a nice difference in efficiency, just that I wouldn't expect that much of a gain across the board.

antihelten · Aug 16, 2014

bullzz said:
@antihelten - although i am not sure abt the "50% improvement" there are arch improvements for reducing power consumption. intel is doing duty cycle control for GPU clocks (mentioned in Ryan's article) which helps in reducing power during idle. they are also making the GPU wider (2 vs 3 sub slices) while increasing EUs by only 20% like how they did for HD5000 vs HD4400. that means more opportunities to do power gating when not in full load

While DCC should certainly lower power consumption, it will only do so when the GPU is idling, and as such won't really affect efficiency.

witeken · Aug 17, 2014

Khato said:
Though keep in mind that the scenario Intel presented is likely emphasizing the effect. Not saying that it won't make a nice difference in efficiency, just that I wouldn't expect that much of a gain across the board.

Good point, but Mantle makes me quite optimistic.

NTMBK · Aug 17, 2014

Yeah, DX12 will be a great help for thermally constrained devices. Balancing TDP between GPU and CPU is one of the most important aspects of a tablet SoC- just look at how terribly Temash performed until Mullins brought in good thermal management.

Homeles · Aug 17, 2014

antihelten said:
While DCC should certainly lower power consumption, it will only do so when the GPU is idling, and as such won't really affect efficiency.

That is an efficiency improvement, although not at load. As far as load efficiency goes, the re-balancing of resources should help substantially. It's difficult to assess at this point whether or not Broadwell's a bigger improvement compared to Ivy Bridge's improvement over Sandy Bridge, from an end user standpoint, but the fact that it is indeed an improvement over Gen 7.5 is very clear.

antihelten · Aug 17, 2014

Homeles said:
That is an efficiency improvement, although not at load. As far as load efficiency goes, the re-balancing of resources should help substantially. It's difficult to assess at this point whether or not Broadwell's a bigger improvement compared to Ivy Bridge's improvement over Sandy Bridge, from an end user standpoint, but the fact that it is indeed an improvement over Gen 7.5 is very clear.

No it's not. Efficiency is performance/watt, and when idling you're obviously not doing any work and thus your performance is zero, and by extension your efficiency is also zero. It doesn't matter how low your power usage goes, your efficiency is still zero, since zero divided by x is always zero, no matter how low x goes.

DCC helps with idle power usage, not efficiency.

frozentundra123456 · Aug 17, 2014

Well, you are technically correct I suppose, depending on what you mean by idle. One could argue that there is in fact some work being done at "idle" if you mean having the windows desktop open. In any case though, using less power at idle leaves more power available when you want to do work, so that is the same as being more "efficeint".

Homeles · Aug 18, 2014

antihelten said:
No it's not. Efficiency is performance/watt, and when idling you're obviously not doing any work and thus your performance is zero, and by extension your efficiency is also zero. It doesn't matter how low your power usage goes, your efficiency is still zero, since zero divided by x is always zero, no matter how low x goes.

DCC helps with idle power usage, not efficiency.

Idle is not zero.

antihelten · Aug 18, 2014

frozentundra123456 said:
Well, you are technically correct I suppose, depending on what you mean by idle. One could argue that there is in fact some work being done at "idle" if you mean having the windows desktop open. In any case though, using less power at idle leaves more power available when you want to do work, so that is the same as being more "efficeint".

And technically correct is the best kind of correct

. Anyway with DCC the GPU is shut off completely for up to 87.5% of the time, with just the display controller running. So for that 87% of the time, the GPU is certainly not doing any useful work.

As for your second statement, I really don't follow the logic here. Having more power available doesn't affect efficiency, if that was the case, higher TDP CPUs would automatically be more efficient than low TDP ones, which obviously isn't the case.

Homeles said:
Idle is not zero.

As mentioned above DCC involves shutting off the GPU, so yes in this case idle is for all intents and purposes zero (and remember this whole discussion was originally in relation to gaming).

witeken · Aug 18, 2014

antihelten said:
No it's not. Efficiency is performance/watt, and when idling you're obviously not doing any work and thus your performance is zero, and by extension your efficiency is also zero. It doesn't matter how low your power usage goes, your efficiency is still zero, since zero divided by x is always zero, no matter how low x goes.

DCC helps with idle power usage, not efficiency.

Wrong.

Duty Cycle Control improves efficiency at low frequencies.

witeken · Aug 18, 2014

antihelten said:
And technically correct is the best kind of correct .

But you are not correct in any useful way.

Anyway with DCC the GPU is shut off completely for up to 87.5% of the time, with just the display controller running. So for that 87% of the time, the GPU is certainly not doing any useful work.

When a feature involves shutting off the GPU, that doesn't mean it doesn't improve the efficiency because it is turned off (and it can't do anything when it's off).

But DCC does improve efficiency, and not only because the GPU is turned off. For example, if you need a 150MHz clock speed, you can run the GPU at 150MHz, but you can get an equivalent 150MHz clock speed by running the GPU at 300MHz for half a second and shutting it off for half a second. That's the idea of DCC.

As for your second statement, I really don't follow the logic here. Having more power available doesn't affect efficiency, if that was the case, higher TDP CPUs would automatically be more efficient than low TDP ones, which obviously isn't the case.

If you save by 50J by shutting off the GPU during idle, you could use those 50J instead when you really need it to improve performance (so you end up with the same battery life).

antihelten · Aug 18, 2014

witeken said:
Wrong.

Duty Cycle Control improves efficiency at low frequencies.

The "efficient" in that slide has nothing to do with efficiency as being discussed here. Rather it refers to the region in which scaling voltage is still an "efficient" manner of lowering power usage when idling (since power scales with the square of voltage), but when you hit the threshold voltage you can't go any lower and you're stuck with just lowering the frequency, which isn't particularly efficient (since power only scales linearly with frequency).

DCC simply allows you to hit lower effective average voltages than the threshold voltage, by constantly switching between off (0 voltage) and threshold voltage.

But when DCC is running the GPU is effectively off and thus your frequency is not just low, it is zero, and thus no work is being done (although work can obviously still be done during the 12.5% or more of the time when DCC is not running)

witeken said:
When a feature involves shutting off the GPU, that doesn't mean it doesn't improve the efficiency because it is turned off (and it can't do anything when it's off).

Actually that is exactly what it means, since being off by definition means not doing any work. Either way that is not terribly relevant here since DCC only turns off the GPU intermittently, so when DCC is not running (and thus not saving any power), the GPU can still do work.

witeken said:
But DCC does improve efficiency, and not only because the GPU is turned off. For example, if you need a 150MHz clock speed, you can run the GPU at 150MHz, but you can get an equivalent 150MHz clock speed by running the GPU at 300MHz for half a second and shutting it off for half a second. That's the idea of DCC.

The important thing to remember here is why you "need" 150 MHz in the first place. If Intel could, they would simply turn of the GPU completely when it's not needed to do work, but since doing this the normal way (i.e. sleep mode) involves a fair bit of latency, they use DCC as an alternative. Now DCC still involves some small amount of latency, but this is acceptable for Intels needs.

So before you had the choice of keeping the GPU idling at 300 MHz to avoid latency when you needed it to do work again, but taking a small power use hit, or alternatively save power and make it sleep but suffer a latency penalty. DCC basically offers the best of both worlds (to a degree), but again it is about saving power not improving efficiency as such.

Now I'll admit that if you're running some sort of load that introduced a very light (not needing the GPU to clock higher than the 300 MHz idle clock) and intermittently (not allowing the GPU to enter sleep), you could improve efficiency, but such a load has nothing to do with gaming which was the whole premise of this discussion.

witeken said:
If you save by 50J by shutting off the GPU during idle, you could use those 50J instead when you really need it to improve performance (so you end up with the same battery life).

True , but again this has nothing to with efficiency, since by the same logic including a bigger battery in your laptop/tablet would also increase efficiency.

witeken · Aug 18, 2014

Okay, so we basically agree. What I don't agree with is that it doesn't improve efficiency. This feature is advertised to decrease power at the same frequency, thus improving efficiency. If you don't need such clock speeds, you can clock the GPU down as low as you want and the idle power is also reduced. DCC then further reduces power.

You say, because apparently only gaming is included in what you consider for efficiency, it won't do anything for gaming since that uses those higher clock speeds, I can't comment on that because I only have this conceptual slide about DCC. So I don't know at which frequencies this becomes useful, but I could see it being used in Android games.

Lastly, your battery comparison is wrong because a larger battery doesn't reduce the amount of power you need for a certain amount of performance.

Edit: I just looked back at the post I quoted, and I don't see where you mention gaming.

No it's not. Efficiency is performance/watt, and when idling you're obviously not doing any work and thus your performance is zero, and by extension your efficiency is also zero. It doesn't matter how low your power usage goes, your efficiency is still zero, since zero divided by x is always zero, no matter how low x goes.

DCC helps with idle power usage, not efficiency.

At least you now agree that it increases efficiency for certain workloads that are not games. BTW, you're obviously right that when the GPU is shut off efficiency is zero, but that's just an exercise in cherry picking a small enough time scale (within the DCC) to average zero efficiency; take more than 87.5% of second and your argument becomes meaningless.

antihelten · Aug 18, 2014

witeken said:
Okay, so we basically agree. What I don't agree with is that it doesn't improve efficiency. This feature is advertised to decrease power at the same frequency, thus improving efficiency. If you don't need such clock speeds, you can clock the GPU down as low as you want and the idle power is also reduced. DCC then further reduces power.

You say, because apparently only gaming is included in what you consider for efficiency, it won't do anything for gaming since that uses those higher clock speeds, I can't comment on that because I only have this conceptual slide about DCC. So I don't know at which frequencies this becomes useful, but I could see it being used in Android games.

Of course I only consider gaming for efficiency, when that was what started this whole discussion (or to be more exact, running a graphics workload using DX12 was what started it), with you speculating about a 50% improvement in efficiency from architectural improvements (in this post: http://forums.anandtech.com/showpost.php?p=36621037&postcount=106).

witeken said:
Lastly, your battery comparison is wrong because a larger battery doesn't reduce the amount of power you need for a certain amount of performance.

Neither does DCC in and of itself, instead it lowers the amount of power you use when you don't need any performance, but don't want to put the GPU into sleep mode (since you might be needing performance soon, and can't afford the latency penalty).

And since idling usually involves period of not needing any performance with intermittent periods of needing a small amount of performance, a method for rapidly powering the GPU on an off, like what DCC does, is useful.

witeken · Aug 18, 2014

antihelten said:
Of course I only consider gaming for efficiency, when that was what started this whole discussion (or to be more exact, running a graphics workload using DX12 was what started it), with you speculating about a 50% improvement in efficiency from architectural improvements (in this post: http://forums.anandtech.com/showpost.php?p=36621037&postcount=106).

Okay, so let me get this straight. If you read from my post on, which indeed initiated this discussion about efficiency, you'll see that I never responded to the DCC discussion until this point, so I never claimed that DCC will be one of the things that will improve Gen8's efficiency (BTW, although I started this because of the DirectX 12 demo, Gen8 doesn't only apply to gaming workloads, although those would miss the DirectX improvement). I made an in my opinion reasonable assumption based on what I've seen that Gen8 will be 50% more efficient and we discussed that.

DCC was first brought up by bullzz, who said exactly the same as you; he never claimed improvements in gaming (I do speculate it might improve efficiency in certain games). You quoted him, basically reiterating his claim. Homeles then responded with a claim which I agree with, and the discussion got started, with you posting your posts (which I quoted) where you deny that DCC gives an improvement in performance per watt.

Neither does DCC in and of itself, instead it lowers the amount of power you use when you don't need any performance, but don't want to put the GPU into sleep mode (since you might be needing performance soon, and can't afford the latency penalty).

When you don't need the GPU, any SoC that is well designed will power gate it. DCC is a legitimate way to reduce power at a certain (averaged) range of clock speeds.

And since idling usually involves period of not needing any performance with intermittent periods of needing a small amount of performance, a method for rapidly powering the GPU on an off, like what DCC does, is useful.

Not only at idle, but also other clock speeds in the "inefficient" part of the frequency range...

antihelten · Aug 18, 2014

witeken said:
Okay, so let me get this straight. If you read from my post on, which indeed initiated this discussion about efficiency, you'll see that I never responded to the DCC discussion until this point, so I never claimed that DCC will be one of the things that will improve Gen8's efficiency (BTW, although I started this because of the DirectX 12 demo, Gen8 doesn't only apply to gaming workloads, although those would miss the DirectX improvement). I made an in my opinion reasonable assumption based on what I've seen that Gen8 will be 50% more efficient and we discussed that.

DCC was first brought up by bullzz, who said exactly the same as you; he never claimed improvements in gaming (I do speculate it might improve efficiency in certain games). You quoted him, basically reiterating his claim. Homeles then responded with a claim which I agree with, and the discussion got started, with you posting your posts (which I quoted) where you deny that DCC gives an improvement in performance per watt.

Technically DCC was first brought up by me (http://forums.anandtech.com/showpost.php?p=36623388&postcount=112) and not bullzz. I did so to specifically point out that it would only have an effect on idle power usage and not on a gaming load. Bullzz then basically reiterated this, albeit in more detail, and I replied to that, again stressing that it would only affect idle power usage (reiterating a reiteration I guess

), now I'll admit that I didn't clearly state at this point that my discussion of efficiency was still in the context of gaming, which is probably what started this entire tangent discussion.

witeken said:
When you don't need the GPU, any SoC that is well designed will power gate it. DCC is a legitimate way to reduce power at a certain (averaged) range of clock speeds.

True but even when idling you generally still need the GPU, if for nothing else, at least to drive the display controller. So as long as the display is on you can't really power gate the GPU (unless you have something like PSR).

witeken said:
Not only at idle, but also other clock speeds in the "inefficient" part of the frequency range...

It is my impression from Intels slides that DCC only comes into play when you've already gone down to threshold voltage, which I would imagine would only occur at idle (or very close to it).

IntelUser2000 · Aug 25, 2014

antihelten said:
It is my impression from Intels slides that DCC only comes into play when you've already gone down to threshold voltage, which I would imagine would only occur at idle (or very close to it).

Based on that slide about "Inefficient Region/Efficient Region" and previous slides I've seen, that's not the case.

Similar slide was shown before when describing the benefits of GT3 at lower clocks vs GT2 at higher clocks. Basically, HD 4400 @ 15W vs HD 5000 @ 15W. The latter performs 10-20% better than former but at same power. That's rather high clocks, with GT2 @ ~1GHz and GT3 @ 600-700MHz or so(based on some user/professional reviews). We can assume based on that 600-700MHz clocks needed to run GT3 at 10-20% better performance than GT2 is close to that inefficient/efficient region.

Remember they never said "threshold/below threshold", but efficient/inefficient. Below the region all the way down to threshold voltage the leakage dominates so scaling frequency down below that level makes it inefficient in terms of perf/watt.

DCC makes "efficient" scaling possible without messing around with voltage, because not all 3D applications need GPU @ 100%.

True but even when idling you generally still need the GPU, if for nothing else, at least to drive the display controller.

I don't think Intel is talking about 2D clocks when talking about DCC, like when driving the display. Seperate 2D and 3D blocks have been in existence since ages ago. 3D blocks can be completely power gated during that instance. Perf/watt issues were never about 2D, it was always about 3D, because that's where the real demand is. I can tell you with display idling even my Sandy Bridge iGPU is power gated and never active.

instead it lowers the amount of power you use when you don't need any performance, but don't want to put the GPU into sleep mode (since you might be needing performance soon, and can't afford the latency penalty).

You gotta remember since Haswell Intel can change the frequency so fast thanks to FiVR(think in between frames) that even in most 3D workloads it can scale down clocks in comparatively less intensive scenarios to boost performance in more demanding ones. Now with Broadwell when they need even less GPU power they can scale down clocks leakage-be-damned.

antihelten · Aug 26, 2014

IntelUser2000 said:
Based on that slide about "Inefficient Region/Efficient Region" and previous slides I've seen, that's not the case.

Similar slide was shown before when describing the benefits of GT3 at lower clocks vs GT2 at higher clocks. Basically, HD 4400 @ 15W vs HD 5000 @ 15W. The latter performs 10-20% better than former but at same power. That's rather high clocks, with GT2 @ ~1GHz and GT3 @ 600-700MHz or so(based on some user/professional reviews). We can assume based on that 600-700MHz clocks needed to run GT3 at 10-20% better performance than GT2 is close to that inefficient/efficient region.

Remember they never said "threshold/below threshold", but efficient/inefficient. Below the region all the way down to threshold voltage the leakage dominates so scaling frequency down below that level makes it inefficient in terms of perf/watt.

DCC makes "efficient" scaling possible without messing around with voltage, because not all 3D applications need GPU @ 100%.

I don't think Intel is talking about 2D clocks when talking about DCC, like when driving the display. Seperate 2D and 3D blocks have been in existence since ages ago. 3D blocks can be completely power gated during that instance. Perf/watt issues were never about 2D, it was always about 3D, because that's where the real demand is. I can tell you with display idling even my Sandy Bridge iGPU is power gated and never active.

You gotta remember since Haswell Intel can change the frequency so fast thanks to FiVR(think in between frames) that even in most 3D workloads it can scale down clocks in comparatively less intensive scenarios to boost performance in more demanding ones. Now with Broadwell when they need even less GPU power they can scale down clocks leakage-be-damned.

You can find the actual talk here: http://intelstudios.edgesuite.net/140811_intel/event.html

They start talking about DCC and the slide in question at about 01:08:56. If you listen to the talk, it becomes clear that the dividing factor between the "efficient" region and the "inefficient" region, is the fact that the voltage has been lowered to Vmin. Now obviously Vmin is not quite the same as threshold voltage, but even then Vmin should still only come into play when idling.

Besides, the Anandtech preview also makes it quite clear that DCC is something that is aimed at idle power usage.

IntelUser2000 · Aug 26, 2014

antihelten said:
Now obviously Vmin is not quite the same as threshold voltage, but even then Vmin should still only come into play when idling.

Let me take you to another presentation about Vmin.

Spring IDF 2013 titled "The HD Graphics Architecture in the New World of Low Power Computing".

The first slide gives a brief explanation about what the "Vmin" really is.

The second slide explicitely refers to graphics, "1x size" vs "2x size". There's yet another slide about using "Fmax @ Vmin" and race-to-halt to save power.

It doesn't seem ambiguous to me that GT3 is the "2x size @ Vmin" and GT2 is "1x size".

Also, GPUs don't have only two states, that is "Full Load" and "Zero Load". We are way beyond that. And the FiVR allows switching to be fast enough so in cases where the frame is more CPU-bound and less GPU-bound you can ramp the clocks down for GPU so the CPU can use the extra power. And if the frequency needed is below "Fmax @ Vmin" now with Broadwell you can save credible power rather than achieving nothing.

antihelten · Aug 27, 2014

IntelUser2000 said:
Let me take you to another presentation about Vmin.

Spring IDF 2013 titled "The HD Graphics Architecture in the New World of Low Power Computing".

The first slide gives a brief explanation about what the "Vmin" really is.

The second slide explicitely refers to graphics, "1x size" vs "2x size". There's yet another slide about using "Fmax @ Vmin" and race-to-halt to save power.

It doesn't seem ambiguous to me that GT3 is the "2x size @ Vmin" and GT2 is "1x size".

Also, GPUs don't have only two states, that is "Full Load" and "Zero Load". We are way beyond that. And the FiVR allows switching to be fast enough so in cases where the frame is more CPU-bound and less GPU-bound you can ramp the clocks down for GPU so the CPU can use the extra power. And if the frequency needed is below "Fmax @ Vmin" now with Broadwell you can save credible power rather than achieving nothing.

I'm not really sure how any of that relates to my claim that Vmin primarily comes in to play at idle or close to idle.

First of your slides doesn't actually explain what Vmin is, but rather what the impact of Vmin is with regards to scaling power use (and performance). The existence of a Vmin simply means that there is a region in which you can scale power usage by adjusting voltage (this region being whenever the current voltage is above Vmin), and a region where you can no longer scale power usage by adjusting voltage (since your current voltage is equal to Vmin).

DCC is only intended for the latter region, since you can only scale power usage linearly with frequency here, but frequency doesn't affect leakage, whereas DCC does and thus DCC is more efficient than just using frequency scaling.

In the region where voltage scaling is still possible (i.e. above Vmin) you wouldn't want to use DCC since it is less efficient than scaling voltage. this being due to the fact that power scales linearly with DCC usage (if DCC is on for x% of the time you reduce your power usage with x%), but with the square or cube of voltage for dynamic power and leakage respectively.

Now the question is then in which of the 2 regions you find yourself when idling and where you find yourself when under load (any amount of load). I would dare to argue that the region where DCC comes into play (Vmin region), is only relevant when idling or very close to idling (i.e. loads so low that they wouldn't really occur when gaming), since this is what has been indicated by all of the sites I have read (including anandtech).

witeken · Aug 27, 2014

You missed the part "Also, GPUs don't have only two states, that is "Full Load" and "Zero Load"".

Look at the power vs performance graph of the second image of second slide.

antihelten · Aug 27, 2014

witeken said:
You missed the part "Also, GPUs don't have only two states, that is "Full Load" and "Zero Load"".

Look at the power vs performance graph of the second image of second slide.

No I didn't miss it, I ignored it. I never claimed that GPUs only have two states, so I don't know why inteluser2000 pointed it out.

I only claimed that a GPU can and does idle and it is during this idling that DCC kicks in, and by extension that this would mean that for any kind of load relevant for gaming, DCC does not come in to play.

Intel Broadwell Thread

Member

Diamond Member

Senior member

Golden Member

Golden Member

Diamond Member

Lifer

Platinum Member

Golden Member

Lifer

Platinum Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Elite Member

Golden Member

Elite Member

Golden Member

Diamond Member

Golden Member