CPU Thermal wall? So much for the GHz race.

wand3r3r · Jul 23, 2013

Just noting progress, first it began with the early chips, clock speeds increasing rapidly (P4 race to 10 GHz

).

Enter the Nehalem, Sandy/Ivy bridge, Haswell.
Nehalem clocked great, Sandy even better. Ivy bridge has been relatively worse, certainly no better at clocks, and Haswell appears to be another significant drop.

At the current rate as the process is getting smaller we are loosing overclockability as heat is becoming an issue. There is little to no incentive to upgrade for owners of the previous 3 gens (Haswell, Ivy, Sandy are all so similar imo). Then the extreme high end is lagging a generation behind in IPC (ivy E coming soon).

If we can no longer rely on clock increases, and IPC seems pretty hard to increase dramatically maybe they will finally start going massively parallel (many cores?).

What are your guys thoughts?

sm625 · Jul 23, 2013

The biggest computing power sink has to be video related tasks. These can be parallelized. So look for more and more compute units, stream processors, whatever. The cpu core execution units can be made wider. Beyond that there is a serious storage bottleneck, that is actually one of the biggest reason mobile/ARM is overtaking x86 so rapidly. ARM solutions almost universally have close-to-the-die low latency and low cost storage, and their OSes arent so bloated as to require excess space. PCs and notebooks have to go thru 3 levels of cache, a PCH, a PCIe controller, a SATA controller, and finally a NAND controller. All that latency, power consumption, and complexity to do what should be close to the die like RAM.

BallaTheFeared · Jul 23, 2013

When you're looking at the heat you need to consider the program and thus the performance it's outputting.

I think we're talking Prime95/Linpack heat, seems to be the trend anyways.

Nehalem - 3.6GHz C0, 4GHz D0 seems fair? You're talking 60 GFLOPs on a quad core.

Sandy Bridge 4.4GHz 115 GFLOPs, 5GHz 130 GFLOPs

Ivy Bridge more of the same.

Haswell 4.3GHz roughly 200 GFLOPs.

Clock speed was always a dead end (P4, Bulldozer, Steamroller), perf/w scales poorly with clock speed. Instructions and IPC are the future, lower clocks, higher performance, lower power is where we're heading.

MisterMac · Jul 23, 2013

How are new instructions in a lagging slow eco system going to help without managed languages - which in many cases negate the specialized instructions in the first place?

Instructions cannot be the answer.

cytg111 · Jul 23, 2013

Thats all well and all, but IMO that just paints the same picture, same graph, plotted at tad bit higher coordinates. Issue is going to be the same.
The only way out is a parallel one and we still havent got the magic beans to do so.
What could raipidly increase single threaded performance as we know it today? Optics? Quantum? New uarcs ? I dont know

thegimp03 · Jul 23, 2013

sm625 said:
The biggest computing power sink has to be video related tasks. These can be parallelized. So look for more and more compute units, stream processors, whatever. The cpu core execution units can be made wider. Beyond that there is a serious storage bottleneck, that is actually one of the biggest reason mobile/ARM is overtaking x86 so rapidly. ARM solutions almost universally have close-to-the-die low latency and low cost storage, and their OSes arent so bloated as to require excess space. PCs and notebooks have to go thru 3 levels of cache, a PCH, a PCIe controller, a SATA controller, and finally a NAND controller. All that latency, power consumption, and complexity to do what should be close to the die like RAM.

I remember back in the late 90s/early 2000s, reading Maximum PC articles talking about the race to hitting 10 ghz. Funny how almost 15 years later, even highly overclocked chips are barely even halfway there.

cytg111 · Jul 23, 2013

thegimp03 said:
I remember back in the late 90s/early 2000s, reading Maximum PC articles talking about the race to hitting 10 ghz. Funny how almost 15 years later, even highly overclocked chips are barely even halfway there.

- Except it's not funny

Sheep221 · Jul 23, 2013

thegimp03 said:
I remember back in the late 90s/early 2000s, reading Maximum PC articles talking about the race to hitting 10 ghz. Funny how almost 15 years later, even highly overclocked chips are barely even halfway there.

Sometimes I think that new AMDs chips with 5GHz stock clock are step backward technologically, the increasing frequency greatly decreases IPC.

NTMBK · Jul 23, 2013

Sheep221 said:
Sometimes I think that new AMDs chips with 5GHz stock clock are step backward technologically, the increasing frequency greatly decreases IPC.

The IPC of the 5GHz Piledriver FX parts is the same as the IPC on the 4GHz parts.

sefsefsefsef · Jul 23, 2013

NTMBK said:
The IPC of the 5GHz Piledriver FX parts is the same as the IPC on the 4GHz parts.

That's not how IPC works. Unless your application's working set is entirely cache resident (which most interesting applications aren't), then IPC goes down as frequency goes up.

AtenRa · Jul 23, 2013

sefsefsefsef said:
then IPC goes down as frequency goes up.

Dont think so, IPC fluctuates, it can go down or up with frequency. It really depends on the architecture of the CPU but on average, IPC it should go up with higher frequencies.

Sheep221 · Jul 23, 2013

NTMBK said:
The IPC of the 5GHz Piledriver FX parts is the same as the IPC on the 4GHz parts.

If 5GHz has same IPC as 4GHz CPU than it is no point to buy a 5GHz, because it will only offer small performance increase over 4GHz but will require much more power and will generate much more heat. Too high frequencies are causing temps and power consumption to rapidly increase, which is opposite effect of good IPC.
IPC means instructions per clock, eg. set amount of instructions done per 1Hz by your CPU, the higher the IPC the higher the performance. IPC is however defined by architecture and design and it's relative to CPU's stock TDP. Overclocking, selling higher binned CPUs and more are things, that increase frequency and therefore the power consumption yet they don't increase ability of the CPU's integrals inside to work faster on their own because their mechanics remained unchanged, which means by increased frequency your CPU works like 10% faster, but it's power draw is increased by 50% or more, this means your IPC decreased by roughly 40%.
For example Pentium 4 and Bulldozer were architectures with very bad raw IPC.

Phynaz · Jul 23, 2013

Nothing new...

http://forums.anandtech.com/showpost.php?p=33151036&postcount=33

NTMBK · Jul 23, 2013

Sheep221 said:
If 5GHz has same IPC as 4GHz CPU than it is no point to buy a 5GHz, because it will only offer small performance increase over 4GHz but will require much more power and will generate much more heat. Too high frequencies are causing temps and power consumption to rapidly increase, which is opposite effect of good IPC.
IPC means instructions per clock, eg. set amount of instructions done per 1Hz by your CPU, the higher the IPC the higher the performance. IPC is however defined by architecture and design and it's relative to CPU's stock TDP. Overclocking, selling higher binned CPUs and more are things, that increase frequency and therefore the power consumption yet they don't increase ability of the CPU's integrals inside to work faster on their own because their mechanics remained unchanged, which means by increased frequency your CPU works like 10% faster, but it's power draw is increased by 50% or more, this means your IPC decreased by roughly 40%.
For example Pentium 4 and Bulldozer were architectures with very bad raw IPC.

I'm well aware of what IPC means.

If IPC remains constant and you increase clock from 4GHz to 5GHz, then you are raising the clock speed by 25% and hence should see a roughly 25% performance improvement.

In the real world of course you are limited by memory bandwidth, as sefsefsef correctly pointed out- without also overclocking your northbridge and RAM, your IPC will drop a bit.

SlowSpyder · Jul 23, 2013

If your cache/IMC speeds scale up with your overclocking, assuming enough memory bandwidth, should IPC be pretty close to the same? What makes IPC change with clockspeed (assuming all hardware needed to feed the cores is always fast enough)?

Also, is this an 'on paper' change that isn't really noticed? Or is this a practical difference where you get to a point where there is no point in adding clockspeed because IPC drops so much (assuming you're not power/thermal limited)?

*edit - I did some benching for IDC a couple of times, and in the program I tested just bumping the multiplier on my Phenom did have diminishing returns. But that was because I left my NB/L3 at the stock setting, so as the cores went upwards with their clocks the NB/L3 couldn't feed the cores fast enough (my semi educated guess). My system scaled better by bumping the NB/L3, if I remember right. I want to know if we're talking about that kind of IPC decrease, or if something I'm not familiar with slows down the cores as they increase their clockspeed. This is just for my own knowledge... thanks.

LegSWAT · Jul 23, 2013

cytg111 said:
What could raipidly increase single threaded performance as we know it today? Optics? Quantum? New uarcs ? I dont know

Graphene Thermal Layers&Vias

Concillian · Jul 23, 2013

wand3r3r said:
Just noting progress, first it began with the early chips, clock speeds increasing rapidly (P4 race to 10 GHz ).

Enter the Nehalem, Sandy/Ivy bridge, Haswell.
Nehalem clocked great, Sandy even better. Ivy bridge has been relatively worse, certainly no better at clocks, and Haswell appears to be another significant drop.

At the current rate as the process is getting smaller we are loosing overclockability as heat is becoming an issue. There is little to no incentive to upgrade for owners of the previous 3 gens (Haswell, Ivy, Sandy are all so similar imo). Then the extreme high end is lagging a generation behind in IPC (ivy E coming soon).

If we can no longer rely on clock increases, and IPC seems pretty hard to increase dramatically maybe they will finally start going massively parallel (many cores?).

What are your guys thoughts?

Heat is not the issue we've seen with IB and Haswell... temperature is. They are related, but there is a difference. Especially when discussing a topic that is necessarily going to get into the material science.

If you define your issue more clearly it's that Haswell is reaching the same distance to Tj (temperature) at a lower total heat output.

This is partly related to process shrink, partly to TIM, but partly to the design decisions Intel has made. They are under pressure to make mobile perform well, and they certainly have moved the design to a point where the optimal voltage is lower than it has been in the past. I believe that to be a conscious decision on Intel's part to get better power usage at lower powers. Think about the different processes that TSMC offers:

TSMCs 28nm process offering includes 28nm High Performance (28HP), 28nm High Performance Low Power (28HPL), 28nm Low Power (28LP), and 28nm High Performance Mobile Computing (28HPM)

It's clearly possible to tweak design and process parameters to adjust the practical power range for a given process. I'd be pretty damn surprised if they haven't intentionally shifted IB and Haswell to be lower on this power target spectrum than they targeted for previous designs. That kind of thing will steepen the voltage to frequency relationship at the upper end of the frequency spectrum, and that's exactly what we see with IB and Haswell.

sefsefsefsef · Jul 23, 2013

SlowSpyder said:
If your cache/IMC speeds scale up with your overclocking, assuming enough memory bandwidth, should IPC be pretty close to the same? What makes IPC change with clockspeed (assuming all hardware needed to feed the cores is always fast enough)?

First of all, I need to make clear that "IPC" is not an inherent feature of a CPU. IPC is a function of a particular program running on a particular CPU at a particular clock frequency. For example ...

Let's say that you have a CPU running at 1.0 GHz, and in order to run a particular program it takes 10 seconds to complete. Let's say that of those 10 seconds, 5 seconds are spent where the CPU is busy and performing actual calculations, and the other 5 seconds are spent waiting on DRAM latency. DRAM latency is definitely not going to improve as you increase the CPU frequency. Let's call the IPC of this 1.0 GHz running this program 1.0 IPC.

If you increase the frequency of this CPU to 2.0 GHz then this program will now complete in 7.5 seconds, or in other words 25% faster. The 5 seconds spent where the CPU is actually busy has been halved to 2.5 seconds, but the 5 seconds waiting for DRAM latency is unchanged. Now the IPC of running this program is 0.667 IPC.

At 3.0 GHz the program finishes in 6.667 seconds, for an IPC of 0.5 IPC. At 4.0 GHz the program finishes in 6.25 seconds, for an IPC of 0.4 IPC. As you can see, that the trend is that as you increase frequency more and more, the improvement in IPC goes down and down. If you had an infinitely fast CPU, then executing this program would still take 5 seconds, or an IPC of 0.0 IPC.

Just to drive this point home: as clock frequency increases, then often cache bandwidth increases, DRAM *bandwidth* might increase, cache latency might keep up, but DRAM latency most definitely will not keep up, and this is a major source of slowdown in computer programs. My example was maybe a little silly, where 50% of program execution was spent waiting for DRAM, but it illustrates the point. As frequency goes up, IPC always goes down (unless there are 0 cache misses in the program).

EDIT: sorry guys, I made a mistake with my IPC calculations. They're fixed in the text now.

AtenRa · Jul 23, 2013

sefsefsefsef said:
As frequency goes up, IPC always goes down (unless there are 0 cache misses in the program).

Scaling goes down, IPC goes up (see your numbers and my chart posted above)

cbn · Jul 23, 2013

wand3r3r said:
At the current rate as the process is getting smaller we are loosing overclockability as heat is becoming an issue. There is little to no incentive to upgrade for owners of the previous 3 gens (Haswell, Ivy, Sandy are all so similar imo). Then the extreme high end is lagging a generation behind in IPC (ivy E coming soon).

If we can no longer rely on clock increases, and IPC seems pretty hard to increase dramatically maybe they will finally start going massively parallel (many cores?).

What are your guys thoughts?

Until Intel widens their cores and/or changes their process.....I think at the high end what you are saying is basically true.

But at the budget level (ie, simpler, lower cost desktop systems) there is still lots to gain.

A good example is the single core Intel LGA cpus (Celeron G465, G470 etc) and the dual cores LGA cpus (Celeron, Pentium and i3). Clocks on those, particularly Celeron and Pentium, are still quite low compared to the K series quad cores.

So Intel still has plenty of room to improve value (and the desire to upgrade) for desktop by boosting clocks.....but I think its mainly at the budget desktop level this will happen.

sefsefsefsef · Jul 23, 2013

AtenRa said:
Scaling goes down, IPC goes up (see your numbers and my chart posted above)

OOPS, I made a silly mistake in my calculations. They're fixed now. IPC goes down. Thanks for drawing my attention to it.

AtenRa · Jul 23, 2013

I will just say that, you cannot have ZERO IPC. That zero you measure is translated in to instantaneous computation, it take zero time to calculate the task.

What you measure is not IPC.

sefsefsefsef · Jul 23, 2013

AtenRa said:
I will just say that, you cannot have ZERO IPC. That zero you measure is translated in to instantaneous computation, it take zero time to calculate the task.

What you measure is not IPC.

IPC = Instructions Per Cycle = Instructions / Cycles

A hypothetical CPU with infinite "speed" (by which I meant clock rate, sorry if this caused any confusion) will have an IPC of 0.0, because finite instructions / infinite clock ticks = 0.0. This is true unless every instruction takes 1 or less cycles to complete, which can't be the case in the presence of off-chip memory accesses.

Anyway, the point is that the latency of off-chip memory accesses doesn't scale with clock speed, and because of this, IPC always goes down when frequency goes up.

cytg111 · Jul 23, 2013

I would argue that for a given CPU it has constant IPC.
Given, if the subsystem, mainboard and off die RAM is unable to provide it with the juice it needs to perform, the compute output will be less.. but off die ram, board and busses really have less to do with a given CPU and its "IPC".

think, some vendors sell computers with just one dimm installed while the product is clearly intended for dual channel operation, does that mean the cpu has lesser ipc? no, it's just not fed.

If this is not true, we can stop talking about IPC for CPUs alltogether cause it makes zero sense, then we need to talk about IPC of a given hardware combination.
Clearly not what the term "IPC" is used for.

VirtualLarry · Jul 23, 2013

Sheep221 said:
therefore the power consumption yet they don't increase ability of the CPU's integrals inside to work faster on their own because their mechanics remained unchanged, which means by increased frequency your CPU works like 10% faster, but it's power draw is increased by 50% or more, this means your IPC decreased by roughly 40%.

No, it doesn't. Wattage does not factor into IPC. At all.

If you want to talk performance/watt, then yes, that is a factor. But not in terms of IPC.

CPU Thermal wall? So much for the GHz race.

Diamond Member

Diamond Member

Diamond Member

Senior member

Lifer

Diamond Member

Lifer

Golden Member

Lifer

Senior member

Lifer

Golden Member

Lifer

Lifer

Lifer

Member

Diamond Member

Senior member

Lifer

Lifer

Senior member

Lifer

Senior member

Lifer

No Lifer