CPU Thermal wall? So much for the GHz race.

Idontcare · Jul 23, 2013

sefsefsefsef said:
That's not how IPC works. Unless your application's working set is entirely cache resident (which most interesting applications aren't), then IPC goes down as frequency goes up.

Your message got confused by the audience because you are conflating theoretical IPC (capability) with realized IPC (actual).

The IPC of a microarchitecture is invariant to clockspeed. That is why the acronym itself reflects that fact the rate of instructions are being normalized to the clock.

But other aspects of the supporting compute topology will negatively impact the scaling of realized IPC as a function of the absolute clock...but that is not a technical aspect of the microarchitecture per se.

The realized IPC of my 2600K varies whether I have single-channel DDR3-800 ram or dual-channel DDR3-2133 ram. That doesn't mean the IPC of the core microarchitecture of my Sandy Bridge CPU is changing, which is fixed and doesn't change.

sefsefsefsef · Jul 24, 2013

I understand every word that you're saying, IDC, but I just disagree with the usage of "IPC" here. I use the term as my fellow computer architects use the term, not as overclockers/hobbyists use it. In my background I study the effects of the cache and main memory subsystems on IPC, so it's impossible for me to separate the two. IPC has a very specific meaning to architects, and it does not mean the same thing as it is commonly used around here, I guess, which is just "how 'good' the architecture is."

escrow4 · Jul 24, 2013

Either way, at least you won't spend much upgrading anymore. I'm pretty sure I'll run this 3930K into the ground before I'll "have" to upgrade, even at a rather low 4.1GHz. Unless some port is magically faster on Haswell next year gaming wise.

AtenRa · Jul 24, 2013

sefsefsefsef said:
IPC = Instructions Per Cycle = Instructions / Cycles

A hypothetical CPU with infinite "speed" (by which I meant clock rate, sorry if this caused any confusion) will have an IPC of 0.0, because finite instructions / infinite clock ticks = 0.0. This is true unless every instruction takes 1 or less cycles to complete, which can't be the case in the presence of off-chip memory accesses.

Anyway, the point is that the latency of off-chip memory accesses doesn't scale with clock speed, and because of this, IPC always goes down when frequency goes up.

sefsefsefsef said:
Let's say that you have a CPU running at 1.0 GHz, and in order to run a particular program it takes 10 seconds to complete.

And thats where the problem starts,

IPC = Instructions Per Cycle

Now, most of the people take IPC as the single Core performance of the CPU. Thats not what IPC is. IPC is the Execution capacity(instructions) of the CPU Core. Its how many Instructions the Execution resources (ALUs etc) of the Core can Execute per Cycle.

For example, if the CPU Core has 2x Integer Execution Units(ALUs) , then its theoretical maximum IPC will be two(2). If it has 4x Integer Execution Units its Maximum theoretical IPC will be 4.

IPS = Instructions Per Seconds

IPS = IPC x F (where F = Frequency)

One Hz = 1Cycle per Second

or F= 1/T (where F = Frequency and T = Time in Seconds)

1 gigahertz = 1000000000 hertz

Now lets see what you said,

At 1GHz it takes 10 seconds to finish the work, 5 secs for Memory and 5 secs for the execution.

So, the execution at 1GHz takes 5secs.

Then you said that at 2GHz it takes 7.5secs to finish the work(performance), at which 5 secs are for the memory and 2.5secs for the execution.

So, at 1GHz it takes 5secs to execute and at 2GHz it takes 2.5secs to execute the same work, that means that IPC is constant and you halved the time because you doubled the frequency, remember that IPS = IPC x F.

Then again you said that at 4GHz it takes 6.25secs to finish the program. Of that, 5 secs is for memory and 1.25 secs is for the execution at 4GHz. That again means the IPC is constant and you halved the time it needed to execute because you increased once again (double) the Frequency, so it needs half the time to execute the same work.

Main memory access is a constant and it doesnt affect IPC at higher CPU Frequencies. If it needs 70ns to access the main memory at 1GHz, it will also need 70ns to access the main memory at 2 or 4GHz. That means that if your IPC is 1.2 at 1GHz and you are memory access constraint, your IPC will still be 1.2 at 4GHz because of the main memory access time. But, your performance will increase because you have raised the frequency.

So, what you measuring in your paradigm is not IPC. IPC doesnt go down with frequency. IPC is actually increasing with higher frequency, you can clearly see that in the chart I posted.

Bellow is the IPC measurements of Intel Core 2 Duo E6400 in SPEC CPU2006 and CPU2000, we can clearly see that its average IPC is bellow two(2). Each Intel Core 2 Duo CPU Core has 3x ALUs.

AtenRa · Jul 24, 2013

OT,

When JFAMD said that Bulldozer IPC was going up, he was talking about IPC and not anything else. IPC in Bulldozer is higher than Phenom, but ignorance and haters/fanboys taken IPC as Single Core performance or performance/frequency and they show a decline.

Bulldozer and Piledriver HAVE HIGHER IPC than Phenom, CPU Performance per Frequency may be lower but IPC was up(SPEC 2006).

Idontcare · Jul 24, 2013

AtenRa said:
OT,

When JFAMD said that Bulldozer IPC was going up, he was talking about IPC and not anything else. IPC in Bulldozer is higher than Phenom, but ignorance and haters/fanboys taken IPC as Single Core performance or performance/frequency and they show a decline.

Bulldozer and Piledriver HAVE HIGHER IPC than Phenom, CPU Performance per Frequency may be lower but IPC was up(SPEC 2006).

Even John admitted he was wrong. History revisionism isn't going to work here.

sefsefsefsef · Jul 24, 2013

I don't know why I didn't think of this earlier.

http://en.wikipedia.org/wiki/Instructions_per_cycle

The number of instructions executed per clock is not a constant for a given processor; it depends on how the particular software being run interacts with the processor, and indeed the entire machine, particularly the memory hierarchy.

All I'm trying to get across is that what you guys mean when you talk about "IPC" is *literally* not what IPC means. IPC means what I've been talking about and what the Wikipedia article talks about.

Also, AtenRa, I'm sorry to say your calculations are just wrong. Even in the fantasy scenario where memory accesses don't count toward IPC, IPC would just remain constant, it wouldn't increase as frequency increases. Instructions Per Second (IPS) would increase in that scenario, but not IPC. Indeed, whenever you increase clock speed you do in fact increase IPS, which is what most people are thinking of when they think "performance," but this is not what IPC means.

AtenRa · Jul 24, 2013

Idontcare said:
Even John admitted he was wrong. History revisionism isn't going to work here.

Can you post that plz ?? thx

AtenRa · Jul 24, 2013

sefsefsefsef said:
I don't know why I didn't think of this earlier.

http://en.wikipedia.org/wiki/Instructions_per_cycle

All I'm trying to get across is that what you guys mean when you talk about "IPC" is *literally* not what IPC means. IPC means what I've been talking about and what the Wikipedia article talks about.

IPC is application depended yes, but same CPU architecture will have the same IPC in the same application. FX6300 will have almost the same IPC as FX8350 in the same application. FX8350 may have a little higher IPC than FX6300 due to higher Frequency.

Performance = IPS = IPC x F(Frequency)

sefsefsefsef said:
Also, AtenRa, I'm sorry to say your calculations are just wrong. Even in the fantasy scenario where memory accesses don't count toward IPC, IPC would just remain constant, it wouldn't increase as frequency increases.

First of all i didnt say memory access is not affecting IPC, i just said that Main Memory access latency will remain the same at 70ns if the processor is at 1Ghz or at 4GHz. In your scenario it was always at 5secs.

Secondly, i just show you in your scenario that IPC will remain constant because the Memory access latency always remains the same. What your paradigm was showcasing was IPS and not IPC. The moment you talked about time you where talking about performance(IPS) and not IPC.

You even came to the conclusion that IPC will be ZERO at some point, thats also not true.

sefsefsefsef said:
Instructions Per Second (IPS) would increase in that scenario, but not IPC. Indeed, whenever you increase clock speed you do in fact increase IPS, which is what most people are thinking of when they think "performance," but this is not what IPC means.

Yeap, IPC = Instructions Per (CPU)Cycle. But you said that IPC will decrease while you raising the Frequency. Thats not true, IPC will increase with higher frequency.

There is only one IPC, and that(Theoretical Maximum) cannot be higher than the number of the Execution Units of the CPU Core.

SlowSpyder · Jul 24, 2013

AtenRa said:
Can you post that plz ?? thx

Maybe IDC is thinking of another post, but I remember his 'goodbye' (if you want to call it that) after BD launched.

KCfromNC · Jul 24, 2013

AtenRa said:
First of all i didnt say memory access is not affecting IPC, i just said that Main Memory access latency will remain the same at 70ns if the processor is at 1Ghz or at 4GHz. In your scenario it was always at 5secs.

5 seconds is 5 seconds. But 5 seconds isn't always the same numbers of cycles. 5 secs at 4GHz vs. 1GHz is a 4x cycle count difference (plus or minus synchronization artifacts). Since you are dividing by C in IPC, that larger C means smaller IPC for a given workload. It's all in the original example sefsefsefsef gave.

sushiwarrior · Jul 24, 2013

AtenRa said:
Yeap, IPC = Instructions Per (CPU)Cycle. But you said that IPC will decrease while you raising the Frequency. Thats not true, IPC will increase with higher frequency.

There is only one IPC, and that(Theoretical Maximum) cannot be higher than the number of the Execution Units of the CPU Core.

IPC should never increase with higher frequency. Realized or actual. At best it stays constant (ALU executes operations faster, decode happens faster, cache operates faster, memory controller operates faster), but it never gets better. I see no logical way how any of those operations, operating at a linearly faster speed, will gain more than linear amounts of performance.

CPU completes 50 instructions at 100HZ in 1 second, CPU can at best complete 100 instructions at 200Hz in 1 second, but never more, and probably less. Why? Because ALL operations of the CPU increase by 2x, nothing can possibly increase by a factor of MORE than two.

I think we all need to go read our CPU architecture textbooks again...

AtenRa · Jul 25, 2013

KCfromNC said:
5 seconds is 5 seconds. But 5 seconds isn't always the same numbers of cycles. 5 secs at 4GHz vs. 1GHz is a 4x cycle count difference (plus or minus synchronization artifacts).

Correct, if the Memory latency is constant at 70ns (lets say 70 Cycles) at 1GHz then at 2GHz it will still be 70Cycles. But, if it took 5secs at 1GHz and it still was 5sec at 2GHz, that means that Memory latency got higher than 70ns or 70 Cycles. Hence, its math is incorrect arriving in to a wrong conclusion.

CPU time = Instructions x CPI x Frequency,

CPU time = Performance = the time the CPU takes to execute a programm. The less the faster the CPU.

CPI = Cycles Per Instruction. The less CPU Cycles it needs to access memory or execute in ALUs etc the faster.

The only part that should change in his scenario is Frequency, memory latency (cycles) should be the same, calculated in the CPI(Cycles Per Instruction).
But, he is changing the Memory Access (Cycles) with every Frequency Increase, which means that he is also changing both the CPI AND the Frequency.

If memory access is 70cycles and it takes 50% of the Instructions, then ALU takes the other 50% at 70cycles in order to have the 5secs CPU time from memory and 5secs CPU time from the ALU at 1GHz.

At 2GHz Memory Cycles will still be 70 and ALU cycles will still be 70 but, CPU time will be lower because of the higher Frequency. Same goes at 4 or higher GHz.

KCfromNC said:
Since you are dividing by C in IPC, that larger C means smaller IPC for a given workload. It's all in the original example sefsefsefsef gave.

Yes, but Cycles should remain the same for the same CPU no matter the freqyency. If we need 70 cycles for the memory access and 70 cycles for the ALU execution at 1GHz we will have exactly the same cycles at 2 or 4GHz.

Now do you see why his scenario was wrong calculating the IPC like that ??

Atreidin · Jul 25, 2013

If the CPU is processing an instruction that is waiting for something from memory, having more clock cycles just means that you are spending more clock cycles waiting for the data and not doing anything useful. The faster the CPU is running, the more clocks it is sitting there waiting for data from memory, which causes IPC to go down. There's no way it is ever possible for IPC to go up with increasing CPU frequency in a scenario like that. I don't think sef is the one confusing IPC with IPS.

KCfromNC · Jul 25, 2013

AtenRa said:
But, he is changing the Memory Access (Cycles) with every Frequency Increase, which means that he is also changing both the CPI AND the Frequency.

Yep, CPI goes up if you increase core frequency without increasing memory frequency. This is how real systems work - memory access latency is independent of core clock speed. It would be great if we could magically get DRAM latency to improve at pace with CPU speedups, but that's never been possible in real life. Look up the memory performance gap - it's the reason caches were invented.

mrmt · Jul 25, 2013

Idontcare said:
Even John admitted he was wrong. History revisionism isn't going to work here.

Not only that, he actually wrote that single threaded IPC was going up:

http://aceshardware.freeforums.org/posting.php?mode=quote&f=2&p=14368

Concillian · Jul 25, 2013

SlowSpyder said:
Maybe IDC is thinking of another post, but I remember his 'goodbye' (if you want to call it that) after BD launched.

yeah, that's it:

In my estimation, I made the IPC stament on XS and I don't recall making it other places (but I am sure that my comments were reposted.)

This is not a case of me lying, this is a case of me being wrong.

Doesn't really get any more clear than that.

AtenRa · Jul 25, 2013

Atreidin said:
If the CPU is processing an instruction that is waiting for something from memory, having more clock cycles just means that you are spending more clock cycles waiting for the data and not doing anything useful. The faster the CPU is running, the more clocks it is sitting there waiting for data from memory, which causes IPC to go down. There's no way it is ever possible for IPC to go up with increasing CPU frequency in a scenario like that. I don't think sef is the one confusing IPC with IPS.

Having the same program run on the same Processor will require the same access latency from memory if the Processor is working at 1Ghz or at 4GHz. You dont spend more cycles for the same memory access if the Processor oparetes at higher frequency.

But i believe that we are talking about two different things, im talking about CPI(IPC) when you people are talking about how many Instructions a processor is executing per Frequency (Hz = Cycle /sec).

parvadomus · Jul 25, 2013

sefsefsefsef said:
I don't know why I didn't think of this earlier.

http://en.wikipedia.org/wiki/Instructions_per_cycle

All I'm trying to get across is that what you guys mean when you talk about "IPC" is *literally* not what IPC means. IPC means what I've been talking about and what the Wikipedia article talks about.

Also, AtenRa, I'm sorry to say your calculations are just wrong. Even in the fantasy scenario where memory accesses don't count toward IPC, IPC would just remain constant, it wouldn't increase as frequency increases. Instructions Per Second (IPS) would increase in that scenario, but not IPC. Indeed, whenever you increase clock speed you do in fact increase IPS, which is what most people are thinking of when they think "performance," but this is not what IPC means.

Memory speed and latency penaltys are almost always hidden by cache levels, plus the ability of excecuting instructions out of order.
There are A LOT of reviews which shows marginal performance increases by using higher end memory modules.
When you overclock a CPU there are other factors by which IPC goes down, like stability of the CPU components, or agressive CPU throttling mechanisms.

AtenRa · Jul 25, 2013

KCfromNC said:
Yep, CPI goes up if you increase core frequency without increasing memory frequency. This is how real systems work - memory access latency is independent of core clock speed. It would be great if we could magically get DRAM latency to improve at pace with CPU speedups, but that's never been possible in real life. Look up the memory performance gap - it's the reason caches were invented.

CPI is not related to Core Frequency or Memory Frequency.

frozentundra123456 · Jul 25, 2013

I am not a CPU architect, but it seems to me what is important is total work done in a given amount of time and how much energy is required to do that work. I think a lot of what we talk about as ipc could really be called single core performance. Total performance in a given time is what really counts, whether it is achieved by more slower cores or fewer faster cores. However, in workloads that use only one or a few cores, the "more slower cores" will not equal the performance of the faster cores.

In regards to the original topic of lack of progress in CPU performance, it just seems normal that this would happen. Every new technology improves rapidly in the beginning an reaches a plateau eventually unless some revolutionary new discoveries are made.

And actually the total processing done by cpus is actually still increasing, but the improvement (unfortunately for desktop power users) is in the igp rather than raw CPU processing power.

sefsefsefsef · Jul 25, 2013

AtenRa said:
You dont spend more cycles for the same memory access if the Processor oparetes at higher frequency.

There are no words ... but I'll try. Memory access latencies are rightly measured in time (nanoseconds), not cycles, because they have absolutely nothing to do with what frequency the CPU is running at. They are totally independent. As the frequency of a CPU goes up or down, the number of nanoseconds to complete an external memory access remains constant (as measured in nanoseconds). As the frequency of a CPU goes up or down, then during the fixed latency time of an external memory access the CPU will have experienced more or fewer cycles. At low frequencies, the CPU will experience relatively fewer cycles for the same fixed latency external memory access (measured in nanoseconds), and at high frequencies the CPU will experience relatively more cycles for the same fixed latency external memory access (measured in nanoseconds), because that is the definition of CPU frequency, that in the same fixed time a higher frequency CPU will experience more clock cycles than a lower frequency CPU. Maybe someone should draw a picture (?).

But i believe that we are talking about two different things, im talking about CPI(IPC) when you people are talking about how many Instructions a processor is executing per Frequency (Hz = Cycle /sec).

I don't really get what you are talking about with instructions-per-frequency (kinda like DMIPS/MHz, I guess), which is another independent topic. Can you shed some more light on this? Thanks.

sefsefsefsef · Jul 25, 2013

parvadomus said:
Memory speed and latency penaltys are almost always hidden by cache levels, plus the ability of excecuting instructions out of order.
There are A LOT of reviews which shows marginal performance increases by using higher end memory modules.

My toy example is specifically talking about memory accesses which the caches don't catch. Off-chip DRAM access latency does not scale at all with CPU frequency. The problem of off-chip DRAM access latency is so big that the difference between an "average" and "high end" memory module is usually insignificant.

When you overclock a CPU there are other factors by which IPC goes down, like stability of the CPU components, or agressive CPU throttling mechanisms.

Hmmm, this part I'm not following as well. What part of the CPU lowers IPC when it's unstable? That sounds more like it would just crash the system to me. What part of the CPU is getting throttled that can affect IPC? As we've talked about before, if the CPU is throttling the overall frequency down, then IPC would actually go up! (but performance would probably go down)

AtenRa · Jul 25, 2013

frozentundra123456 said:
I am not a CPU architect, but it seems to me what is important is total work done in a given amount of time

That is called Throughput,

Responses Time is the time between the start and the completion of the
task.

99% of benchmarks done in the Reviews are measuring Response Time.

Example,

It takes 4 hours to assemble a Laptop (Response Time)

The factory can produce 10 Laptops per Hour (Throughput)

AtenRa · Jul 25, 2013

sefsefsefsef said:
There are no words ... but I'll try. Memory access latencies are rightly measured in time (nanoseconds), not cycles, because they have absolutely nothing to do with what frequency the CPU is running at. They are totally independent. As the frequency of a CPU goes up or down, the number of nanoseconds to complete an external memory access remains constant (as measured in nanoseconds). As the frequency of a CPU goes up or down, then during the fixed latency time of an external memory access the CPU will have experienced more or fewer cycles. At low frequencies, the CPU will experience relatively fewer cycles for the same fixed latency external memory access (measured in nanoseconds), and at high frequencies the CPU will experience relatively more cycles for the same fixed latency external memory access (measured in nanoseconds), because that is the definition of CPU frequency, that in the same fixed time a higher frequency CPU will experience more clock cycles than a lower frequency CPU. Maybe someone should draw a picture (?).

Thats exactly what im saying, CPI(IPC) has nothing to do with TIME.

CPU Cycle = Fetch -> Decode -> Execute
Frequency = Hz = Cycles /seconds

And i have already said that Main memory Latency is constant. My reference to Cycles whas in responce to Atreidin post.

sefsefsefsef said:
I don't really get what you are talking about with instructions-per-frequency (kinda like DMIPS/MHz, I guess), which is another independent topic. Can you shed some more light on this? Thanks.

In your example you said it took 5secs for the main memory and 5 secs to execute (total of 10 secs) at 1GHz. You called this IPC 1.0. Well, since this involves TIME it has nothing to do with CPI(IPC).

Could you please explain what that IPC you are talking about really is ?? It seems to me like CPU time execution but with the Main Memory Latency included.

CPU Thermal wall? So much for the GHz race.

Elite Member

Senior member

Diamond Member

Lifer

Lifer

Elite Member

Senior member

Lifer

Lifer

Lifer

Senior member

Senior member

Lifer

Senior member

Senior member

Diamond Member

Diamond Member

Lifer

Senior member

Lifer

Lifer

Senior member

Senior member

Lifer

Lifer