IPC of FX-8150?

Smoblikat

Diamond Member
Nov 19, 2011
5,184
107
106
So i constantly hear everyone saying that the IPC is so horrible. Yet ive never heard what the IPC actually is. I know its the instructions per cycle, but i have no idea how to determine how many IPC it has. How many does my 2500K have?
 

jones377

Senior member
May 2, 2004
457
52
91
IPC is a pretty misunderstood subject. It is a measurement of instructions per clock when running an actual piece of code (as in program). Now it can either be averaged over the entire length of the code being run or you can look at snapshots at anytime during the code and see how many instructions are executed in parallel. For this reason, IPC changes with different programs, different CPUs, system architectures and clockspeeds.
 

lehtv

Elite Member
Dec 8, 2010
11,897
74
91
Wikipedia:
the average number of instructions executed for each clock cycle.

I'm no expert on this but this suggests to me that the absolute value of IPC is irrelevant because performance depends on the particular application you're using. What's more interesting is the performance compared to a different processor core in a particular application, operating at the same clock speed. This is exactly what Tom's Hardware tested in their FX-8150 review. The result was that

Intel [2600K] gets significantly more work done per cycle than the Phenom II X6 1100T, which in turns outperforms the FX
 

Chiropteran

Diamond Member
Nov 14, 2003
9,811
110
106
IPC is a pretty misunderstood subject.

I agree.

In addition to what you said, a lot of people seem to misunderstand it and use it to mean "single threaded performance" which isn't it at all. Higher IPC is no guarantee of higher performance.

For example, a 2ghz CPU with an IPC of 1.5 is faster than a 1ghz CPU with IPC of 2.
 

postmortemIA

Diamond Member
Jul 11, 2006
7,721
40
91
I agree.

In addition to what you said, a lot of people seem to misunderstand it and use it to mean "single threaded performance" which isn't it at all. Higher IPC is no guarantee of higher performance.

For example, a 2ghz CPU with an IPC of 1.5 is faster than a 1ghz CPU with IPC of 2.
IPC tells me how much slower or faster CPU A is when compared with CPU B that has same clock speed, same instruction set, compiler, and operating system.
So it does not make sense to compare ARM v9 and Intel Core 7, but it does make great sense to see why AMD CPUs have been slower per cycle for years now. If AMD is 20% (arbitrary number, just to explain my point) is 20% slower per cycle than intel, and because it needs 20% or so % higher clock speed to do the same work, where it takes 20% more power, then even it is cheaper a bit, it does not save me any money in a long run.
 
Last edited:

BD231

Lifer
Feb 26, 2001
10,568
138
106
So basically the amount of work a CPU can get done per cycle has little baring on overall performance, gottcha :rolleyes:

I agree.

In addition to what you said, a lot of people seem to misunderstand it and use it to mean "single threaded performance" which isn't it at all. Higher IPC is no guarantee of higher performance.

For example, a 2ghz CPU with an IPC of 1.5 is faster than a 1ghz CPU with IPC of 2.

Quoted from tom's article above:

" A processor’s per-core potential is defined by the number of instructions it can execute per cycle "

Jesus bro.
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
So i constantly hear everyone saying that the IPC is so horrible. Yet ive never heard what the IPC actually is. I know its the instructions per cycle, but i have no idea how to determine how many IPC it has. How many does my 2500K have?

Computing absolute IPC is easy to do if you are interested in the theoretical maximum for a given microarchitecture, but theoretical max is rarely relevant.

Instead, what is relevant is actual IPC, which is always something smaller than the theoretical IPC value because of the very reason CPU's have cache - the dataset itself that is being created or analyzed or changed must get to the circuits in the CPU.

And the latency in doing that is to blame for actual IPC's being less than theoretical IPC.

Take a 2500K which has higher actual IPC than an FX-8150, remove the L3$, L2$, and L1$, and it will still have the same theoretical IPC but the actual IPC will be something worse than that of a Celeron 300A.

But, a Celeron 300A's theoretical IPC is far less than even the actual IPC of a 2500K, which is why Intel doesn't take a 300A and just bolt on the same L1$, L2$ and L3$ that they bolted onto a 2500K and call it a day...even if they added all that cache to a Celeron 300A the actual IPC would merely become more closer to the 300A's theoretical IPC which is still far lower than the actual IPC of a 2500K with all that cache.

Now then...as said above, knowing the theoretical IPC is easy if you have the microarchitecture details, but determining the actual IPC is not at all straightforward because it is (1) instruction mix dependent (there are >700 instructions in the ISA, see pic below) which makes it software application dependent, and (2) the dataset itself is user-dependent as well as application dependent.

x86ISAovertime.jpg


^ that is a lot of instructions, and each one has a unique theoretical IPC, as well as an actual IPC (effective IPC) which is data-set dependent (cache stalls, data dependencies, etc).

And so computing a specific number for the effective IPC is not at all straightforward.

What is straightforward is to make relative comparisons, on a clock-normalized basis, in the benchmark performance of difference CPU's and microarchitectures.

Compare a 4GHz Thuban to a 4GHz Zambezi to a 4GHz Sandy Bridge. From that kind of a clock-normalized analysis you can arrive at reasonably useful IPC numbers which can then be used to speak to the underlying strengths and weaknesses of a given microarchitecture.

Performance = IPC x GHZ x Threads x Thread_Scaling_Factor

^ both IPC and thread_scaling_factor are application dependent (instruction mix dependent) as well as data-set dependent.

Thus you must find benchmarks that are suitable proxies for generating IPC and thread-scaling info which are indicative of the same class of general software applications that are relevant to the user class. (server apps for server markets, desktop apps for desktop markets, etc)

AmdahlsLawaugmentedbyAlmasiandGottlieb.png


Euler3DBenchmarkScaling.gif


So just analyzing the actual IPC portion alone will still fail in capturing a significant portion of the performance-impacting characteristics of the overall processor/platform package.
 

lehtv

Elite Member
Dec 8, 2010
11,897
74
91
@Idontcare

Yeah, that's pretty much what I said. :D (No but really, interesting post.)
 

Anarchist420

Diamond Member
Feb 13, 2010
8,645
0
76
www.facebook.com
I was going to say some of what IDC said. Basically, theoretical performance will always be higher than sustained/real world performance because of cache and memory latency. Example would be Sega Saturn's SH-2s (combined) theoretically could do more instructions per second than the PS1's CPU, but because the Sega Saturn's CPUs were in a master/slave config (resulted in them sharing bandwidth), they could get fewer instructions done per clock cycle than the PS1's single CPU could.

Similarly, the PS2's EE has theoretical performance of 6.2 GFLOPS, but the vector units' cache are so small that in practice, it never reaches anywhere close to its theoretical performance.

It seems like AMD focused way too much on theoretical performance when they designed BD. They could've and should've used a much better cache design even if it meant having regular core clocks of not more than 3 GHz.
 

Chiropteran

Diamond Member
Nov 14, 2003
9,811
110
106
So basically the amount of work a CPU can get done per cycle has little baring on overall performance, gottcha :rolleyes:

I didn't say little baring on performance, I said you had to look at clock speed also to get a correct answer.

For example, The old Athlon "Thunderbird" at 700mhz had the exact same IPC as the Athlon "thunderbird" at 1ghz. Are you honestly trying to argue they performed the same, since they had the same IPC?

Clock speed multiplied by IPC = performance. IPC alone doesn't really tell you anything, a 200mhz CPU with better IPC than Sandy Bridge wouldn't be useful for anything. For mutlithreaded tasks, you also have to consider the number of threads and cores and other details.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
It seems like AMD focused way too much on theoretical performance when they designed BD. They could've and should've used a much better cache design even if it meant having regular core clocks of not more than 3 GHz.

AMD felt that was a battle they couldn't win, and if you look back to every chip since core2 it's probably true. They've only slipped farther behind in that race. The notion of ending "direct competition" with Intel was in part due to that: if you can't beat them in IPC, whether theoretical or practical, then you need another game plan. And so Bulldozer was born... the red-headed stepchild of desktop computing =P
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
And so Bulldozer was born... the red-headed stepchild of desktop computing =P

And if most software was developed to take advantage of up to 8 cores, then we would be talking about how revolutionary BD was. :) The problem is that it was designed for a platform that does not exist for most users (yet).
 
Mar 10, 2006
11,715
2,012
126
And if most software was developed to take advantage of up to 8 cores, then we would be talking about how revolutionary BD was. :) The problem is that it was designed for a platform that does not exist for most users (yet).

In 8-threaded code, the FX 8150 is basically on-par with a 2600K. We'd be talking about how it basically matched several-month old, much lower TDP Intel chips. lol
 

Denithor

Diamond Member
Apr 11, 2004
6,298
23
81
In many ways Bulldozer reminds me of Netburst. Which obviously isn't a good thing for AMD...
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
In many ways Bulldozer reminds me of Netburst. Which obviously isn't a good thing for AMD...

Obvious to just about everyone but the folks at AMD it appears.

At this point would it surprise any of us if the next gen Brazos turns out to be an in-order Atom-like cpu paired with a chipset that eats 30W on its own? :eek:
 

BallaTheFeared

Diamond Member
Nov 15, 2010
8,115
0
71
Obvious to just about everyone but the folks at AMD it appears.

After the way they went so hard marketing it I have to believe they were counting on a "bigger numbers" effect on the actual shelves and on the Best Buy labels..


"But honey, this one has eight cores at 4GHz that means it's got 32GHz"


Basically AMD tried/is trying to use deception to prey on the ill-informed.

Heck they're even mimicking Intel's naming schemes now, all the way down to the K.
 

cebalrai

Senior member
May 18, 2011
250
0
0
So i constantly hear everyone saying that the IPC is so horrible. Yet ive never heard what the IPC actually is. I know its the instructions per cycle, but i have no idea how to determine how many IPC it has. How many does my 2500K have?

Doesnt really matter?
 

denev2004

Member
Dec 3, 2011
105
1
0
Theoretically its IPC=4 for each Bulldozer Module
AMD's idea about Mac-op is different from the u-op in Intel, so its not a good idea use the IPC pre core in AMD as a reference.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Theoretically its IPC=4 for each Bulldozer Module

That's going to depend on the specific instruction though, right?

We have latency and reciprocal throughput to factor in:

Latency:

This is the delay that the instruction generates in a dependency chain. The numbers are minimum values. Cache misses, misalignment, and exceptions may increase the clock counts considerably. Floating point operands are presumed to be normal numbers. Denormal numbers, NAN's and infinity increase the delays very much, except in XMM move, shuffle and Boolean instructions. Floating point overflow, underflow, denormal or NAN results give a similar delay.
Note: There is an additional latency for moving data from one unit or subunit to another. A table of these latencies is given in manual 3: "The microarchitecture of Intel, AMD and VIA CPUs". These additional latencies are not included in the listings below where the source and destination operands are of the same type.
Reciprocal throughput:
The average number of clock cycles per instruction for a series of independent instructions of the same kind in the same thread.

Looking at page 36, certain instructions do hit a reciprocal throughput of 1/4 (meaning IPC = 4, LFENCE is an example), but then other instructions can take several clock cycles to complete, and only so many of them can be queued up in series, resulting in IPC that is <<1 (DIV instructions are like that, with 20 clocks required to complete one instruction).
 

Martimus

Diamond Member
Apr 24, 2007
4,488
153
106
Obvious to just about everyone but the folks at AMD it appears.

At this point would it surprise any of us if the next gen Brazos turns out to be an in-order Atom-like cpu paired with a chipset that eats 30W on its own? :eek:

Yeah, we were comparing the BD architecture to Netburst as soon as they explained it in Hot Chips 2 years ago. Unfortunately, it really did perform similarly, and they weren't able to correct those issues with the longer pipeline, lower IPC design that Netburst had.
 

blckgrffn

Diamond Member
May 1, 2003
9,505
3,837
136
www.teamjuchems.com
In 8-threaded code, the FX 8150 is basically on-par with a 2600K. We'd be talking about how it basically matched several-month old, much lower TDP Intel chips. lol

Is that really so awful?

(performance wise)

I agree that it is awful TDP wise.

If AMD can over similar to slightly better performance in a competitive TDP envelope to Intel even in specific cases, and give it to us at a better price, would that really be so awful?

Can AMD really expect to do better than that at this point, given the obvious material and technological advantages Intel has?

We probably shouldn't expect them too, certainly.
 

Martimus

Diamond Member
Apr 24, 2007
4,488
153
106
Is that really so awful?

(performance wise)

I agree that it is awful TDP wise.

If AMD can over similar to slightly better performance in a competitive TDP envelope to Intel even in specific cases, and give it to us at a better price, would that really be so awful?

Can AMD really expect to do better than that at this point, given the obvious material and technological advantages Intel has?

We probably shouldn't expect them too, certainly.

That is a very good point. That said, I don't really care anymore. I will buy what is best for me at the time, and if Intel stagnates and artificially creates demand by teaming with a third party like Microsoft that requires a new ISO to do important functions, then I will just have to deal with it until a competitor is established. Luckily, competitors seem to come out of the woodwork in this situation, although they come from unlikely places usually with relatively novel solutions.
 

BD231

Lifer
Feb 26, 2001
10,568
138
106
I didn't say little baring on performance, I said you had to look at clock speed also to get a correct answer.

For example, The old Athlon "Thunderbird" at 700mhz had the exact same IPC as the Athlon "thunderbird" at 1ghz. Are you honestly trying to argue they performed the same, since they had the same IPC?

Clock speed multiplied by IPC = performance. IPC alone doesn't really tell you anything, a 200mhz CPU with better IPC than Sandy Bridge wouldn't be useful for anything. For mutlithreaded tasks, you also have to consider the number of threads and cores and other details.

Wrong. IPC is one of the best performance barometers for a CPU out there. As the P4 and BullDozer have proven mhz don't mean jack unless you're comparing apples to apples.