Max IPC

Mookow

Lifer
Apr 24, 2001
10,162
0
0
I dont want to start a flame war here, but I'm in an arguement with some Mac zealots. What is the MAX (I know that its not actually achieved most of the time) instructions per clock of:
Pentium 4
Palimino
G4
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0
There is no number for "the" IPC of a CPU....to blatantly rip off of one of my previous posts:


<< IPC is a function of fetch and decode width, dispatch and retire width, reorder buffer size, # of functional units, instruction latency characteristics, # of issue ports, pipeline length, misprediction penalty, branch prediction rate, # of renaming registers, # of logical registers, instruction format (stack vs. 2 register vs. 3 register), L1 access time, L1 hit rate, L2 access time, L2 hit rate, main memory access time, out-of-order execution vs. in-order execution, out-of-order retirement vs. in-order retirement, out-of-order vs. in-order memory access, TLB access time and hit rate, scheduling policies, cache replacement policies, code characteristics.... >>


In other words, the entire microarchitecture. :) One thing to note is that the memory subsystem is extremely important, and that the hit rates for caches and the branch predictor is variable depending on the code that is being executed.

The "upper limit" to the IPC is generally considered to be closely related to the fetch width, dispatch width, and retire width....typically, the fetch and retire width is the same, but the dispatch width can be higher. In an out-of-order superscalar core, instrcutions are fetched into reorder buffers before they are dispatched out-of-order to the execution units. That way, if there's a memory access that stalls all the instructions due to read-after-write dependencies, the buffers might fill up....when the memory stall is done, the instructions can "burst" out of the reorder buffers at a rate higher than the fetch width. But the average throughput is still limited by the fetch and retire width.

The Athlon is a 3-way fetch / 9-way (I think) dispatch core, and the P4 is a 3-way fetch / 6-way dispatch core. In terms of x86 instructions, they both average between .8 and ~1.15 instructions/cycle (obviously, the Athlon often has a higher IPC given the same code). But the x86 instructions are decoded into shorter RISC-like micro-ops; the fetch and dispatch rates are for micro-ops, not x86 instructions. Typically, x86 instructions are decoded into 1 or 2 micro-ops, sometimes more for the rarely used archaic instructions. IIRC, for the "higher IPC" x86 code, the average is 1.4 micro-ops/x86 instruction, and for the "lower IPC" code, the average is 1.6 micro-ops/x86 instruction. That places the average IPC, in terms of the micro-ops, at around 1.28 - 1.6.

The G4e, IIRC, is a 3-way fetch / 6-way dispatch core. I don't know any performance numbers off-hand, but consider the Alpha EV6, which is an uber-expensive high-end RISC CPU. It is a 4-way fetch / 6-way dispatch core, but even with its large amounts of cache, the upper-limit to its IPC (considered one of the best for a superscalar, out-of-order core) is around 2.0 (you can see why simultaneous multithreading is of such interest to CPU architects :)). So for comparison's sake, consider the G4e's average IPC to be slightly higher than the numbers for the Athlon/P4, but definitely below that of the EV6.

Also, be aware that performance is not IPC * clock rate....the Iron Law says that Execution time = (cycles/instruction) * (seconds / cycle) * (instructions/program). The last term is variable, depending on the compiler used and the instruction set. In theory, RISC ISAs have longer code path lengths than that of CISC CPUs, since RISC instructions tend to do "less work" than a CISC instruction. I don't know how true this is anymore...modern x86 compilers don't use the more archaic "ultra CISCy" x86 instructions that often (if ever), and I think register - register arithmetic is used more often than register - memory. But this is definitely an issue to consider, depending on the state of the compilers used for x86 programs vs. PowerPC programs.
 

Goosemaster

Lifer
Apr 10, 2001
48,775
3
81
speaking from an engineering standpoint, the G4 and the entire series are better chips than x86. X86 chps have tranformed som uch, that they look more like patchwork quilts than anything. Of course, motorola is the producer, and theyu are meager in terms of intel and AMD in the field. Additionally, apples themselves are not anywhere near as proficient as the processors they use.


Don't believe me? Check out IBM's workstations using the Power3 chip(same thing) Tell me it does not give any orgasmic delight....


Apple sucks. Motorla rocks.

That is why it is Intel"inside"
 

Chesebert

Golden Member
Oct 16, 2001
1,012
13
81
IPC = 1/CPI :)
the MAX CPI is 1 for any computer :)

Max CPI for any particular architecture is based on the program it runs. running different program wil give you different CPI.

simple isn't it?

 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0


<< the MAX CPI is 1 for any computer >>

You're kidding, right? That may have been true a few decades ago, but these days we have these things called superscalar and VLIW architectures....:confused:



<< speaking from an engineering standpoint, the G4 and the entire series are better chips than x86. >>

That's definitely true...since the x86 CPUs started using RISC-like superscalar OOOE cores, the ISA has less of an impact on performance. Though the ISA, in terms of # of logical registers, is still important as a programmable model of the CPU. Register renaming can eliminate write-after-read and write-after-write data hazards due to a limited number of registers, but it cannot eliminate read-after-write hazards.

So register renaming and micro-op decoding has overcome some of the performance limitations of x86, but these steps introduce more stages into the pipeline (and thus a higher misprediction penalty)...they also make the CPUs *in general* more cumbersome to engineer, larger, and hotter due to the more complex decoding.
 

gsaldivar

Diamond Member
Apr 30, 2001
8,691
1
81
I'm so tired of the platform wars.. :(

Computers. They are good. Use one. Don't take them personally.