digitalalgorithm, soccerman, hans007
You all seem to not fully understand what pipelining does.
In any logic circuit (MPU's included) the number of logic gates in series determines the frequency the circuit can operate (along with the frequency the gates themselves can operate at.)
When you pipeline 2 things happen. First you break up a long string of logic gates into much shorter strings called stages. This allows you to clock the circuit at higher frequencies (as fast as the longest or "critical" stage can operate.)
This by itself accomplishes nothing since instead of performing 1 operation in 1 clock you are performing that same operation in several clocks. The absolute time it takes to perform the operation remains unchanged.
This is where the second part of pipelining comes in. Let's say you have a 4-stage pipeline. You do not have to wait until an instruction makes its way through the whole pipeline before starting a second instruction. As soon as the first instruction has passed through the first stage in the pipeline a second can enter. So after 4 clock cycles you have the following instruction 1 (I1) is complete I2 has completed up to stage 3 (S3), I3 has completed S2, I4 has completed S1.
So you should now be able to complete 4 times as many instructions as the original 1 stage circuit. You are not doing less work per clock in a longer pipeline just less work per stage in each clock. Because you have more stages you may in fact be doing more each clock.
There is an excellent thread at ars discussing this. (It starts of talking about the G4 processor but turns into a discussion about pipelines with a number of excellent posts.)
http://arstechnica.infopop.net/Open...amp;s=50009562&f=77909774&m=224092771
(the above is all 1 link)