Here is the ultra simple pipeline explanation.
Basically, a CPU does the following...
Reads Data - Performs some Function on the data- Writes Data
Each section requires a clock cyle. Letting one instruction go through at a time yields 1 finished instuction every 3 clock cyles. To speed things up, you put registers inbetween the stages to store intermediate data. This enables you to put 3 instructions at a time into the pipeline. At any time, one instruction will be reading data, another will be performing some function on the data, and another will be writing data.... so in a perfect world, you now finish 1 instruction every clock cycle.
This does not happen though.. especially if the pipeline is longer. The longer the pipeline the better the chance is for something to screw up the cycle and have to cause a delay. This is why the AMD processor and its 10 (?) stage pipeline does not need to match Intel's Mhz to match Intel's performance.