I really like Yanagi's McDonalds analogy. In a CPU architecture you need to complete every stage of a pipeline in a single clock or you lose it so going back to McDonalds:
You order a sandwich, the order is sent in, someone puts a burger in the oven, when done he gives it to the nest guy who ads the ketchup, then next is the mustard, and then the bun pickles onions ect. And then you have the fries.
The slowest part is the cooking of the burger so a single clock cannot be faster then the time it takes to cook the burger or it wouldn't finish.
What Intel does is spit it up some more like cook the burger on one side, then cook on the other. Since it takes less time to cook only one side, the single clock is shorter allowing for higher frequencies but now you need two clocks to cook the burger.
Then you have other optimizations to have maximum work done in a clock like one can add ketchup, mustard and pickles in the time to cook a burger so that can be just one stage. And the problem with a mispredict is when you guess the person wants mustard when he does not so you have to thro it out and start again.