Okay, okay, I'll give a little better description, keep in mind that I'm not an expert, like mrman3k said ask in Highly Technical for highly technical answers.
This is very simple, I cannot give a much more detailed description as this is pretty much all I know:
Each commend has to get through several stages to get executed, the Athlon (and all other x86 cpu's except the p4) has 15 stages whereas the p4 has 20. Thus it takes longer time for the p4 to execute each command but the more stages allow it to reach higher clock speeds. Look at some of the articles written about the p4 when it came out, it was discussed much at that time.