All the "gigaflop" numbers are really just theoretical. Conventionally, it just multiplies the maximum FP operations throughput by the clockrate. So:
The Athlon has 3 parallel FPU's and 3 decoders to feed them, so it can do theoretically 3 FP operations per clock. At 1.4 GHz, that would be 4.2 Gflops (billion FP operations per second) for x87 code and 5.6 Gflops for single-precision SSE code.
On the P4, which has 2 parallel FPU's and an execution trace cache that can issue up to 3 micro-ops per clock to feed it (assuming this is repeatable code). At 1.4 GHz that would be 2.8 Gflops for x87 code and 5.6 Gflops for single-precision SSE code and 2.8 Gflops for double-precision SSE2 code.
The PPC G4 processor has 1 general FP pipeline (fully pipelined) with a throughput of 1 FP operation per clock. At 1 GHz, that would be 1 Gflop. Not impressive, however, combine that with the AltiVec units which function independently from the conventional FPU and can do 4 or 8 (depending on the variety of your AltiVec instructions) single-precision operations per clock, you get a theoretical 5 Gflops in the case of only 1 AltiVec vector per clock and 9 Gflops in the case of 2 AltiVec vectors per clock at 1 Ghz.
This is all theoretical. I would say it's safe to say that this never, ever happens (the maximum being reached that is).