<<
Well clock-per-clock which faster? Motorola G4 733MHz or Pentium III 733MHz? >>
You're looking for too much of an over-simplification to sufficiently answer the question. The Iron Law says that execution time = seconds/cycle (inverse of clock rate) * cycles/instruction * instructions/program. The second two terms are extremely software dependent; they vary depending on the program, instruction set, and the compiler used. For these reasons it's difficult to make any comparisons between CPUs of different instruction sets, since benchmarking software cannot be directly run on both platforms.
For this reason, the industry uses SPEC, a benchmarking suite on which performance is based. While it's not the perfect test of performance, nothing really is. The only
official numbers that seem to be available from Motorola place the G4 733's SPECint 95 at 32.1, and its SPECfp 95 at 23.9 (for some reason they're using the outdated SPEC 95 instead of 2000).
SPEC's numbers for the P3 733 place it at 35.2 for SPECint 95 and 30.5 for SPECfp 95.
The G4 is a great processor, but as you can see, you shouldn't believe Apple's marketing. x86 CPUs, notibly the P3/P4 and Athlon, tend to have better floating-point performance, due partially to the wider superscalar floating-point implementation (2 FP units/3 FP units on the P3/P4 and Athlon vs. 1 on the G4). While x87's stack-based architecture is ridiculous, the use of latency-free FXCH instructions on the P3 and Athlon can emulate a register-based instruction set. Likewise, x87's two-operand format is inferior to PowerPC's three-operand format, since it forces it to rely more on memory latency and bandwidth; on the other hand, the P3/P4/Athlon's generally faster caches, larger re-order buffers, and larger system memory bandwidth help work around the problem. On the other hand, the G4 tends to match or exceed integer performance. Also, its SIMD set (Altivec) is superior to 3DNow and SSE1/2 (check out
3 1/2 SIMD Architectures for details).
Check out Ace's
SPECmine database...in SPEC 2000, the 1.6GHz AthlonXP is only beat by the 1.3GHz IBM Power4. The 2GHz P4 is only beat by the aforementioned Athlon and Power4, as well as the 1GHz Alpha EV68. In SPECfp, the 2GHz P4 and 1.6GHz Athlon take 5th and 9th place, respectively. Granted that the P4/Athlon can't come close to touching high-end RISC in terms of system & CPU scalability, system bandwidth, and cache size, but don't sell x86 short...especially when you consider the
huge price difference between x86 and high-end RISC.
edit: If it seems like I'm coming down on you, Jerboi, I didn't mean to do so. While the ISA is important, IMHO it's not as important as the microarchitecture, engineering talent, and manufacturing process technology. x86's clunky ISA certainly has an effect on the performance and design of its CPUs. The x86 instruction to fixed-length micro-RISC op conversion facilitates pipelining and superscaling, but it adds a few pipeline stages (which hurts branch misprediction penalty) and makes the CPUs hotter and more complex. x86's two-operand format (a=a+b instead of 3-operand a = b + c) makes it rely on memory more often. x86's small 8-register general purpose register set has the same effect, and it limits the programming model the compiler sees (and prevents some cool compiler tricks). These last two detremental effects can be eased with fast caches w/high hitrates...the use of register renaming can bypass the increased frequency of write-after-read and write-after-write data dependencies due to the small register set.
So while an elegant ISA makes the CPU architect's, engineer's, and compiler writer's job easier, the performance increase from going to x86 to a RISC ISA would be evolutionary, not revolutionary. It's said that AMD Hammer's 16-register register set (vs. 8 with x86) in the x86-64 mode will increase performance by 5-10%...certainly an incremental improvement, but nothing revolutionary.