A P200MMX will noticeably outperform a P200 classic in almost all applications. This is due to the added L1 cache (16k instead of 8k).
Not only that, but didn't they have twice the memory channels to the cpu core, a different FPU, and a fifth stage (as compared to four) in the instruction pipelines?
Everything you want to know about cpu's design specs:
http://einstein.et.tudelft.nl/~offerman/cl.contents2.html
Tidbits:
Original Pentium prototypes at 25-33mHz! hehe
Also:
Multiprocessor support.
Upgrading: adding another Intel Pentium CPU.
Parity checking at busses.
Branch prediction (BTB: Branch Target Buffer).
8 kbyte instruction cache, 8 kbyte data cache (Harvard architecture).
Both 2-way set-associative, write-back, no write-allocate.
32 bit internal data bus (CPU - MMU (Memory Management Unit, including cache))
64 bit external data bus (MMU (Memory Management Unit, including cache) - memory).
32 bit address bus.
P24 - Early Prototype:
Technology: 0.8 micron biCMOS.
Single 32 kbyte cache: 16 kbyte code, 16 kbyte data.
P5:
Technology: 0.8 micron biCMOS.
3.1E6 transistors.
Die size: 288 mm2.
New L1 Design - 8 kbyte instruction cache, 8 kbyte data cache (Harvard architecture). Both 2-way set-associative, write-back, no write-allocate.
2-issue 5-stage superscalar with 8-stage pipelined FPU (Floating Point Unit).
P54:
Technology: 4-layer metal, 0.6 micron biCMOS.
3.1E6 transistors.
Die size: 157 mm2.
P54CT:
Technology: 4-layer metal, 0.35 micron CMOS.
3.1E6 transistors.
Die size: 90 mm2.
P54CTB - Overdrive
Technology: 4-layer metal, 0.35 micron CMOS.
4.5E6 transistors.
Die size: 141 mm2.
P55C:
Technology: 4-layer metal, 0.35 micron CMOS.
4.5E6 transistors.
Die size: 141 mm2.
Two MMX execution units; L1 16 kbyte instruction cache, 16 kbyte data cache (Harvard architecture).
P55C - Mobile
Technology: 5 layer metal, 0.25 micron CMOS.
4.5E6 transistors.
Die size: 95 mm2.
P-Pro - Early Prototype
Technology: 0.6 micron biCMOS, precharged domino logic.
5.5E6 transistors.
Die size: 306 mm2
Level 1 cache: 8 kbyte instruction, 8 kbyte data (Harvard architecture).
Multi-processor support, Superpipelined superscalar: 3-issue, 12-stage, instruction pool, fetch/decode unit, dispatch/execution unit (2 AGU (Address Generation Unit): 1 load, 1 store, 1 JEU, ECC (Error Correcting Code), Fault Analysis & Recovery ,Functional Redundancy Checking, Multi-branch prediction, data flow analysis, speculative execution, (Jump Execution Unit), 2 IEU (Integer Execution Unit), 1 FEU (Floating Execution Unit)), retire unit.