P4 will have
256KB L2 cache. It's very similar to the "ATC" in the Coppermine except that the cache is now
single pumped instead of half pumped (it gets data every clock cycle; on the Coppermine, the cache could only get data every other clock cycle).
P4 has
8KB Data L1 cache which (I think) has a two cycle latency.
It also has a much larger
trace cache (analogous to an Instruction L1, but handled in a different part of the instruction pipe and with other differences) which can hold
twelve thousand microOps, which is apparently north of a hundred kilobytes (but I'm not 100% certain of that).
P4 has a
217sqmm die (I think) with 256KB L2. PIII (cC0) has a
90sqmm die with 256KB L2 and somewhere around
300sqmm with 2MB L2 (I think). Having a P4 with 512KB more complex "full speed" L2 is possibly/likely over 250sqmm, which would be really annoying. I don't know the actual rule of thumb, but I'd say that you really want to be well under 200sqmm in order to be fruitful for the mass market.
But then, that's just my speculation.
-JC
PC News'n'Links
http://www.jc-news.com/pc