To keep latencies as low as possible on the L1 cache. A larger L1 cache would necessitate higher latencies to the L1, increasing the L1 cache size beyond 8KB on the P4 on it's present .18u process technology likely would have made it impossible to keep the low 2 cycle latency to the L1.
Some have put forth the theory that the .13u Northwood core may increase L1 data cache to 16KB in size, but at present this seems quite unlikely to happen.
Remember size is not the only factor when considering caches.... set associativity, data bus width, latencies, inclusive vs. exclusive w/the L2 etc. are all very important factors to take into consideration.
A smaller cache isnt always an inferior cache.