Poorly designed? No, not at all. It's not like the Oregon design team suddenly looked at the chip they produced at said "oh my, look, we forgot to add performance!" It would be my contention that the P4 was truly designed with the .13 process in mind..
the die size is quite large in .18, and this is the "stripped down" model. There was an
article in EETimes about how in the design process, the engineers realized that they had to modify or remove several features to be able to make a reasonable-sized and reasonably cool chip in a .18 process
Some of the changed features included:
Cut L1 cache size in half
Removed the 2nd FPU pipeline
Removed the 3rd ALU (left the 2 "fireball" ALUs)
Removed all L3 cache (compensated by boosting L2 to 256k)
if all those changed hadn't been made, I think the initial P4 would have been a real screamer. Currently I don't think there are any plans to re-add any of the removed features, but just increasing the L2 cache to 512k should boost performance by a fair bit.
Basically though, the P4 was built for speed. It can be seriously fast when using optimized code, but on stuff not produced by the intel compiler, it suffers quite a bit.
This building for clockspeed hurt the P4's acceptance tremendously, but the lead the P4 now has in clockspeed on the Athlon is ridiculous -- 600mhz behind is almost like comparing the G4 to the P3. We know that in terms of real performance on current code, it's a wash, but the computer buying public doesn't.