sizes of caches and memmories are only half the equation, speed is just (if not more) important. Pretty much the way it works is that hte CPU does its work on the register file, this is a set of latches that store information, so like in the P4 you have something like 128 32bit registers. Now obviously everything cant fit in the registers, so you need to load stuff from memmory too. So if the data aint in the registers the CPU goes and issues a load instruction ot get it from the L1 cache. Now the L1 cache is really quick, but relatively small. The bigger it is the hgiher chance that the data you want is in there, but the more time it will take to load it(so there is your tradeoff, big caches hit mroe often, but are slower). Now, if the data aint in the L1 cache the CPU goes and looks for it in the L2. You ahve pretty much the exact same tradeoff here, you can have a big cache which is less likely to miss, or a small one which means if ther is a hit the CPU can get the data quicker. So, if the data aint in the L2 then your gotta go look in main memmory. So you go look in memmory, if the data still isn't there your just completely screwed and you have to wait millions of cycles to get it from the HD. Actually i don't even know if the CPU is ever really gonna ask for data that the memmory doesn't have, youd hope to all hell that the info has been prefeteched.
So basically its:
registerfile<-L1 cache<-L2 cache<-main memmory<-hard drive