The L1 and L2 caches are areas of very fast memory located close to the processor. They store the most frequently used data and instructions for fast access by the CPU. Accessing the caches is MUCH faster than accessing RAM.
Having the cache on-board means that it is closer to the processor, clocked at a higher speed and allows faster access. It also costs less to have it onboard as opposed to discrete chips, as in the case of the Pentium II and Katmai Pentium III.
All current processors have L1 and L2 cache onboard.
For the Intel Pentium 4 (Willamette), the L1 cache is 8KB, 256KB for the L2 cache. This does not include an execution trace cache of approximately 96KB (which is really another form of L1 cache, so don't let the 8KB fool you).
For the Coppermine Pentium III, the L1 cache is 32KB and 256KB for the L2 cache.
For the Coppermine Celeron, the L1 cache is 32KB and 128KB for the L2 cache.
For the AMD Athlon, the L1 cache is 128KB and 256KB for the L2 cache.
For the AMD Duron, the L1 cache is 128KB and 64KB for the L2 cache.
Performance is more than just sizes though; there is the bus width (it's 64-bits wide on the AMD processors and 256-bits wide on the Intel processors), Inclusive vs Exclusive cache, and set associativity.
For an indepth look at caching, check out the article
The Fundamentals of Cache.
Hhmm...perhaps
this article is more appropriate since it is less advanced.