Some confusing advertising there, but here's the low down
All Athlons/Durons have 128KB(64KB instruction/64KB data) of on-die L1 cache. The difference is in the L2 cache.
The classic Athlons(both 0.25u K7 and 0.18u K75 cores) have 512KB of off die, on package L2 cache, which I believe is 8-way set associative.
The Thunderbirds have 256KB of on-die 16-way set associative L2 cache on a 64-bit cache line.
The Durons have 64KB of on-die 16-way set associative L2 cache on a 64-bit cache line.
Thunderbirds and Durons use exclusive cache, in that data in doesn't have to be replicated in both L1 and L2 cache. It either exists in 1 level, the other, or none - never both. These increases effective cache size, and hence cache hit rate.
All P2,P3 and Celerons have the same 32KB(16KB instruction/16KB data) of on-die L1 cache.
Klamath, Deschutes and Katmai P2/P3 processors had 512KB off-die, on package L2 cache. I forgot the associativity.
Coppermine P3s have 256KB on-die 8-way set associative L2 cache on a 256-bit cache line.
Covington Celerons didn't have any L2 cache at all.
Mendocino Celerons has 128KB on-die 4-way set associative L2 cache on a 64-bit cache line.
Coppermine-128 Celerons has 128KB on-die 4-way set associative L2 cache on a 256-bit cache line.
And that's about it. There are other differences in cache hierarchy, but these are the more important ones.