Seems peculiar to me as well. As I recall most of the K8 chips did not benefit greatly from L2 cache due to the low latency/high bandwidth hypertransport bus & integrated memory controller. I figured that being on the same die,using the same bus alone they would scale quite well without the addition of shared cache. The X2's and Opteron's the opteron x65+ proved that.
I'm just thinking that once the "errata" is fixed and a few steppings roll out, it may make more sense to drop the L3 for the desktop chips and have them run just a hair faster. A Phenom with no (not disabled, but inexistant) L3 cache should be fairly cheap to produce after all the engineering has taken place and draw ever so slightly less power. I cant remember exactly how much power sram draws alone, but wasn't it like 5-10 watts/meg in the prescott days @ 3ghz? I'ld imagine it should be ~ 10watts for the phenom. I could be way off here, google/forum/article search isn't being my friend today. Of course, less cache could reduce the effective instructions/clock if it is really being utilized. Maybe I'm just thinking its a conspiracy.