If I had to hazard a guess, I would say it's due to different architectures. Internally, both CPU's translate x86 ISA to be used with what's been called a "RISC-like" processor. Unless both design teams are twins, it's highly unlikely both implementations are anywhere near similar to each other. If I remember correctly, the Athlon also has more execution engines to begin with, which would mean a lot more die space and more silicon in use at any given time.
There's also the issue with familiarity with the .13u process. Intel may have been able to tweak their design to use less power on average, turning off transisters not in use, reducing current flow, etc. AMD adding an extra metal layer to TBreds indicates difficulty maintaining signal integrity, which isn't surprising for such an old design, but also shows they're running out of tweaks.
I'm sure I missed some points, or misreported, but this is what I think at the moment.
There's also the issue with familiarity with the .13u process. Intel may have been able to tweak their design to use less power on average, turning off transisters not in use, reducing current flow, etc. AMD adding an extra metal layer to TBreds indicates difficulty maintaining signal integrity, which isn't surprising for such an old design, but also shows they're running out of tweaks.
I'm sure I missed some points, or misreported, but this is what I think at the moment.