According to GF slides, they expect a 40-50% advantage in perf/watt compared to the current 45nm process (although I'm not sure if they already include the Low-K update for 45nm):
So this is one variable. Another is gate delay. But I don't have GF's gate delay numbers for their processes at the likely operating voltage range, but it should roughly be 70% of their 45nm SOI gate delay (linear scaling factor).
Llano should have ~35M T per core, 110M incl. L2, so ~75M for the L2 and other stuff (e.g. power gating ring).
BD has 213M T per module, minus 150M for the L2+rest there remain 63M T for the 2 cores.
Both for BD and Llano AMD stated 0.8 to 1.3V. There is even more extensive clock gating in BD, which might reduce dynamic core power even more (maybe 20% vs. K10).
http://pc.watch.impress.co.jp/img/pcw/docs/430/341/html/13.jpg.html
Assuming nearly constant uncore power consumption (reduced by process, increased by # of T and clock frequency), a BD module on a X8 might have 150% of the power budget of a Thuban core and a BD core 75% respectively. With less L2 power (IIRC it is an 8T design in BD) but likely more relative leakage we might just look at roughly estimated core power relations:
BD : 63M T at 150% power (2 cores)
BD : ~32M T at 75% power (1 core avg)
hypothetical 10h 6C at 32nm: ~35M T at 100% power
10h: ~35M T at 166% power (inverse of 40% lower perf/watt)
So if a Thuban core has about 13.3W TDP (95W-15W uncore), it would use ~8W in 32nm at the same clock. This would be 100%.
A BD module at the same voltage and maybe 20% higher clock (FO4 reduction!) might use ~12W. 4 Modules would use ~50W then. Now scaling back to 80W again (20W per module) could add even another 20% of clock frequency headroom (about 45% combined). Thus a 4GHz Zambezi at 95W is not unlikely.
Could someone please check my numbers?