You need your PMOS drive currents to be comparable to your NMOS drive currents in order for the switch from domino to static cmos logic to make sense.
This switch was enabled by Intel's gate-last integration because the strain-engineering that comes with gate last results in a PMOS drive current that is very large and on par with NMOS.
GloFo insisting on gate-first for 32nm HK/MG integration means their PMOS drive currents don't benefit from the same stain engineering opportunities, so no outsized PMOS drive current bumps, so no PMOS to NMOS drive current parity, so no enabling benefits of converting to static cmos.
Gate-first is great for high density, meaning lower production cost per die produced. Gate-last is great for high performance, your drive currents will be higher which means higher ASP for your products when selling in a performance sensitive market.
I suspect a simpler answer: device variation. If you consider a dynamic wide-NOR with a keeper (let's say 16 parallel pulldowns for a 16-input NOR), it only works so long as the weakest nmos pulldown is stronger than the pmos keeper, and the leakage of all the nmos pulldowns combined is weaker than the pmos keeper. These requirements are conflicting - in one case you want stronger Ns, and in the other you want stronger Ps, so there's a window of functionality that's bounded at both ends.
If you have a lot of variation, you can find yourself in a situation where no size ratios can guarantee both conditions (i.e. sometimes your pmos will be a little too weak and your nmos a little too leaky, or sometimes your pmos will be a little too strong and your nmos a little too weak). Variation has been getting worse with each new process node. Also, some variation effects tend to be exacerbated at lower voltages too, and with the modern push for low power, you may make tradeoffs to support a lower Vmin. Intel may have gone static because dynamic was too risky (or too big/slow once you increase sizes enough that you can tolerate the device variation). Static CMOS logic works pretty well so long as your transistors look kinda sorta transistor-like. Dynamic logic's advantage has been shrinking for a long time.
They may also have gone static to simplify porting to 32nm/22nm, or because they want to use more automated synthesis and place&route, which generally work best with static logic gates (ignoring Intrinsity's Fast14 stuff). As I think about this more, there are a whole bunch of other possibilities too... but that's good enough for this post.
If there are parts of this post you (or anyone) don't understand, I can attempt to clarify or draw pictures.