Time has not told (and writing it bold wouldn't make it true either). Engineers talk about execution of code optimized for a CPU, not for P6, Atom or SB. It even hasn't been tested except for one single app (Cray, als already linked above):
http://ht4u.net/reviews/2011/amd_bulldozer_fx_prozessoren/index17.php
Note: x264 hasn't been recompiled. They used the quickly optimized apps provided by the x264 developer, who just got access to a BD (first via terminal, later as a real chip on his desk) weeks ago.
HT4U even tested the effect of running one thread on a module compared to two threads on a module:
http://ht4u.net/reviews/2011/amd_bu.../index16.php?dummy=&advancedFilter=false&prod[]=AMD+FX-8150+[1+Modul%2C+1+Threads]&prod[]=AMD+FX-8150+[1+Modul%2C+2+Threads]
Instead of wildly exaggerating around people should better start to think. But I guess this is buried deeply in our lower level nervous system.