I think your assumption is based on a misundestanding of what HT does: HT is a way of keeping the long pipelines on the P4's full all the time, so as to lessen the amount of wasted cycles the processor performs. The added benefit of this is that it covers up how badly the Windows Scheduler sucks. Is it a bad feature? Absolutely not, it does add some very good things in terms of responsiveness between two heavy programs. It does not, however, make a processor technically faster. So in programs where the pipeline is entirely dedicated to one thread and the CPU is being used at its highest level (such as games), K8 chips blow NetBurst chips out of the water. So even if you use HT to keep the P4's pipeline's full, it's still slower than a K8, assuming the scheduling is done right (within a program shouldn't have to fight the Scheduler). Basically, your assumption is based on exposing a weakness in the Athlon, not finding a strength in the P4, and you'll need quite a bit more strength in games to bring out the P4's from their rut.
Also, you don't seem to realize that a) programmers have to factor in the lowest common denominator and b) complexity of multithreading increases for each thread, it's not just as simple as 'if you have 2, you can have 4'. By lowest common denom, I mean that there is only one x86 processor in the world right now that is capable of 4 threads simultaneously, and it costs 1000 dollars. Lesson in economics: you don't spend massive amounts of programming time to put in a feature that only .0001% of people can use. The Pentium D's will have the same problem with 4 threads right now that X2's have, so the point is moot. And hopefully, all this dual core stuff will force MS to take another look at how the Windows Scheduler works.