Originally posted by: BitByBit
Neos:
I wasn't implying HT gives the P4 an unfair advantage.
I was merely stating that without HT, the P4 would have no multitasking advantage over the Athlon. I've seen many posts that claim or imply that the P4 is just a better multitasker, with or without HT, which isn't true.
Zebo:
I'd imagine the idea that HT only benefits deep pipelines has arisen from the fact that deeper pipelines take longer to flush and refill than shorter pipelines.
While this is true, context switching latency is actually more dependent on memory interface speed than a processor's instruction latency. A 3.0GHz Prescott's instruction latency is around 10ns, while its memory access latency is many times that.
What HT allows is two threads to run concurrently without having to continually switch between them, which would involve memory access.
The second benefit to HT is when running multithreaded applications.
All modern processor's are superscalar. Superscalar execution allows a processor to execute more than one instruction simultaneously, but only if the pick stage(s) can find non-dependent instructions to execute in parallel. If this isn't the case, then there are going to be redundant execution units.
HT simply allows the processor to pick from more instructions, so that the probability of not being able to find two or more instructions to execute in parallel has been reduced, which has resulted in the P4 getting up to a 20% performance boost when running multithreaded applications.
'Wider' designs, such as the Athlon, would benefit from SMT atleast as much as the P4 has. In order to utilise its execution units efficiently, the Athlon must find more instructions to execute in parallel. With SMT, it would have more instructions to pick from, and therefore would be able to sustain a higher execution rate.
As far as I know, Intel intends to enable HT on Conroe, which is supposedly going to be a 4-issue design (wider than the Athlon).
That makes sense, because without HT, it would have a hard time keeping those execution units busy.
I doubt Dothan/Yonah will ever make use of HT. Despite Dothan's high IPC, it is in fact a 'narrow', but already highly efficient design.