Originally posted by: carlosd
Originally posted by: intangir
Untrue. While the first wave of Intel's Next-Generation Microarchitecture (Merom/Conroe/Woodcrest) will not have SMT, the later cores will.
That will be not HT, HT will die with netburst. HT uses coarse multithreading not the same with SMT using multiple cores.
Now you're just abusing the terminology. HT is what Intel marketing calls SMT, nothing more and nothing less. When Intel's new cores get SMT, Intel will call it HT. If you're saying HT is SMT as applied to the Netburst core, well then yes, duh, your definition of HT will go away with Netburst. But that's tautological and pointless to say, so you can't really mean that, can you?
Also, HT is the property of a single core. SMT with multiple cores is just duplicating the SMT with a single core. There are no new challenges or features that make it different from a single-core implementation.
And finally, HT is fine-grained simultaneous multithreading. Don't you check your facts at all? Where do you get this misinformation?? Stop spreading it!!

:disgust:
http://arstechnica.com/articles/paedia/cpu/hyperthreading.ars/4
"The scheduler has no idea that it's scheduling code from multiple threads. It simply looks at each instruction in the scheduling queue on a case-by-case basis, evaluates the instruction's dependencies, compares the instruction's needs to the physical processor's currently available execution resources, and then schedules the instruction for execution. To return to the example from our hyper-threading diagram, the scheduler may issue one red instruction and two yellow to the execution core on one cycle, and then three red and one yellow on the next cycle. So while the scheduling queue is itself aware of the differences between instructions from one thread and the other, the scheduler in pulling instructions from the queue sees the entire queue as holding a single instruction stream."
http://intel.com/design/pentium4/manuals/index_new.htm#aorm
"The core can dispatch up to six µops per cycle, provided the µops are
ready to execute. Once the µops are placed in the queues waiting for
execution, there is no distinction between instructions from the two
logical processors. The execution core and memory hierarchy is also
oblivious to which instructions belong to which logical processor."
Or if you're a little slow:
"Figure 1-6 shows a typical bus-based symmetric multiprocessor (SMP)
based on processors supporting Hyper-Threading Technology. Each
logical processor can execute a software thread, allowing a maximum of
two software threads to execute simultaneously on one physical
processor. The two software threads execute simultaneously, meaning
that in the same clock cycle an ?add? operation from logical processor 0
and another ?add? operation and load from logical processor 1 can be
executed simultaneously by the execution engine."
Originally posted by: carlosd
Originally posted by: intangir
One: it is a myth that SMT is not worth it on shorter pipelines or single cores. The Alpha EV8 would have implemented 4-way simultaneous multithreading in a single core with a 9-stage pipeline. They estimated it would have doubled performance with a die-size increase of less than 10%.
The now dead Alpha architechture was a quite different architecture cannot directly be compared to X86. I am talking specifically about x86 cores. EV8 architecture RISC and the level instruction parallelism is much easier to reach but with post RISC desing this goal is quite difficult since the micro or macro ops are not directly in control of software those are up to the hardware and microcode decoders, in the pure RISC architectures instructions are directly in control of software.
OMG. This is hilarious. That was pure gobbledygook. It has become blindingly obvious you have no clue what you're talking about. Heck, I don't even know what you were trying to say, but I'll try to cover every point you might possibly have been making.
First of all, SMT does not care about instruction parallellism. It exploits thread-level parallellism.
Second, SMT does not rely on the software being aware of it to work. It relies on no software hints or support. All the software cares about is that its instructions get executed. The processor automatically takes care of all scheduling and updating of architectural state. With hyperthreading, it just fetches and schedules instructions from two processes at the same time. That's all the software stack has to know.
Third of all, there is no relevant difference between the Alpha's backend and the backend of modern x86 processors. They've pretty much converged. Once the decode and register allocation has been done, the backend just sees a single stream of instructions. The scheduler is free to make whatever scheduling choices it desires among the available execution units. All relevant data dependencies are taken care of by the register allocation. Whether these are RISC ops or x86 uops makes no difference whatsoever.
Originally posted by: carlosd
Originally posted by: intangir
Two: Proliferations of the Merom core *will* have SMT. It probably will not double the performance, but I know for a fact it will increase it significantly.
I don't think they will have the same kind of multithreading as HT, it would be SMT using multiple cores, multiple cores+HT = no gains in performance as you see with the 840XE CPU.
Again, whatever architectural form it takes, it will be called HT by Intel, which has no relevant differences with SMT. And I don't have an 840XE so I can't test it, but I doubt in the extreme that HT gives it no gains. Dual processor Xeons have benefitted from HT indisputably, as running 4 threads give more performance than 2. Why would dual-core chips be any different? And the benchmarks for dual-core Xeons with HT show obvious gains.
For a dual-processor dual-core Bensley (Netburst-based Xeon) system (4 cores, 8 threads):
http://www.realworldtech.com/page.cfm?ArticleID=RWT112905011743&p=4
"The Bensley system also scales perfectly to four physical processors (which is quite an achievement), and then gets a 35% boost from Hyper-Threading. At eight threads, the Bensley system executed the kernel 74% faster."
http://www.realworldtech.com/page.cfm?ArticleID=RWT112905011743&p=5
"The Nocona system scales by a factor of 2.44, and the Bensley system by a factor of 4.81."
That gives a 20% increase from hyperthreading.
And stop moving the goalposts. First you complained that HT was useless. Then you said it was only useful in multicore processors. Now you're saying it's only useful in single core processors. Make up your damn mind.
Originally posted by: carlosd
Originally posted by: intangir
Well, if AMD thought as you, I think Intel has nothing to fear for the next 3 years. You remind me of the people that claim register renaming helps register-starved CISC designs more than RISC, and so is a necessary added cost of designing CISC chips. Well, the fact is, any serious high-performance RISC design also implements register renaming, because the performance gain is worth it.
You are talking about a TOTALLY different issue.
That is what is known as an "analogy". We are debating the cost/benefit of a microarchitectural optimization. I was showing you the historic decisions made concerning an optimization with the same characteristics as HT.
Originally posted by: carlosd
As said the HT was only useful in the P4, but now with the new desings of intel the same kind of multithreading will have not significative advantages.
I showed you my numbers, you show me yours.