AMD working on reverse Hyper-Threading technology

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Gamingphreek

Lifer
Mar 31, 2003
11,679
0
81
although HyperThreading was just a substitute and short lived until Dual Cores were released.

Hyper Threading was no substitute. It was a "We have to cover our asses move".

-Kevin
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
hyperthreading was a realalization that SMT can improve performance up to 40% in some apps. Unfortunately Intel's implimintation was troublesome on the P4 due to the replay bug. I know as we go to multicore processors people think that SMT is now worthless because you can just put the other thread on another core, but the individual cores still stall jsut as much as before, and SMT can help that. Basically, id say "Hyperthreading" will be coming back to the x86 line at some point because it can increase performance considerably, but doesnt require nearly as much space as another core.
 

FelixDeCat

Lifer
Aug 4, 2000
31,295
2,790
126
Originally posted by: Gamingphreek
although HyperThreading was just a substitute and short lived until Dual Cores were released.

Hyper Threading was no substitute. It was a "We have to cover our asses move".

-Kevin

No Kevy,

It was revolutionary. Hyperthreading was extraordinary. It allowed Intel to "Leap Ahead" at the time allowing multiple threads on a single core. Amd couldnt touch it.
 

Munky

Diamond Member
Feb 5, 2005
9,372
0
76
Originally posted by: FelixDeKat
AMD is taking the right stance - if you cant beat 'em, copy 'em! :laugh:

Ah, so we can expect from AMD a 30-stage pipeline Netburst abomination that heats up your whole house any day now, right?
 

Cooler

Diamond Member
Mar 31, 2005
3,835
0
0
Originally posted by: munky
Originally posted by: FelixDeKat
AMD is taking the right stance - if you cant beat 'em, copy 'em! :laugh:

Ah, so we can expect from AMD a 30-stage pipeline Netburst abomination that heats up your whole house any day now, right?

well we never know they might even fix intel heating problem and can really create 10+ ghz chip.
 

FelixDeCat

Lifer
Aug 4, 2000
31,295
2,790
126
Originally posted by: munky
Originally posted by: FelixDeKat
AMD is taking the right stance - if you cant beat 'em, copy 'em! :laugh:

Ah, so we can expect from AMD a 30-stage pipeline Netburst abomination that heats up your whole house any day now, right?

I wouldnt put money on it.
 

dexvx

Diamond Member
Feb 2, 2000
3,899
0
0
Originally posted by: Gamingphreek
although HyperThreading was just a substitute and short lived until Dual Cores were released.

Hyper Threading was no substitute. It was a "We have to cover our asses move".

-Kevin

No, HT and SMT in general is revolutionary. Conroe based derivatives WILL have some kind of HT implementation in the future, as will most CPU's.

Like it or not, Pentium-4 was well ahead of its time, in terms of CPU technology. The main backdraws where:

- Memory technology (because Rambus left the desktop memory market) was stagnant and could not provide the necessary banwidth to feed the Pentium-4
- The shared FSB that could also not feed the Pentium-4.
- Too much leakage and bad design on Prescott that prevented high scaling.

Researchers at IBM concluded the "optimal" pipeline length is 50 (yes 50). Researchers at Intel concluded the "optimal" pipeline length to be 30.

---

In reference to Mitosis, I've heard of the theory, but in practice its at least years away from implementation. SMT was a theory years before it actually came into fruitation with the Northwood HT core/Power-4 SMT core. Even now, SMT is still early in development and buggy.
 

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
This makes sense to me. CPUs already have multiple execution cores, it should be possible to extend that to dual cores, only problem is the speed of transmission between the two. They'd either need some pretty intelligent preordering, or they're going to have to speed up the hypertransport memory crossbar considerably. Or maybe they could just combine the caches like Intel did, or maybe they'll somehow combine the cores.
 

DrMrLordX

Lifer
Apr 27, 2000
23,240
13,327
136
Originally posted by: RichUK
[
I was referring to AMD?s ?actual? implementation of new technology, aka an on die memory controller and the use of HT links, and therefore not carrying on using inferior data transport models, such as a single I/O bus for all communication to the CPU, namely the FSB! Yes Intel may patent quite a few technologies but that doesn?t mean they go ahead and implement them. AMD are innovative as they implemented these technologies and have been leading by example, not following behind in Intel?s foot steps.

I agree with this point in general. Given the IP-sharing agreement between Intel and AMD, it's hard to tell which advance in CPU engineering can be credited to whom these days. In the end, it boils down to whomever can release a working implementation of a technological advancement in a fashion that is advantageous to users.

VT, for example, has to be credited to Intel. It was first to market (Pacifica was not). If AMD can get this "reverse Hyperthreading" to work, they get the innovator crown, Mitosis or no Mitosis.
 

Madwand1

Diamond Member
Jan 23, 2006
3,309
0
76
Why do these threads have to always degenerate into Intel vs. AMD wars? More intesting is how this is actually supposed to work, the role of the compilers, and the role of the developers, and what performance gains may be seen, what performance costs.



 

zephyrprime

Diamond Member
Feb 18, 2001
7,512
2
81
I'm very interested in how they would handle latency issues inherent with a scheme like this. I also wonder how fine or coarse grained their approach is. I remember reading some sort of claim from toshiba/fujitsu(?) of nearly double the performance when they simulated a automatic multithreading dual core or something like that. That was just a simulation though.
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
Originally posted by: Madwand1
Why do these threads have to always degenerate into Intel vs. AMD wars? More intesting is how this is actually supposed to work, the role of the compilers, and the role of the developers, and what performance gains may be seen, what performance costs.

There were some performance notes and a lot of implementation details and compiler details about the Intel research in the Intel article quoted by TuxDave. How this may or may not relate to the AMD implementation, I don't know.
 

Gamingphreek

Lifer
Mar 31, 2003
11,679
0
81
Originally posted by: FelixDeKat
Originally posted by: Gamingphreek
although HyperThreading was just a substitute and short lived until Dual Cores were released.

Hyper Threading was no substitute. It was a "We have to cover our asses move".

-Kevin

No Kevy,

It was revolutionary. Hyperthreading was extraordinary. It allowed Intel to "Leap Ahead" at the time allowing multiple threads on a single core. Amd couldnt touch it.

No, HT and SMT in general is revolutionary. Conroe based derivatives WILL have some kind of HT implementation in the future, as will most CPU's.

Like it or not, Pentium-4 was well ahead of its time, in terms of CPU technology. The main backdraws where:

- Memory technology (because Rambus left the desktop memory market) was stagnant and could not provide the necessary banwidth to feed the Pentium-4
- The shared FSB that could also not feed the Pentium-4.
- Too much leakage and bad design on Prescott that prevented high scaling.

Researchers at IBM concluded the "optimal" pipeline length is 50 (yes 50). Researchers at Intel concluded the "optimal" pipeline length to be 30.

Revolutionary in the sense that it was a cover up. HT is merely a cover up for an extremely long instruction pipeline. It is revolutionary in that it effectively increases performance on a long pipeline.

HT is not the god send you are making out to be. It does nothing, and even hinders performance on efficient processors (ie: Short Instruction Pipelines).

-Kevin
 

dexvx

Diamond Member
Feb 2, 2000
3,899
0
0
Originally posted by: Gamingphreek
Revolutionary in the sense that it was a cover up. HT is merely a cover up for an extremely long instruction pipeline. It is revolutionary in that it effectively increases performance on a long pipeline.

HT is not the god send you are making out to be. It does nothing, and even hinders performance on efficient processors (ie: Short Instruction Pipelines).

-Kevin

HT/SMT covering up for a long pipelined Pentium-4 is a myth.

I'll give a very easy counter-arguement to debunk that myth. The IBM Power4's have shorter pipelines, and yet carry their own SMT. The sun Niagara is also a very short pipelined CPU, and it also carries its own SMT. The Itanium-2 Monecito has a mere 10 stage long pipeline, but also carries its own SMT. SMT is not a "patch" for long pipelined CPU's.

HT/SMT will also not hinder performance (beyond the spread of error) on single threaded applications when properly implemented. The Pentium-4 in its entirity is just a first-generation implementation of SMT by Intel. The main causes of single threaded degraded performance came from cache-thrashing in its very small L1, along with a small L2 in the Northwood/Foster core. Since Conroe is a NGMA, it will take sometime before it receives its own version of HT.

SMT and thread-level parallelism is the future.
 

Accord99

Platinum Member
Jul 2, 2001
2,259
172
106
Originally posted by: Gamingphreek
Revolutionary in the sense that it was a cover up. HT is merely a cover up for an extremely long instruction pipeline. It is revolutionary in that it effectively increases performance on a long pipeline.
It makes up performance for any CPU that that has execution resources not fully utilized from a single thread, which happens in nearly all applications.

HT is not the god send you are making out to be. It does nothing, and even hinders performance on efficient processors (ie: Short Instruction Pipelines).
You don't know that but if the K8 was so efficient, it wouldn't need an IMC. But it does, in order to reduce the amount of time the core is stalled waiting for data. HT does something similar, reduces the amount of time the core is stalled by allowing it work on another thread while its waiting for data. If HT and SMT only worked on long pipeline architecture, why do all three major server CPU manufacturer, IBM, Intel and Sun already have or are introducing multi-threading processors?

 

Gamingphreek

Lifer
Mar 31, 2003
11,679
0
81
I'll give a very easy counter-arguement to debunk that myth. The IBM Power4's have shorter pipelines, and yet carry their own SMT. The sun Niagara is also a very short pipelined CPU, and it also carries its own SMT. The Itanium-2 Monecito has a mere 10 stage long pipeline, but also carries its own SMT. SMT is not a "patch" for long pipelined CPU's.

None of which are x86.

You cannnot compare the two.

You don't know that but if the K8 was so efficient, it wouldn't need an IMC. But it does, in order to reduce the amount of time the core is stalled waiting for data. HT does something similar, reduces the amount of time the core is stalled by allowing it work on another thread while its waiting for data. If HT and SMT only worked on long pipeline architecture, why do all three major server CPU manufacturer, IBM, Intel and Sun already have or are introducing multi-threading processors?

Do you even know what you are talking about or are you just feeding off of devx.

First off i never said that you COULDN'T use HT on a short core it just doesn't make sense. Apprarently you dont know how it works.

In a long pipeline one packet is sent through. It take so long to go through that much of the pipeline is not working and is idle. Additionally, if there were a cache miss then it would have to do the same thing over again. HT staggers packets and sends another behind the first packet. Therefore the pipeline is working as close to theoretical as possible all the time.

In a short pipeline the pipeline is in use most of the time. There would be no point in trying to jam another packet in there; it would only delay the other packets. Additionally, a cache miss or a branch misprediction is MUCH MUCH less costly on a short pipeline.

If you understand HT this is common knowledge...

-Kevin
 

FelixDeCat

Lifer
Aug 4, 2000
31,295
2,790
126
Originally posted by: dexvx
Originally posted by: Gamingphreek
Revolutionary in the sense that it was a cover up. HT is merely a cover up for an extremely long instruction pipeline. It is revolutionary in that it effectively increases performance on a long pipeline.

HT is not the god send you are making out to be. It does nothing, and even hinders performance on efficient processors (ie: Short Instruction Pipelines).

-Kevin

HT/SMT covering up for a long pipelined Pentium-4 is a myth.

I'll give a very easy counter-arguement to debunk that myth. The IBM Power4's have shorter pipelines, and yet carry their own SMT. The sun Niagara is also a very short pipelined CPU, and it also carries its own SMT. The Itanium-2 Monecito has a mere 10 stage long pipeline, but also carries its own SMT. SMT is not a "patch" for long pipelined CPU's.

HT/SMT will also not hinder performance (beyond the spread of error) on single threaded applications when properly implemented. The Pentium-4 in its entirity is just a first-generation implementation of SMT by Intel. The main causes of single threaded degraded performance came from cache-thrashing in its very small L1, along with a small L2 in the Northwood/Foster core. Since Conroe is a NGMA, it will take sometime before it receives its own version of HT.

SMT and thread-level parallelism is the future.


I completely agree, this myth has been foisted by those who do not fully realize its brilliance. By those who must push their own myth that "amd rulzored". It didnt until x2 came around. Thats why Intel P4 Northwood w/Hyperthreading kicked ass when it came to encoding because of the efficient use of processor resources.

/thread.
 

RichUK

Lifer
Feb 14, 2005
10,341
678
126
Well lets hope for Intel?s sake if they do indeed intend to implement another spin of SMT, that Microsoft?s Vista OS Scheduler is a lot more efficient compared to its current form in XP.

With the current Intel dual core XE chips (955 XE etc), and with the current Windows XP scheduler, it wasn?t able to work out which of the 4 cores were real or virtual. This resulted in multi threaded apps or multiple progs running on one core whilst utilising SMT, rather than using both of the physical processing cores first in SMP. Which obviously results in actual utilisation of a normal equivalently clocked P4 with HT, rather than a dual core.

Maybe there will be a way for an MS OS to actually identify the physical cores of a processor (or maybe it already can?), and force priority on these physical cores before SMT trips in. If not then this would render SMT useless unless you fully loaded the OS and processor with simultaneous tasks. After this argument, quad cores and multi cores will arrive, so I don?t think Intel has made any room for SMT in the near future. Yes it is a useful technique to implement for the right sort of processor (netburst derivative), but not a technique that is needed to help excel an already efficient design, ?Conroe? for example.

In all honesty I really cant see what use an SMT implementation is going to have on any future processors. From what I can see with Intel?s future chips, there isn?t going to be any need for SMT on its shorter stage pipelined processors.
 

Accord99

Platinum Member
Jul 2, 2001
2,259
172
106
Originally posted by: Gamingphreek


Do you even know what you are talking about or are you just feeding off of devx.

First off i never said that you COULDN'T use HT on a short core it just doesn't make sense. Apprarently you dont know how it works.

In a long pipeline one packet is sent through. It take so long to go through that much of the pipeline is not working and is idle. Additionally, if there were a cache miss then it would have to do the same thing over again. HT staggers packets and sends another behind the first packet. Therefore the pipeline is working as close to theoretical as possible all the time.
I advise you to do some reading. Each pipeline stage is operating on different instructions at once. There's no difference in the theoretical throughput of a 1 stage pipeline and a 100 stage pipeline. The relative latency to execute an instruction will be higher for the 100 stage pipeline, but then it will also clock higher.

You don't send one instruction through the pipeline until its done, you send them every cycle and both the P4 and K8 can issue up to 3 instructions per cycle. This limit is rarely reached in practice for a single application, hence the ability for SMT to aid in work.

Additionally, a cache miss or a branch misprediction is MUCH MUCH less costly on a short pipeline.
Depends, the high pipeline design can usually clock faster and while it may take more cycles to recover, each cycle will take less absolute time.

If you understand HT this is common knowledge...
It's quite clear you don't understand HT.

 

Accord99

Platinum Member
Jul 2, 2001
2,259
172
106
Originally posted by: RichUK
Yes it is a useful technique to implement for the right sort of processor (netburst derivative), but not a technique that is needed to help excel an already efficient design, ?Conroe? for example.
Conre is "efficient", but it's also very powerful with a large number of integer and FP execution resources. This also works well with SMT.
 

RichUK

Lifer
Feb 14, 2005
10,341
678
126
Originally posted by: Accord99
Originally posted by: RichUK
Yes it is a useful technique to implement for the right sort of processor (netburst derivative), but not a technique that is needed to help excel an already efficient design, ?Conroe? for example.
Conre is "efficient", but it's also very powerful with a large number of integer and FP execution resources. This also works well with SMT.

I don?t understand HT well enough to have said for sure whether Conroe will or will not benefit from the use of SMT. I assumed not due to the general consensus with shorter pipelined processors and from my current understanding. Anyway Conroe is going to be a 4 issue core, from what i have read.

If so would that be where such benefit would come in, being that it can issue more instructions to the pipeline, and thus efficiently utilise SMT?

Again the OS will need to be able to manage these threads efficiently otherwise it would go to waste.
 

dexvx

Diamond Member
Feb 2, 2000
3,899
0
0
Originally posted by: Gamingphreek
None of which are x86.

You cannnot compare the two.

The instruction set of a chip has nothing to do with with fundamental computer science architecture.

Originally posted by: Gamingphreek
Do you even know what you are talking about or are you just feeding off of devx.

First off i never said that you COULDN'T use HT on a short core it just doesn't make sense. Apprarently you dont know how it works.

Apparently, you have no idea how SMT works. Whether its long or short pipelined makes no difference in SMT ideology.

Originally posted by: Gamingphreek
In a long pipeline one packet is sent through. It take so long to go through that much of the pipeline is not working and is idle. Additionally, if there were a cache miss then it would have to do the same thing over again. HT staggers packets and sends another behind the first packet. Therefore the pipeline is working as close to theoretical as possible all the time.

In a short pipeline the pipeline is in use most of the time. There would be no point in trying to jam another packet in there; it would only delay the other packets. Additionally, a cache miss or a branch misprediction is MUCH MUCH less costly on a short pipeline.

If you understand HT this is common knowledge...

I suggest you re-read the fundamentals of SMT/HT if you really think that is what HT/SMT is.
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: dexvx


Apparently, you have no idea how SMT works. Whether its long or short pipelined makes no difference in SMT ideology.

True...but it does make a difference on the rewards/cost ratio.
SMT (both Intel's and IBM's) add both a die penalty (~5-6%) as well as a latency penalty. If it we're only the die penalty, I would bet that AMD would have instituted one of those SMT technologies. But latency penalties are death for current Athlon designs (and I would bet the same is true for upcoming Conroe designs at sub-3 Ghz), while on higher-clocked/less-efficient chips like Netburst, we've seen that latency is less important than the advantages HT can bring at times.
 

spittledip

Diamond Member
Apr 23, 2005
4,480
1
81
Quite honestly, when I first heard about dual core CPUs, I assumed that this is what they would be doing. It only makes sense, don't you think? I was disappointed when I found out they were mainly about multitasking.