hyperthreading?

PowerYoga · Oct 27, 2003

is this pentium exclusive? Or can AMD do it too? (just generally asking)

rjain · Oct 27, 2003

it is possible to do it on any CPU. The first CPU to have SMT should have been the Alpha EV8, but HP killed that when it bought out Compaq.

Fencer128 · Oct 27, 2003

Originally posted by: rjain
it is possible to do it on any CPU. The first CPU to have SMT should have been the Alpha EV8, but HP killed that when it bought out Compaq.

Whilst no doubt correct (I don't know but it sounds reasonable enough) I think the initial question was referring to "does hyperthreading exist on AMD cpus"? If so, then the answer is unfortunately "no".

Cheers,

Andy

rjain · Oct 27, 2003

I interpreted it as asking whether hyperthreading is possible for AMD to implement on its CPUs. In any case... 🙂

Fallen Kell · Oct 27, 2003

I would interprete it the same as rjain. Yes, it would be possible to implement, but AMD just hasn't done so. In reality, they might not ever need to implement. I remember reading that AMD was experimenting with multi-core processors, which instread of being hyperthreaded, would be true multi-threaded.

Hyperthreaded is not a bad idea, especially with Intel's current line of P4 processors. Analysis has shown that many of the CPU's clock cycles were being wasted on the P4 architecture while waiting for I/O or flushing the processor pipeline to allow a different set of operations to be performed. So they pretty much gave the processor a second pipeline (well almost certain parts of the processor can still only perform an operation/action on one of the pipelines at a time so there are still bottlenecks). The addition of this additional pipeline allows the P4 with hyperthreading to be able to cut back on the wasted clock cycles by allowing another "thread" of operations to be executed while another thread stopped because of I/O, or a flush (or any number of reasons that would cause the pipeline to be delayed from being processed).

This allows the impression of a multi-threaded hardware. It is not true multi-threaded as that would involve being able to process 2 or more threads concurrently (i.e. at the same time), while hyperthreading can still only actually process one thread at any given time. AMD's multi-core system would be true multi-threaded and not hyperthreaded. Hyperthreading was simply a way to increase performance of the P4 by utilizing more of the wasted clock cycles.

AMD's CPU's have tended to be more efficient in terms of utilizing more of its clock cycles then Intel's P4 line. Partly due to the pipeline structure as well as allowing more time for each clock cycle to complete (i.e. a faster Hz rating means there is less time availabe for a clock cycle to complete, and with I/O (i.e. reading from memory, or hard drive, etc) taking a certain amount of time, multiple clock cycles are wasted while waiting for that I/O to complete, whereas with a slightly longer clock cycle, more I/O has a chance to complete in less clock cycles, thus not wasting as many cycles as the faster clock cycle system might waste). Hyperthreading was a way to make the P4 more efficient in its operations, and would not necessarily make ALL CPU's more efficient.

rjain · Oct 27, 2003

hyperthreading processes multiple threads at once. different execution units are executing instructions from different threads. the parallelism is realized on the level of execution units instead of the whole CPU, as it is in multiprocessing. multi-core CPUs are simply multiprocessing made cheap by integrating both CPUs on one die and possibly sharing caches. the POWER5 is both multi-core and multi-threaded.

Lynx516 · Oct 27, 2003

Fallen Kell your right on how Hyperthreading would not help an Athlon much. However your descrition of the implementation leaves much to be desired.

To understand Hyperthreading you need to understand how a multiple issue pipline works (Supersaler)
Basically you have lots of execution units (ALU,FPU e.t.c)
And each clock you issue multiple instrucitons if you can. (P4 can issue 4 IIRC) However most of the time all the execution units are not used because there are not enough issues per clock to fill them or a thread only wants to use the ALUs while the FPUs sit idle.

What Hyperthreding does is it takes advantages of these free execution units and runs another thread in the free units therefore increasing efficiency

PentiumIV · Nov 12, 2003

Hyperthreading (aka SMT- Simultaneous Multi-threading) would require for AMD
to do a complete redesign/revalidation FROM SCRATCH, as opposed to multi-core
implementation.

The main advantage of hyperthreading is ~30% extra performance for multitherading/multitasking
for only ~5% (I may be wrong, but this is the scale) of die area. The disadvantage is the EXTREME
complexity of thorough validation.

Lynx516 · Nov 12, 2003

it does not require a redesign from scratch at all. Al it requires is to put in another PC (Program Counter) which will tell teh fetch unit to get anothe instruction then the sheduler will shedule them. Just a slight redesign of the Fetch unit and another PC. Multicores are harder to implement IMHO due to some of the packaging and manufacturing requirements.

The performance gains are for the P4 around 20% for about 1% Die space. For the athlon it is probably going to be lower due to the Athlon keeping its instruction units nice and filled more of the time than the P4 (if that english makes sence)

edit: it is true that you would have to revalidate it but you could easily include Hyperthreading wiht other core improvements to limit this factor. And it isnt as if you woudl have to validate a multi core chip

rjain · Nov 12, 2003

Intel added SMT to the P4 long after it was released.

Adding more internal registers (for renaming) is important to allowing SMT to do its job most effectively.

Matthias99 · Nov 12, 2003

If things I've read are true, the earlier P4s also have the components for hyperthreading -- they're just disabled except on the HT models. Either it didn't work right/fast enough early on, or the OS support wasn't there, or they just decided for marketing reasons to not have it available right away. *All* new P4s have the HT capability there on the die, but disabled -- just like Celerons are basically P4s with some of the cache disabled, and running at a lower clock. So it's not like Intel decided halfway through the product line to totally redesign the core.

SuperTool · Nov 12, 2003

If true hat would explain the low area increase if the logic was already there but disabled. What percent of area is the multithreading logic overhead?

rjain · Nov 13, 2003

The only logic needed to support SMT is the stuff that's already part of a pipelined, superscalar, out-of-order processor, other than the stuff already mentioned. That's why it's such a successful technology.

Sahakiel · Nov 14, 2003

I don't really see Hyperthreading as SMT. If you take a look at what each pipeline in the P4 actually does, you'll see what I mean. SMT to me is like preemptive multi-tasking in hardware. Hyperthreading is more like cooperative multi-tasking.

rjain · Nov 14, 2003

Sahakiel: At what level is it cooperative? Are you saying that the pipeline has to explicitly request an instruction from the secondary thread vs. the primary thread? If it wants an instruction from the primary thread but there are none ready to execute, then it stalls?

Sahakiel · Nov 14, 2003

I'm saying that with the P4, there are not enough resources for SMT.
Both floating point pipes do different tasks, which means if you have two threads that need both pipes they have to get in line.
Integer is not much better. While there are two ALU's for fast arithmetic, everything else is limited to one pipe each. Loads, Stores, and slower ALU ops are relegated to one ALU each.

The way I see SMT is that even if you have two threads running identical operations you can run them at exactly the same time. This means twice the pipes for every type of operation. That's why I liken it to preemptive multi-tasking where the holy grail is everything running at the same time with no one thread hogging all the resources. In software every thread is given dedicated execution time even if it uses it fully or not.

The way I understand HyperThreading, the second thread is only allowed to execute when resources are free. That sounds like cooperative multitasking where one task hogs the resources it needs until its done. This way, more resources are in use at any given time, but the problem is that most other operations are effectively frozen.

SMT is like going one step shy of multi-core. Hyperthreading is going one step shy of SMT. It's pretty damn close in that not all threads will require full CPU resources, but with compilers trying to do exactly that, it's not close enough. That's how I see it, but I'll admit I'm still learning a lot so I could be wrong.

rjain · Nov 15, 2003

Sahakiel: The alpha also has limited execution resources. If there are no more ALUs free for this cycle, no ALU ops can be issued.

SMT most definitely does not allow identical threads to be run at the same time with a 2x speedup. If you're lucky, the threads will manage to interleave their usage of execution resources, but that means that the code didn't have much ILP to begin with, so SMT just lets the CPU use TLP to make up for that.

Sohcan · Nov 15, 2003

Originally posted by: Sahakiel
I'm saying that with the P4, there are not enough resources for SMT.
...
The way I see SMT is that even if you have two threads running identical operations you can run them at exactly the same time. This means twice the pipes for every type of operation. That's why I liken it to preemptive multi-tasking where the holy grail is everything running at the same time with no one thread hogging all the resources. In software every thread is given dedicated execution time even if it uses it fully or not.

The way I understand HyperThreading, the second thread is only allowed to execute when resources are free. That sounds like cooperative multitasking where one task hogs the resources it needs until its done. This way, more resources are in use at any given time, but the problem is that most other operations are effectively frozen.

Your understanding is a bit off. SMT is defined as the ability to issue instructions from multiple threads within the same cycle, and this is exactly what HT on the P4 does. There are still numerous design decisions involved, so the P4's SMT implementation will differ from another microprocessor's implementation...but it is still SMT, regardless of the back-end configuration or front-end design decisions with respect to SMT. Other design decisions (such as the number of execution units) may effect the impact of SMT, but they don't change its presence.

Keep in mind that only the front-end of the pipeline is thread-aware; to a large extent the back-end is thread agnostic. Once instructions are go through register renaming, an instruction from one thread sitting in a scheduling buffer, waiting for its operands to be ready so that it can be executed, is no different from an instruction in the same buffer from another thread. So an instruction stall due to a two threads over-utilizing a particular type of execution resource is no different than a single thread with an over-abundance of instruction-level parallelism behaving the same way.

imgod2u · Nov 15, 2003

This allows the impression of a multi-threaded hardware. It is not true multi-threaded as that would involve being able to process 2 or more threads concurrently (i.e. at the same time), while hyperthreading can still only actually process one thread at any given time. AMD's multi-core system would be true multi-threaded and not hyperthreaded. Hyperthreading was simply a way to increase performance of the P4 by utilizing more of the wasted clock cycles.

As I recall, the SMT implementation on the P4 is perfectly capable of multiple threads being processed concurrently. The only limitation to this I can see is the single-decoder front-end which can only decode one x86 instruction at any given clock. However, I don't think this limits it to only one thread at any given time as the majority of time, instructions are issued from the trace cache, not decoded. The trace cache can issue up to 3 micro-ops every cycle (well, 6 micro-ops every 2 cycle), and those 3 micro-ops can be from any of the two threads running.

AMD's CPU's have tended to be more efficient in terms of utilizing more of its clock cycles then Intel's P4 line. Partly due to the pipeline structure as well as allowing more time for each clock cycle to complete (i.e. a faster Hz rating means there is less time availabe for a clock cycle to complete, and with I/O (i.e. reading from memory, or hard drive, etc) taking a certain amount of time, multiple clock cycles are wasted while waiting for that I/O to complete, whereas with a slightly longer clock cycle, more I/O has a chance to complete in less clock cycles, thus not wasting as many cycles as the faster clock cycle system might waste). Hyperthreading was a way to make the P4 more efficient in its operations, and would not necessarily make ALL CPU's more efficient.

While this is true, it is true of all processors, not just P4 or Athlon. A 2 GHz Athlon would be inherently "less efficient" every clock cycle than a 1.8 GHz Athlon. As you'll notice from the scaling of the Athlon and P4 (Anand's review of the 3.2 P4 showed a 59.65% scaling of the P4 from 3.0 to 3.2 GHz and a 55.93% scaling of the Athlon Barton from 1.83 to 2.20 GHz), this is a problem for all processors.

However, this isn't neccessarily all the impact SMT has. Often, even without I/O or memory latency problems, not all resources on the MPU is neccessarily utilized. Due to many things such as data dependencies and instruction decode limitations, ILP cannot always be extracted to max out the processor's parallel execution resources (in modern superscalar MPU's). SMT solves this as well.
For the Athlon, SMT could be very helpful, perhaps even more so than the P4 as the Athlon has much more parallel execution units than the P4 does (3-6 issue front-end and 3-way decoder vs 3-issue front-end and single decoder, 9 parallel issue ports and execution units vs 6 issue ports and 7 execution units). Assuming (and not accurately) that the Athlon is achieving a 33% higher clock-normalized performance, that still doesn't mean it's utilizing the 33% more execution resources it has compared to the P4.

rimshaker · Nov 15, 2003

AMD won't do it, they'll have to do a complete redesign of the Ahtlon core. HT requires a very long pipeline, which would totally defeat any IPC advatages the Athlon cores have.

Sahakiel · Nov 16, 2003

Originally posted by: Sohcan

Your understanding is a bit off. SMT is defined as the ability to issue instructions from multiple threads within the same cycle, and this is exactly what HT on the P4 does. There are still numerous design decisions involved, so the P4's SMT implementation will differ from another microprocessor's implementation...but it is still SMT, regardless of the back-end configuration or front-end design decisions with respect to SMT. Other design decisions (such as the number of execution units) may effect the impact of SMT, but they don't change its presence.

Keep in mind that only the front-end of the pipeline is thread-aware; to a large extent the back-end is thread agnostic. Once instructions are go through register renaming, an instruction from one thread sitting in a scheduling buffer, waiting for its operands to be ready so that it can be executed, is no different from an instruction in the same buffer from another thread. So an instruction stall due to a two threads over-utilizing a particular type of execution resource is no different than a single thread with an over-abundance of instruction-level parallelism behaving the same way.

I think I begin to understand, now. What HyperThreading does is bring SMT functionality to superscaler architectures. Whereas I was thinking that SMT would be designing mult-threading to be along the lines of a single CPU with a single front-end and multiple pipelines in non-superscalar fashion.

imgod2u · Nov 16, 2003

I think I begin to understand, now. What HyperThreading does is bring SMT functionality to superscaler architectures. Whereas I was thinking that SMT would be designing mult-threading to be along the lines of a single CPU with a single front-end and multiple pipelines in non-superscalar fashion.

Simultaneous Multithreading, as the name implies, is the ability to run multiple threads concurrently. The original term SMT was coined in a paper by Dean Tullsen, Susan Eggers and Hank Levy as I recall. Here's a link. It was specifically said to be the ability to execute multiple threads on a superscalar architecture.

Cashmoney995 · Nov 17, 2003

What's funny is that AMD has the patent to Hyperthreading (or atleast running multiple threads) and they dont use it. yet intel does. In all honesty Intel's HT is baloney. Intel has a better packaging and heatsink division therefore they can raise the clock speeds higher. However they can not process as much data at the same time. Their L1 data and Instruction cache is only 20k whereas AMD's is 128k.

AbsolutDealage · Nov 17, 2003

What's funny is that AMD has the patent to Hyperthreading (or atleast running multiple threads) and they dont use it. yet intel does.

References?

In all honesty Intel's HT is baloney.

That's rubbish. Baloney? So how do you explain the real world performance gain?

AbsolutDealage · Nov 17, 2003

What's funny is that AMD has the patent to Hyperthreading (or atleast running multiple threads) and they dont use it. yet intel does.

References?

In all honesty Intel's HT is baloney.

That's rubbish. Baloney? So how do you explain the real world performance gain?

hyperthreading?

Diamond Member

Golden Member

Platinum Member

Golden Member

Diamond Member

Golden Member

Senior member

Member

Senior member

Golden Member

Diamond Member

Lifer

Golden Member

Golden Member

Golden Member

Golden Member

Golden Member

Platinum Member

Senior member

Senior member

Golden Member

Senior member

Senior member

Platinum Member

Platinum Member