• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

How does hyperthreading work when...

AmberClad

Diamond Member
Ok, so I know that with hyperthreading, there're two logical procs. So what happens when you run an app that needed more than half the processing capability of the whole CPU (i.e. more than one logical proc can provide)?

Surely it isn't limited to half the frequency. I'm assuming in that case, it would act as a regular single processor with the full frequency, and not two procs. Is this programmed into the OS or the hardware?

Basically started wondering about this question when I was playing a game that seemed to require ~55% CPU utilization.
 
The 2nd logical processor only gets the resources left over by the 1st logical processor.

Let's say that a CPU can do 2 multiplications simultaneously:

Example 1) in a some cases, a program will only be able to use 1 of these multiplication units (because there are lots of other commands between multiply commands, and the commands can't be reordered because the later operations depend on results from earlier ones). In this case, one of the Multiply units will be free most of the time, so the 2nd logical proc will be able to use it.

Example 2) Lets say a program is very heavily optimised, and can keep both multiply units busy at all times. In this case, if the 2nd logical proc needs to perform a multiply, it can't because the multiply units are busy. Instead, the 2nd logical processor will stall until a multiply unit becomes available (althernatively, the 1st logical proc will stall to allow the 2nd access to the multiply).

Note that because hte processors stall, they will appear to the OS to be busy. If I ran one copy of Program (2) it will appear to use 100% of 1 proc and 0% of the other (average of 50%), and may benchmark at 1,000,000 operations / sec. However, if I run 2 copies it will appear to use 100% of both procs (100% average) but the overall speed will be no better - each program may benchmark at 500,000 ops/sec (the CPUs appear to be busy, but in reality they are stalling as the 2 threads fight over the scarce multipliers).
 
Think of a CPU pipeline as an assembly line. Depending on how fast or slow things are moving along, some of the stages along the assembly line might not be doing anything. We can create a second "virtual" assembly line that uses those idle stages. Basically how hyperthreading works is that Intel takes advantage of the fact that many pipeline stages will have idle time here and there. This wouldn't work well on a CPU with less pipeline stages like any AMD Athlon CPUs.
 
Originally posted by: Bassyhead
Think of a CPU pipeline as an assembly line. Depending on how fast or slow things are moving along, some of the stages along the assembly line might not be doing anything. We can create a second "virtual" assembly line that uses those idle stages. Basically how hyperthreading works is that Intel takes advantage of the fact that many pipeline stages will have idle time here and there. This wouldn't work well on a CPU with less pipeline stages like any AMD Athlon CPUs.

No!
An instruction must progress through the entire pipeline. Instructions from other threads cannot skip pipeline stages to fill in 'idle stages'.
The point of Hyperthreading is to keep the execution units busy. By providing the schedulers with two independent instruction streams (threads) to pick instructions from, the chances of the schedulers not being able to find instructions that can be executed in parallel are reduced, and therefore so are the chances of having idling execution units.
Execution width determines suitablility for SMT, not pipeline depth!

Regarding the Athlon 64, it is in my opinion a good candidate for SMT. It is able to decode three instructions per clock, but is only able to sustain an average IPC of three if its Out-Of-Order engine can find three instructions to execute in parallel every clock cycle, which is rarely the case.

With the advent of multicore processors, the importance of core-level SMT has been reduced, certainly from a multitasking perspective.
An often overlooked (or unknown) fact is that the main reason Intel introduced Hyperthreading is not to address any design weakness of the P4, but to encourage software developers to start writing multithreaded applications, allowing performance to scale with additional cores in the future.
 
Back
Top