hyperthreading?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0
Originally posted by: Cashmoney995
What's funny is that AMD has the patent to Hyperthreading (or atleast running multiple threads) and they dont use it. yet intel does.
AMD holds a patent on a form of course-grained multithreading for fast context switches, not SMT. Hardware multithreading techniques have existed for over two decades; there are numerous multithreading patents held by many companies, including Intel.

In all honesty Intel's HT is baloney.
Perhaps you should read this thread.

Intel has a better packaging and heatsink division therefore they can raise the clock speeds higher.
Yikes, do you really think that is what enables higher clock speed?

Their L1 data and Instruction cache is only 20k whereas AMD's is 128k.
The P4 has a 8 KB L1 data cache in addition to a 12K entry instruction trace cache. The trace cache achieves a hit-rate similar to that of a traditional 32 KB L1 instruction cache. As for the L1 data cache, the P4's 8 KB L1D is smaller than the Athlon's 64 KB L1D, but it has a higher set-associativity (4-way on the P4, 2-way on the Athlon) and a lower access time (2 cycles on the P4, 3 cycles on the Athlon). The P4's L1D yields about twice the miss-rate than that of the Athlon, yet can be accessed faster and is backed by a lower latency L2 cache. It's a design decision, don't assume there isn't a reason for it. In the end, the P4 and Athlon achieve similar average access times (in cycles) with their on-die (L1 + L2) cache hierarchy...each uses a cache hierarchy that suits the microprocessor.

However they can not process as much data at the same time.
Again, don't assume there isn't a reason for the design decisions. And don't assume that the P4 was the first microprocessor to take the higher-clock-lower-CPI route versus the lower-clock-higher-CPI route. This "speedracer" vs. "brainiac" paradigm was made famous in the early-to-mid 90s, when the Alpha EV4 and EV5 held a 2X to 3X gap in clock speed over its similarly performing HP PA-RISC and IBM POWER rivals. The 200 MHz EV4 was about 10% faster than the 66 MHz POWER2 and 20% faster than the 100 MHz PA-RISC 7100. Performance is what matters in the end, don't get so hung up over the design decisions made to achieve it.
 

rjain

Golden Member
May 1, 2003
1,475
0
0
Originally posted by: rimshaker
AMD won't do it, they'll have to do a complete redesign of the Ahtlon core. HT requires a very long pipeline, which would totally defeat any IPC advatages the Athlon cores have.
Wrong. The biggest gains from SMT/HT are from allowing a second thread to take sneak in when there are pipeline stalls due to either too little ILP in the main thread or a cache miss. A long pipeline makes it harder to get lots of ILP in your code, but that doesn't stop you from having an algorithm that underutilizes specific execution units. One thread dominatied by ALU ops and another dominated by FPU ops is a great way to get the most out of SMT/HT, regardless of pipeline length (and regardless of CPU design other than having separate execution units for ALU and FPU ops).
 

rimshaker

Senior member
Dec 7, 2001
722
0
0
Originally posted by: rjain
Originally posted by: rimshaker
AMD won't do it, they'll have to do a complete redesign of the Ahtlon core. HT requires a very long pipeline, which would totally defeat any IPC advatages the Athlon cores have.
Wrong. The biggest gains from SMT/HT are from allowing a second thread to take sneak in when there are pipeline stalls due to either too little ILP in the main thread or a cache miss. A long pipeline makes it harder to get lots of ILP in your code, but that doesn't stop you from having an algorithm that underutilizes specific execution units. One thread dominatied by ALU ops and another dominated by FPU ops is a great way to get the most out of SMT/HT, regardless of pipeline length (and regardless of CPU design other than having separate execution units for ALU and FPU ops).

The P4 core has a long 20-stage pipeline. Care to explain why they chose that? It was ridiculed as inefficient way back when the P4 first came out and there was so much bashing between the P4 and Athlon core designs... only to find out much later that it was to implement the HT feature we all know today. A long pipelline is bad when a cache miss occurs... but it's needed for multiple-process features like HT.

 

rjain

Golden Member
May 1, 2003
1,475
0
0
They had a 20-stage pipeline because that would allow them to scale the frequency more easily. You don't EVEN need a pipeline to have SMT/HT.
 

imgod2u

Senior member
Sep 16, 2000
993
0
0
The P4 core has a long 20-stage pipeline. Care to explain why they chose that? It was ridiculed as inefficient way back when the P4 first came out and there was so much bashing between the P4 and Athlon core designs... only to find out much later that it was to implement the HT feature we all know today. A long pipelline is bad when a cache miss occurs... but it's needed for multiple-process features like HT.

Not quite, high frequency is bad when a cache miss occurs, longer pipeline architectures merely tend to have higher frequencies. The only thing hyperpipelining really effects (assuming you're just extending the pipeline) is branch mispredicts.