• We should now be fully online following an overnight outage. Apologies for any inconvenience, we do not expect there to be any further issues.

Discussion Please enlighten me on SMT.

whm1974

Diamond Member
Jul 24, 2016
9,436
1,571
126
My understanding of SMT is somewhat limited and certainly nowhere in detail, but isn't there a cost in using it? Such as reduced IPC and clockspeeds, along with higher power consumption? Please feel free to correct me if I am misunderstanding this.

Or am I totally wrong on this?
 

NTMBK

Lifer
Nov 14, 2011
10,455
5,842
136
My understanding of SMT is somewhat limited and certainly nowhere in detail, but isn't there a cost in using it? Such as reduced IPC and clockspeeds, along with higher power consumption? Please feel free to correct me if I am misunderstanding this.

Or am I totally wrong on this?

It increases the core's IPC, because having two threads worth of instructions to process means it can fill gaps left by e.g. cache misses in one thread. But yes, the IPC for each individual thread can drop, because they have to share resources with another thread.

As for power consumption and clockspeed... well, I don't think anyone has got power or clock gating fine enough to make the extra utilisation make that much difference. Though I guess the increase in overall data movement (due to the increased IPC) would increase power consumption a little bit.
 

whm1974

Diamond Member
Jul 24, 2016
9,436
1,571
126
It increases the core's IPC, because having two threads worth of instructions to process means it can fill gaps left by e.g. cache misses in one thread. But yes, the IPC for each individual thread can drop, because they have to share resources with another thread.

As for power consumption and clockspeed... well, I don't think anyone has got power or clock gating fine enough to make the extra utilisation make that much difference. Though I guess the increase in overall data movement (due to the increased IPC) would increase power consumption a little bit.
Thanks. That clears my understanding.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
Though I guess the increase in overall data movement (due to the increased IPC) would increase power consumption a little bit.

Power comsumption is proportional to the throughput, if SMT provide 40% gain then the CPU will use 40% more power.
 
  • Like
Reactions: amd6502

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,884
3,324
146
There is a small penalty in clockspeeds or probably more accurately clock potential. It's been a trick used for overclocking to disable SMT to squeeze out an extra 100-200 MHz in benchmarks that don't benefit from it.

I'm not sure if or how much it affects retail clockspeed determination.
 

maddie

Diamond Member
Jul 18, 2010
5,157
5,545
136
Power comsumption is proportional to the throughput, if SMT provide 40% gain then the CPU will use 40% more power.
Not sure if I agree. I remember when HT was first introduced, Intel claimed that this allowed for an increase in perf/W so you get X% power increase for X+% performance. This still holds true, ASFAIK.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
Not sure if I agree. I remember when HT was first introduced, Intel claimed that this allowed for an increase in perf/W so you get X% power increase for X+% performance. This still holds true, ASFAIK.

At the time gain was marginal with a CPU that was a power hog, with current gains and the cores being huge this is clealy visible in the power comsumption numbers, check on AMD CPUs after taking account of the voltage dfference.

FI the 2400G consume no more than the 2200G in Cinebench but this is due to the latter s voltage that is 18% higher, hence 40% higher power at same throughput and equal power at 30% less throughput.

You can also compare a R3 1200/1300X and a R5 1400/1500...
 
Last edited:

Roland00Address

Platinum Member
Dec 17, 2008
2,196
260
126
My understanding of SMT is somewhat limited and certainly nowhere in detail, but isn't there a cost in using it? Such as reduced IPC and clockspeeds, along with higher power consumption? Please feel free to correct me if I am misunderstanding this.

Or am I totally wrong on this?
As a metaphor. Imagine you are in a kitchen and you are doing baking. And during the baking you have multiple electronic tools to speed up the process.

SMT is recognizing electronic kitchen tool A is not linked to electronic kitchen tool B which is not limited to electronic kitchen tool C. Thus if you have a baking job that is not using tool A or B for it is in tool C step of the pipeline it is okay to have a separate baking job utilize to use tool A and B but especially tool A. If you do not use tool A or tool B then the CPU is idling those parts of the CPU but not a perfect idle so you might as well utilize these resources.

-----

SMT is a trade off of trying to maximize die area and not waste die area with parts of the chip that are not being utilized by the job, but at the same time overbuilt SMT is wasting die area.

For example you are not limited to 2 threads per core SMT, but can do higher amount of threads per core than 1 or 2. 2010's IBM POWER7 are 4 threads per core, and 2014 POWER8 are 8 threads per core. But the thing you have to remember about POWER7 and POWER8 are they are very beefy cores, and doing something like that on Intel's Atom does not make sense. Sometimes it makes more sense to just add more cores and other times you want more threads per core to maximize die area. There is no magic ideal zone and it depends on the architecture and how it is designed and what trade-offs the architecture made.






I am sorry there is no always use this mental shortcut / mental shorthand on whether hyperthreading (a form of SMT) is better for you, or better for intel, or better for the other chip makers. Usually hyperthreading gives you more performance but sometimes it does not. Sometimes your software you are running wants an extra 100 mhz or 200 mhz for it is not well threaded and if you have a 4 ghz part 200 more mhz is 5%+ more performance. That said usually you can get more than 5% more performance via hyperthreading. Furthermore whether it is a power saver or power gainer is dependent on whether you can get that extra performance without increasing the voltage for when you double the voltage you quadruple the power consumption (if you keep the ghz constant) and thus a 1.1x increase in voltage is a 21% increase in power consumption. Thus whether it saves power or not is dependent on whether does increasing more mhz will cause you to need to increase the voltage faster, or does adding hyperthreading allow you to keep the mhz low, and keep the voltage low, but increase better utilization of the core and thus increase IPC for the same clockspeed.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
For a terse, but reasonably accessible explanation of SMT, one can look at this wiki article https://en.m.wikipedia.org/wiki/Simultaneous_multithreading.

The higher the chip area dedicated to SMT per core, the better the performance (for 'well behaving' apps). Generally, better performing SMT implementations have more shared resources per thread in each core. Intel uses a fairly light SMT implementation that costs very little in die area and power. HT performance is better where there is low ILP, but generally, again, due to the resource light approach, yields relatively small gains. Intel has increasingly dedicating more chip area to high ILP (a wider execution stage) than to more shared execution resources.

I haven’t analyzed AMD's SMT implementation, but given the higher SMT throughput, there must be more shared resources in each core.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
SMT improves throughput by converting TLP into ILP. That is all it is meant to do and it comes with a lot of negatives. Of which, it takes complexity creep to negate the negatives. SMT and OOO comes with more negatives as the parallelisms compete. Of which to fix, requires even more complexity creep!
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
I haven’t analyzed AMD's SMT implementation, but given the higher SMT throughput, there must be more shared resources in each core.

The Intel implementation has more shared structures than AMD's. Neither can be conclusively said to be better. Dedicating more structures per thread like AMD does can improve gains with SMT. Shared structures are for allowing more resources when single threaded, allowing for few % more performance.

Both AMD/Intel implementations are quite minimal and add about 5% to core area. IBM's SMT not only allows for more threads, but they added heaps to improve gains in SMT. IBM's SMT added 40% to core area. It allows IBM's version to gain a lot more, and in more applications than Intel/AMD's version.

Consumer usage is also much more sensitive to any loss in single thread performance, and I think that's a big reason why 2T/Core limit exists and probably a driving factor in AMD/Intel implementation.

Power comsumption is proportional to the throughput, if SMT provide 40% gain then the CPU will use 40% more power.

SMT is still a very effective way of increasing perf/watt and perf/mm2. Increasing performance by adding execution resources gain performance in a sublinear way.