are multi-threaded cores the most logical design?

coffeemonster · Feb 28, 2017

My understanding of CPU micro-architecture is admittedly limited, but this thought occurred to me recently.

Is simultaneous multi-threading a more efficient approach to maximizing thread count on a transistor budget and keeping IPC about the same?

Consider 8 thread Ryzen 1400(quad core SMT). What if they were able to design 6-8 individual cores that were slightly smaller and narrower single threaded with similar IPC, instead of 4 larger wider SMT cores on the same transistor budget.

Would single thread IPC be directly sacrificed as a result of making the cores smaller and narrower of resources?

I'm not sure if I'm explaining my line of thought well enough, or if my lack of understanding is blatantly obvious.

In the most basic sense I ask, why design a core so full of resources that it is most efficient when processing 2 threads at a time(1 much weaker than the othe) rather than a core designed to maximize 1 thread at a time but smaller, leaner and able to fit more of these on a similar die?

whm1974 · Feb 28, 2017

From my understanding SMT takes up very little die space and doesn't reduce IPC at all.

MajinCry · Feb 28, 2017

SMT Only shows benefits in certain scenarios.

http://www.agner.org/optimize/blog/read.php?i=6

I have made some tests of hyperthreading to see how fast each of the two threads is running. The following resources are shared between two threads running in the same core:

Cache

Branch prediction resources

Instruction fetch and decoding

Execution units

Hyperthreading is no advantage if any of these resources is a limiting factor for the speed. But hyperthreading can be an advantage if the speed is limited by something else. To be more specific, each of the two threads will run at more than half speed in the following cases:

If memory data are so scattered that there will be many cache misses regardless of whether each thread can use the full cache or only half of it. Then one thread can use all the execution resources while the other thread is waiting for a memory operand that was not in the cache.

If there are many branch mispredictions and the number of branch mispredictions is not increased much by sharing the branch target buffer and branch history table between two threads. Then one thread can use all the execution resources while the other thread is waiting for the misprediction to be resolved.

If the code has many long dependency chains that prevent efficient use of the execution units.

Edit: Also, Hyperthreading can also decrease performance. From the above source:

In these cases, each of the two threads will run at more than half speed, but less than full speed. The total performance is never doubled by hyperthreading, but it may be increased by e.g. 25%.

On the other hand, if the performance is limited by any of the shared resources, for example the instruction fetcher, the memory read port, or the multiply unit, then the total performance is not increased by hyperthreading.

Actually, in the worst cases the total performance is decreased by hyperthreading because some resources are wasted when the two threads compete for the same resources. A quick google search reveals several examples of applications that run slower with hyperthreading than when hyperthreading is disabled.

Anecdotally, there was one user over on the Tale Of Two Wastelands forum who was getting some hardcore microstutter in New Vegas, with his i7. Told him to try disabling hyperthreading, et voila.

DrMrLordX · Feb 28, 2017

coffeemonster said:
My understanding of CPU micro-architecture is admittedly limited, but this thought occurred to me recently.

Is simultaneous multi-threading a more efficient approach to maximizing thread count on a transistor budget and keeping IPC about the same?

Consider 8 thread Ryzen 1400(quad core SMT). What if they were able to design 6-8 individual cores that were slightly smaller and narrower single threaded with similar IPC, instead of 4 larger wider SMT cores on the same transistor budget.

Would single thread IPC be directly sacrificed as a result of making the cores smaller and narrower of resources?

I'm not sure if I'm explaining my line of thought well enough, or if my lack of understanding is blatantly obvious.

In the most basic sense I ask, why design a core so full of resources that it is most efficient when processing 2 threads at a time(1 much weaker than the othe) rather than a core designed to maximize 1 thread at a time but smaller, leaner and able to fit more of these on a similar die?

You should examine the POWER8 and POWER9 architectures from IBM. They basically have what your propose: a huge, ultra-wide monstrous core with the ability to handle multiple threads via SMT:

https://en.wikipedia.org/wiki/POWER9

Though in the case of POWER9, the SMT implementation is . . . distinctively different than Intel's.

tamz_msc · Mar 1, 2017

More cores is usually better. Whether you get extra benefits in SMT and HT is a bit murky, though a lot has happened since the days of Pentium 4 HT.
Here for example CSGO shows improvement with core count but loses performance when you have HT enabled:

imported_jjj · Mar 1, 2017

The question should be about power not area as it's far more important nowadays.
And i don't have an answer, never looked at perf gains vs power for SMT but chances are that it's pretty efficient.
Computing in general has to move towards parallelism and accelerators as otherwise there is no realistic path forward. The era of the CPU is over.

sm625 · Mar 1, 2017

Considering how much time the pipeline spends stalled out, it only makes sense to have SMT. I expected SMT to have a larger penalty on the Intel chips with the huge L4 cache, but it seems very little testing has been done on the 5775C with HT disabled.

TheELF · Mar 1, 2017

MajinCry said:
Edit: Also, Hyperthreading can also decrease performance. From the above source:

In these cases, each of the two threads will run at more than half speed, but less than full speed. The total performance is never doubled by hyperthreading, but it may be increased by e.g. 25%.

Actually, in the worst cases the total performance is decreased by hyperthreading because some resources are wasted when the two threads compete for the same resources. A quick google search reveals several examples of applications that run slower with hyperthreading than when hyperthreading is disabled.

Click to expand...

tamz_msc said:
More cores is usually better. Whether you get extra benefits in SMT and HT is a bit murky, though a lot has happened since the days of Pentium 4 HT.
Here for example CSGO shows improvement with core count but loses performance when you have HT enabled:

Hyperthreading will do whatever you tell it to.
Csgo uses one thread for it's main game logic and fills up the rest of the cores with worker threads that prepare graphics to give you better FPS.
If you run the game, all these threads will have the same priority causing the things agner talks about,task manager takes away time from the main thread,but this is due to bad "coding" .
Microsuks on Scheduling Priorities

The system treats all threads with the same priority as equal. The system assigns time slices in a round-robin fashion to all threads with the highest priority. If none of these threads are ready to run, the system assigns time slices in a round-robin fashion to all threads with the next highest priority. If a higher-priority thread becomes available to run, the system ceases to execute the lower-priority thread (without allowing it to finish using its time slice), and assigns a full time slice to the higher-priority thread.

Screw equal,that's not what we want we want the important threads to run as if they where on a real core all on their own.

Use HIGH_PRIORITY_CLASS with care. If a thread runs at the highest priority level for extended periods, other threads in the system will not get processor time.

That's what we want,the main thread getting all the processor power it can use and the rest but only the rest going towards rendering.

You should almost never use REALTIME_PRIORITY_CLASS, because this interrupts system threads that manage mouse input, keyboard input, and background disk flushing.

Er,well shut up, we have enough cores.

Dresdenboy · Mar 1, 2017

TheELF said:
Hyperthreading will do whatever you tell it to.
Csgo uses one thread for it's main game logic and fills up the rest of the cores with worker threads that prepare graphics to give you better FPS.
If you run the game, all these threads will have the same priority causing the things agner talks about,task manager takes away time from the main thread,but this is due to bad "coding" .
Microsuks on Scheduling Priorities

Screw equal,that's not what we want we want the important threads to run as if they where on a real core all on their own.

That's what we want,the main thread getting all the processor power it can use and the rest but only the rest going towards rendering.

From my own tests I saw no benefit from thread and process priorities regarding HT. Hyperthreading is a "humanistic" variant of SMT, where each thread is seen as being created equal. So in the CSGO case, even the best coding skills won't help, if there is still an OS in the background and sees a free logical core, which is the second logical core on a physical core, which already runs that mentioned CSGO main thread. Thus a simple BG task could slow down the CSGO thread by 20% or more.

Idea: If a software wants to avoid it, it could actually create a thread, which does nothing (but blocks disturbing threads), and puts it via affinity setting next to an important thread on a physical core. Has that been tried already?

TheELF · Mar 1, 2017

Dresdenboy said:
Idea: If a software wants to avoid it, it could actually create a thread, which does nothing (but blocks disturbing threads), and puts it via affinity setting next to an important thread on a physical core. Has that been tried already?

Real-time priority (on the thread itself) interrupts even system threads!!!
It stops any other thread from running on the same logical core(if the thread can use 100% of this core) there is exactly 0% chance of "a simple BG task could slow down the CSGO thread by 20% or more"
Read the MS link I posted:

You should almost never use REALTIME_PRIORITY_CLASS, because this interrupts system threads that manage mouse input, keyboard input, and background disk flushing.

Look at the video, there is more than an easily visible benefit from changing priorities (OF THREADS NOT TASKS(tasks also but that's an other discussion) )

Dresdenboy · Mar 1, 2017

TheELF said:
Real-time priority (on the thread itself) interrupts even system threads!!!
It stops any other thread from running on the same logical core(if the thread can use 100% of this core) there is exactly 0% chance of "a simple BG task could slow down the CSGO thread by 20% or more"
Read the MS link I posted:

Look at the video, there is more than an easily visible benefit from changing priorities (OF THREADS NOT TASKS(tasks also but that's an other discussion) )

I stand corrected. So this is a non-issue. Now about the software itself: Does it prevent other threads from disturbing one important main thread (the critical path)?

I'm still at work with the video blocked. Will watch it later.

tamz_msc · Mar 1, 2017

TheELF said:
Csgo uses one thread for it's main game logic and fills up the rest of the cores with worker threads that prepare graphics to give you better FPS.

I'm not so sure as someone mentioned in another thread how running more than 8 bots in a custom match tanked fps on his 4770k.

Besides, engine limitations can show up even more prominently with HT enabled, and I have a feeling that an older version of Source would be appropriate to test in this regard.

TheELF · Mar 1, 2017

Dresdenboy said:
I stand corrected. So this is a non-issue. Now about the software itself: Does it prevent other threads from disturbing one important main thread (the critical path)?

I'm still at work with the video blocked. Will watch it later.

If you mean process hacker it just changes the priority value,something you could do through windows,it's windows task manager that will not interfere with a thread put into real-time unless there are other threads that are equally high (lol in priority that is)

TheELF · Mar 1, 2017

tamz_msc said:
I'm not so sure as someone mentioned in another thread how running more than 8 bots in a custom match tanked fps on his 4770k.

This isn't about fps tanking, that can happen for any number of reasons,this is about threads being able to run as fast as on "normal" cores if you (wish it where the devs) tell them to do so.

ashetos · Mar 1, 2017

There are some applications, such as games, that are latency sensitive. If a game effectively uses more threads than the cores available then the OS needs to schedule how the threads are run, and assign time slices.

The scheduling can be very bad, up to 10 milliseconds of extra latency. There is also context switch overhead with worker threads, which can be almost zero if all threads are actively running and queues are fed.

TheELF · Mar 1, 2017

ashetos said:
There is also context switch overhead with worker threads, which can be almost zero if all threads are actively running and queues are fed.

Look at the CS:GO video,process hacker in the background displays context switches,they drop to 1/3 when the main thread runs at real-time because the system knows that it should not "switch away" from running the main thread.
It has nothing, or not much, to do with available cores.

ashetos · Mar 1, 2017

TheELF said:
Look at the CS:GO video,process hacker in the background displays context switches,they drop to 1/3 when the main thread runs at real-time because the system knows that it should not "switch away" from running the main thread.
It has nothing, or not much, to do with available cores.

I'm talking about active threads that are greater than the available number of cores though.

TheELF · Mar 1, 2017

Cs:go runs 65 threads...3 of them to a high degree,the cpu only has 2 (real) cores

Dresdenboy · Mar 1, 2017

TheELF said:
If you mean process hacker it just changes the priority value,something you could do through windows,it's windows task manager that will not interfere with a thread put into real-time unless there are other threads that are equally high (lol in priority that is)

As it seems, it does that thread-wise. That's the interesting thing.

HexiumVII · Mar 1, 2017

Makes me think if i get a Ryzen 8 core, i can disable HT and get max performance. I mean I get by fine with 6700k, 8 true cores would be awesome.

Dygaza · Mar 1, 2017

Worth mentioning is that you can also set affinities for different threads with process hacker. So if you run 1 or 2 cores at higher clocks than rest, you can assign your heavies threads to those cores.

dogen1 · Mar 1, 2017

tamz_msc said:
More cores is usually better. Whether you get extra benefits in SMT and HT is a bit murky, though a lot has happened since the days of Pentium 4 HT.
Here for example CSGO shows improvement with core count but loses performance when you have HT enabled:

1-4 cores were faster with HT enabled...

qookap · Mar 1, 2017

tamz_msc · Mar 2, 2017

dogen1 said:
1-4 cores were faster with HT enabled...

Yeah, but look at the relative gains going from 4C4T to 6C6T vs 4C4T to 4C8T.

R0H1T · Mar 2, 2017

Dresdenboy said:
As it seems, it does that thread-wise. That's the interesting thing.

It can also alter the thread I/O & page priority as well as that of the main program, I use it all the time to cheat in benchmarks 😀

I'm surprised many here don't know what process hacker is, it isn't a task manager replacement, not even close.

are multi-threaded cores the most logical design?

Senior member

Diamond Member

Platinum Member

Lifer

Diamond Member

Senior member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Golden Member

Senior member

Member

Senior member

Member

Diamond Member

Platinum Member