Why are more slower cores better than one fast core?

cytg111 · Apr 6, 2013

aigomorla said:
...
now if u want me to add a wrench in this works... lets talk about Hyper Threading.
Cuz those arent real cores which act like real cores.

- Right!!, how the hell is a scheduler supposed to know which is which? This is a real core, this is not, this one is but being hyperthreaded from another angle so expect lesser performance ? Think I would be very confsued if I was a scheduler

Exophase · Apr 6, 2013

cytg111 said:
- Right!!, how the hell is a scheduler supposed to know which is which? This is a real core, this is not, this one is but being hyperthreaded from another angle so expect lesser performance ? Think I would be very confsued if I was a scheduler

The CPU status registers tell the scheduler to prefer putting threads on separate physical cores before it puts them on the same physical core. I don't think it's that elaborate.

2is · Apr 6, 2013

Exophase said:
But then you have the problem that an i7 is not just a quad core and an FX-8350 is not quite an octa core, making it hardly a cut and dry comparison.

i7 is indeed just a quad core. Hyperthreading does not change the number of cores.

Exophase · Apr 6, 2013

2is said:
i7 is indeed just a quad core. Hyperthreading does not change the number of cores.

That's just semantics. Hyper threading changes the performance of running more threads. It's not as large as the performance boost you get from traditional separate physical cores. Neither is the CMT on Piledriver, which is also not like traditional separate physical cores.

If you want to compare them you have to take that into consideration and not just what their respective marketing calls a core.

2is · Apr 6, 2013

I'm not referring to marketing, I'm referring to the processors physical nature. i7 is a quad core. Not due to semantics or marketing, but physics.

Exophase · Apr 6, 2013

Arguing over the definition of core (which is what you're doing) is semantics. Arguing over the definition of anything is semantics. Calling it physics sounds pretty strange to me.

You could spend all day arguing over when something stops being one core and starts being two cores. Maybe it's two cores as soon as you have separate register files. Maybe when you have separate L1 dcaches, separate ALUs, separate schedulers, separate decoders, separate fetch, etc. Or maybe they don't qualify as separate cores unless their entire cache hierarchy is separate, in which case neither AMD or Intel have truly separate cores at all.

But this argument isn't that productive. What matters is how heavily multithreaded software runs.

2is · Apr 6, 2013

You can call the argument anything you want, but saying i7 isn't really a quad core is nothing short of wrong.

Threads != Cores

Idontcare · Apr 6, 2013

2is said:
You can call the argument anything you want, but saying i7 isn't really a quad core is nothing short of wrong.

Threads != Cores

Do you agree that AMD's FX-8350 is an 8-core CPU?

Exophase · Apr 6, 2013

2is said:
You can call the argument anything you want, but saying i7 isn't really a quad core is nothing short of wrong.

Threads != Cores

It's still semantics. Neither Intel nor AMD have authority over what the word core means. I wouldn't call an i7 octa core, but ONLY saying that it's quad core leaves out important information when it comes to its multiprocessing capabilities. Same as ONLY calling an FX-8350 octa core, which AMD is perfectly willing to do but not everyone agrees with. Like I said already, I don't really care what people decide to call it, arguing the definition of a core is a waste of time.

2is · Apr 6, 2013

I think AMD blurs the lines a bit. Which is why I'm not debating Exophase remarks on that front. But I don't see how HT which, as I understand it, is essentially a unique way of allowing unused portions of a core to be used on another thread where applicable makes an i7, "not really" a quad core. It's allowing more work to be completed per core, but the number of physical cores is no different in an i7 as it is in an i5 (not including Intel's temporary judgement lapse when the introduced dual core i5's a while back)

Exophase · Apr 6, 2013

Even HT requires some duplication of resources beyond what a non-MT core can handle. The point is, if you have no definition for how little or how much duplication is required then you can't say what is and isn't a core. HT blurs the line just like CMT does, even if it doesn't blur at as much. The ideas for CMT actually started as a progression from HT.

What I said was that HT makes it not JUST a quad core, as opposed to "not really a quad core." Those two things come off kind of differently to me...

Idontcare · Apr 6, 2013

Exophase said:
Even HT requires some duplication of resources beyond what a non-MT core can handle. The point is, if you have no definition for how little or how much duplication is required then you can't say what is and isn't a core. HT blurs the line just like CMT does, even if it doesn't blur at as much. The ideas for CMT actually started as a progression from HT.

What I said was that HT makes it not JUST a quad core, as opposed to "not really a quad core." Those two things come off kind of differently to me...

I think you could make the argument that HT is the extreme case of dual-core CMT in which you have created a CMT design where nearly every resource is shared.

postmortemIA · Apr 6, 2013

have any of you made threaded software? it is almost impossible to make multiple cores have same utilization if tasks given to them are not exactly the same.

Usual multi threading in desktop apps is one thread for UI, and other worker threads that do GPU/CPU intensive work. Guess what happens when you have only one worker thread? UI is idle, worker loads single core. Now, if that work can be divided to 2 independent pairs, great, you could utilize one more core. But if it can't, your other cores are idle. And that's the catch. Usually you have to do sequential work, as the output of one step is the input to next step.

Idontcare · Apr 6, 2013

postmortemIA said:
have any of you made threaded software? it is almost impossible to make multiple cores have same utilization if tasks given to them are not exactly the same.

Usual multi threading in desktop apps is one thread for UI, and other worker threads that do GPU/CPU intensive work. Guess what happens when you have only one worker thread? UI is idle, worker loads single core. Now, if that work can be divided to 2 independent pairs, great, you could utilize one more core. But if it can't, your other cores are idle. And that's the catch. Usually you have to do sequential work, as the output of one step is the input to next step.

Amdahl's law, serial code is the Achilles heel. This is what turbo-core/boost is supposed to help with. If only they got more aggressive with the turbo clockspeeds so those serial steps can get done all the faster.

bronxzv · Apr 7, 2013

postmortemIA said:
it is almost impossible to make multiple cores have same utilization if tasks given to them are not exactly the same.

nope, you get it wrong, the tasks are usually very different at any given time for a well threaded application, a very common way to achieve this is to use thread pools http://en.wikipedia.org/wiki/Thread_pool_pattern

cytg111 · Apr 7, 2013

Exophase said:
The CPU status registers tell the scheduler to prefer putting threads on separate physical cores before it puts them on the same physical core. I don't think it's that elaborate.

Yes but then what?

Thread1 -> core1 main thread
Thread2 -> core2 main thread
Thread3 -> core3 main thread
Thread4 -> core4 main thread
Thread5 -> core1 hyper thread

Now it just so happens that Thread5 is my most important thread of them all and now its chugging away at 20% ..

mrle · Apr 7, 2013

cytg111 said:
Yes but then what?

Thread1 -> core1 main thread
Thread2 -> core2 main thread
Thread3 -> core3 main thread
Thread4 -> core4 main thread
Thread5 -> core1 hyper thread

Now it just so happens that Thread5 is my most important thread of them all and now its chugging away at 20% ..

Then you would probably set Thread5 priority higher than the others to instruct the scheduler that it has to give it preferential treatment when assigning threads to cores.

cytg111 · Apr 7, 2013

mrle said:
Then you would probably set Thread5 priority higher than the others to instruct the scheduler that it has to give it preferential treatment when assigning threads to cores.

Yes but again that means that I as a programmer need to consider what kind of core my app/thread will be running on .. real thread, hyper thread, CMT ..

bronxzv · Apr 7, 2013

cytg111 said:
Yes but then what?

Thread1 -> core1 main thread
Thread2 -> core2 main thread
Thread3 -> core3 main thread
Thread4 -> core4 main thread
Thread5 -> core1 hyper thread

Now it just so happens that Thread5 is my most important thread of them all and now its chugging away at 20% ..

I don't get your 20%, 20% of what ?

postmortemIA · Apr 7, 2013

cytg111 said:
Yes but again that means that I as a programmer need to consider what kind of core my app/thread will be running on .. real thread, hyper thread, CMT ..

If thread 1 is not utilizing 100% cpu, I see no problem. Thread 5 will. Although there are priorities in Windows, they are more of suggestions. RTOS are designed to solve scheduling problems.

bronxzv · Apr 7, 2013

mrle said:
Then you would probably set Thread5 priority higher than the others to instruct the scheduler that it has to give it preferential treatment when assigning threads to cores.

with hyperthreading on current CPUs (which lack any notion of hardware thread priority) specifying thread priorities will not change anything if there is more ready thread than hardware contexts (like in the example at hand with 5 threads and 8 hardware contexts)

to ensure you have a full core for a thread you must use processor afinity masks

cytg111 · Apr 7, 2013

20% I pulled out of my .... but supposed to be the performance level you can expect from a hyperthread ..

I know we have some apps/games that actually hurt from HT .. and I am wondering if this is the cause?

Mark R · Apr 7, 2013

cytg111 said:
Yes but then what?
Now it just so happens that Thread5 is my most important thread of them all and now its chugging away at 20% ..

Well, it would be more like Thread 1 - 60% of dedicated, Thread 5 - 60% of dedicated.

In reality, modern OS schedulers periodically shuffle threads between cores, so that all threads should end up with roughly balanced time on shared or dedicated cores.

Further, the latest OSs do recognise which logical cores relate to which physical core, which cache (some CPUs share caches between cores), which RAM (e.g. in multi-socket CPU systems) and can optimize the scheduling of threads, so that a thread which primarily uses RAM on CPU socket B, spends most of its time running on CPU socket B.

A similar technique can be used to bias threads towards unused physical cores (for performance) or bias threads towards partially loaded physical cores (to save power - Microsoft call this "core parking"; if all the running threads can be placed onto 1 physical core without lagging, then all the other cores can be shutdown completely saving power until they are required again).

R0H1T · Apr 7, 2013

cytg111 said:
20% I pulled out of my .... but supposed to be the performance level you can expect from a hyperthread ..

I know we have some apps/games that actually hurt from HT .. and I am wondering if this is the cause?

HT works better when the real cores aren't stressed 100% so in most apps/games when the CPU is below that level HT usually delivers 10% or more absolute gain depending on the apps/games in question ! However on a fully utilized/stressed core it causes thrashing & is counter productive, this is the reason why the original P4 didn;t yield gains comparable to modern Intel processors with HT cause the overall load can be evenly spread across cores & threads more efficiently unlike P4 that had a single core with HT.

bronxzv · Apr 7, 2013

cytg111 said:
20% I pulled out of my .... but supposed to be the performance level you can expect from a hyperthread ..

both threads get exactly the same treatment with hyperthreading, so that will be more something like 50%, less than 50% actual if the threads fight heavily for some resources (typically the DL1 and L2 cache) but generally more than 50% since unused execution slots are filled and there is less impact of branch misprediction and cache misses, in other words IPC of each thread is better than half the IPC of a single thread

Why are more slower cores better than one fast core?

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Diamond Member

Elite Member

Diamond Member

Elite Member

Senior member

Lifer

Member

Lifer

Senior member

Diamond Member

Senior member

Lifer

Diamond Member

Platinum Member

Senior member