Question from earlier? Not sure I saw that. As for that, the core must be able to deliver almost 100% of its total performance at all the time when fully utilized, to be qualified as a core. Thus when all cores are fully utilized, each core must be able to deliver the same performance. However if the "core" does not behave in such a manner when all "cores" are fully utilized, then it is likely that the "core" is a virtual/logical core or hardware thread (either SMT or CMT).
ok that is at least a more valid approach than what I understood from your previous remarks.
But to respond to that, a core will always have performance loss when all the cores are loaded.
Even if you assume the perfect program, you still have a shared memory controller, shared l3 cache, shared l2 cache in some cpu's. So that is nothing new.
I don't think anybody disagrees that a shared design has drawbacks but it also has merits.
Ofcourse talking about cores is symantics, but we had shared logic for very long time now.
-first we had single cores in multiple sockets sharing the same northbridge
-then we had single cores in multiple sockets both having their own northbridge but shared through the ram through the cores
- then the dual cores came with shared northebridge
- then the dual cores came with shared northbridge and shared l2 cache.
- then the quad cores came with shared northbridge and shared l3 cache.
- now the dual cores come with shared front end, fpu, l2 cache, l3 cache and northbridge
You can say it is HT like that it gives a bigger performance penality then a conventional dual core design. But if you shut down the core in a HT cpu the second thread won't work either, if you shut down a core in BD the other core will still work.
One thing to note about BD is that:
The threads are handled completely undependant from eachother.
The front end is shared, but it works only for 1thread/cycle.
The fpu is shared, but the fpu sheduler only accepts 1thread/cycle.
Can you think about the following concept?
You look it at this moment from this angle (correct?):
you take one core performance, you add the other core and the overal performance drops more than you would expect from a second core. so the core's are not real cores.
But try to look it from another angle (the multithreaded angle)
You have 2 cores that give a certain performance and when you only use one core it becomes faster than what you would expect.
both scenario's are exactly the same, but they give a completely different notion to the result imo.
Also i believe you overestimate the drawbacks of the shared design. while you can indeed stall shared resources, for real software this is not the case. In most cases the software used in real life will never use all the execution resources. Most of the times, the most intensive software is the least execution intensive software.