Required is an odd word to use. it is simpler to design a CPU with more cores than it is to work on scaling existing cores. But there is no inherent requirement in software to have more cores, it is actually detrimental to the software, making it much more difficult to program.
You misunderstood what I meant or maybe we are simply seeing it from different angles.
I'm not looking this from the perspective of either software or hardware (and as I said before, I agree parallel programming is a lot harder and that is why, combined with workloads that aren't parallel by nature, multiple cores don't scale almost perfectly), I'm looking from the perspective of getting work done.
And we can't ignore multi-tasking - we can easily have an OS, have tons of drivers running, have a game running, macros for the game, a music application, the anti-virus & firewall on, voip, downloads, web browser, etc, all running at the same time.
I disagree. I think it is significantly cheaper and faster to develop and design more slower cores then less faster cores. But I completely disagree that there is any physics effects that make more cores MORE efficient.
Remember that the THERMALS argument of 2 cores at 2.5ghz or 1 core at 3ghz is because of the non linear increase of power consumption with core clock speed and voltage. however, do remember that the dual core is nearly twice as many transistors.
A single core design with the same amount of transistors as the dual core @ 2.5 would be even more efficient overall... but requires significant changes from x86.
Can't ignore the fact that the dual core is only slightly slower than the 1 core at 3 GHz in single thread applications but it is much faster than the single core whenever you have more than 1 workloads.
We can't forget that in a normal PC we will have dozens of processes going on and not simply a single program with 1 thread or more than 1 thread.
A dual-core at 2.5 GHz can potentially do significantly more work than a single core at 3 GHz, so it is only normal for it to use more power. How high would that single core needed to be clocked to do the same amount of work? 5 GHz, 6 GHZ?
Additionally you can't forget that these days both Intel and AMD have CPUs with more cores that are clocked as high as the CPUs with less cores, on the same processes.
Performance: single core @2.5ghz @2B transistors > dual core @2.5ghz @2B transistors > single core @ 3ghz @ 1B transistors
Development cost: single core @2.5ghz @2B transistors < dual core @2.5ghz @2B transistors < single core @ 3ghz @ 1B transistors
Manufacturing cost: single core @2.5ghz @2B transistors = dual core @2.5ghz @2B transistors > single core @ 3ghz @ 1B transistors
The above of course assumes that you have a good design, and are not just inflating transistor counts with "more cache" or some such... but use them to the optimal capability (which might be more cache, might be other things, depends on the exact design).
Again, even if all your software is single threaded, most likely than not, you will have various programs running, hence the dual core will be overall better.
Note that both ATI and nVidia have such a design in their GPUs. they call SPs "cores" but they are not actual cores, they are parallel execution units. Just like how a single "core" in x86 has multiple ALU (Arithmatic logic units). The software sees 1 GPU, not hundreds of nvidia cuda cores. And the underlying architecture allows them to be utilized in parallel quite well. This is despite the fact that each SP is identical to every other SP, simply duplicated many times over.
GPUs, which are not hampered by x86, allow amazing "single core" scaling via massively parallel execution units. x86 has very rigid structure, which while it allows a specific amount of multiple execution units per core, doesn't allow flexibly increasing/decreasing easily. instead they duplicates those cores (easiest), with wastage... along with the wastage of the x86 instruction set itself btw.
But that is why both Intel and AMD are interested in integrating GPU type of core in their CPUs - using the most efficient tool for each specific work load.
And it isn't like there haven't been increments in single core performance - icore are faster than K10/Core2, which are faster than K8 and P4.
Simply those improvements aren't enough to keep up with current demand if you are only using a single core.
Would you rather have a single iCore at 5 GHz instead of 2 iCores at 4 GHz in your PC?
I wouldn't.