Agreed. But we already have options (multi-socket) for having 16-32-64 cores on a server. Granted they are expensive (relative), but they do exist. It gets to a point where having 20 cores per socket, with 4 sockets, that we start to bottleneck in other places of the system. Scaleability of the x86 designs come into play here. One can argue that if you need this horsepower, chips like Power7 and Itanium may be better options.
Depends, but there is clearly that demand. AMD is not making 12-core CPUs for your desktop, the 16-core BDs won't be, nor the 20-core to follow. They are to be flagship server CPUs, not what everyone in the world buys.
Personally, I would much rather see cores have more stages, greater SMT, better IPC, faster clocked, lower wattage, more instruction sets like FMA, etc., even if we have less of them. Yes that means applications will need to be recompiled to take advantage of this, but they generally need to be re-designed to take advantage of more threads as well (usually).
Higher clocks mean higher TDP, and Intel already learned their lesson about trying to push the clock speed limits. Clock speeds will not jump up.
Higher clocks with higher IPC will only raise the TDP faster, and increase the chances of not being able to meet higher clock speed targets. Neither AMD nor Intel are willing to have a single socket exceed 150W, either.
Lower wattage necessitates lower clocks, fewer transistors switching (less work done per clock), and/or LVDS and/or low power manufacturing (which won't reach very high clocks).
You are asking for the impossible, unless you want to pay $500+ for low-end CPUs, with a fridge unit for cooling.
I am more upset with Intel and AMD as they just throw more cores at us and charge more $$, instead of trying to make advances in the IPC. The truth is that more cores does not help 90% of their customers. But they want you to think it does, and market it as such.
Intel has been improving both IPC and total performance per thread per generation since the Pentium-M reset. AMD has been since the K6. Both companies have been increasing total per-thread performance, as they have also been adding cores, and will continue to do so. Nobody in their right mind would intentionally create weaker cores for the sake of having a ton of them. If nothing else, Amdahl's law would bite them in the ass, for CPU work. The reality, though, is that we are well into diminishing returns of IPC improvements.
And the majority of businesses still want to keep latency down and aren't all that interested in throughput. Who cares if the server can handle 10k users at once if every one of those has to wait 1 second for a response?
Nobody, if latency matters, and latency is sufficiently low without thousands of requests. However, if that became due to choosing a server with too little memory bandwidth, then it was poor planning. If latency is low for a small number of users, it's generally not hard to keep it low for a large number of users, so long as there is a maximum number planned for. After that, either you let it get slow, or you deny new connections. What CPU config is best is a more specific problem than just that, and many times it's easier and cheaper to buy some headroom than to actually figure out the answer (so long as you can figure out a good minimum estimate). If everyone were able to know exactly what was needed, and buy just that, server markets would become unrecognizable compared to today.
But then that shouldn't really be a problem for AMD since x86 is still quite strong per core, but just thinking that per core performance is completely uninteresting is a bit too simple. There's a google paper about that.
But there's a point--generally, wherever Intel is at with their fastest CPUs--where whether it is interesting or not doesn't matter, because you can't buy a faster one, and they haven't figured out how to produce a faster one that enough people will want to buy. It's not that people don't want faster single cores. It's that a single core is not fast enough. For a problem that can use many cores, the answer to this is simple: do the extra work needed to use more cores.
If there were CPUs that would stay within reasonable power budgets, available at low cost, with much higher performance per core, such that you would not need to scale out to more cores, obviously people would buy more of those. In fact, they do, because that's pretty much a comparison of the current Xeons against Opterons. OTOH, if the per-thread performance of the fastest clock speed Xeon isn't enough, and it's not an IO constraint, there is nowhere to go. It's not because
anyone really wishes this were the case, but because real world constraints make it the best option, and we've got to make the best of that.