i'd be interested in the opinion from someone who has a lot of experience programming code in parallel. I could see the core counts just getting higher and higher, but I would guess that it would reach many temporary plateaus if programmers were not able to keep up.
I'll take all the cores I can get.
I build Linux boxes exclusively as compute servers for parallel math computations coded in
Haskell.
A good read on the issues one faces with parallelism is
The Art of Multiprocessor Programming. With more cores, syncing all those caches and memory traffic becomes a bottleneck. It is amazing what has to happen at the hardware level for this all to work.
Haskell, being a functional programming language, treats most memory as read-only (after one computation to set its value), so there is much less invalidating of cache memory. There's a constant efficiency loss, but I can often parallelize a program by adding a mere handful of lines of code, and it then scales linearly to all the cores I can get.
Perhaps this applies to only 95%, with 5% holding out. That's a small job, and computer scientists prefer to think in terms of rates of growth. Make the job larger, and that 95% becomes 99.9%, and we're craving more than 20 cores.
Haskell is pretty hard to learn, but the problem of using all these cores is also hard, and Haskell yields the unique cleanest solution, for now. I know a couple dozen programming languages, and if there were a better choice I'd be using it. For example, Erlang messages take a huge efficiency hit; its parallelism is designed not for speed, but so an avalanche can take out a whole village of Scandinavian servers, with the sys admins sleeping through the night while the rest of the network automatically repairs itself.
Functional programming is destined to become the norm, as we cope with multiple cores. It will take a long time, with lots of kicking and screaming, as programming languages are like religions. These aren't reborn Lispers preaching what they'd like to believe. This comes down to the need for read-only memory to avoid cache thrashing, and is begrudgingly acknowledged by people who aren't pleased by this news.
The virtual cores of Core i7 are a second order benefit, a scheduling convenience. It used to be that using all cores would slow down a parallel computation, as the odd other running jobs held back one core. Now, using 5, 6, or 7 of the 8 virtual cores of a Core-i7 2600K is pretty much the same throughput, with the effect of cleanly using all 4 physical cores with no scheduling slowdown. In other words, where I used to use 3 cores of a Q6600, I now effectively use all 4 cores of a 2600K.