Folks are missing some of the finer nuances of process shrink.
For one, once we went below 40nm, additional circuitry is required to take care of leakage within the chips. ARM's chief engineer alluded to this a couple of years ago, stating below 40nm we would see diminished returns on further die shrinks.
Process shrinks are used to make smaller and lower power chips - not faster ones. That's because there's a problem that they haven't solved yet - making a large, high density chip with a small process like 22nm (or even 28nm) where you can cram far more circuitry into a given area requires even more power.
So, they shrink the die, and put more chips on a wafer. This lowers power consumption and increases profit margins (more chips per wafer). What it does not do is significantly increase IPC or overall performance.
Take a look here :
http://preshing.com/20120208/a-look-back-at-single-threaded-cpu-performance
And that chart misses the last 2 years. What's happened in that time?
As for compute power - Sandy Bridge was the last decent bump, but Ivy was not so much, and Haswell appears to be very much a non-event in the performance arena.
If we look at just the last 3 years, we've now gone from ~21% per year from that chart to 5-10%/yr in the last 3.
That pales in comparison with the 51%/yr jumps in desktop compute power we had in the 1990s. It will now take about 4-5 years to equal one year of advancement in compute power in the 1990s.
I also agree about the software part, multi-core simply hasn't been used effectively in most applications. That also contributes, as there is no force driving chip makers to add more cores than what we've had for the past 5 years. Combined with negligible increases in IPC, we are stuck.