Gustafson's Law does in no way refute Amdahl's Law. They are merely perspectives on the same concepts.
Amdahl's Law is, "all that," for any application which has strict serial dependencies, but more or less indefinite data and time bounds. For a general-purpose algorithm, it's the best you'll be able to get. Gustafson's Law concerns itself with scalability of data sets (and time), which may also be limited. The two are equivalent, but for different problem cases. In the case of Gustafson's Law, scalability only increases linearly if the computation per processor is fixed. If the computation per processor increases with data size increasing, the scalability won't be linear, but rather, a curve that flattens out, or approaches an asymptote, as data size and/or processors increases, just as with Amdahl's Law's typical applications.
The implication of Amdahl's Law is that some problems will never be worked on with systems like GPUs, instead requiring faster processing from each processor. The implication of Gustafson's Law is that it becomes worthwhile to do more processing, after a point, rather than process more stuff (such as more in-depth data mining, providing real-time statistics, etc., instead of finding more raw data to process)--or, in today's world, just go idle and save electricity.
It would be good to keep in mind that in 1967, there were far more problems out there that computers weren't fast enough for at all, and simpler faster processors were really much faster than complicated units with many processors, so infinite data/time bounds would make much more sense, than in 1988, by which computers were common business items, able--and often
required--to process data as fast or faster than it could be presented to them.
Today, though, you should really be moving to using Gunther's Law, which encompasses both, without the work of deriving one from the other.
http://en.wikipedia.org/wiki/Neil_J._Gunther#Universal_Law_of_Computational_Scalability
Intel is at 2 for mainstream CPUs, and can feed 2 well. More than 2 threads now, and we'd be back to the poor quality of HT on the P4. Response time matters. Alpha was going for maximum throughput. On real workloads, existing Alphas were able to max out their bus (one of several reasons for the IMC on the K8). At the time, they could keep on scaling well. That's apples and oranges. The guys behind the SPARC T-series figured more would help, FI, and those CPUs are no slouches, in the right setting, and even at Oracle's costs (what can you pay, today?), have managed to provide non-kool-aid drinkers with real value. There's not a perfect universal number, nor perfect way to implement multithreading. 4 may have been the ideal count for the super-wide Alpha-to-be. That does not make for a universal truth.
Actually, they are up to 4, if you want to go try that counting game. They apply 2 on mainstream CPUs, because we care about more than just keeping the ALUs busy. There have consistently been cases where turning HT off, going back to 1 thread per core, is an improvement. Fewer cases with each generation, but all these years on, it still happens. As long as memory is not instant, it will keep on happening, too, as long as they use shared resources (as opposed to say, fully partitioned SMT).
Where do you find a course that teaches you algorithms that either (a) cannot exist or (b) have not yet been created? They simply don't exist, for a wide variety of real problems. Then, in some cases, when they do exist, the less-parallel versions are faster, in practice, because the parallel versions have such high overhead. Stones don't bleed.