CPUarchitect
Senior member
- Jun 7, 2011
- 223
- 0
- 0
Neither Haswell nor GK110 is out yet. But who needs the numbers when the technology speaks for itself? GPUs achieve a high theoretical throughput by using wide vector units with gather and FMA support. AVX2 offers the exact same features, but integrates them into the CPU cores themselves thus avoiding any heterogeneous latency overhead or bandwidth bottlenecks. The CPU also benefits from higher cache hit rates due to a lower thread count (in turn thanks to out-of-order execution).Your hypothesis sounds very interesting. Do you have some numbers to back it up? I'd like to see if an AVX2 CPU is faster at different typical parallel compute tasks than a compute optimized GPU (like GK110) at the same power and a comparable process node.
Either way I don't think GK110 can be included in a fair comparison. NVIDIA describes it as The Fastest, Most Efficient HPC Architecture Ever Built. Clearly this behemoth is not aimed at the average consumer. AVX2 on the other hand will be in every consumer desktop/laptop/ultrabook chip, starting with Intel but later also AMD, and a high likelihood of equivalent homogeneous throughput computing technology appearing for other architectures sooner or later.
Also note that a discrete GPU can't work by itself! It still needs a CPU, and so the power consumption of that CPU has to be taken into account as well. Therefore a fair comparison should test against APUs instead. See for instance these result: Handbrake OpenCL. The i7-2820QM is running the OpenCL code on the CPU, and outperforms the A10-4600M. With AVX2's doubling of the throughput and addition of gather support, the CPU should be able to greatly increase its lead.
A fair comparison also requires running optimized code. AVX2 can run OpenCL, but OpenCL is not aimed at homogeneous computing. It has restrictions to be able to run on the GPU. AVX2 can run more aggressively optimized code that doesn't have to live by OpenCL's restrictions. So homogeneous computing offers more capabilities than heterogeneous computing, enabling more applications than what the latter would be able to support.