Also, I honestly do not understand why AVX2 always comes up in discussions of HSA/FSA/OpenCL. I see people continually framing the discussion as "AVX2 vs. heterogeneous computing" and I just do not think that is the case. We've seen Intel and AMD release OpenCL-compatible CPUs, and we will see both Intel and AMD introduce AVX2-compatible CPUs.
Only one of these technologies will prevail. There is no room for both since both attempt to cover the need for general purpose throughput computing. History shows that incompatible competing technologies cannot coexist. Think about AMD64 versus IA64: Itanium is practically dead. Think about 3DNow! versus SSE: Bulldozer no longer supports 3DNow!.
So the question now is which is the superior throughput computing technology: homogeneous AVX2+ or heterogeneous GPGPU? And yes, both companies will support both for a while, but they have a different idea of what to focus on. There's a lot at stake for AMD since it's sacrificing CPU performance to make the GPU more powerful, in an attempt to make GPGPU more attractive. Not just that, it's also sacrificing graphics performance. As illustrated by NVIDIA's Fermi and Kepler, graphics and GPGPU require different architectures. HSA leans very much toward GPGPU, which compromises graphics.
Intel doesn't make any sacrifices. It already has a superior CPU architecture and it will be the first to add high throughput performance to it using AVX2. Even when AMD implements AVX2, there will still be a big difference in computing density because of Bulldozer's shared SIMD cluster architecture. There's also no sign of Intel sacrificing graphics performance for the sake of GPGPU. And last but definitely not least, AVX2 is much easier to adopt by developers than GPGPU, and will offer more consistent performance across system configurations.
The main benefit of HSA, as far as I can tell, is making it easier for developers to extract processing power out of a heterogeneous system.
Easier, yes, but it will never be easy. In fact heterogeneous computing becomes harder when things scale up. So they're fighting an uphill battle. The only way to guarantee that it doesn't suffer from bad latency and bandwidth scaling, is to fully merge the GPU technology into the CPU. And that's what AVX2 already does!
It's no coincidence that Intel's Knights Corner chip, which is pretty much a GPU architecture (minus the graphics components), uses an instruction set that has a very close resemblance to AVX2.
So it's inevitable that things will converge into a single architecture. All general purpose computing will happen on the CPU. The GPU either has to become fully focused on graphics, or the programmable shaders too get processed on the CPU and the GPU decays into some fixed-function units that act as peripheral components which assist the CPU in graphics processing.
AMD desperately wants the CPU and GPU to remain heterogeneous, but in doing so it ironically converges them closer together, making the case for AVX2 and its successors.