Not to mention, giving ARM a one-up on themselves, by not embracing the AVX/AVX2 ISA stack top-to-bottom on their chips. HSA will be ubiquitous before you know it (even on ARM!), and Intel's AVX will only be a footnote in history.
I wonder if Intel is preparing for their own HSA-type system here. Supposedly, Intel is preparing for wider AVX systems - AVX-512 or higher. I wonder if they're going to shift the AVX work over to the integrated GPU instead?
The current Intel HD graphics architecture has 8 32-bit floats per execution unit (or 256 bits, same as AVX), and a latency of 8 cycles. If you allow AVX latency to jump to 8 cycles (that's the key), then 8 HT cores could feed 1 HD graphics EU, or more likely 4 cores could feed it AVX-512 instructions. Then, if you allow AVX to extend over as many EUs as there are in the processor, you get a real equivalent to HSA without having to use OpenCL.
But what about Pentiums and Celerons? If they used the same system, they only have two cores, so they could only keep each EU half-occupied. Maybe that's what Intel's setting up here: a situation where Pentiums and Celerons are only half as fast as their other processors at vectorized work? Probably a stretch, but interesting to think about anyway.