For the others. The developers write a native C/C++, Python, Java ... or other code (OpenCL if they want). The HSA runtime will run the application. Than they will write an OpenCL implementation for Intel and NVIDIA. Of course they don't need to do this. The HSA runtime will allow to run any application on legacy mode (this is the Intel compatible mode).
Erm...how's this HSA runtime going to run this application? That simply makes no sense, without reinventing so many wheels as to be unfathomable.
If they write a C++ program, they're generally going to compile it directly to the CPU ISA.
If they write a Python program, at most, it might be compatible with CPython (compiled to CPU's ISA), Pypy (compiled to CPU's ISA, but also JIT for a few), and Jython (JVM).
And so on.
They're going to have to make their programs to use HSA. Now in some cases, this may be easier than others. FI, an HSA-enabled version of something like Numpy (if you think that's

, check out Magnum P.y.

), then just by using its data structures and functions for work on arrays, you could easily get some speed boosts. But, that's going to be the odd case.
The problem
now, is that every HW maker has their own driver, their own libraries, with their own APIs, to implement their own special sauce, which also sees the computer's world through its own senses, which are made based on how it works inside, not how it would be the best way for others to make use of it. So, you've got hundreds of little alien universes trying to work together. It's a mess, wasting time and effort, in ways that negatively effect everyone but Intel, IBM, Oracle, etc. (and we generally don't care much about IBM or Oracle, these days). That is what HSA could help with.
For a bad car analogy, imagine if every auto maker used a different tail light pattern, for every series of car they made. So, you had to remember which light turning on what color meant what for every model of every make. It would be chaos, right? You don't care if they use burning filaments, gas, or LEDs, whether they are run by transistors or relays, or any of that, but you
do care that blinking colored light on one side means turning that way, and a center red light means braking, and multiple white lights means backing up, etc..
It would be nice to be able to write some code to run on a
generic DSP-like or GPU-like machine, and then have the system figure out the details, for the most part. You'd still have to write DSP-like or GPU-like code, and you'd still have to deal with all the threading intricacies to make it work well...you just wouldn't have to care exactly what it ran on. Outside of x86, these days, that's not an easy task, and it's not even
common on x86, for OpenCL, yet.
Qualcomm, ARM and Samsung wouldn't put themselves at the mercy of AMD, and this is exactly what happens if they let HSA dominate their design development. Especially when every one of those you mentioned (except for ARM) will be competing among themselves on every market you can think.
Dominating their design and development, by implementing features that are mostly sensible and desirable, putting them at the mercy of AMD? Not really.
While certainly as byzantine as expected by a committee of semi-competitors, what's in the HSA/HSAIL doc they have up is, for the most part, pretty sensible, and doesn't get too much into how it has to be done, by the hardware.
But so far nobody but AMD is designing silicon from the ground up with HSA in mind, and until I see the big guys doing this, I'll consider HSA as AMD next pipe dream, just like GPGPU was Nvidia pipe dream.
Because as we all know, nobody uses CUDA on Quadros, and nVidia hasn't sold a single Tesla. It's not a pipe dream for nVidia, but a successful revenue generator.