Existing APIs for GPGPU are not the easiest to use and have not had widespread adoption by mainstream programmers. In HSA we have taken a look at all the issues in programming GPUs that have hindered mainstream adoption of heterogeneous compute and changed the hardware architecture to address those. In fact the goal of HSA is to make the GPU in the APU a first class programmable processor as easy to program as today's CPUs. In particular, HSA incorporates critical hardware features which accomplish the following:
1. GPU Compute C++ support: This makes heterogeneous compute access a lot of the programming constructs that only CPU programmers can access today
2. HSA Memory Management Unit: This allows all system memory is accessible by both CPU or GPU, depending on need. In today's world, only a subset of system memory can be used by the GPU.
3. Unified Address Space for CPU and GPU: The unified address space provides ease of programming for developers to create applications. By not requiring separate memory pointers for CPU and GPU, libraries can simplify their interfaces
4. GPU uses pageable system memory via CPU pointers: This is the first time the GPU can take advantage of the CPU virtual address space. With pageable system memory, the GPU can reference the data directly in the CPU domain. In all prior generations, data had to be copied between the two spaces or page-locked prior to use
5. Fully coherent memory between CPU & GPU: This allows for data to be cached in the CPU or the GPU, and referenced by either. In all previous generations GPU caches had to be flushed at command buffer boundaries prior to CPU access. And unlike discrete GPUs, the CPU and GPU share a high speed coherent bus
6. GPU compute context switch and GPU graphics pre-emption: GPU tasks can be context switched, making the GPU in the APU a multi-tasker. Context switching means faster application, graphics and compute interoperation. Users get a snappier, more interactive experience. As UI's are becoming increasing more touch focused, it is critical for applications trying to respond to touch input to get access to the GPU with the lowest latency possible to give users immediate feedback on their interactions. With context switching and pre-emption, time criticality is added to the tasks assigned to the processors. Direct access to the hardware for multi-users or multiple applications are either prioritized or equalized
As a result, HSA is a purpose designed architecture to enable the software ecosystem to combine and exploit the complementary capabilities of CPUs (sequential programming) and GPUs (parallel processing) to deliver new capabilities to users that go beyond the traditional usage scenarios. It may be the first time a processor company has made such significant investment primarily to improve ease of programming!
In addition on an HSA architecture the application codes to the hardware which enables user mode queueing, hardware scheduling and much lower dispatch times and reduced memory operations. We eliminate memory copies, reduce dispatch overhead, eliminate unnecessary driver code, eliminate cache flushes, and enable GPU to be applied to new workloads. We have done extensive analysis on several workloads and have obtained significant performance per joule savings for workloads such as face detection, image stabilization, gesture recognition etc…
Finally, AMD has stated from the beginning that our intention is to make HSA an open standard, and we have been working with several industry partners who share our vision for the industry and share our commitment to making this easy form of heterogeneous computing become prevalent in the industry. While I can't get into specifics at this time, expect to hear more about this in a few weeks at the AMD Fusion Developer Summit (AFDS).
So you see why HSA is different and why we are excited
from this anandtech article:
http://www.anandtech.com/show/5847/...geneous-and-gpu-compute-with-amds-manju-hegde