compute depends on latency, graphics on bandwidth
Compute is a generic term, and you have tons of different types of workloads in the real world. So you would really not like to generalize things in this way. Particularly when we are talking about exploiting DATA PARALLELISM. Graphics is just compute, by the way, so some stages of the graphics pipeline, or maybe even the shaders, may want lower latencies either. Atomic operations and RMW operations of render backends are great examples.
HBM working as L3 benefit both but more to graphics (working with textures), and HBM working as extra memory well just ask xbox one developers, it breaks HSA
No, it won't. Okay, don't bring those hUMA slides to me, but read the HSA Platform System Architecture Specification 1.0 Provisional. Does it mention even a single word of hUMA the marketing hype of Kaveri? No, but it tells you
"yah you need these features of hUMA to be HSA compliant". So basically how people interpreted hUMA by overlooking the last three letters can now be thrown away. Oh, by the way, Carrizo and those Project Skybridge APUs should be the first waves of full HSA platforms. Sorry, but Kaveri is not on the list*. Yep.
What else does it tell you? Discrete HSA devices with component local memory! Multi-node, multi-device topology discovery! So now, if a discrete GPU can be supported and covered by the spec, why would conceptually integrating a discrete GPU with its own pool of memory into a host processor suddenly break HSA?
Hmm. Don't even mention the fact that the HSA (and the higher level OpenCL 2 of the HSA SW stack) would already have itself broken in your kind of sense, with the holy group memory, mercy image memory and the virtue of the private memory segments..
P.S. Doesn't working as a L3 Cache contradict with the
Graphics Needs Bandwidth claim, by the way? You are burning the bandwidth for cache management, while graphics is fine to operate in a smaller chunk of memory (yet 1GB is not too small for a GPU). Now making it a dedicated pool guarantees never a single cache miss and the full bandwidth to the DRAM!
* Carrizo supports hard preemption of wavefronts, which is a requirement of the Full Profile of the HSA platform spec. So there was a reason behind why Kaveri is just marketed as... "first to support HSA features".