- May 11, 2008
In this speculation thread, I have used HBM as a generic term for High Bandwidth Memory, with no regard to the specific version, or generation, of the particular HBM specification.
In this discussion, since we are speculating about the future, I think we need to see the APU chip as a black box, and accept any form of integration within that black box, including MCM solutions.
MCM solutions based on new packaging technology and "chiplet" design is an inevitable part of the roadmap of the industry, it seems, as monolithic die design hits its limits.
Well, that's merely what Intel did.
As a programmer by profession, I view HSA as the overarching vision for AMD's Fusion design philosophy — in particular, the simplification of the programming model and memory model of the system, with seamless and coherent memory access, reduced latency, pre-emptive task switching, and a standardised API and hardware-agnostic ISA for the heterogeneous compute units in the system.
As a programmer I face two kinds of algorithms; sequential and parallel. Some of the latter runs very well on a GPU, but to this day, I have never taken advantage of that, even though my 20+ years old software (road design) may be very well suited for that and see massive speed and efficiency gains.
Why? It is not easy. There are some tools (such as C++ AMP) that are promising, and the C++ language standard committee is now doing great work to enable parallelisation seamlessly. But there is still a long way to go before running computations on a GPU is as simple as running them on the CPU.
For example, a game should seamlessly be able to use all hardware acceleration available on a system. Even if a discrete GPU card is present and runs the game's main graphics load, the game should take advantage of the gigaflops of parallel compute performance available in todays integrated GPUs. Just disabling such a powerful coprocessor is a sad waste. In the future, maybe a game will use the integrated GPU to run AI, while using the discrete GPU for the graphics load.
To bring my digression back on topic; my interest in APUs are not for graphics, and not to replace more powerful discrete cards for gamers and professional users that need them. I am interested in technology, and I'd like to see HSA become reality, so that programmers can simply and seamlessly make use of any acceleration available in their user's systems. A big part of that is to remove bottlenecks in memory latency and bandwidth — which is what this discussion is about.
You know what makes me wonder for about 2 years now when thinking about apu and hbm memory.
HBM2 for consoles and laptops.
Let say that there is 8GB or even 16GB of HBM2.
Since the hbm channel width is made up of sections of 128 bits and totals 1024 bit( 8 channels of 128 bit). It would be possible to divide the hbm memory into a pool only for the cpu and a pool only for the gpu but both pools controlled by the same memory controller. One 128 bit channel set up as 2 64 bit pseudo channels for the cpu and 7 128 bit channels for the gpu.
Or 2 channels for the cpu and 6 for the gpu.
This would allow maximum usage and since it works in parallel, the cpu and gpu would not have to wait for each other. And since it is all connected to the same memory controller,
it is still possible to do zero copy latency reducing tricks to reduce memory traffic. The memory is shared virtually for the system but in hardware it runs parallel.
That is, if this arrangement is possible at all. But if it is possible, then the need for extra system ram would no longer be there.
This would allow for flexible management of the memory and a massive increase in performance and reduction of the bill of material because system memory is no longer needed.
That would be a bonus for APUs.