HD4000 does OpenCL acceleration with dGPU installed

NTMBK

Lifer
Nov 14, 2011
10,435
5,785
136
http://techreport.com/news/24604/performance-boosting-intel-igp-drivers-are-out

The driver has other improvements in store, as well. Intel says users can now tap into the IGP for OpenCL acceleration—or use the processor's QuickSync video acceleration block—"even when Intel® HD Graphics is not the primary display adapter," provided the system is running Windows 8 and has the new driver installed.

Very nice! The iGPU running Havok Physics, while the dGPU does the graphics rendering, perhaps? This should be very useful for porting back GPGPU tricks from the new consoles which have unified memory.
 

BallaTheFeared

Diamond Member
Nov 15, 2010
8,115
0
71

zlatan

Senior member
Mar 15, 2011
580
291
136
Very nice! The iGPU running Havok Physics, while the dGPU does the graphics rendering, perhaps? This should be very useful for porting back GPGPU tricks from the new consoles which have unified memory.
This is not that easy. Most people don't understand what is PS4 just recognize the eight core CPU, and the GCN-based iGPU, but the new custom APU is much more. The killer feature is that the PS4 APU will share a fully coherent memory and a unified address space between the iGPU and the CPU. This is the most important step forward, because GPGPU is a big thing, but the current APIs require the application to explicitly copy all input and output memory. This will lead to a very big performance penalty, because copying the data can easily takes longer than processing it on the CPU cores. So only small datasets or very expensive computations benefit from GPGPU. The PS4 technically enable to use GPGPU for any algorithm with good parallelism. This is the most important advancement in the console, and sadly nobody care about it.

On the PC Kaveri will be the first APU with exactly the same feature set. This is not really a good thing, because most of the PS4 GPGPU algorithms won't be directly portable to the PC, or only Kaveri can handle them. Maybe Trinity and Richland also enough with their HSA-MMU units, but this is not a fast solution for the problem.
 
Last edited:

NTMBK

Lifer
Nov 14, 2011
10,435
5,785
136
This is not that easy. Most people don't understand what is PS4 just recognize the eight core CPU, and the GCN-based iGPU, but the new custom APU is much more. The killer feature is that the PS4 APU will share a fully coherent memory and a unified address space between the iGPU and the CPU. This is the most important step forward, because GPGPU is a big thing, but the current APIs require the application to explicitly copy all input and output memory. This will lead to a very big performance penalty, because copying the data can easily takes longer than processing it on the CPU cores. So only small datasets or very expensive computations benefit from GPGPU. The PS4 technically enable to use GPGPU for any algorithm with good parallelism. This is the most important advancement in the console, and sadly nobody care about it.

On the PC Kaveri will be the first APU with exactly the same feature set. This is not really a good thing, because most of the PS4 GPGPU algorithms won't be directly portable to the PC, or only Kaveri can handle them.

Don't worry, I know. Take a look at a few of the threads about the PS4, and you'll see me arguing exactly the same thing. :) But Intel has enabled the sharing of common memory between their CPU and GPU with InstantAccess (and AMD already has it with ZeroCopy), so these algorithms can be run efficiently on their APU.
 

zlatan

Senior member
Mar 15, 2011
580
291
136
But Intel has enabled the sharing of common memory between their CPU and GPU with InstantAccess (and AMD already has it with ZeroCopy), so these algorithms can be run efficiently on their APU.
ZeroCopy and InstantAccess are limited things. These functions just share an uncacheable memory partition between the CPU and the iGPU, but the programmers must control all the data accesses, which is not easy.

AMD has GART which is more advanced. This is a system facility that performs physical-to-physical translation of memory addresses within a graphics adapter. The problem is that this is a really slow solution. The HSA-MMU support (Trinity/Richland) is a more useful approach. The iGPU can access the CPU memory with the IOMMUv2 unit. This is safe and fast but not enough fast. The next logical step is to integrate a TLB to the iGPU. This is an extremely fast solution for the problem especially with fully shared coherent system memory.