VRW - AMD Intel APUs Open Source GPGPU Apps and Performance

ARMD Supremacy · Feb 25, 2015

ArrayFire found in its benchmarks that AMDs (NASDAQ:AMD) integrated GPU in the Kaveri A10-7850K APU fares better at computational tasks than Intels (NASDAQ:INTC) Haswell HD 4600 series on the Core i7-4790K. The software company builds software libraries for GPU programming, with its products working on both Nvidia CUDA-based cards as well as AMDs OpenCL GPUs.

In its benchmarks, ArrayFire tested several GPU functions such as a bilateral filter, which is a non-linear filter that is used to smoothen out edges from images. The bilateral filter test showed AMDs integrated GPU achieving 156 fps at 720p and 69 fps at full-HD, more than twice the score managed by the HD 4600 at 62 and 28 fps.

http://www.vrworld.com/2015/02/24/amds-kaveri-holds-significant-edge-intel-arrayfire-gpu-benchmarks/

Full ArrayFire test part 1:

http://arrayfire.com/arrayfire-benchmarks-amd-kaveri-vs-intel-haswell-part-1/

Background on ArrayFire open source HPC accelerator library:

The library contains hundreds of functions for math, signal processing, image processing, and algorithms. Through its broad function set, nearly every engineering, science, or financial simulation can be accelerated today.

The Microsoft Xbox One with Kinect and recently revealed Hololens (with Intel) demonstrate how these algorithms are applied to create a computing environment that adapts to the individual and provide them the means to greatly expand their capabilities.

The apps that utilize these functions will start showing the full power and potential of APUs like the Kaveri and upcoming Carrizo.

krumme · Feb 25, 2015

What apps?

ARMD Supremacy · Feb 25, 2015

krumme said:
What apps?

Everything from voice and face recognition with login/command/control to image processing and rendering (ie Photoshop, Musemage, Luxrender) to financial analysis and spreadsheet acceleration (LibreOffice).

http://www.hardcoreware.net/amd-kaveri-a10-7850k-apu-overclocked/3/

monstercameron · Feb 25, 2015

AMD just needs some consumer facing software that can utilize HSA. Hopefully they can convince adobe to implement a few HSA enabled filters.

nismotigerwvu · Feb 25, 2015

GPGPU has the potential to be that "next big leap" in compute, perhaps bigger than the dawn of the multicore era or Intel's awakening (Conroe). That said, I'm not really sure it will happen on anyone but Intel's terms and right now they have a HUGE incentive to keep most compute tasks on x86. The day Intel closes the GPU gap is the day mainstream GPGPU becomes "a thing".

Enigmoid · Feb 25, 2015

Remarks

For most of the benchmarks the Intel system was outperformed by the AMD APU. We believe that we will be able to get more performance from the Intel system by modifying the kernels to use vector operations which will increase the resource utilization. Keep an eye out for a follow up post

Needs to be taken into account.

jhu · Feb 25, 2015

nismotigerwvu said:
GPGPU has the potential to be that "next big leap" in compute, perhaps bigger than the dawn of the multicore era or Intel's awakening (Conroe). That said, I'm not really sure it will happen on anyone but Intel's terms and right now they have a HUGE incentive to keep most compute tasks on x86. The day Intel closes the GPU gap is the day mainstream GPGPU becomes "a thing".

That's what Xeon Phi is for.

Gikaseixas · Feb 25, 2015

Unfamiliar territory for both brands, yet interesting perspective to what can still materialize

nismotigerwvu · Feb 25, 2015

Well yeah, Xeon Phi helps keep the workload on x86 in HPC, but HSA could help around the house or office not just around Argonne national lab.

krumme · Feb 25, 2015

ARMD Supremacy said:
Everything from voice and face recognition with login/command/control to image processing and rendering (ie Photoshop, Musemage, Luxrender) to financial analysis and spreadsheet acceleration (LibreOffice).

http://www.hardcoreware.net/amd-kaveri-a10-7850k-apu-overclocked/3/

It needs to be more specific for me to understand the user need?

This looks only interesting for a few in seldom situations.
I simply can not see the software driving the need for it.
Be it hsa or some funky Intel solution.

And when is this software relevant anyway?

(Edit: to me the examples here just shows the basic problem of amd strategy - and what it have been for years. You design for technical reasons without a customer in mind. Its like they do something that is smart and then they go looking for the need. A more balanced approach would help imo)

TheELF · Feb 26, 2015

krumme said:
This looks only interesting for a few in seldom situations.

This basically, and those few will know that they need this and buy a GPU that is at the exact level of their needs of GPGPU,instead of having to buy a relatively slow CPU with a good IGP.

beginner99 · Feb 26, 2015

nismotigerwvu said:
Well yeah, Xeon Phi helps keep the workload on x86 in HPC, but HSA could help around the house or office not just around Argonne national lab.

Well if the Phi would be HSA compatible it would work. But just imagine an Intel CPU with 4 big cores and higher count of Phi cores, like 16. Question remains how to assign the right taks to the right cores, IMHO should be done in hardware and the OS should not see the phi cores?

greatnoob · Feb 26, 2015

krumme said:
It needs to be more specific for me to understand the user need?

This looks only interesting for a few in seldom situations.
I simply can not see the software driving the need for it.
Be it hsa or some funky Intel solution.

If you don't understand the purpose of compute then please don't post drivel like "it needs to be more specific for me to understand the user need"

And when is this software relevant anyway?

(Edit: to me the examples here just shows the basic problem of amd strategy - and what it have been for years. You design for technical reasons without a customer in mind. Its like they do something that is smart and then they go looking for the need. A more balanced approach would help imo)

Hmm. I really hope you're trolling.

The 'customer in mind' is everybody except for people who are happy with mediocre last decade performance.

It's the market moving forward, whether you think it's important or not makes no difference to Intel, AMD, Nvidia, Qualcomm, Apple and (more here).

greatnoob · Feb 26, 2015

beginner99 said:
Well if the Phi would be HSA compatible it would work. But just imagine an Intel CPU with 4 big cores and higher count of Phi cores, like 16. Question remains how to assign the right taks to the right cores, IMHO should be done in hardware and the OS should not see the phi cores?

Parallel processing != serial processing.

You don't assign any task, you assign the same workload split up all at once to all units available.

therealnickdanger · Feb 26, 2015

If you're really serious about this type of processing, wouldn't you bypass IGP altogether and get a Phi or Tesla card/server?

ShintaiDK · Feb 26, 2015

GPGPU compute is something we still wait on to delivery. It havent moved much since the start. Faster yes, but its the same type of applications. And its also why HSA is utter deathweight and resources taken from more important issues.

AtenRa · Feb 26, 2015

People here forget that the vast majority of hardware will be mobile the coming years. Face/voice recognition (no need for passwords), GPU accelerated browsing, Gaming, Office/financial apps, video/photo editing, scientific apps (chemistry, physics, medical, Oil, seismic etc) all those can be GPU accelerated.

Your mobile phone/Laptop and Tablet will know you (voice/face recognition), you will be able to edit large 4K photos/videos on your mobile devices, more censors will be integrated in to mobile devices that will need GPU acceleration. Gaming will also benefit.
Mobile workstations will also benefit for people on the go, designers, architects, lab simulations, financials and more. Yes Servers will also be there but you will be able to have a small office workstation at your fingertips all day long everywhere you are, office, on your car, on the plane, on the boat (space ??? yea it will take more time for that unfortunately).

It is why everyone in the industry invest on iGPUs.

Nothingness · Feb 26, 2015

ShintaiDK said:
GPGPU compute is something we still wait on to delivery. It havent moved much since the start. Faster yes, but its the same type of applications.

Indeed it looks like GPU compute capabilities are only used for codec on the consumer market (cf. Intel and its H.265 support done with GPU acceleration). OTOH many 3d pro apps now use GPGPU for acceleration.

jhu · Feb 26, 2015

Nothingness said:
Indeed it looks like GPU compute capabilities are only used for codec on the consumer market (cf. Intel and its H.265 support done with GPU acceleration). OTOH many 3d pro apps now use GPGPU for acceleration.

Can confirm, Blender rendering is faster on NVidia CUDA than a 5960X. Well, at least $1000 worth of Nvidia GPU is significantly faster than $1000 worth of CPU.

DrMrLordX · Feb 26, 2015

TheELF said:
This basically, and those few will know that they need this and buy a GPU that is at the exact level of their needs of GPGPU,instead of having to buy a relatively slow CPU with a good IGP.

Mostly it has to do with sharing memory space between the CPU and GPU which is something currently not supported for PCI-e connected devices, such as dGPUs. You get around a lot of data copying overhead when you can just pass pointers to the iGPU instead of doing full memory copy.

There is also the matter of what happens when you are throwing a bunch of tight loops at the GPU instead of one big block of instructions and data for processing. PCI-e latency and zero support for GPU context switching can kick your butt. Admittedly, we won't see support for GPU context switching until Carrizo.

therealnickdanger said:
If you're really serious about this type of processing, wouldn't you bypass IGP altogether and get a Phi or Tesla card/server?

See above, though at least in the case of Phi, you could technically handle your entire workload on the card which would help you evade latency and mem copy issues. But then you would have those problems if you needed to use your high-IPC host processor for anything related to the task(s) offloaded to the Phi card.

Someday Intel is going to put QPI slots on their boards to host Phi cards so that they can make their compute units "first class citizens" on the board. Or at least, that's what I would expect. Or they're just going to stick Phi chips minus all the card logic/memory into LGA sockets.

jhu · Feb 26, 2015

DrMrLordX said:
Mostly it has to do with sharing memory space between the CPU and GPU which is something currently not supported for PCI-e connected devices, such as dGPUs. You get around a lot of data copying overhead when you can just pass pointers to the iGPU instead of doing full memory copy.

There is also the matter of what happens when you are throwing a bunch of tight loops at the GPU instead of one big block of instructions and data for processing. PCI-e latency and zero support for GPU context switching can kick your butt. Admittedly, we won't see support for GPU context switching until Carrizo.

See above, though at least in the case of Phi, you could technically handle your entire workload on the card which would help you evade latency and mem copy issues. But then you would have those problems if you needed to use your high-IPC host processor for anything related to the task(s) offloaded to the Phi card.

Someday Intel is going to put QPI slots on their boards to host Phi cards so that they can make their compute units "first class citizens" on the board. Or at least, that's what I would expect. Or they're just going to stick Phi chips minus all the card logic/memory into LGA sockets.

I thought Intel said they were doing this.

DrMrLordX · Feb 27, 2015

Yes, they did. I'm just waiting to see it happen so I can get a better grasp of how it works. If Phi chips will just drop into any ol LGA2011 socket . . . coooool. Something tells me it isn't that simple.

mrmt · Feb 27, 2015

DrMrLordX said:
Someday Intel is going to put QPI slots on their boards to host Phi cards so that they can make their compute units "first class citizens" on the board. Or at least, that's what I would expect. Or they're just going to stick Phi chips minus all the card logic/memory into LGA sockets.

According to Intel Knights Landing will be socketed.

Essence_of_War · Feb 27, 2015

Face/voice recognition (no need for passwords)

There are very good reasons to not have biometrics replace passwords. What do you do when someone steals a password? You change it. What do you do when someone steals a biometric?

scientific apps

Not every scientific application is amenable to GPU acceleration. Even for those that are, it is often WAY more productive to throw more traditional CPU cores at a problem using OMP/MPI than to re-write something using OpenCL/CUDA.

I think you're making a lot of concrete claims about the future that are not obvious at this time.

monstercameron · Feb 27, 2015

Essence_of_War said:
There are very good reasons to not have biometrics replace passwords. What do you do when someone steals a password? You change it. What do you do when someone steals a biometric?

Not every scientific application is amenable to GPU acceleration. Even for those that are, it is often WAY more productive to throw more traditional CPU cores at a problem using OMP/MPI than to re-write something using OpenCL/CUDA.

I think you're making a lot of concrete claims about the future that are not obvious at this time.

Hmm HSA kinda solves that, by directing what compute units do what computation. Serial tasks to the cpu and parallel ones to the gpu.

VRW - AMD Intel APUs Open Source GPGPU Apps and Performance

Member

Diamond Member

Member

Diamond Member

Golden Member

Platinum Member

Lifer

Platinum Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Senior member

Senior member

Lifer

Lifer

Diamond Member

Lifer

Lifer

Lifer

Lifer

Diamond Member

Platinum Member

Diamond Member