AMD & OpenCL: Faster compression, but without using GPU

cp8086

Junior Member
Oct 25, 2010
16
0
66
In Trinity review, I found very interesting that the latest WinZip version was in the growing list of GPU accelerated consumer applications.

I tried to verify the OpenCL speed up with a supported discrete video card (HD 5830 with Catalyst 12.4 WHQL): the compression time was indeed reduced (with better speed up than Llano/Trinity), but global GPU usage was about zero (with GPU/memory clocks fixed at idle values).

I checked these results with various utilities, such as Task Manager, Resource Monitor, Sysinternals Process Explorer, GPU-Z and AMD System Monitor: WinZip's GPU usage was exactly zero!

I noted that CPU usage was instead bigger, so I suppose that, enabling OpenCL option, WinZip can simply use a better multi-thread algorithm.

Task Manager is enough to confirm that WinZip thread count is significantly bigger, sometimes more than double, with OpenCL code path.

I found the maximum speed up with JPEG photos (250%), whereas the minimum with PCM audio (30%).

I think that would be nice if other enthusiasts performed similar tests on different configurations and perhaps with other "GPU accelerated" programs.
 

cp8086

Junior Member
Oct 25, 2010
16
0
66
I am afraid that I didn't understand your post.

I exposed some observations about general purpose lossless compression with OpenCL, not video transcoding/encoding with auxiliary fixed pipelines.
 

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
OpenCL exposes the CPU as well. It could be its selecting the wrong device (you get a list including the CPU in the openCL API). You may not for example have the AMD App installed, which would make the app fall back to the CPU.

It could just be broken of course.
 

cp8086

Junior Member
Oct 25, 2010
16
0
66
You may not for example have the AMD App installed
I have the latest WHQL Catalyst 12.4, with "OpenCL 1.2 AMD-APP (923.1) FULL_PROFILE".
GPU Caps Viewer 1.16.0 detects one "CL platform" (AMD APP) with two "CL devices": Cypress and CPU, of course in this order.

It could be its selecting the wrong device
It could be. If so, then I suspect that there would be maybe a systematic problem in selecting the correct device (thinking about reported speed up for Llano/Trinity).

It would be crucial that someone carefully checked my observations with different platforms (CPU Intel/AMD, GPU Evergreen/N.I./S.I., APU Llano/Trinity but also Zacate/Ontario).
 

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
You may also find that given a faster CPU the overhead of copying to the GPU is so high that there is no benefit/detrimental.
 

cp8086

Junior Member
Oct 25, 2010
16
0
66
You may also find that given a faster CPU the overhead of copying to the GPU is so high that there is no benefit/detrimental.

That's technically correct; indeed it's theoretically possible that an APU can be faster than a discrete GPU with more compute power, thanks to on die links and other optimizations.

What I'm pointing out is that, in WinZip 16.5 with OpenCL enabled, GPU usage is exactly zero, despite a significant speed up.
 

cp8086

Junior Member
Oct 25, 2010
16
0
66
Even in After Effects CS6 only the following nVidia cards are supported:

GeForce GTX 285
• GeForce GTX 470
• GeForce GTX 570
• GeForce GTX 580

From: http://www.adobe.com/products/aftereffects/tech-specs.html

Adobe requirements are laughable: the official list rules out "complete" Fermi GTX480, GT200 GTX280/260 or dual GPU GTX295/GTX590 but includes GTX470 (GTX465 or GTX560Ti 448 are banned!) and GT200b GTX285 (not GTX275)...

I hope that this is only the result of oversights and omissions...

Anyway Photoshop CS6 "uses both the OpenGL and OpenCL frameworks. It does not use the proprietary CUDA framework from nVidia."
Tested video cards include GeForce 8000-500 and Radeon HD2000-7000
 

cp8086

Junior Member
Oct 25, 2010
16
0
66
It will be nice when Handbrake supports Open CL.

I fully agree, of course provided that it actually does good use of a modern GPU.
Thankfully this seems very likely, since Anand showed GPU usage with AMD System Monitor during transcode in the OpenCL enhanced version of Handbrake.
 

Lorne

Senior member
Feb 5, 2001
874
1
76
You can use GPUSniffer and add your existing Nvidia 200+ series to the Adobe list of Cuda support for CS5 or better but dont expect much from low Cuda core count cards.

You will get better performance from OpenCL.
 

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
I did some coding in OpenCL recently and I hated every minute of it. I honestly can't see that many developers wanting to go back to coding like its 1980 all over again. Its a pretty painful language they are using for the development of openCL programs and I wasn't overall impressed with the language and API.

I doubt many developers will want to deal with it until the level of abstraction improves a bit.

As to Winzip I think there is a good chance that the zip algorithm does not multi-thread very well. The algorithms used are hard to make parallel and already highly optimised for a CPU. They have pretty much got it down to a disk bottleneck, so using the GPU is rarely going to help.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
I fully agree, of course provided that it actually does good use of a modern GPU.
Thankfully this seems very likely, since Anand showed GPU usage with AMD System Monitor during transcode in the OpenCL enhanced version of Handbrake.
Anand says very soon for handbrake. I like this open cl vary much. But I like AVX2 even more so . Me myself I can't wait until all the legacy crapp has been recompiled and X86 can be shown the door. Intel wanted to leave x86 years ago .But AMD came with AMD 64 and held us captive as MS had zero interest in nothing that wasn't X86 back when . So It was AMD and MS that held back innovation and not Intel as the EU stated.
 

turn_pike

Senior member
Mar 4, 2012
316
0
71
The algorithms used are hard to make parallel and already highly optimised for a CPU. They have pretty much got it down to a disk bottleneck, so using the GPU is rarely going to help.

Thats quite remarkable. Does it hold true for SSDs as well ?
 

Riek

Senior member
Dec 16, 2008
409
14
76
In Trinity review, I found very interesting that the latest WinZip version was in the growing list of GPU accelerated consumer applications.

I tried to verify the OpenCL speed up with a supported discrete video card (HD 5830 with Catalyst 12.4 WHQL): the compression time was indeed reduced (with better speed up than Llano/Trinity), but global GPU usage was about zero (with GPU/memory clocks fixed at idle values).

I checked these results with various utilities, such as Task Manager, Resource Monitor, Sysinternals Process Explorer, GPU-Z and AMD System Monitor: WinZip's GPU usage was exactly zero!

I noted that CPU usage was instead bigger, so I suppose that, enabling OpenCL option, WinZip can simply use a better multi-thread algorithm.

Task Manager is enough to confirm that WinZip thread count is significantly bigger, sometimes more than double, with OpenCL code path.

I found the maximum speed up with JPEG photos (250%), whereas the minimum with PCM audio (30%).

I think that would be nice if other enthusiasts performed similar tests on different configurations and perhaps with other "GPU accelerated" programs.

Well openCL isn't limited to gpu useage.

By programming in openCL or alike you are automatically thinking about threading and how the handle/program for it. e.g. What can i improve or do to make it better. This in strong contrast to what you normally do in another language. So you basically will see speedups that are due to changes in how the code works and is thought/optimized about.
You basically approach the same problem in a completely different manner.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
OpenCL in Winzip is not really using the GPU. It makes nearly no difference if you using a low end Evergreen card or the 7970.
 

pcm81

Senior member
Mar 11, 2011
581
9
81
It is probably faster, because it now uss all cores of the CPU, but it is still 100x slower than it could be if used GPU. But at those rates you'd be bottlenecked at HD anyways.
 

Kippa

Senior member
Dec 12, 2011
392
1
81
Has anyone tried using the new opencl enhanced winzip compressing a relatively large file from a ramdisk to a ramdisk? Personally I would like to see the results of that test if it has been done.
 

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
OpenCL can be run on a computer without a GPU at all.

What OpenCL does is change your functions such that you don't iterate. Instead what happens is you get indexes from the API and you only write the function for mapping input to output.

Its kind of like programming exclusively in map functions. In Scala we would write something like
1 to 1000 par map { i => doStuff(i) }

and in openCL we are writing:
f(Range in, List out) {
int index = getIndex()
out(i) = doStuff(in(index) )
}

Compare both these approaches to standard Java/C style where we would be writing the actual loop:

List in = .....
List output = new ArrayList(1000)
for(int i=0;i<1000;++i) {
output.set(i,doStuff(in.get(i)));
}

In Java that can only run on a single CPU, in the scala and openCL examples it runs in parallel because we haven't said how to iterate, we left that to an underlying implementation and that implementation is going to run doStuff(i) in parallel because there are no interactions between the various lists of inputs and outputs. Works for some classes of problems but not all.
 

cp8086

Junior Member
Oct 25, 2010
16
0
66
They have pretty much got it down to a disk bottleneck, so using the GPU is rarely going to help.

It doesn't seem so: I tried different disks on the same system, also I tried to use two different disks at same time (source, destination) but compression times were the same (+/- 1s).
 

cp8086

Junior Member
Oct 25, 2010
16
0
66
Thats quite remarkable. Does it hold true for SSDs as well ?

I don't think so, because...

Kippa said:
Has anyone tried using the new opencl enhanced winzip compressing a relatively large file from a ramdisk to a ramdisk? Personally I would like to see the results of that test if it has been done.
Yes, I tried a RAM Drive and I checkd absolute performance and relative speed up, with and without OpenCL, with different data sets: the results were the same (and GPU usage was always zero!).

Perhaps with a faster CPU or slower mechanical hard drive we can run into a disk bottleneck; of course, if a powerful GPU was fully exploited, a very fast SSD would be needful!
 
Last edited:

cp8086

Junior Member
Oct 25, 2010
16
0
66
Riek said:
Well openCL isn't limited to gpu useage.
OpenCL can be run on a computer without a GPU at all.

That's technically correct; but according to slide and info from
http://www.investorvillage.com/smbd.asp?mb=476&mn=235391&pt=msg&mid=11638341
or
http://www.geeks3d.com/20111217/win...for-ultra-fast-compression-and-decompression/

<<WinZip 16.5 is being optimized for AMD Fusion and Radeon Graphics
- Fast memory access on Fusion APUs
- Massively parallel operation favors the most powerful discrete graphics
- OpenCL allows workload to be spread across CPU, integrated GPU, and discrete graphics>>

Furthermore from http://apps.corel.com/lp/amd/index.html:

<<WinZip has been working closely with AMD to bring users a major leap in file compression and encryption technology. Available today, WinZip 16.5 uses OpenCL acceleration to take advantage of the significant power of the 2nd Gen AMD A-Series APUs and AMD Radeon GPUs.>>


If GPU usage is consistently zero, all that statements are misleading!