• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

GPU and CPU calculation results not identical (OpenCL issue)

boren

Member
When I turn on OpenCL in DxO 8.1.5 (image processing application) the resulting files are slightly different than when OpenCL is turned off. The files are binary different, and I can even spot a few differences if I increase magnification significantly.

Is there an application I could use to validate that the OpenCL implementation of my card (AMD HD 7850 2GB) is correct and accurate? If it isn't, then I guess AMD is to blame and I'll contact their support and report this issue. If the results are accurate, then I report it to DxO support.
 
I didn't say it's necessarily be AMD's fault. It's one of two possibilities:

Is there an application I could use to validate that the OpenCL implementation of my card (AMD HD 7850 2GB) is correct and accurate? If it isn't, then I guess AMD is to blame and I'll contact their support and report this issue. If the results are accurate, then I report it to DxO support.
The reason why I want to test it with a separate tool is to identify which possibility is the right one.
 
It may be a case where the GPU renderer in using single precision floating point for speed, whereas the CPU is using double precision. The real question is: 'is the post processed image accurate enough for you purposes'?
 
It may be a case where the GPU renderer in using single precision floating point for speed, whereas the CPU is using double precision. The real question is: 'is the post processed image accurate enough for you purposes'?
Indeed. DxO wouldn't be the first or the last program to take a different codepath when using the GPU. Sometimes it's a precision thing, other times it's just the difference between an algorithm that works well in serial on a CPU, and a similar algorithm that's better suited for wide execution on a GPU.
 
It's not uncommon for different processors to give different results.

That why is applications such as distributed computing, where the same workset is sent to multiple machines, the work must be sent to machines of the same architecture.

Read the docs at wcgrid.org for more details.

Short version - there's not a problem.
 
When I turn on OpenCL in DxO 8.1.5 (image processing application) the resulting files are slightly different than when OpenCL is turned off. The files are binary different, and I can even spot a few differences if I increase magnification significantly.

Is there an application I could use to validate that the OpenCL implementation of my card (AMD HD 7850 2GB) is correct and accurate? If it isn't, then I guess AMD is to blame and I'll contact their support and report this issue. If the results are accurate, then I report it to DxO support.

Floating point calculations are expected to give different results on different architectures. Use integer calculations if you want files that are bit identical.
 
Floating point calculations are expected to give different results on different architectures. Use integer calculations if you want files that are bit identical.

That is why we have a standard for floating point that ensures that the calculations are the same. But you need to enable strict floating point in the code path to ensure that is used and that has a performance cost.

I think it more likely in this case that its not the same algorithm, its a completely different algorithm designed to work on the GPU. OpenCL does not allow a developer to reuse existing code to make their GPU program, you have to write in a special form of C where the iteration is taken out of your hands and put into the API. Its very likely that change is responsible as the developers can no longer use their previous code and have had to create a separate path.
 
It might be fastest to contact the authors of DxO.

This could be:
- bugs in the OpenCL driver
- bugs in DxO
- floating-point differences (for example with an x86 CPU you might be using 4-, 8-, or 10-byte floats)
- algorithm differences. Even with the same floats, the way results are created could propagate or magnify floating errors differently.
 
This is the nature of DCT (discrete consign transformation) compression. It is lossy! If you parallel the operation the same work gets done in a different order.
 
Back
Top