• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

TensorFlow - Vega vs Volta

1.3 is the latest version for ROCm. AMD's changes are still not merged to the upstream and, for now, AMD has to manually keep them in sync.
 
Interesting results. So Vega can do matrix multiplication quite fast even without specialized units... ehm "Tensor Cores".
 
If there was any doubt that AMD built Vega as a professional workload oriented chip in lieu of gaming this should erase that.
 
If there was any doubt that AMD built Vega as a professional workload oriented chip in lieu of gaming this should erase that.

Well it's no question that they had a massive amount of focus on the ROCm stack and it's related projects prior to the release of Vega since any semblance of OpenCL support among mainstream developers has been deprecated and when OpenCL is a complicated driver, the platform is a maintenance burden to them instead of a benefit. AMD hasn't even submitted drivers for OpenCL CTS in OVER 2 years (not even polaris/vega supports OpenCL 2.0 compared to their predecessors) and only Intel GPUs support OpenCL 2.1 for acceleration. AMD learned the hard way of how frustrating the other Khronos Group members are so they don't bother anymore and instead rolled their own software compute stack solution. AMD only throws a bone for OpenCL when needed so they don't support any OpenCL standard past 1.2 and they will support it over the ROCm stack from now on ...

A big hurdle to Intel is if they truly are entering the high end accelerator market then it will be a big shock for them to find out that OpenCL isn't enough ...
 
One benchmark is hardly worth any relevance though, here are others:


benchmark_fp32_with_tensorcore.png



benchmark_fp32_simple.png


http://blog.gpueater.com/en/2018/03/20/00006_tech_flops_benchmark_2/
 
For a first graph, please read a disclaimer below it on original site 🙄
*It’s not really ideal to make comparisons against FP32 where they should be made against FP16, and give exaggerated descriptions the way they do on NVIDIA’s official site, but please note that this graph does make comparisons between FP32 and TensorCore (FP16).
 
One benchmark is hardly worth any relevance though, here are others:

http://blog.gpueater.com/en/2018/03/20/00006_tech_flops_benchmark_2/

I agree that you can't just go off of one benchmark result, but both of the graphs you posted are far less relevant than the one in the OP, the first one isn't even using the updated software stack for the AMD cards (which is why the OP benchmark exists and is the newer result) and isn't relevant at all for this thread. The second one is better but is basically just a single precision FLOP benchmark which the author even states isn't very relevant for this type of comparison.

You basically just said to ignore the newer results that show better performance for AMD cards and that we should consider that AMD performed comparatively worse when using an outdated software stack. 😕
 
Last edited:
I agree that you can't just go off of one benchmark result, but both of the graphs you posted are far less relevant than the one in the OP, the first one isn't even using the updated software stack for the AMD cards (which is why the OP benchmark exists and is the newer result) and isn't relevant at all for this thread. The second one is better but is basically just a single precision FLOP benchmark which the author even states isn't very relevant for this type of comparison.

You basically just said to ignore the newer results that show better performance for AMD cards and that we should consider that AMD performed comparatively worse when using an outdated software stack. 😕
Why am I not surprised?
 
Back
Top