Speculation: SYCL will replace CUDA

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

What does the future hold for heterogeneous programming models?


  • Total voters
    26

Vattila

Senior member
Oct 22, 2004
799
1,351
136
It looks more like SYCL is an extra abstraction layer on top of lower level GPU APIs like CUDA or OpenCL, than it is a replacement for either.

Although some implementations may rest upon other programming models and supporting frameworks — such as hipSYCL using HIP (AMD's CUDA dialect) to provide cross-platform support on top of AMD ROCm and Nvidia CUDA frameworks and toolchains — there is nothing in the SYCL 2020 standard that mandates a particular implementation. You can make a SYCL 2020 compliant implementation using pure machine code, if you should want to.

With regards to abstraction level, the SYCL programming model is similar to CUDA and HIP. They are all providing a single-source programming model, using similar fundamental concepts (kernels, scheduling, memory access, etc.), and all are based on C++, compiling down to high-performance native machine code for the targeted system architecture. Porting between them is feasible and can be relatively straight-forward, with the help of automation, as AMD has proven with their numerous recent supercomputer wins.

For example, here is a blog post on porting from CUDA to HIP for the LUMI supercomputer:

"As mentioned, HIP is AMD’s answer to CUDA, however, whereas CUDA code can only run on Nvidia GPUs, programs using HIP can run on both AMD and Nvidia GPUs. The HIP API syntax is very similar to the CUDA API, and the abstraction level is the same meaning that porting between the two is easy and we will cover the practical ways this can be done below. [...] In the end converting CUDA code to HIP is usually quite straightforward, with the catch being that the most bleeding edge CUDA features are not supported but may be supported in the future. The AMD GPU software stack comes with tools that will significantly speed up the conversion process compared to doing it manually."


Intel provides similar tools in their oneAPI framework for automating the translation of CUDA code to SYCL.
 
Last edited:

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Recent Khronos SYCL Webinar, with presentations from all the major SYCL implementers (oneAPI/DPC++, ComputeCPP, triSYCL, hipSYCL and neoSYCL):


Below is a short presentation on SYCL in GROMACS (a heavily used library in the supercomputer world — apparently ~4-5% of the worlds' total supercomputing time is spent running GROMACS!). Kudos to Intel for doing the right thing here!

 
Last edited:

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Here are a few recent Intel articles on their support for SYCL:
Good to see Intel talk more about SYCL explicitly (not obscured by DPC++/oneAPI).

Here are your current options for deploying SYCL code:

1644490703201.png

PS. Note that support for AMD GPUs in DPC++/oneAPI will arrive as well, as CodePlay Software has been contracted by the USA national labs to implement such support (see announcement). Also note that the diagram above does not show the full list of SYCL implementations currently existing, as e.g. Huawei has an implementation of SYCL for their recently announced Beiming architecture.
 
  • Like
Reactions: moinmoin

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
Does Nvidia support it yet?
No, but you can use hipSYCL or oneAPI's DPC++ to use SYCL on CUDA. The latter is being supported by Lawrence Berkeley National Laboratory and Argonne National Laboratory:


ANL also did a performance comparison with CUDA and hipSYCL:


 
  • Like
Reactions: Vattila

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
SYCL performance looks pretty good except on N-body. Interesting that it actually speeds up a few of those kernels from the first performance suite considerably (and slows down on some others). Shows that CUDA has some room for improvement.
 

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Intel is working on implementing the Blender renderer using SYCL:

"Opened up at the end of March is the work-in-progress Intel oneAPI back-end for Blender's Cycles renderer. This Intel GPU back-end focused for supporting the company's forthcoming Intel Arc graphics cards is targeting the open-source oneAPI Base Toolkit and making use of SYCL. There still is more code work needed, but it's good to see this coming together to complement Blender's NVIDIA CUDA and AMD HIP support. [...] With Blender 3.0 having removed OpenCL acceleration, at least until there is any viable Vulkan back-end developed it's been up to vendor-specific rendering back-ends with the NVIDIA CUDA/OptiX code leading the way followed by AMD HIP. (With Blender 3.2 this summer is where the AMD HIP acceleration on Linux will finally be in place.) Intel engineers recently sent out for review their code adding a Cycles back-end for Intel GPUs via oneAPI and the SYCL API. Via the industry standard SYCL, this back-end could potentially be used with other driver stacks in the future."

Blender Cycles Rendering Support For Intel Arc Via oneAPI + SYCL Under Review - Phoronix
 
  • Like
Reactions: moinmoin

Vattila

Senior member
Oct 22, 2004
799
1,351
136
GROMACS 2022 ported to SYCL:

"As part of the oneAPI optimization work, Lindahl’s team ported GROMACS’ CUDA code, which only runs on Nvidia hardware, to SYCL using the Intel DPC++ Compatibility Tool (part of the Intel oneAPI Base Toolkit), which typically automates 90-95% of the code migration. This allowed the team to create a new, single portable codebase that is cross-architecture-ready, greatly streamlining development and providing flexibility for deployment in multiarchitecture environments."

GROMACS 2022 Advances Open Source Drug Discovery with oneAPI (hpcwire.com)
 
  • Like
Reactions: moinmoin

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Intel makes CUDA-to-SYCL porting easier with SYCLomatic, an open-source migration tool:

"There are many things Intel needs to get right with OneAPI if the chipmaker wants its multi-pronged compute strategy to work the same kind of magic Nvidia has conjured with CUDA, and the first thing it needs is an easy way for developers to port code from CUDA. Intel upped its CUDA migration efforts this month by open sourcing the technologies powering the Intel DPC++ Compatibility Tool, which is used for moving code from CUDA to OneAPI’s Data Parallel C++ language. But rather than herding developers into OneAPI, the new open source tool, called SYCLomatic, focuses on simply helping move that code to SYCL, the royalty-free, cross-architecture programming abstraction layer that underpins Intel’s parallel-friendly C++ implementation."

Intel Takes The SYCL To Nvidia’s CUDA With Migration Tool (nextplatform.com)

1653199144467.png
 
Last edited:

Vattila

Senior member
Oct 22, 2004
799
1,351
136

Saylick

Diamond Member
Sep 10, 2012
3,084
6,184
136
Intel is on a mission to tear down the walled garden that is CUDA.
 

NTMBK

Lifer
Nov 14, 2011
10,208
4,940
136
Honestly not sure what you want to imply there. Do you think Intel has a bad track record with open source or that Khronos does a bad job handling its abstraction layer standards in a manufacturer agnostic way?

Codeplay were an independent provider of compilers, and they were the ones pushing SYCL for years- meaning SYCL wasn't just an Intel effort. Now they've been acquired, so SYCL is now more obviously an Intel thing.

The only implementations of SYCL listed on the Khronos website are Codeplay's, Intel's, and a few "research projects" which are nowhere near production ready.
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
Codeplay were an independent provider of compilers, and they were the ones pushing SYCL for years- meaning SYCL wasn't just an Intel effort. Now they've been acquired, so SYCL is now more obviously an Intel thing.

The only implementations of SYCL listed on the Khronos website are Codeplay's, Intel's, and a few "research projects" which are nowhere near production ready.
But this is not a competition where independent implementations make the money. This is a royalty-free open standard which succeeds by seeing widespread uptake. Intel could maintain it as a CUDA-like solution for Intel hardware alone, but it's in Intel own best interest that also users of Nvidia and AMD hardware would want to use SYCL as that increases mobility between competitive hardware. AMD wants that as well, after all who doesn't want a share of the CUDA cake. Nvidia obviously want to keep that cake for itself.
 

NTMBK

Lifer
Nov 14, 2011
10,208
4,940
136
But this is not a competition where independent implementations make the money. This is a royalty-free open standard which succeeds by seeing widespread uptake. Intel could maintain it as a CUDA-like solution for Intel hardware alone, but it's in Intel own best interest that also users of Nvidia and AMD hardware would want to use SYCL as that increases mobility between competitive hardware. AMD wants that as well, after all who doesn't want a share of the CUDA cake. Nvidia obviously want to keep that cake for itself.

An "open" standard where one vendor makes all the compilers? And where it just happens to work better on that vendor's harder? Yeah sure okay...
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
An "open" standard where one vendor makes all the compilers? And where it just happens to work better on that vendor's harder? Yeah sure okay...
Most standards start out with a single entity pushing it. Somewhere everything has to start.

Once again: Do you think Intel has a bad track record with open source or that Khronos does a bad job handling its abstraction layer standards in a manufacturer agnostic way?
 

Vattila

Senior member
Oct 22, 2004
799
1,351
136
I wonder whether Codeplay was ever in talks with AMD about a takeover. I think it would have been more promising for SYCL as a standard, if AMD had shown interest in contributing. But, alas, I think AMD is laser focussed on taking the least resistance path to CUDA replacement, which is HIP. The latter seems to have achieved impressive acceptance in the supercomputing space (possibly due to the existence of hipSYCL, as well).

That said, Codeplay will now have significantly more resources for developing SYCL solutions:

"Intel wants to be the most open of the platform suppliers, and that is because it has to be. Nvidia, as the undisputed GPU compute leader (excepting a few big supercomputers in the United States and Europe), can build a moat around the CUDA platform and its libraries and just continue to make money by giving this software away for “free.” [...] Intel wants the oneAPI stack to not only be free, but open [...] because this will spur adoption of its software and de-risk the choice of Intel hardware to develop applications. Code developed in DPC++ atop SYCL and accessing the oneAPI libraries can run on GPUs from Intel, AMD, or Nvidia. [...] There is no question that Intel needs Codeplay to increase the odds of oneAPI being adopted outside of its own compute engines, but it is also true that Codeplay needed the might of Intel to expand its operations and make SYCL and DPC++ more pervasive than it could be alone."

To Cure Iron Anemia With SYCL, Intel Buys Codeplay (nextplatform.com)
 
Last edited:
  • Like
Reactions: moinmoin

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
Between Intel and AMD the former definitely has more resources, more experience with the software development involved, as well as more experience with handling open source projects and open standards. So in that regard I don't see the outcome as a bad match.

From the AMD side the focus on SYCL just hasn't been there before. This may change now with Xilinx and its experience both with SYCL and software development (which it takes over for the whole combined company), but obviously for this particular development this was too late.
 
  • Like
Reactions: Vattila

Vattila

Senior member
Oct 22, 2004
799
1,351
136
An "open" standard where one vendor makes all the compilers? [...] Yeah sure okay...

Fear not (yet). Codeplay will run as a subsidiary, and the leading people are passionate about open standards and heterogenous computing, in particular Codeplay founder and CEO Andrew Richards, and on the Intel side, SYCL evangelist and book author James Reinders. However, if any of those two should leave, alarm bells should go off.

"This is an exciting moment for the industry and will help enable the team to fulfil our vision of bringing open standards programming to all. Codeplay will continue to work in partnership with organisations across the industry to enable open standard software on the latest cutting edge processors. Being a catalyst for industry innovation is what Codeplay was created to do, and it is what gets us excited to come to work." (Andrew Richards, Codeplay CEO)

Scots software firm Codeplay acquired by Intel (dailybusinessgroup.co.uk)

"James Reinders believes the full benefits of the evolution to full heterogeneous computing will be best realized with an open, multivendor, multiarchitecture approach. Reinders rejoined Intel [in 2020], specifically because he believes Intel can meaningfully help realize this open future."

Solving Heterogeneous Programming Challenges with SYCL (hpcwire.com)

"Through the subsidiary structure, we plan to foster Codeplay’s unique entrepreneurial spirit and open ecosystem approach for which it is known and respected in the industry." (Joe Curley, VP of Intel’s Software and Advanced Technology Group)

Intel set to acquire Codeplay Software (insider.co.uk)

PS. And don't forget the ultimate goal of SYCL; to contribute to the support for heterogenous system programming in ISO standard C++, which will benefit everyone, regardless of any possible detrimental Intel influence on SYCL. In this regard, a key person is, as I have mentioned before, Michael Wong, Vice President of Research and Development at Codeplay Software, Canadian Head of Delegation to the ISO C++ Standard, founding member of the ISO C++ Directions group, and other C++ related positions.
 
Last edited:
  • Like
Reactions: moinmoin

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
Looking at the wiki entry for Codeplay, another aspect that makes Intel a good match is Codeplay's involvement with RISC-V development:
  • In 2020, Codeplay announced collaboration with NSITEXE and Kyoto Microcomputer to develop OpenCL and SYCL support for RISC-V
  • In 2021, Codeplay announced partnership with Andes to achieve Software First SoC design for AI-based applications using RISC-V Vector Processors
Codeplay also chairs the Datacentre SIG for RISC-V.
Intel on the other hand wants to offer RISC-V IPs as well as part of IFS 2.0.
 
  • Like
Reactions: Vattila

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Here is a nice article by Neil Trevett — Khronos President, as well as Vice President of Developer Ecosystems at Nvidia — on the need for standards in the embedded space for the growing use of compute, vision and inference acceleration:

"SYCL (pronounced ‘sickle’), enables code for heterogeneous processors to be written together with the host application code in a “single-source” file using standard C++ for portable acceleration on a wide range of hardware. SYCL will typically compile the part of the C++ application to be offloaded onto an acceleration processor into a lower-level API such as OpenCL. [...]"

"Khronos has also recently formed the SYCL Safety-Critical Exploratory Forum to investigate industry requirements for a general parallel programming API for accelerated compute using SYCL’s standard C++ single source programming model in safety-critical markets. [...]"

"The integration of increasing use of compute, vision and inferencing acceleration in embedded products is resulting in significant new business opportunities and Khronos is developing a growing family of open, royalty-free API standards relevant to meet the rapidly evolving needs of embedded and safety-critical markets. [...]"


How Open Acceleration Standards are Driving Safety-Critical Development - Embedded Computing Design
 
  • Like
Reactions: moinmoin

Vattila

Senior member
Oct 22, 2004
799
1,351
136
And here is a nice recent article from EEJournal, providing some background on SYCL and CUDA, with a specific focus on Intel's oneAPI role, as well as some commentary on Intel's recent acquisition of CodePlay:

"Nvidia’s first GPU, the GeForce 256, appeared in 1999. That’s the same year that Ian Buck started working on his PhD at Stanford University, where he developed Brook, a series of extensions for the C programming language that allowed software developers to harness programmable graphics hardware, specifically GPUs, to perform general-purpose computations. Buck’s Brook extensions transformed GPUs into streaming, highly parallel, vector coprocessors. Nvidia liked what it saw in Brook, hired Buck in 2004, and introduced a more generalized set of parallel programming language extensions called CUDA in 2006. [...]"

"Intel’s Raja Koduri announced the oneAPI initiative and its core programming language, Data Parallel C++ (DPC++), during Intel’s Architecture Day in 2018. DPC++ is based on ISO C++ and the Khronos Group’s SYCL standards. [...]"

"Codeplay currently offers SYCL compilers that can target either Nvidia or AMD GPUs. However, an Intel blog posted on June 1 announced that the company had signed an agreement to acquire Codeplay, so it’s likely that Intel’s GPUs will soon be supported as well. [...]"

"Codeplay used DPCT to convert N-body kernel code written in CUDA into SYCL code. The N-body kernel uses multidimensional vector math to simulate the motion of multiple particles under the influence of various physical forces. Codeplay compiled the resulting SYCL version of the N-body kernel directly, with no additional optimization or tuning. The original CUDA version of the N-body code kernel ran in 10.2 milliseconds while the converted DPC++ version of the N-body kernel ran in 8.79 milliseconds. Both results are for the same Nvidia GPU target. That’s a 14% performance improvement for machine-translated code. [...]"

"The contract to implement a DPC++ compiler to support AMD GPU-based supercomputers that Argonne National Laboratory in collaboration with Oak Ridge National Laboratory awarded to Codeplay in mid June is an encouraging sign that Codeplay and DPC++ will continue to support processors and accelerators from multiple vendors. It all looks very hopeful, and it will be interesting to see how this all plays out in the future."


Intel oneAPI and DPC++: One Programming Language to Rule Them All (CPUs, GPUs, FPGAs, etc) – EEJournal
 
  • Like
Reactions: moinmoin