How well does code with polymorphism and virtual functions run on those vector units? I can guarantee you it will be bailing on vectorization and falling back to scalar code.
"All of C++" isn't a terribly useful target for dense computational code, because a lot of those features will trash performance. On the fly memory allocation and deallocation, polymorphism, exceptions all spring to mind.
Suppose for a moment that what you said were true ...
Nvidia on the other hand sees the contrary since they just made ANOTHER compiler for a different programming language compared to their proprietary spec'd CUDA kernel language so they must at least see some value in attempting to run standard C++ code ...
Here are a list of
limitations of NVC++ for our entertainment:
C++ Parallel Algorithms and device function annotations (in violation of the "one definition rule")
C++ Parallel Algorithms and CUDA Unified Memory (referencing CPU stack memory from GPU is forbidden)
C++ Parallel Algorithms and function pointers (function pointers can't be passed between the CPU and GPU)
Random-access iterators (using forward or bidirectional iterators results in compilation errors)
Interoperability with the C++ Standard Library (no usage of I/O functions or any functions that involve running the OS is permitted)
No exceptions in GPU code (this is self explanatory to you and it's the same thing plagues CUDA as well)
For one that believes Nvidia is all so modest about their C++ capabilities, they sure like to keep bragging about how their GPUs can run the most amount of C++ code ...
They see things like independent thread scheduling (especially this since they want to badly run locks on GPUs which will have horrendous performance over there), libcu++, and NVC++ as major milestones to their competitive compute stack advantage ... (they even have a good chunk of representation on the C++ committee as well)
The major irony in all of this is that their NVC++ compiler is only designed to accelerate a
specific set of C++17 Parallel STL algorithms while with big boy (conformant) C++ compilers on CPUs can run all of your C++17 PSTL algorithms on AVX-512 and can be used to exploit sources of parallelism outside of that domain as well ...
With AVX-512, no programmer would have to deal with nearly much nonsense that comes out of GPUs or other similar forms of heterogeneous compute and C++ might go in a direction Nvidia doesn't like such as exposing transactional memory in a future C++ standard. Nvidia sure are envious (pun intended) about how CPUs can run standard C++ which is why they like to keep imitating them with GPUs because they truly seem to be in desperate need to purchase a major CPU ISA licensee like ARM Ltd ...