Question Is there really any need for Intel to keep AVX-512 alive now?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Jul 27, 2020
24,361
16,953
146
Now that Intel has a very capable GPU with Xe, does anyone think that AVX-512 won't last long? It was a knee jerk reaction to nVidia's compute prowess but now Intel can just as easily switch their compute workloads over to their Xe GPU units and get better power efficiency too. Am I right or wrong?
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,028
3,800
136
Maybe true on their server/HEDT cores that have 2x512b AVX units. Cannonlake, IceLake, and TigerLake appear to have the same AVX unit config that Intel has used on the desktop since Haswell: 2x256b. AVX512 is handled via op fusion.
Thats not quite correct, its not op fusion, its one 512bit unit that can do 2x256 bit operations/has the issue ports etc. The key different is there is no boundary at the 256bit point like there is at 128 for avx/avx2.
 

semiman

Member
May 9, 2020
77
68
91
There are workloads which cannot be offloaded to GPUs. Sometimes workloads are too small to be offloaded. Copying 512bit of data to GPU process them, then copying it back to DRAM? It takes forever and doesn't really give us any benefit.
 

Jorgp2

Junior Member
Dec 19, 2018
21
11
81
For one thing, AVX-512 instructions eat into the power budget of the CPU, forcing it to downclock to avoid overheating and slowing everything else down until the workload finishes.

From https://www.mjr19.org.uk/IT/clocks.html: "Very few (no?) Intel CPUs can sustain their standard clock-speed when executing long, dense sequences of AVX-512 instructions."

AVX-512 is designed to improve net performance and perf/power, which it both succeeds in doing.

And as for Skylake-X AVX-512, it only down clocks the core that is using AVX-512. Intel has had per core clocking since Broadwell Server, to improve consistency in virtualized scenarios.
 
  • Like
Reactions: semiman and Edrick

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
Thanks to Intel shooting itself in the foot, their CPU market share dropped from a high of 82.5% to below 65% in less than 4 years
...
...
increasing their clean income by 100%

They sell less product and make double the money and they didn't even increase the cost of their cpus (per core) .
If that's shooting yourself in the foot then please hand me the gun.
Why would you wanna turn this topic into a financial discussion?
 
  • Like
Reactions: KompuKare

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
Why would you wanna turn this topic into a financial discussion?
If you get down to it this topic is a financial topic,the only reason intel included avx 512 was because its customers wanted it and would pay for it and the only reason for them to stop using it would be if they ever go ahead and do that fabled CPU without all the legacy support.
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
GPUs STILL can't run ALL standard C++ code and they don't have a conformant C++ compiler implementation either. The closest 'standard' C++ compiler for GPUs is NVC++ ...

Meanwhile on AVX-512, you can use it on ANY C++ kernel. Compared to GPUs, AVX-512 is more flexible from a programming model perspective and it offers latency advantages as well ...
 

NTMBK

Lifer
Nov 14, 2011
10,401
5,638
136
GPUs STILL can't run ALL standard C++ code and they don't have a conformant C++ compiler implementation either. The closest 'standard' C++ compiler for GPUs is NVC++ ...

Meanwhile on AVX-512, you can use it on ANY C++ kernel. Compared to GPUs, AVX-512 is more flexible from a programming model perspective and it offers latency advantages as well ...

How well does code with polymorphism and virtual functions run on those vector units? I can guarantee you it will be bailing on vectorization and falling back to scalar code.

"All of C++" isn't a terribly useful target for dense computational code, because a lot of those features will trash performance. On the fly memory allocation and deallocation, polymorphism, exceptions all spring to mind.
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
How well does code with polymorphism and virtual functions run on those vector units? I can guarantee you it will be bailing on vectorization and falling back to scalar code.

"All of C++" isn't a terribly useful target for dense computational code, because a lot of those features will trash performance. On the fly memory allocation and deallocation, polymorphism, exceptions all spring to mind.

Suppose for a moment that what you said were true ...

Nvidia on the other hand sees the contrary since they just made ANOTHER compiler for a different programming language compared to their proprietary spec'd CUDA kernel language so they must at least see some value in attempting to run standard C++ code ...

Here are a list of limitations of NVC++ for our entertainment:

C++ Parallel Algorithms and device function annotations (in violation of the "one definition rule")
C++ Parallel Algorithms and CUDA Unified Memory (referencing CPU stack memory from GPU is forbidden)
C++ Parallel Algorithms and function pointers (function pointers can't be passed between the CPU and GPU)
Random-access iterators (using forward or bidirectional iterators results in compilation errors)
Interoperability with the C++ Standard Library (no usage of I/O functions or any functions that involve running the OS is permitted)
No exceptions in GPU code (this is self explanatory to you and it's the same thing plagues CUDA as well)

For one that believes Nvidia is all so modest about their C++ capabilities, they sure like to keep bragging about how their GPUs can run the most amount of C++ code ...

They see things like independent thread scheduling (especially this since they want to badly run locks on GPUs which will have horrendous performance over there), libcu++, and NVC++ as major milestones to their competitive compute stack advantage ... (they even have a good chunk of representation on the C++ committee as well)

The major irony in all of this is that their NVC++ compiler is only designed to accelerate a specific set of C++17 Parallel STL algorithms while with big boy (conformant) C++ compilers on CPUs can run all of your C++17 PSTL algorithms on AVX-512 and can be used to exploit sources of parallelism outside of that domain as well ...

With AVX-512, no programmer would have to deal with nearly much nonsense that comes out of GPUs or other similar forms of heterogeneous compute and C++ might go in a direction Nvidia doesn't like such as exposing transactional memory in a future C++ standard. Nvidia sure are envious (pun intended) about how CPUs can run standard C++ which is why they like to keep imitating them with GPUs because they truly seem to be in desperate need to purchase a major CPU ISA licensee like ARM Ltd ...