Question Is there really any need for Intel to keep AVX-512 alive now?

igor_kavinski · Sep 20, 2020

Now that Intel has a very capable GPU with Xe, does anyone think that AVX-512 won't last long? It was a knee jerk reaction to nVidia's compute prowess but now Intel can just as easily switch their compute workloads over to their Xe GPU units and get better power efficiency too. Am I right or wrong?

itsmydamnation · Sep 21, 2020

DrMrLordX said:
Maybe true on their server/HEDT cores that have 2x512b AVX units. Cannonlake, IceLake, and TigerLake appear to have the same AVX unit config that Intel has used on the desktop since Haswell: 2x256b. AVX512 is handled via op fusion.

Thats not quite correct, its not op fusion, its one 512bit unit that can do 2x256 bit operations/has the issue ports etc. The key different is there is no boundary at the 256bit point like there is at 128 for avx/avx2.

semiman · Sep 21, 2020

There are workloads which cannot be offloaded to GPUs. Sometimes workloads are too small to be offloaded. Copying 512bit of data to GPU process them, then copying it back to DRAM? It takes forever and doesn't really give us any benefit.

Jorgp2 · Sep 21, 2020

igor_kavinski said:
For one thing, AVX-512 instructions eat into the power budget of the CPU, forcing it to downclock to avoid overheating and slowing everything else down until the workload finishes.

From https://www.mjr19.org.uk/IT/clocks.html: "Very few (no?) Intel CPUs can sustain their standard clock-speed when executing long, dense sequences of AVX-512 instructions."

AVX-512 is designed to improve net performance and perf/power, which it both succeeds in doing.

And as for Skylake-X AVX-512, it only down clocks the core that is using AVX-512. Intel has had per core clocking since Broadwell Server, to improve consistency in virtualized scenarios.

lobz · Sep 22, 2020

TheELF said:
Thanks to Intel shooting itself in the foot, their CPU market share dropped from a high of 82.5% to below 65% in less than 4 years
...
...
increasing their clean income by 100%

They sell less product and make double the money and they didn't even increase the cost of their cpus (per core) .
If that's shooting yourself in the foot then please hand me the gun.

Intel net income 2024| Statista

In 2024, Intel recorded a net loss of ***** billion U.S.

www.statista.com

Why would you wanna turn this topic into a financial discussion?

piokos · Sep 22, 2020

lobz said:
Why would you wanna turn this topic into a financial discussion?

He wasn't the one who started it, was he?

TheELF · Sep 22, 2020

lobz said:
Why would you wanna turn this topic into a financial discussion?

If you get down to it this topic is a financial topic,the only reason intel included avx 512 was because its customers wanted it and would pay for it and the only reason for them to stop using it would be if they ever go ahead and do that fabled CPU without all the legacy support.

DrMrLordX · Sep 22, 2020

itsmydamnation said:
Thats not quite correct, its not op fusion, its one 512bit unit that can do 2x256 bit operations/has the issue ports etc. The key different is there is no boundary at the 256bit point like there is at 128 for avx/avx2.

Huh! Interesting. Learn something new every day.

ThatBuzzkiller · Sep 25, 2020

GPUs STILL can't run ALL standard C++ code and they don't have a conformant C++ compiler implementation either. The closest 'standard' C++ compiler for GPUs is NVC++ ...

Meanwhile on AVX-512, you can use it on ANY C++ kernel. Compared to GPUs, AVX-512 is more flexible from a programming model perspective and it offers latency advantages as well ...

NTMBK · Sep 25, 2020

ThatBuzzkiller said:
GPUs STILL can't run ALL standard C++ code and they don't have a conformant C++ compiler implementation either. The closest 'standard' C++ compiler for GPUs is NVC++ ...

Meanwhile on AVX-512, you can use it on ANY C++ kernel. Compared to GPUs, AVX-512 is more flexible from a programming model perspective and it offers latency advantages as well ...

How well does code with polymorphism and virtual functions run on those vector units? I can guarantee you it will be bailing on vectorization and falling back to scalar code.

"All of C++" isn't a terribly useful target for dense computational code, because a lot of those features will trash performance. On the fly memory allocation and deallocation, polymorphism, exceptions all spring to mind.

ThatBuzzkiller · Sep 25, 2020

NTMBK said:
How well does code with polymorphism and virtual functions run on those vector units? I can guarantee you it will be bailing on vectorization and falling back to scalar code.

"All of C++" isn't a terribly useful target for dense computational code, because a lot of those features will trash performance. On the fly memory allocation and deallocation, polymorphism, exceptions all spring to mind.

Suppose for a moment that what you said were true ...

Nvidia on the other hand sees the contrary since they just made ANOTHER compiler for a different programming language compared to their proprietary spec'd CUDA kernel language so they must at least see some value in attempting to run standard C++ code ...

Here are a list of limitations of NVC++ for our entertainment:

C++ Parallel Algorithms and device function annotations (in violation of the "one definition rule")
C++ Parallel Algorithms and CUDA Unified Memory (referencing CPU stack memory from GPU is forbidden)
C++ Parallel Algorithms and function pointers (function pointers can't be passed between the CPU and GPU)
Random-access iterators (using forward or bidirectional iterators results in compilation errors)
Interoperability with the C++ Standard Library (no usage of I/O functions or any functions that involve running the OS is permitted)
No exceptions in GPU code (this is self explanatory to you and it's the same thing plagues CUDA as well)

For one that believes Nvidia is all so modest about their C++ capabilities, they sure like to keep bragging about how their GPUs can run the most amount of C++ code ...

They see things like independent thread scheduling (especially this since they want to badly run locks on GPUs which will have horrendous performance over there), libcu++, and NVC++ as major milestones to their competitive compute stack advantage ... (they even have a good chunk of representation on the C++ committee as well)

The major irony in all of this is that their NVC++ compiler is only designed to accelerate a specific set of C++17 Parallel STL algorithms while with big boy (conformant) C++ compilers on CPUs can run all of your C++17 PSTL algorithms on AVX-512 and can be used to exploit sources of parallelism outside of that domain as well ...

With AVX-512, no programmer would have to deal with nearly much nonsense that comes out of GPUs or other similar forms of heterogeneous compute and C++ might go in a direction Nvidia doesn't like such as exposing transactional memory in a future C++ standard. Nvidia sure are envious (pun intended) about how CPUs can run standard C++ which is why they like to keep imitating them with GPUs because they truly seem to be in desperate need to purchase a major CPU ISA licensee like ARM Ltd ...

Search

Question Is there really any need for Intel to keep AVX-512 alive now?

igor_kavinski

Lifer

itsmydamnation

Diamond Member

semiman

Member

Jorgp2

Junior Member

lobz

Platinum Member

Intel net income 2024| Statista

piokos

Senior member

TheELF

Diamond Member

DrMrLordX

Lifer

ThatBuzzkiller

Golden Member

NTMBK

Lifer

ThatBuzzkiller

Golden Member

TRENDING THREADS