Question Is there really any need for Intel to keep AVX-512 alive now?

Jul 27, 2020
16,313
10,336
106
Now that Intel has a very capable GPU with Xe, does anyone think that AVX-512 won't last long? It was a knee jerk reaction to nVidia's compute prowess but now Intel can just as easily switch their compute workloads over to their Xe GPU units and get better power efficiency too. Am I right or wrong?
 

Kryohi

Junior Member
Nov 12, 2019
16
17
81
As an intermediate C programmer (plus Python, Julia and some others) I still have no clue how am I supposed to write code that properly supports all new-ish gpus, is not overly complicated and time-consuming to write and actually provides a speed-up even on low-end gpus.

To me every time I read people saying that AVX is useless I wonder if they are HPC programmers for some Nvidia supercomputer, writing code for that machine only.
Because otherwise, yes in theory gpus are great, but in practice how many desktop programs are gpu-accelerated? Why x265 uses AVX512 but not CUDA or OPENCL?

Edit: there's also the problem that consumer gpus have criminally crippled FP64 performance.
 
Last edited:
Jul 27, 2020
16,313
10,336
106
As an intermediate C programmer (plus Python, Julia and some others) I still have no clue how am I supposed to write code that properly supports all new-ish gpus, is not overly complicated and time-consuming to write and actually provides a speed-up even on low-end gpus.

To me every time I read people saying that AVX is useless I wonder if they are HPC programmers for some Nvidia supercomputer, writing code for that machine only.
Because otherwise, yes in theory gpus are great, but in practice how many desktop programs are gpu-accelerated? Why x265 uses AVX512 but not CUDA or OPENCL?
Have you tried OpenCL? Maybe the developers of x265 were sponsored by Intel to write the AVX-512 codepath? I'm not saying that AVX-512 is useless but maybe superfluous now that all of the major "powers that be" of the PC world have a competent GPU.

Also, you might find this interesting: http://www.cudahandbook.com/2017/06/ten-years-later-why-cuda-succeeded/
 

DrMrLordX

Lifer
Apr 27, 2000
21,633
10,845
136
CUDA is probably the best thing going for GPGPU right now. I'm loathe to admit it, but nVidia has the best programming interface for GPUs. It remains to be seen if Intel can do any better with oneAPI. The main problem with GPGPU for consumer workloads is latency - how long does it take to build a kernel, send the kernel to the card, process data, and return a result that is usable to the program/end user? AVX512 (and SVE/SVE2 - see Fujitsu A64FX) has a pretty big advantage when latency like that is critical to the performance of the application.
 

TheELF

Diamond Member
Dec 22, 2012
3,973
730
126
Have you tried OpenCL? Maybe the developers of x265 were sponsored by Intel to write the AVX-512 codepath? I'm not saying that AVX-512 is useless but maybe superfluous now that all of the major "powers that be" of the PC world have a competent GPU.
Yeah it's superfluous...if you pay how much more minimum?
How much is the cheapest GPU that has avx-512 right now?
Also making a software that needs x CPU and later is much better (easier to sell) than making a software that needs x cpu and y gpu.
Until the competent gpus come for free there will be a need for having it on cpu, the same extremely niche one that exists until now but still.
 

NTMBK

Lifer
Nov 14, 2011
10,237
5,020
136
GPGPU for consumer workflows is basically dead, because we never got a good API that was well supported across all GPUs. Hell, OpenCL has had to pull features back out in an effort to get it better supported- they've basically abandoned OpenCL 2, and reverted to building off 1.2.

CUDA is great for applications where you control the target platform, like running on a specific supercomputer, a workstation that is guaranteed to have a Quadro, or a device that you ship with an Nvidia GPU. But when the vast majority of consumer devices run integrated graphics, and a sizeable minority of the gamer market runs on AMD, the economics just don't work. Why spend developer time on a proprietary API that most users can't benefit from?
 

Vattila

Senior member
Oct 22, 2004
799
1,351
136
As an intermediate C programmer (plus Python, Julia and some others) I still have no clue how am I supposed to write code that properly supports all new-ish gpus

As an alternative to low-level OpenCL, have a look at hipSYCL — "a modern SYCL implementation targeting CPUs and GPUs, with a focus on leveraging existing toolchains such as CUDA or HIP". A nice feature of SYCL is that it is an embedded domain-specific language (DSL) implemented in pure standard C++.

SYCL implementations
 
Last edited:
Jul 27, 2020
16,313
10,336
106
It will take a few years for AVX-512 to become prevalent in consumer PCs (it's only available in 10th gen and onwards mobile chips so far). Intel could still decide to nip it in its bud and keep it as a "feature" of its server CPU's OR make it a separate drop-in chip (like the now ancient maths co-processor) for people who really need it. I bet they could do a lot of performance boosting stuff with the freed silicon real estate that helps everyone instead of just the specific computing workloads of a few.
 

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
As an alternative to low-level OpenCL, have a look at hipSYCL — "a modern SYCL implementation targeting CPUs and GPUs,

OK, the graph is nice but does it work in practice? What projects are actually using this?

And as always, no windows support which by default relegates it to the server / HPC area while you can run x.265 on your icelake or tigerlake laptop and profit from AVX-512.
 
  • Like
Reactions: Tlh97 and Vattila
Jul 27, 2020
16,313
10,336
106
Which variations would that be?
From Wikipedia:
  • Ice Lake: AVX-512 F, CD, VL, DQ, BW, IFMA, VBMI, VBMI2, VPOPCNTDQ, BITALG, VNNI, VPCLMULQDQ, GFNI, VAES
  • Tiger Lake: AVX512 F, VL, BW, DQ, CD, VBMI, IFMA, VBMI2, VPOPCNTDQ, BITALG, VNNI, VPCLMULQDQ, GFNI, VAES, VP2INTERSECT
  • Rocket Lake:
  • Alder Lake:
By the way, thanks to Intel shooting itself in the foot, their CPU market share dropped from a high of 82.5% to below 65% in less than 4 years: https://www.statista.com/statistics/735904/worldwide-x86-intel-amd-market-share/

They need to get Unreal and Unity engines accelerated with AVX-512 ASAP if they ever hope to reclaim their glory days.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
Sorry, Igor. I didn't mean to come off as an ass earlier. I was genuinely curious because AVX512 units take up considerable space on a die. You may be right. I can see Intel going down that path in the future once they go chiplets as per their Silicon Days event or whatever their self-glory weekend with Raja spearheading the bs'ery was about.
 

piokos

Senior member
Nov 2, 2018
554
206
86
Now that Intel has a very capable GPU with Xe, does anyone think that AVX-512 won't last long? It was a knee jerk reaction to nVidia's compute prowess but now Intel can just as easily switch their compute workloads over to their Xe GPU units and get better power efficiency too. Am I right or wrong?
Confused.

AVX-512 is a CPU instruction set. GPUs are GPUs.
GPUs may or may not be available in a system. And they are targeted in a very different way - (currently) forcing a massive refactoring of a program - way beyond what's needed for AVX.

Sure, some CPU loads can be migrated to a GPU with noticeable performance boost. That's why companies invest into extra layer of abstraction - like Intel's oneAPI.
But many loads will still run better on a CPU - even with a GPU and oneAPI around. And AVX-512 can make them faster. So why not?
 
  • Like
Reactions: Tlh97 and lobz
Jul 27, 2020
16,313
10,336
106
But many loads will still run better on a CPU - even with a GPU and oneAPI around. And AVX-512 can make them faster. So why not?

For one thing, AVX-512 instructions eat into the power budget of the CPU, forcing it to downclock to avoid overheating and slowing everything else down until the workload finishes.

From https://www.mjr19.org.uk/IT/clocks.html: "Very few (no?) Intel CPUs can sustain their standard clock-speed when executing long, dense sequences of AVX-512 instructions."
 

piokos

Senior member
Nov 2, 2018
554
206
86
For one thing, AVX-512 instructions eat into the power budget of the CPU, forcing it to downclock to avoid overheating and slowing everything else down until the workload finishes.
The question was whether AVX-512 makes sense, not whether it can be sustained for long periods.

Processors can't sustain their max load with or without AVX-512. It's normal. CPUs are designed for flexibility - especially those for consumer segment.
I don't know why people on an "enthusiast forum" still don't get this.

Of course AVX-512 increases power consumption. And adding a GPU to the system doesn't?
Extra performance is not fueled by magic dust.
From https://www.mjr19.org.uk/IT/clocks.html: "Very few (no?) Intel CPUs can sustain their standard clock-speed when executing long, dense sequences of AVX-512 instructions."
Is this supposed to be a joke?
 

NTMBK

Lifer
Nov 14, 2011
10,237
5,020
136
For one thing, AVX-512 instructions eat into the power budget of the CPU, forcing it to downclock to avoid overheating and slowing everything else down until the workload finishes.

From https://www.mjr19.org.uk/IT/clocks.html: "Very few (no?) Intel CPUs can sustain their standard clock-speed when executing long, dense sequences of AVX-512 instructions."

AVX-512 downclocking is much less of a problem on Ice Lake than it used to be: https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html Hopefully this carries over to the server version of the core, too.
 
  • Like
Reactions: Tlh97 and moinmoin

TheELF

Diamond Member
Dec 22, 2012
3,973
730
126
By the way, thanks to Intel shooting itself in the foot, their CPU market share dropped from a high of 82.5% to below 65% in less than 4 years: https://www.statista.com/statistics/735904/worldwide-x86-intel-amd-market-share/
Thanks to Intel shooting itself in the foot, their CPU market share dropped from a high of 82.5% to below 65% in less than 4 years
...
...
increasing their clean income by 100%

They sell less product and make double the money and they didn't even increase the cost of their cpus (per core) .
If that's shooting yourself in the foot then please hand me the gun.
 
  • Like
Reactions: Magic Carpet
Jul 27, 2020
16,313
10,336
106

piokos

Senior member
Nov 2, 2018
554
206
86
AVX-512 downclocking is much less of a problem on Ice Lake than it used to be: https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html Hopefully this carries over to the server version of the core, too.
True.

But overall, IMO the main reason why so many people on forums criticize AVX-512 power consumption (beside just pure hatred towards Intel) is looking at wrong processors and wrong benchmarks.
Because when a 300W HEDT CPU jumps to 500W because of AVX-512, it absolutely looks ridiculous from a consumer perspective.
But when a 100W workstation/server CPU jumps to 150W, it may not be a problem at all.

Anyway, this anti-AVX-512 movement is probably temporary. AMD will implement it at some point.
Just recall the initial reaction of AMD fans to hardware RTRT compared to what they say today - weeks before AMD-powered consoles make it a mainstream feature. And AFAIK it hasn't even been confirmed that AMD designed the actual ASIC. ;)
 
Jul 27, 2020
16,313
10,336
106
Anyway, this anti-AVX-512 movement is probably temporary. AMD will implement it at some point.

That might happen but the way AMD chooses to implement them should be interesting. AVX-512 was born out of Tom Forsyth's need to run shaders on the CPU during the Larrabee project. So if mostly GPU related instructions were devised to run on the CPU, then conceivably backporting them to the GPU may also be possible and AMD might do it that way, running them across dozens of iGPU cores rather than paltry few CPU cores in the typically affordable CPUs.
 
Last edited:

NTMBK

Lifer
Nov 14, 2011
10,237
5,020
136
That might happen but the way AMD chooses to implement them should be interesting. AVX-512 was born out of Tom Forsyth's need to run shaders on the CPU during the Larrabee project. So if mostly GPU related instructions were devised to run on the CPU, then conceivably backporting them to the GPU may also be possible and AMD might do it that way, running them across dozens of iGPU cores rather than paltry few CPU cores in the typically affordable CPUs.

As Larrabee demonstrated, running x86 instructions on a GPU isn't a terribly good idea! It's just not an instruction set that was designed for it, and it leads to inefficient hardware.
 

DrMrLordX

Lifer
Apr 27, 2000
21,633
10,845
136
I bet they could do a lot of performance boosting stuff with the freed silicon real estate that helps everyone instead of just the specific computing workloads of a few.

Maybe true on their server/HEDT cores that have 2x512b AVX units. Cannonlake, IceLake, and TigerLake appear to have the same AVX unit config that Intel has used on the desktop since Haswell: 2x256b. AVX512 is handled via op fusion.
 

KompuKare

Golden Member
Jul 28, 2009
1,016
932
136
increasing their clean income by 100%

They sell less product and make double the money and they didn't even increase the cost of their cpus (per core) .
If that's shooting yourself in the foot then please hand me the gun.
Yes, good for shareholders, but why should consumers care?
Are they re-investing that extra profit into making stuff consumers can benefit from?
The low-volume high-margin drive resulted in Intel being out of the mobile phone market.
Short-term good, long-term bad. And having belatedly thrown $billion at contra-revenue didn't help them

Selling less for more is very good in the short-term but long-term marketshare matters. A lot. TSMC are where they are now, because lots of low(er) margin vendors went to them and the guaranteed high volumes enable them to invest in the longer term. Apple paying premium for risk production of the latest node isn't enough for this, having long-term high-volume customers on lower margins willing to keep those fabs busy when Apple are on the next node is at least equally as important.

That's not to say that for shareholders, Intel's record profits aren't impressive. It's not many businesses who can have security issues which for each percentage of performance lost, their customers buy almost the same percentage more of their stuff!
 
  • Like
Reactions: Tlh97 and moinmoin

piokos

Senior member
Nov 2, 2018
554
206
86
That might happen but the way AMD chooses to implement them should be interesting. AVX-512 was born out of Tom Forsyth's need to run shaders on the CPU during the Larrabee project. So if mostly GPU related instructions were devised to run on the CPU, then conceivably backporting them to the GPU may also be possible and AMD might do it that way, running them across dozens of iGPU cores rather than paltry few CPU cores in the typically affordable CPUs.
Again: you can't assume a system has GPU cores. To make AVX-512 a mainstream solution, it has to be implemented on the CPU - at least as a usable fallback.