Question Is there really any need for Intel to keep AVX-512 alive now?

igor_kavinski · Sep 20, 2020

Now that Intel has a very capable GPU with Xe, does anyone think that AVX-512 won't last long? It was a knee jerk reaction to nVidia's compute prowess but now Intel can just as easily switch their compute workloads over to their Xe GPU units and get better power efficiency too. Am I right or wrong?

GotNoRice · Sep 20, 2020

It still gives them something that they can use for marketing since AMD CPUs don't have it.

Kryohi · Sep 20, 2020

As an intermediate C programmer (plus Python, Julia and some others) I still have no clue how am I supposed to write code that properly supports all new-ish gpus, is not overly complicated and time-consuming to write and actually provides a speed-up even on low-end gpus.

To me every time I read people saying that AVX is useless I wonder if they are HPC programmers for some Nvidia supercomputer, writing code for that machine only.
Because otherwise, yes in theory gpus are great, but in practice how many desktop programs are gpu-accelerated? Why x265 uses AVX512 but not CUDA or OPENCL?

Edit: there's also the problem that consumer gpus have criminally crippled FP64 performance.

igor_kavinski · Sep 20, 2020

Kryohi said:
As an intermediate C programmer (plus Python, Julia and some others) I still have no clue how am I supposed to write code that properly supports all new-ish gpus, is not overly complicated and time-consuming to write and actually provides a speed-up even on low-end gpus.

To me every time I read people saying that AVX is useless I wonder if they are HPC programmers for some Nvidia supercomputer, writing code for that machine only.
Because otherwise, yes in theory gpus are great, but in practice how many desktop programs are gpu-accelerated? Why x265 uses AVX512 but not CUDA or OPENCL?

Have you tried OpenCL? Maybe the developers of x265 were sponsored by Intel to write the AVX-512 codepath? I'm not saying that AVX-512 is useless but maybe superfluous now that all of the major "powers that be" of the PC world have a competent GPU.

Also, you might find this interesting: http://www.cudahandbook.com/2017/06/ten-years-later-why-cuda-succeeded/

DrMrLordX · Sep 20, 2020

CUDA is probably the best thing going for GPGPU right now. I'm loathe to admit it, but nVidia has the best programming interface for GPUs. It remains to be seen if Intel can do any better with oneAPI. The main problem with GPGPU for consumer workloads is latency - how long does it take to build a kernel, send the kernel to the card, process data, and return a result that is usable to the program/end user? AVX512 (and SVE/SVE2 - see Fujitsu A64FX) has a pretty big advantage when latency like that is critical to the performance of the application.

TheELF · Sep 20, 2020

igor_kavinski said:
Have you tried OpenCL? Maybe the developers of x265 were sponsored by Intel to write the AVX-512 codepath? I'm not saying that AVX-512 is useless but maybe superfluous now that all of the major "powers that be" of the PC world have a competent GPU.

Yeah it's superfluous...if you pay how much more minimum?
How much is the cheapest GPU that has avx-512 right now?
Also making a software that needs x CPU and later is much better (easier to sell) than making a software that needs x cpu and y gpu.
Until the competent gpus come for free there will be a need for having it on cpu, the same extremely niche one that exists until now but still.

NTMBK · Sep 20, 2020

GPGPU for consumer workflows is basically dead, because we never got a good API that was well supported across all GPUs. Hell, OpenCL has had to pull features back out in an effort to get it better supported- they've basically abandoned OpenCL 2, and reverted to building off 1.2.

CUDA is great for applications where you control the target platform, like running on a specific supercomputer, a workstation that is guaranteed to have a Quadro, or a device that you ship with an Nvidia GPU. But when the vast majority of consumer devices run integrated graphics, and a sizeable minority of the gamer market runs on AMD, the economics just don't work. Why spend developer time on a proprietary API that most users can't benefit from?

Vattila · Sep 20, 2020

Kryohi said:
As an intermediate C programmer (plus Python, Julia and some others) I still have no clue how am I supposed to write code that properly supports all new-ish gpus

As an alternative to low-level OpenCL, have a look at hipSYCL — "a modern SYCL implementation targeting CPUs and GPUs, with a focus on leveraging existing toolchains such as CUDA or HIP". A nice feature of SYCL is that it is an embedded domain-specific language (DSL) implemented in pure standard C++.

igor_kavinski · Sep 21, 2020

It will take a few years for AVX-512 to become prevalent in consumer PCs (it's only available in 10th gen and onwards mobile chips so far). Intel could still decide to nip it in its bud and keep it as a "feature" of its server CPU's OR make it a separate drop-in chip (like the now ancient maths co-processor) for people who really need it. I bet they could do a lot of performance boosting stuff with the freed silicon real estate that helps everyone instead of just the specific computing workloads of a few.

A/// · Sep 21, 2020

igor_kavinski said:
It will take a few years for AVX-512 to become prevalent in consumer PCs

Which variations would that be?

beginner99 · Sep 21, 2020

Vattila said:
As an alternative to low-level OpenCL, have a look at hipSYCL — "a modern SYCL implementation targeting CPUs and GPUs,

OK, the graph is nice but does it work in practice? What projects are actually using this?

And as always, no windows support which by default relegates it to the server / HPC area while you can run x.265 on your icelake or tigerlake laptop and profit from AVX-512.

igor_kavinski · Sep 21, 2020

A/// said:
Which variations would that be?

From Wikipedia:

Ice Lake: AVX-512 F, CD, VL, DQ, BW, IFMA, VBMI, VBMI2, VPOPCNTDQ, BITALG, VNNI, VPCLMULQDQ, GFNI, VAES
Tiger Lake: AVX512 F, VL, BW, DQ, CD, VBMI, IFMA, VBMI2, VPOPCNTDQ, BITALG, VNNI, VPCLMULQDQ, GFNI, VAES, VP2INTERSECT
Rocket Lake:
Alder Lake:

By the way, thanks to Intel shooting itself in the foot, their CPU market share dropped from a high of 82.5% to below 65% in less than 4 years: https://www.statista.com/statistics/735904/worldwide-x86-intel-amd-market-share/

They need to get Unreal and Unity engines accelerated with AVX-512 ASAP if they ever hope to reclaim their glory days.

A/// · Sep 21, 2020

Sorry, Igor. I didn't mean to come off as an ass earlier. I was genuinely curious because AVX512 units take up considerable space on a die. You may be right. I can see Intel going down that path in the future once they go chiplets as per their Silicon Days event or whatever their self-glory weekend with Raja spearheading the bs'ery was about.

piokos · Sep 21, 2020

igor_kavinski said:
Now that Intel has a very capable GPU with Xe, does anyone think that AVX-512 won't last long? It was a knee jerk reaction to nVidia's compute prowess but now Intel can just as easily switch their compute workloads over to their Xe GPU units and get better power efficiency too. Am I right or wrong?

Confused.

AVX-512 is a CPU instruction set. GPUs are GPUs.
GPUs may or may not be available in a system. And they are targeted in a very different way - (currently) forcing a massive refactoring of a program - way beyond what's needed for AVX.

Sure, some CPU loads can be migrated to a GPU with noticeable performance boost. That's why companies invest into extra layer of abstraction - like Intel's oneAPI.
But many loads will still run better on a CPU - even with a GPU and oneAPI around. And AVX-512 can make them faster. So why not?

igor_kavinski · Sep 21, 2020

piokos said:
But many loads will still run better on a CPU - even with a GPU and oneAPI around. And AVX-512 can make them faster. So why not?

For one thing, AVX-512 instructions eat into the power budget of the CPU, forcing it to downclock to avoid overheating and slowing everything else down until the workload finishes.

From https://www.mjr19.org.uk/IT/clocks.html: "Very few (no?) Intel CPUs can sustain their standard clock-speed when executing long, dense sequences of AVX-512 instructions."

piokos · Sep 21, 2020

igor_kavinski said:
For one thing, AVX-512 instructions eat into the power budget of the CPU, forcing it to downclock to avoid overheating and slowing everything else down until the workload finishes.

The question was whether AVX-512 makes sense, not whether it can be sustained for long periods.

Processors can't sustain their max load with or without AVX-512. It's normal. CPUs are designed for flexibility - especially those for consumer segment.
I don't know why people on an "enthusiast forum" still don't get this.

Of course AVX-512 increases power consumption. And adding a GPU to the system doesn't?
Extra performance is not fueled by magic dust.

From https://www.mjr19.org.uk/IT/clocks.html: "Very few (no?) Intel CPUs can sustain their standard clock-speed when executing long, dense sequences of AVX-512 instructions."

Is this supposed to be a joke?

NTMBK · Sep 21, 2020

igor_kavinski said:
For one thing, AVX-512 instructions eat into the power budget of the CPU, forcing it to downclock to avoid overheating and slowing everything else down until the workload finishes.

From https://www.mjr19.org.uk/IT/clocks.html: "Very few (no?) Intel CPUs can sustain their standard clock-speed when executing long, dense sequences of AVX-512 instructions."

AVX-512 downclocking is much less of a problem on Ice Lake than it used to be: https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html Hopefully this carries over to the server version of the core, too.

TheELF · Sep 21, 2020

igor_kavinski said:
By the way, thanks to Intel shooting itself in the foot, their CPU market share dropped from a high of 82.5% to below 65% in less than 4 years: https://www.statista.com/statistics/735904/worldwide-x86-intel-amd-market-share/

Thanks to Intel shooting itself in the foot, their CPU market share dropped from a high of 82.5% to below 65% in less than 4 years
...
...
increasing their clean income by 100%

They sell less product and make double the money and they didn't even increase the cost of their cpus (per core) .
If that's shooting yourself in the foot then please hand me the gun.

Intel net income 2024| Statista

In 2024, Intel recorded a net loss of ***** billion U.S.

www.statista.com

igor_kavinski · Sep 21, 2020

NTMBK said:
AVX-512 downclocking is much less of a problem on Ice Lake than it used to be: https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html Hopefully this carries over to the server version of the core, too.

Thanks! That's a great tool. Now need to check the impact of running AVX-512 in multitasking scenarios. Hopefully, I may get access to a Tiger Lake laptop for a short while in December or early next year.

piokos · Sep 21, 2020

NTMBK said:
AVX-512 downclocking is much less of a problem on Ice Lake than it used to be: https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html Hopefully this carries over to the server version of the core, too.

True.

But overall, IMO the main reason why so many people on forums criticize AVX-512 power consumption (beside just pure hatred towards Intel) is looking at wrong processors and wrong benchmarks.
Because when a 300W HEDT CPU jumps to 500W because of AVX-512, it absolutely looks ridiculous from a consumer perspective.
But when a 100W workstation/server CPU jumps to 150W, it may not be a problem at all.

Anyway, this anti-AVX-512 movement is probably temporary. AMD will implement it at some point.
Just recall the initial reaction of AMD fans to hardware RTRT compared to what they say today - weeks before AMD-powered consoles make it a mainstream feature. And AFAIK it hasn't even been confirmed that AMD designed the actual ASIC.

igor_kavinski · Sep 21, 2020

piokos said:
Anyway, this anti-AVX-512 movement is probably temporary. AMD will implement it at some point.

That might happen but the way AMD chooses to implement them should be interesting. AVX-512 was born out of Tom Forsyth's need to run shaders on the CPU during the Larrabee project. So if mostly GPU related instructions were devised to run on the CPU, then conceivably backporting them to the GPU may also be possible and AMD might do it that way, running them across dozens of iGPU cores rather than paltry few CPU cores in the typically affordable CPUs.

NTMBK · Sep 21, 2020

igor_kavinski said:
That might happen but the way AMD chooses to implement them should be interesting. AVX-512 was born out of Tom Forsyth's need to run shaders on the CPU during the Larrabee project. So if mostly GPU related instructions were devised to run on the CPU, then conceivably backporting them to the GPU may also be possible and AMD might do it that way, running them across dozens of iGPU cores rather than paltry few CPU cores in the typically affordable CPUs.

As Larrabee demonstrated, running x86 instructions on a GPU isn't a terribly good idea! It's just not an instruction set that was designed for it, and it leads to inefficient hardware.

DrMrLordX · Sep 21, 2020

igor_kavinski said:
I bet they could do a lot of performance boosting stuff with the freed silicon real estate that helps everyone instead of just the specific computing workloads of a few.

Maybe true on their server/HEDT cores that have 2x512b AVX units. Cannonlake, IceLake, and TigerLake appear to have the same AVX unit config that Intel has used on the desktop since Haswell: 2x256b. AVX512 is handled via op fusion.

KompuKare · Sep 21, 2020

TheELF said:
increasing their clean income by 100%

They sell less product and make double the money and they didn't even increase the cost of their cpus (per core) .
If that's shooting yourself in the foot then please hand me the gun.

Intel net income 2024| Statista

In 2024, Intel recorded a net loss of ***** billion U.S.

www.statista.com

Yes, good for shareholders, but why should consumers care?
Are they re-investing that extra profit into making stuff consumers can benefit from?
The low-volume high-margin drive resulted in Intel being out of the mobile phone market.
Short-term good, long-term bad. And having belatedly thrown $billion at contra-revenue didn't help them

Selling less for more is very good in the short-term but long-term marketshare matters. A lot. TSMC are where they are now, because lots of low(er) margin vendors went to them and the guaranteed high volumes enable them to invest in the longer term. Apple paying premium for risk production of the latest node isn't enough for this, having long-term high-volume customers on lower margins willing to keep those fabs busy when Apple are on the next node is at least equally as important.

That's not to say that for shareholders, Intel's record profits aren't impressive. It's not many businesses who can have security issues which for each percentage of performance lost, their customers buy almost the same percentage more of their stuff!

piokos · Sep 21, 2020

igor_kavinski said:
That might happen but the way AMD chooses to implement them should be interesting. AVX-512 was born out of Tom Forsyth's need to run shaders on the CPU during the Larrabee project. So if mostly GPU related instructions were devised to run on the CPU, then conceivably backporting them to the GPU may also be possible and AMD might do it that way, running them across dozens of iGPU cores rather than paltry few CPU cores in the typically affordable CPUs.

Again: you can't assume a system has GPU cores. To make AVX-512 a mainstream solution, it has to be implemented on the CPU - at least as a usable fallback.

Question Is there really any need for Intel to keep AVX-512 alive now?

Lifer

Senior member

Member

Lifer

Lifer

Diamond Member

Lifer

Senior member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Senior member

Lifer

Senior member

Lifer

Diamond Member

Lifer

Senior member

Lifer

Lifer

Lifer

Golden Member

Senior member