What common desktop applications are using AVX and AVX2?

cbn

Lifer
Mar 27, 2009
12,968
221
106
What common desktop applications are using AVX and AVX2?
 

tamz_msc

Diamond Member
Jan 5, 2017
3,698
3,547
136
Since AVX is aimed at improving parallelization, and most desktop applications are not suited for it, since contrary to popular belief parallelism is not the same as multi-threading, there are not that many applications outside specific use-case scenarios that employ AVX.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
One application that I am interested in is Firefox which has been working on improving Parallelization with the Servo engine.

The prototype seeks to create a highly parallel environment, in which many components (such as rendering, layout, HTML parsing, image decoding, etc.) are handled by fine-grained, isolated tasks. Source code for the project is written in the Rust programming language.

However, I couldn't find info yet on whether Firefox (or future builds of Firefox) uses AVX or not? According to Wikipedia Waterfox does use it.

EDIT: The following link from March 14, 2016 (for Rust Language) does mention AVX under notable changes:

https://this-week-in-rust.org/blog/2016/03/14/this-week-in-rust-122/
 
  • Like
Reactions: footballrunner800

cbn

Lifer
Mar 27, 2009
12,968
221
106
Perhaps for a future experiment a G4600 Kabylake Pentium could be compared to a Kabylake Core i3 (with its multiplier lowered to 36) to see if any differences in browsing speed could be found?

Reason: The Kabylake Pentiums don't have AVX (or AVX2) but the Kabylake Core i3s do.
 
  • Like
Reactions: footballrunner800

Drazick

Member
May 27, 2009
53
70
91
Any application which uses Intel IPP / Intel MKL.
Any application which uses OpenBLAS, Apple Acceleration Library, Eigen.
Most of the Signal / Image Video Processing Filters (And Image and Video Compression libraries).
Compression utilities.
So we got something like:
  1. Adobe Photoshop.
  2. Adobe After Effects.
  3. Adobe Premiere.
  4. Photoshop / After Effects / Premiere Plug In's.
  5. PeaZip / 7 Zip / WinRar.
  6. x264 / FFMPEG (Used by many desktop applications, for instance, Firefox and VLC Player).
  7. Excel.
  8. MATLAB / Octave / Python (Numpy, Scipy, TensorFlow, etc...).
  9. Julia.
  10. Numeric Libraries (Used intensively in many applications).
I can go on and even be more specific.

It is much easier to vectorize code than to parallelize it.
Hence advancement like AVX & AVX2 are welcome.

Pay attention that what needed to support efficient vectorization is localization of the data.
Hence we need CPU's with larger and faster (Or same speed to the least) cache system.
We also need more bandwidth for the Main Memory system.
 
Last edited:

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
There are some "hidden" usages as well. Java/C# virtual machines can generate AVX code automatically. Also there is plenty of AVX code in GPU drivers, even if you are not gaming, Windows desktop is using some of SSE/AVX codepath's in graphic's drivers.
 

RichUK

Lifer
Feb 14, 2005
10,320
672
126
There are some "hidden" usages as well. Java/C# virtual machines can generate AVX code automatically. Also there is plenty of AVX code in GPU drivers, even if you are not gaming, Windows desktop is using some of SSE/AVX codepath's in graphic's drivers.

What happens if AVX is not supported by the processor?
 

tamz_msc

Diamond Member
Jan 5, 2017
3,698
3,547
136
Any application which uses Intel IPP / Intel MKL.
Any application which uses OpenBLAS, Apple Acceleration Library, Eigen.
Most of the Signal / Image Video Processing Filters (And Image and Video Compression libraries).
Compression utilities.
So we got something like:
  1. Adobe Photoshop.
  2. Adobe After Effects.
  3. Adobe Premiere.
  4. Photoshop / After Effects / Premiere Plug In's.
  5. PeaZip / 7 Zip / WinRar.
  6. x264 / FFMPEG.
  7. Excel.
  8. MATLAB / Octave / Python (Numpy, Scipy, TensorFlow, etc...).
  9. Julia.
  10. Numeric Libraries (Used intensively in many applications).
I can go on and even be more specific.

It is much easier to vectorize code than to parallelize it.
Hence advancement like AVX & AVX2 are welcome.

Pay attention that what needed to support efficient vectorization is localization of the data.
Hence we need CPU's with larger and faster (Or same speed to the least) cache system.
We also need more bandwidth for the Main Memory system.
Are you sure that vanilla Adobe products can use AVX? I agree that you can write custom plugins for them that use AVX, but I can't seem to find any documentation that talks about AVX support.
 

Nothingness

Platinum Member
Jul 3, 2013
2,364
707
136
So essentially, the average user who primarily games has no need for AVX at present.
I wouldn't be surprised if GPU drivers were using AVX. But it remains to be seen if that'd make a difference :)

EDIT: I might be wrong as I found no AVX instruction in libnvidia-glcore.so.367.57...
 

Jan Olšan

Senior member
Jan 12, 2017
273
276
136
FFmpeg - it uses AVX2 in its VP9 decoder (which is used in recent Firefox releases AFAIK, much faster than libpvx Chrome). AVX is used in the zscale filter I think, and there is probably more usage, but not really in video decoders/encoders, since AVX is for floating point.
x265 uses AVX2 for substantial speedups, x264 has also some code, but not as much, so it helps less.
 
  • Like
Reactions: Drazick

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Don't know whether you consider PhysX a common desktop app, but it uses AVX for cloth simulation. And if I were a betting man, I would say that NVidia would use AVX2 for the CPU version of Flex whenever it is released, as it will use much more advanced physics effects like smoke and fluid simulation.
 

mayankleoboy1

Junior Member
Mar 19, 2017
1
1
41
Don't know whether you consider PhysX a common desktop app, but it uses AVX for cloth simulation. And if I were a betting man, I would say that NVidia would use AVX2 for the CPU version of Flex whenever it is released, as it will use much more advanced physics effects like smoke and fluid simulation.

Tomshardware did an article a few years back on PhysX.
They found that PhysX on CPU/Non-Nvidia GPU used x87 instructions. x87 is ancient, and much slower than even SSE2.

Edit: http://www.tomshardware.com/reviews/nvidia-physx-hack-amd-radeon,2764-5.html
Alsom this was checked by David Kanter
 
  • Like
Reactions: guachi

itsmydamnation

Platinum Member
Feb 6, 2011
2,731
3,063
136
It is much easier to vectorize code than to parallelize it.
Hence advancement like AVX & AVX2 are welcome.

Pay attention that what needed to support efficient vectorization is localization of the data.
Hence we need CPU's with larger and faster (Or same speed to the least) cache system.
We also need more bandwidth for the Main Memory system.

My understanding is that without effective scatter and gather operations its still very hard for a compiler to auto vectorize code outside of the "obvious". Given AVX and AVX2 lack scatter+gather and the ability of the average code monkey its little wonder why we only see limited cases of benefit.

Its the sad thing about AVX-512 for skylake-X, many of the instructions to help with auto vectorization that are missing in AVX/2 are there at the vector width that most consumer/enterprise workloads/ data structures dont care about. It will be interesting over the next few years to see what happens in:

1. the intel consumer x86 AVX space
2. the amd x86 avx space
3. the arm phone/tablet SVE space
4. the arm server SVE space.
 

mikk

Diamond Member
May 15, 2012
4,108
2,100
136
  • Like
Reactions: Carfax83

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Forget this article, it doesn't use x87 instructions anymore since Physx 3 in 2011.

Thanks for pointing that out. It's hard to believe that crap about PhysX not supporting SSE2/SIMD instructions has persisted this long on the net :confused_old:
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
My understanding is that without effective scatter and gather operations its still very hard for a compiler to auto vectorize code outside of the "obvious". Given AVX and AVX2 lack scatter+gather and the ability of the average code monkey its little wonder why we only see limited cases of benefit.

AVX2 has the gather instruction, but it doesn't have scatter.