What common desktop applications are using AVX and AVX2?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Yes i know it has gather, but i haven't seen many people who have tested it say good things about it , ie its mirco-coded and very slow.

Depends on the CPU. From what I've read, Haswell has the slowest implementation and reputedly, Intel's priority with the gather instruction for Haswell wasn't performance, but adoption. In Broadwell-E the gather instruction is faster than in Haswell, and presumably, the Skylake implementation is faster than the Broadwell version.

So they are definitely improving it.

BroadwellCPU.png
 

Nothingness

Platinum Member
Jul 3, 2013
2,400
733
136
Its the sad thing about AVX-512 for skylake-X, many of the instructions to help with auto vectorization that are missing in AVX/2 are there at the vector width that most consumer/enterprise workloads/ data structures dont care about. It will be interesting over the next few years to see what happens in:

1. the intel consumer x86 AVX space
Given how Intel keep on fusing off AVX on many of their SKU years after it was introduced I don't expect much from them.
2. the amd x86 avx space
3. the arm phone/tablet SVE space
4. the arm server SVE space.
SVE is definitely better than AVX. I hope ARM will push it in smaller chips even with reduced vector length as it will help vectorization a lot.
 

NTMBK

Lifer
Nov 14, 2011
10,232
5,012
136
SVE is definitely better than AVX. I hope ARM will push it in smaller chips even with reduced vector length as it will help vectorization a lot.

SVE will be great for HPC (like Fujitsu supercomputers), but I don't think it makes any sense for phones and tablets. The only mobile workloads which might benefit from it are better off running on DSPs, GPUs or fixed-function hardware.
 

NTMBK

Lifer
Nov 14, 2011
10,232
5,012
136
Its the sad thing about AVX-512 for skylake-X, many of the instructions to help with auto vectorization that are missing in AVX/2 are there at the vector width that most consumer/enterprise workloads/ data structures dont care about.

Actually it looks like AVX-512 instructions should work fine on XMM and YMM (128bt/256bt) registers, the same way that AVX instructions worked on XMM registers- take a look in the Intel intrinsics guide. So you can use lovely stuff like masked operations and scatter even on shorter vector lengths.
 
  • Like
Reactions: Drazick

Nothingness

Platinum Member
Jul 3, 2013
2,400
733
136
SVE will be great for HPC (like Fujitsu supercomputers), but I don't think it makes any sense for phones and tablets. The only mobile workloads which might benefit from it are better off running on DSPs, GPUs or fixed-function hardware.
There are more places that can benefit from vectorization than what you think, in particular if SVE adds instructions for integer operations in the future. Or SIMD would already be pointless...
 

bobhumplick

Junior Member
Jun 29, 2018
8
1
41
if you enable an avx offset and play any unreal engine game you will see that the cpu stays throttled to the avx offset almost the whole time. at least for pubg and a couple others i have tried.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,326
10,034
126
FFmpeg - it uses AVX2 in its VP9 decoder (which is used in recent Firefox releases AFAIK, much faster than libpvx Chrome).
That's what I wanted to know. Trying to decide on a laptop for Mom, between an Intel Celeron 4205U (2C/2T, 1.80Ghz, $209), a Pentium Gold 5405U (2C/4T, 2.30Ghz, $269), or an i3-8145U (2C/4T, 2.10Ghz, Turbo to 3.90Ghz, $289).

The i3 has AVX/AVX2 support, and seems much faster in Passmark, especially the ST bench, mostly because of the beastly turbo clock (for mobile, at least).

If Firefox can utilize the AVX/AVX2, for VP9 decoding especially, since that's kind of CPU-heavy, that would be a big improvement in watching web videos. Then again, the laptop(s) in question are all 768P, which isn't all that heavy to decode, I suspect.
 

NTMBK

Lifer
Nov 14, 2011
10,232
5,012
136
That's what I wanted to know. Trying to decide on a laptop for Mom, between an Intel Celeron 4205U (2C/2T, 1.80Ghz, $209), a Pentium Gold 5405U (2C/4T, 2.30Ghz, $269), or an i3-8145U (2C/4T, 2.10Ghz, Turbo to 3.90Ghz, $289).

The i3 has AVX/AVX2 support, and seems much faster in Passmark, especially the ST bench, mostly because of the beastly turbo clock (for mobile, at least).

If Firefox can utilize the AVX/AVX2, for VP9 decoding especially, since that's kind of CPU-heavy, that would be a big improvement in watching web videos. Then again, the laptop(s) in question are all 768P, which isn't all that heavy to decode, I suspect.

Nice thread necromancy Larry!
 

Drazick

Member
May 27, 2009
53
70
91
Are you sure that vanilla Adobe products can use AVX? I agree that you can write custom plugins for them that use AVX, but I can't seem to find any documentation that talks about AVX support.

Yes I'm sure.
Look in Photoshop folder.
You'll see files from OpenCV and Intel IPP.
Both use AVX and AVX2 on supported CPU's.
 
  • Like
Reactions: Markfw