Which DC projects/subprojects use avx-512 ?

Markfw · Aug 18, 2023

I thought it was one of the Primegrid tasks and possibly more, but now I can't find it documented anywhere. Anything that supports it, I will run for the team and leave my medical for the competition.

StefanR5R · Aug 19, 2023

The LLR2 application makes use of AVX-512. The following projects are built on LLR2:

All of the PrimeGrid LLR subprojects,
all SRBase subprojects except the GPU project "TF",
the LLR2 testing subproject at Private GFN Server.

The PRST application, which is similar to LLR2 and is run at Private GFN Server's subproject of the same name, supports AVX-512 too.
As does the genefer22 application, which is used by

PrimeGrid GFN-15…GFN-22, if they are run on CPUs instead of a GPU.¹

All of these projects are concerned with finding primes, or with proving/ disproving conjectures connected with primes. For now I haven't heard of other active Distributed Computing projects whose applications benefit from AVX-512.

Perhaps Folding@Home's CPU-only FAHCore_a8 uses AVX-512, perhaps not. The older FAHCore_a7 most likely does not. Both are based on GROMACS which offers AVX-512 support, but the GROMACS builds in the F@H cores might not have it. If you enable a F@H CPU slot on an AVX-512 capable computer (probably with at most 64 logical CPUs to suit FAHCore_a8's limitations, IIRC), the client log will probably show you which SIMD flavor is being used.

________
¹) edited: GFN-15 is based on genefer22 too, since April.

Markfw · Aug 19, 2023

StefanR5R said:
The LLR2 application makes use of AVX-512. The following projects are built on LLR2:

All of the PrimeGrid LLR subprojects,

all SRBase subprojects except the GPU project "TF",

the LLR2 testing subproject at Private GFN Server.

The PRST application, which is similar to LLR2 and is run at Private GFN Server's subproject of the same name, supports AVX-512 too.
As does the genefer22 application, which is used by

PrimeGrid GFN-16…GFN-22, if they are run on CPUs instead of a GPU.

All of these projects are concerned with finding primes, or with proving/ disproving conjectures connected with primes. For now I haven't heard of other active Distributed Computing projects whose applications benefit from AVX-512.

Perhaps Folding@Home's CPU-only FAHCore_a8 uses AVX-512, perhaps not. The older FAHCore_a7 most likely does not. Both are based on GROMACS which offers AVX-512 support, but the GROMACS builds in the F@H cores might not have it. If you enable a F@H CPU slot on an AVX-512 capable computer (probably with at most 64 logical CPUs to suit FAHCore_a8's limitations, IIRC), the client log will probably show you which SIMD flavor is being used.

Thanks Stefan !

StefanR5R · Sep 17, 2023

SRBase plans to migrate from LLR2 to PRST and started an open beta test now.

Edit: migration is done now.

mmonnin03 · Sep 17, 2023

I've been running the PRST tasks at Private GFN for WUProp hours. On my 7950x with 6x concurrent tasks they run 5-6 hours. Running 32x tasks they slowed down quite a lot to 12-13 hours. each. With 6x tasks, running the remaining threads on SPT also slowed down those 6 tasks. I'm guessing the caches were being filled. Can these still be mt to fill up the threads with fewer tasks?

rebirther updated that PRST isn't matching the results of llr so testing has stopped.

StefanR5R · Dec 11, 2023

It's possible that the applications which we are talking about here could see another large speed increase going from AMD Zen 4 to Zen 5, per core and per clock. (Not sure if AVX2 applications, i.e. ones with 256 bit vectors, will be addressed by these core upgrades too. From how AMD handled such things in the past, I'd say yes, but who knows.)

But given that there will only be a minor update to the manufacturing node, such a speedup would also come at the cost of almost proportional increase of power consumption.

Assimilator1 · Dec 13, 2023

I got this message from Asteroids@home talking about an AVX512 update :-

Asteroids@home: New AVX512 application released
We are very proud to announce our new set of optimized applications that will utilize AVX512 instruction set capable engines or to be precise those, which support AVX512dq instructions!

These applications are built to support both Linux and Windows 64bit architecture OS. The development of this version was possible thanks to the great help provided by ahorek's team !

Unfortunately it turns out that BOINC client applications for Windows still do not report all processor options to the server correctly. It is because of a known bug and even after a lot of discussions in BOINC's channels it's still there. The good news is that thanks to ahorek's team a bugfix was already accepted and merged into the BOINC's repository and the fix will be applied when client version 7.26.0 is released. Till then in order to run the AVX512 application you might need to switch to the Anonymous platform.

We'd like to remind you that while the Boinc server is capable of finding the best performing application for every particular system taking into account multiple factors, after a while it will start sending the right one for every particular system. Which means that even if your CPU supports AVX512dq instructions it still might receive FMA or AVX tasks and there is nothing to be concerned about. In such a case you might want to give a try to the so-called Anonymous platform where your client will explicitly request the AVX512 application.

Happy crunching and thank you for your support!
Asteroids@home's team
More info here - https://asteroidsathome.net/boinc/forum_thread.php?id=988

StefanR5R · Dec 13, 2023

StefanR5R said:
Other posters in the CPU subforum have linked to it before, but anyway:
mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > Zen4's AVX512 Teardown
An analysis by the author of y-cruncher. One of the points to take home: Although theoretical peak throughput of Zen3/Zen4 AVX-256 and Zen4 AVX-512 is the same clock-for-clock, moving an application to AVX-512 on Zen4 can reduce bottlenecks of the CPU's frontend ( = utilize the execution units better), and also reduce energy spent in the CPU's frontend ( = spend respectively more of the overall power budget in the execution units and elsewhere).

The Genefer application for CPUs has got a command line switch which toggles between different instruction sets:
-x <implementation> set a specific implementation (i32, sse2, sse4, avx, fma, 512)
I tried the default AVX512 and also the Zen-3-style FMA3 (-x fma) on EPYC 9554P @ 400W with genefer -n 20 -b 2615062. This workunit gets 34,066.53 credit.
FMA3:

Code:

tasks x threads, affinity |  avg. task duration   | tasks/day | points/day | power | efficiency
--------------------------+-----------------------+-----------+------------+-------+------------
8x8, ascending            |   5:54:07 =   21247 s |      32.5 |  1,108,218 | 475 W | 2,330 PPD/W

AVX512:

Code:

tasks x threads, affinity |  avg. task duration   | tasks/day | points/day | power | efficiency
--------------------------+-----------------------+-----------+------------+-------+------------
8x8, ascending            |   5:09:43 =   18583 s |      37.1 |  1,267,104 | 474 W | 2,670 PPD/W

So in this specific case, AVX512 gives +14.3 % throughput and +14.6 % power efficiency over AVX2 FMA3.

(Same workunit on RTX 4090 with Kaby Lake and Z270 PC: 1,771,460 PPD; 380 W; 4,700 PPD/W)

StefanR5R · Aug 22, 2024

I just noticed that the author of LLR2 deprecated it in favor of PRST. I am not aware whether or not PrimeGrid plan to replace LLR2 eventually.

On May 1 Pavel Atnashev said:
DEPRECATED: The source code is no longer maintained and no further releases are planned. Replaced by PRST utility.

(https://github.com/patnashev/llr2)

Ken g6 · Aug 23, 2024

I'm pretty sure PrimeGrid is already using PRST, or something very similar. They just couldn't be bothered to change the file names.

But the change happened around the time fast double-checks were introduced.

StefanR5R · Aug 23, 2024

The fast check was added by the LLR --> LLR2 transition (and genefer --> genefer22). ~~Right now, PRST is only installed for the Primorial prime search project.~~ (apps.php)

Whoops, you are right: stderr.txt from a random llrSR5 task:

Code:

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<stderr_txt>
PRST version 9.0.766, GWnum library version 30.12
Using all-complex AVX-512 FFT length 1152K, Pass1=128, Pass2=9K, clm=4, 4 threads.
Fermat probabilistic test of 62698*5^4906582+1, a = 3, complexity = 11525498.
Gerbicz-Li check enabled, L2 = 319*279.
Saving 128 proof points.
Testing complete.
62698*5^4906582+1 compressed 128 points to 7 products, time: 13.6 s.
Done.
15:14:25 (12780): called boinc_finish(0)

</stderr_txt>
]]>

But llr321 for example is still on LLR2. random llr321 task:

Code:

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<stderr_txt>
BOINC PrimeGrid wrapper 2.02 (Nov 23 2020 23:35:38)
running ../../projects/www.primegrid.com/llr2_1.3.0_win64_220821.exe -v
LLR2 Program - Version 1.3.0, using Gwnum Library Version 30.9
running ../../projects/www.primegrid.com/llr2_1.3.0_win64_220821.exe -oGerbicz=1 -oProofName=proof -oProofCount=128 -oProductName=prod -oPietrzak=1 -oCachePoints=0 -pSavePoints -q3*2^22008906-1 -d -t4 -oDiskWriteTime=1
Gerbicz check is requested, switching to PRP.
Starting probable prime test of 3*2^22008906-1
Using AVX-512 FFT length 1200K, Pass1=640, Pass2=1920, clm=2, 4 threads, a = 3, L2 = 539*319, M = 171941
Compressed 128 points to 7 products.  Time : 41.962 sec.
Testing complete.
15:03:43 (11464): called boinc_finish(0)

</stderr_txt>
]]>

Search

Which DC projects/subprojects use avx-512 ?

Markfw

Moderator Emeritus, Elite Member

StefanR5R

Elite Member

Markfw

Moderator Emeritus, Elite Member

StefanR5R

Elite Member

mmonnin03

Senior member

StefanR5R

Elite Member

Assimilator1

Elite Member

StefanR5R

Elite Member

StefanR5R

Elite Member

Ken g6

Programming Moderator, Elite Member

StefanR5R

Elite Member

TRENDING THREADS