Info PrimeGrid Challenges 2024, sieve-free edition

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,049
15,191
136
Just in time for the heat ! 90 here for the next several days, or more ! I shut down 5 boxes. 8 still running. But 2 of the 5 are the 9554's thats a lot of heat gone. And after I test my new 4090, I will shut down another 9554. Back to WCG, Rosetta and F@H ! Waiting for the 9950x to build a replacement for the 7950x I gave my son.
 

StefanR5R

Elite Member
Dec 10, 2016
5,885
8,747
136
DateTime UTCProject(s)Best onChallengeDuration
5​
8-13 August​
08:08:00Factorial
Primorial
CPU?(Tentative) International Cat Day Challenge5 days
"Primorial Sieve on GFN Server" has reached the point after which the "Primorial Prime Search Project" can start on PrimeGrid. The application for the latter project has been installed now and can be selected by users in their project preferences, but workunits have not been loaded onto the server yet. They'll get around to this soon enough though, which means the challenge is going to happen as planned.
 

crashtech

Lifer
Jan 4, 2013
10,573
2,145
146
It'd be nice to get a testing regimen in order for PRST, I don't think the currently available scripts will work without significant modification.
 

StefanR5R

Elite Member
Dec 10, 2016
5,885
8,747
136
My hope is that they will work after insignificant modifications. (Writing from my workplace, and it's not looking like it will get any better during the 17d13h remaining until the start of the challenge.)

Edit, stderr of several results from one of Tony's Ryzen 5000s say "Using Montgomery reduction FMA3 FFT length 2x288K". I guess this tells us that these task want 4.50 MBytes cache each. ... And Ryzen 7000: "Using Montgomery reduction AVX-512 FFT length 2x288K".
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,049
15,191
136
I won't have my 9950x my then, and its hot here, so I may be out of this unless the team need me.
 

crashtech

Lifer
Jan 4, 2013
10,573
2,145
146
No, it's still at version 1.00 (mt), installed on July 19...
Interesting, because I got some early results that don't match up with my retesting of the same configuration. Doing all testing with the same work unit would be better, or I need to see if my setup is inconsistent somehow.
 

StefanR5R

Elite Member
Dec 10, 2016
5,885
8,747
136
Did you check run times locally, or in the results list on the web site? I suspect the latter's bookkeeping might still sometimes be wrong with multithreaded workunits.
 

crashtech

Lifer
Jan 4, 2013
10,573
2,145
146
Did you check run times locally, or in the results list on the web site? I suspect the latter's bookkeeping might still sometimes be wrong with multithreaded workunits.
I didn't know they had a problem, thanks for the heads up!
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,330
4,005
75
So, in case you hadn't noticed yet, PrimeGrid is having a sort of mini-challenge.

"More primorial sieving is required on GFN Server!"

They forgot to sieve a range and just now realized they might need it for the upcoming challenge. :oops:

Also, in case you hadn't noticed yet, I'm doing WUs a little faster than before. I upgraded from a GTX 1060 to an RTX 4070. :D
 

StefanR5R

Elite Member
Dec 10, 2016
5,885
8,747
136
It'd be nice to get a testing regimen in order for PRST, I don't think the currently available scripts will work without significant modification.
My hope is that they will work after insignificant modifications.
Hopes shattered… I started looking into this. Unfortunately, I found no way to extract a progress percentage or time remaining when prst runs in standalone mode. Therefore, one would have to complete an entire workunit in order to measure performance. Which would make testing very time consuming if done on the "main" tasks, or require several input files if done on "verification" tasks. (And who knows how well verification task performance reflects main task performance. It's surely the same transform, but the IO and cryptography parts certainly play a bigger role in verification tasks.)

So far it looks like a "fraction done" reporting can only be had if prst is running in boinc mode, which would require at least a minimal boinc client derivative which sets up a shared memory interface to the task and whatnot.

One idea which I haven't started to work on yet: Current main tasks are configured to create 64 intermediate proof files. This can also be requested in standalone mode. Maybe a sensible approach would be to run a main task until a desired number of proof files was created, e.g. 4 proof files for an estimated 1/16th of the whole work.
 

Orange Kid

Elite Member
Oct 9, 1999
4,375
2,164
146
So, in case you hadn't noticed yet, PrimeGrid is having a sort of mini-challenge.

"More primorial sieving is required on GFN Server!"

They forgot to sieve a range and just now realized they might need it for the upcoming challenge. :oops:

Also, in case you hadn't noticed yet, I'm doing WUs a little faster than before. I upgraded from a GTX 1060 to an RTX 4070. :D
Too hot right now, had to quit.
Cooling off next week and will be back on it. 🙂
Congrats on the upgrade.
 
  • Like
Reactions: Ken g6

StefanR5R

Elite Member
Dec 10, 2016
5,885
8,747
136
I started looking into this. Unfortunately, I found no way to extract a progress percentage or time remaining when prst runs in standalone mode.
But then I did find a way after all. It may not be overly precise, therefore a sufficient test duration will be required, certainly quite a lot longer than needed with genefer for example. I am trying a modified script just now.
 

StefanR5R

Elite Member
Dec 10, 2016
5,885
8,747
136
First quick run completed. Oops, I forgot that I need to reformat the summary table:
Code:
Summary for Intel(R) Xeon(R) CPU E3-1245 v3, test cutoff: 8 minutes
n  |       b       |    credit    | tasks x threads, affinity |     task duration     | tasks/day | points/day
---+---------------+--------------+---------------------------+-----------------------+-----------+-----------
4651711#-1 |      7,306.04 |     7,306.04 | 1x4, none                 |   5:20:00 =   19200 s |     4.500 |     32,877
4651711#-1 |      7,306.04 |     7,306.04 | 1x8, none                 |   5:20:00 =   19200 s |     4.500 |     32,877
4651711#-1 |      7,306.04 |     7,306.04 | 2x2, none                 |  14:48:53 =   53333 s |     3.240 |     23,671
4651711#-1 |      7,306.04 |     7,306.04 | 2x4, none                 |  14:50:44 =   53444 s |     3.233 |     23,620
This is a 4 cores/ 8 threads Haswell with 8 MB inclusive level 3 cache. The candidate 4651711#-1 was tested with "Montgomery reduction FMA3 FFT length 2x384K".

Running two tasks at once causes throughput on this CPU to plummet. This indicates that the cache footprint of "2x384K" is indeed something like 6 MBytes.

I'll make a nicer table layout, try the first two tests again but with longer test duration for more precision, and then put the script to the usual place.

Update:
Code:
Summary for Intel(R) Xeon(R) CPU E3-1245 v3, test cutoff: 24 minutes
  candidate  |   credit   | tasks x threads, affinity |     task duration     | tasks/day | points/day
-------------+------------+---------------------------+-----------------------+-----------+-----------
  4651711#-1 |   7,306.04 | 1x4, none                 |   4:45:54 =   17154 s |     5.036 |     36,793
  4651711#-1 |   7,306.04 | 1x8, none                 |   4:56:17 =   17777 s |     4.860 |     35,507
On this Haswell, it seems marginally better to leave SMT unused.

Another edit:
Before this, I ran the workunit on this computer also in BOINC. It took 21,550 seconds in "1x4, none" configuration, that is, quite a lot longer than the script estimated. However, the BOINC run was concurrent with quite some other stuff happening, like bloated web browsers and several although comparably short standalone PRST runs. The scripted standalone runs however happened without anything else in parallel, except an X11 session with just a few shell terminals sitting there and nothing much else.
 
Last edited:
  • Like
Reactions: crashtech

waffleironhead

Diamond Member
Aug 10, 2005
6,934
445
136
These runs are a few weeks old, but still might be useful for someone. Wu size has risen quite a bit since i got these numbers.
Cores per wu Time. Wu per day
7940hs
8. 4224. 20.454
4. 6617. 26.114
2. 12014. 28.766

13620h
6. 5923. 14.587
3. 11234. 15.38
2. 15752. 16.45

7730u
8. 7884. 10.95
4. 13749. 12.57
2. 36332. 8.916

5500u
6. 11722. 7.37
3. 47467. 3.64
1. 137125. 3.522

6700
4. 9575. 9.07
2. 33965. 3.08
1 80195. 4.24
 
  • Like
Reactions: Ken g6

StefanR5R

Elite Member
Dec 10, 2016
5,885
8,747
136
Code:
Summary for AMD EPYC 9554P 64-Core Processor, test cutoff: 25 minutes
  candidate  |   credit   | tasks x threads, affinity |     task duration     | tasks/day | points/day
-------------+------------+---------------------------+-----------------------+-----------+-----------
  4651711#-1 |   7,306.04 | 32x2, ascending           |   6:01:05 =   21665 s |       127 |    932,360
  4651711#-1 |   7,306.04 | 32x4, ascending           |   5:27:18 =   19638 s |       140 |  1,028,602
  4651711#-1 |   7,306.04 | 16x4, ascending           |   3:10:36 =   11436 s |       120 |    883,161
  4651711#-1 |   7,306.04 | 16x8, ascending           |   3:06:10 =   11170 s |       123 |    904,195
The PPT limit was set to 400 W. I was present during the first two tests and got this from the power meter "at the wall":
32x2: 930 kPPD / 470 W = 2.0 kPPD/W
32x4: 1,030 kPPD / 505 W = 2.0 kPPD/W