• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Info PrimeGrid Challenges 2024, sieve-free edition

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Just in time for the heat ! 90 here for the next several days, or more ! I shut down 5 boxes. 8 still running. But 2 of the 5 are the 9554's thats a lot of heat gone. And after I test my new 4090, I will shut down another 9554. Back to WCG, Rosetta and F@H ! Waiting for the 9950x to build a replacement for the 7950x I gave my son.
 
DateTime UTCProject(s)Best onChallengeDuration
5​
8-13 August​
08:08:00Factorial
Primorial
CPU?(Tentative) International Cat Day Challenge5 days
"Primorial Sieve on GFN Server" has reached the point after which the "Primorial Prime Search Project" can start on PrimeGrid. The application for the latter project has been installed now and can be selected by users in their project preferences, but workunits have not been loaded onto the server yet. They'll get around to this soon enough though, which means the challenge is going to happen as planned.
 
It'd be nice to get a testing regimen in order for PRST, I don't think the currently available scripts will work without significant modification.
 
My hope is that they will work after insignificant modifications. (Writing from my workplace, and it's not looking like it will get any better during the 17d13h remaining until the start of the challenge.)

Edit, stderr of several results from one of Tony's Ryzen 5000s say "Using Montgomery reduction FMA3 FFT length 2x288K". I guess this tells us that these task want 4.50 MBytes cache each. ... And Ryzen 7000: "Using Montgomery reduction AVX-512 FFT length 2x288K".
 
Last edited:
I won't have my 9950x my then, and its hot here, so I may be out of this unless the team need me.
 
Did you check run times locally, or in the results list on the web site? I suspect the latter's bookkeeping might still sometimes be wrong with multithreaded workunits.
 
So, in case you hadn't noticed yet, PrimeGrid is having a sort of mini-challenge.

"More primorial sieving is required on GFN Server!"

They forgot to sieve a range and just now realized they might need it for the upcoming challenge. 😳

Also, in case you hadn't noticed yet, I'm doing WUs a little faster than before. I upgraded from a GTX 1060 to an RTX 4070. 😀
 
It'd be nice to get a testing regimen in order for PRST, I don't think the currently available scripts will work without significant modification.
My hope is that they will work after insignificant modifications.
Hopes shattered… I started looking into this. Unfortunately, I found no way to extract a progress percentage or time remaining when prst runs in standalone mode. Therefore, one would have to complete an entire workunit in order to measure performance. Which would make testing very time consuming if done on the "main" tasks, or require several input files if done on "verification" tasks. (And who knows how well verification task performance reflects main task performance. It's surely the same transform, but the IO and cryptography parts certainly play a bigger role in verification tasks.)

So far it looks like a "fraction done" reporting can only be had if prst is running in boinc mode, which would require at least a minimal boinc client derivative which sets up a shared memory interface to the task and whatnot.

One idea which I haven't started to work on yet: Current main tasks are configured to create 64 intermediate proof files. This can also be requested in standalone mode. Maybe a sensible approach would be to run a main task until a desired number of proof files was created, e.g. 4 proof files for an estimated 1/16th of the whole work.
 
So, in case you hadn't noticed yet, PrimeGrid is having a sort of mini-challenge.

"More primorial sieving is required on GFN Server!"

They forgot to sieve a range and just now realized they might need it for the upcoming challenge. 😳

Also, in case you hadn't noticed yet, I'm doing WUs a little faster than before. I upgraded from a GTX 1060 to an RTX 4070. 😀
Too hot right now, had to quit.
Cooling off next week and will be back on it. 🙂
Congrats on the upgrade.
 
I started looking into this. Unfortunately, I found no way to extract a progress percentage or time remaining when prst runs in standalone mode.
But then I did find a way after all. It may not be overly precise, therefore a sufficient test duration will be required, certainly quite a lot longer than needed with genefer for example. I am trying a modified script just now.
 
First quick run completed. Oops, I forgot that I need to reformat the summary table:
Code:
Summary for Intel(R) Xeon(R) CPU E3-1245 v3, test cutoff: 8 minutes
n  |       b       |    credit    | tasks x threads, affinity |     task duration     | tasks/day | points/day
---+---------------+--------------+---------------------------+-----------------------+-----------+-----------
4651711#-1 |      7,306.04 |     7,306.04 | 1x4, none                 |   5:20:00 =   19200 s |     4.500 |     32,877
4651711#-1 |      7,306.04 |     7,306.04 | 1x8, none                 |   5:20:00 =   19200 s |     4.500 |     32,877
4651711#-1 |      7,306.04 |     7,306.04 | 2x2, none                 |  14:48:53 =   53333 s |     3.240 |     23,671
4651711#-1 |      7,306.04 |     7,306.04 | 2x4, none                 |  14:50:44 =   53444 s |     3.233 |     23,620
This is a 4 cores/ 8 threads Haswell with 8 MB inclusive level 3 cache. The candidate 4651711#-1 was tested with "Montgomery reduction FMA3 FFT length 2x384K".

Running two tasks at once causes throughput on this CPU to plummet. This indicates that the cache footprint of "2x384K" is indeed something like 6 MBytes.

I'll make a nicer table layout, try the first two tests again but with longer test duration for more precision, and then put the script to the usual place.

Update:
Code:
Summary for Intel(R) Xeon(R) CPU E3-1245 v3, test cutoff: 24 minutes
  candidate  |   credit   | tasks x threads, affinity |     task duration     | tasks/day | points/day
-------------+------------+---------------------------+-----------------------+-----------+-----------
  4651711#-1 |   7,306.04 | 1x4, none                 |   4:45:54 =   17154 s |     5.036 |     36,793
  4651711#-1 |   7,306.04 | 1x8, none                 |   4:56:17 =   17777 s |     4.860 |     35,507
On this Haswell, it seems marginally better to leave SMT unused.

Another edit:
Before this, I ran the workunit on this computer also in BOINC. It took 21,550 seconds in "1x4, none" configuration, that is, quite a lot longer than the script estimated. However, the BOINC run was concurrent with quite some other stuff happening, like bloated web browsers and several although comparably short standalone PRST runs. The scripted standalone runs however happened without anything else in parallel, except an X11 session with just a few shell terminals sitting there and nothing much else.
 
Last edited:
These runs are a few weeks old, but still might be useful for someone. Wu size has risen quite a bit since i got these numbers.
Cores per wu Time. Wu per day
7940hs
8. 4224. 20.454
4. 6617. 26.114
2. 12014. 28.766

13620h
6. 5923. 14.587
3. 11234. 15.38
2. 15752. 16.45

7730u
8. 7884. 10.95
4. 13749. 12.57
2. 36332. 8.916

5500u
6. 11722. 7.37
3. 47467. 3.64
1. 137125. 3.522

6700
4. 9575. 9.07
2. 33965. 3.08
1 80195. 4.24
 
Code:
Summary for AMD EPYC 9554P 64-Core Processor, test cutoff: 25 minutes
  candidate  |   credit   | tasks x threads, affinity |     task duration     | tasks/day | points/day
-------------+------------+---------------------------+-----------------------+-----------+-----------
  4651711#-1 |   7,306.04 | 32x2, ascending           |   6:01:05 =   21665 s |       127 |    932,360
  4651711#-1 |   7,306.04 | 32x4, ascending           |   5:27:18 =   19638 s |       140 |  1,028,602
  4651711#-1 |   7,306.04 | 16x4, ascending           |   3:10:36 =   11436 s |       120 |    883,161
  4651711#-1 |   7,306.04 | 16x8, ascending           |   3:06:10 =   11170 s |       123 |    904,195
The PPT limit was set to 400 W. I was present during the first two tests and got this from the power meter "at the wall":
32x2: 930 kPPD / 470 W = 2.0 kPPD/W
32x4: 1,030 kPPD / 505 W = 2.0 kPPD/W
 
Back
Top