Info PrimeGrid Challenges 2024, sieve-free edition

TennesseeTony · Jun 19, 2024

Ken g6 said:
... I discovered with my last WU that I could have run them faster, too.

Go on, we are listening. 🙂

Markfw · Jun 19, 2024

Just in time for the heat ! 90 here for the next several days, or more ! I shut down 5 boxes. 8 still running. But 2 of the 5 are the 9554's thats a lot of heat gone. And after I test my new 4090, I will shut down another 9554. Back to WCG, Rosetta and F@H ! Waiting for the 9950x to build a replacement for the 7950x I gave my son.

StefanR5R · Jun 19, 2024

@Ken g6, thanks for the stats, even though I'm not in them… My fault though. ;-)

Congrats TeAm to another victory.

Orange Kid · Jun 19, 2024

Thanks for all the stats😎

Ken g6 · Jun 19, 2024

TennesseeTony said:
Go on, we are listening. 🙂

I tried two configurations with 7 threads at a time (2 or 4 tasks on 28 threads). I didn't try any with 14 threads at a time until that last one.

Kiska · Jun 19, 2024

Ken g6 said:
@Kiska beat me by half a WU. 🙁 I discovered with my last WU that I could have run them faster, too.

Ooops?

StefanR5R · Jul 21, 2024

Ken g6 said:
Date Time UTC Project(s) Best on Challenge Duration
5
8-13 August
08:08:00 Factorial
Primorial CPU? (Tentative) International Cat Day Challenge 5 days

"Primorial Sieve on GFN Server" has reached the point after which the "Primorial Prime Search Project" can start on PrimeGrid. The application for the latter project has been installed now and can be selected by users in their project preferences, but workunits have not been loaded onto the server yet. They'll get around to this soon enough though, which means the challenge is going to happen as planned.

mmonnin03 · Jul 21, 2024

Tasks are available.

crashtech · Jul 21, 2024

It'd be nice to get a testing regimen in order for PRST, I don't think the currently available scripts will work without significant modification.

StefanR5R · Jul 21, 2024

My hope is that they will work after insignificant modifications. (Writing from my workplace, and it's not looking like it will get any better during the 17d13h remaining until the start of the challenge.)

Edit, stderr of several results from one of Tony's Ryzen 5000s say "Using Montgomery reduction FMA3 FFT length 2x288K". I guess this tells us that these task want 4.50 MBytes cache each. ... And Ryzen 7000: "Using Montgomery reduction AVX-512 FFT length 2x288K".

TennesseeTony · Jul 27, 2024

Now: "Using Montgomery reduction FMA3 FFT length 2x384K"

crashtech · Jul 27, 2024

Did the PRST app change since it was released?

Markfw · Jul 27, 2024

I won't have my 9950x my then, and its hot here, so I may be out of this unless the team need me.

StefanR5R · Jul 28, 2024

crashtech said:
Did the PRST app change since it was released?

No, it's still at version 1.00 (mt), installed on July 19.

TennesseeTony said:
Now: "Using Montgomery reduction FMA3 FFT length 2x384K"

…times 8 (for presumably 8 bytes wide coefficients), makes 6.0 MBytes footprint — *if* "2x" means that two sets of 384K coefficients are accessed in parallel, otherwise 3.0 MB.

crashtech · Jul 28, 2024

StefanR5R said:
No, it's still at version 1.00 (mt), installed on July 19...

Interesting, because I got some early results that don't match up with my retesting of the same configuration. Doing all testing with the same work unit would be better, or I need to see if my setup is inconsistent somehow.

StefanR5R · Jul 28, 2024

Did you check run times locally, or in the results list on the web site? I suspect the latter's bookkeeping might still sometimes be wrong with multithreaded workunits.

crashtech · Jul 28, 2024

StefanR5R said:
Did you check run times locally, or in the results list on the web site? I suspect the latter's bookkeeping might still sometimes be wrong with multithreaded workunits.

I didn't know they had a problem, thanks for the heads up!

Ken g6 · Aug 2, 2024

So, in case you hadn't noticed yet, PrimeGrid is having a sort of mini-challenge.

"More primorial sieving is required on GFN Server!"

They forgot to sieve a range and just now realized they might need it for the upcoming challenge. 😳

Also, in case you hadn't noticed yet, I'm doing WUs a little faster than before. I upgraded from a GTX 1060 to an RTX 4070. 😀

StefanR5R · Aug 3, 2024

crashtech said:
It'd be nice to get a testing regimen in order for PRST, I don't think the currently available scripts will work without significant modification.

StefanR5R said:
My hope is that they will work after insignificant modifications.

Hopes shattered… I started looking into this. Unfortunately, I found no way to extract a progress percentage or time remaining when prst runs in standalone mode. Therefore, one would have to complete an entire workunit in order to measure performance. Which would make testing very time consuming if done on the "main" tasks, or require several input files if done on "verification" tasks. (And who knows how well verification task performance reflects main task performance. It's surely the same transform, but the IO and cryptography parts certainly play a bigger role in verification tasks.)

So far it looks like a "fraction done" reporting can only be had if prst is running in boinc mode, which would require at least a minimal boinc client derivative which sets up a shared memory interface to the task and whatnot.

One idea which I haven't started to work on yet: Current main tasks are configured to create 64 intermediate proof files. This can also be requested in standalone mode. Maybe a sensible approach would be to run a main task until a desired number of proof files was created, e.g. 4 proof files for an estimated 1/16th of the whole work.

Orange Kid · Aug 3, 2024

Ken g6 said:
So, in case you hadn't noticed yet, PrimeGrid is having a sort of mini-challenge.

"More primorial sieving is required on GFN Server!"

They forgot to sieve a range and just now realized they might need it for the upcoming challenge. 😳

Also, in case you hadn't noticed yet, I'm doing WUs a little faster than before. I upgraded from a GTX 1060 to an RTX 4070. 😀

Too hot right now, had to quit.
Cooling off next week and will be back on it. 🙂
Congrats on the upgrade.

Markfw · Aug 3, 2024

Orange Kid said:
Too hot right now, had to quit.
Cooling off next week and will be back on it. 🙂
Congrats on the upgrade.

SAme here

StefanR5R · Aug 3, 2024

StefanR5R said:
I started looking into this. Unfortunately, I found no way to extract a progress percentage or time remaining when prst runs in standalone mode.

But then I did find a way after all. It may not be overly precise, therefore a sufficient test duration will be required, certainly quite a lot longer than needed with genefer for example. I am trying a modified script just now.

StefanR5R · Aug 3, 2024

First quick run completed. Oops, I forgot that I need to reformat the summary table:

Code:

Summary for Intel(R) Xeon(R) CPU E3-1245 v3, test cutoff: 8 minutes
n  |       b       |    credit    | tasks x threads, affinity |     task duration     | tasks/day | points/day
---+---------------+--------------+---------------------------+-----------------------+-----------+-----------
4651711#-1 |      7,306.04 |     7,306.04 | 1x4, none                 |   5:20:00 =   19200 s |     4.500 |     32,877
4651711#-1 |      7,306.04 |     7,306.04 | 1x8, none                 |   5:20:00 =   19200 s |     4.500 |     32,877
4651711#-1 |      7,306.04 |     7,306.04 | 2x2, none                 |  14:48:53 =   53333 s |     3.240 |     23,671
4651711#-1 |      7,306.04 |     7,306.04 | 2x4, none                 |  14:50:44 =   53444 s |     3.233 |     23,620

This is a 4 cores/ 8 threads Haswell with 8 MB inclusive level 3 cache. The candidate 4651711#-1 was tested with "Montgomery reduction FMA3 FFT length 2x384K".

Running two tasks at once causes throughput on this CPU to plummet. This indicates that the cache footprint of "2x384K" is indeed something like 6 MBytes.

I'll make a nicer table layout, try the first two tests again but with longer test duration for more precision, and then put the script to the usual place.

Update:

Code:

Summary for Intel(R) Xeon(R) CPU E3-1245 v3, test cutoff: 24 minutes
  candidate  |   credit   | tasks x threads, affinity |     task duration     | tasks/day | points/day
-------------+------------+---------------------------+-----------------------+-----------+-----------
  4651711#-1 |   7,306.04 | 1x4, none                 |   4:45:54 =   17154 s |     5.036 |     36,793
  4651711#-1 |   7,306.04 | 1x8, none                 |   4:56:17 =   17777 s |     4.860 |     35,507

On this Haswell, it seems marginally better to leave SMT unused.

Another edit:
Before this, I ran the workunit on this computer also in BOINC. It took 21,550 seconds in "1x4, none" configuration, that is, quite a lot longer than the script estimated. However, the BOINC run was concurrent with quite some other stuff happening, like bloated web browsers and several although comparably short standalone PRST runs. The scripted standalone runs however happened without anything else in parallel, except an X11 session with just a few shell terminals sitting there and nothing much else.

waffleironhead · Aug 3, 2024

These runs are a few weeks old, but still might be useful for someone. Wu size has risen quite a bit since i got these numbers.
Cores per wu Time. Wu per day
7940hs
8. 4224. 20.454
4. 6617. 26.114
2. 12014. 28.766

13620h
6. 5923. 14.587
3. 11234. 15.38
2. 15752. 16.45

7730u
8. 7884. 10.95
4. 13749. 12.57
2. 36332. 8.916

5500u
6. 11722. 7.37
3. 47467. 3.64
1. 137125. 3.522

6700
4. 9575. 9.07
2. 33965. 3.08
1 80195. 4.24

StefanR5R · Aug 8, 2024

Code:

Summary for AMD EPYC 9554P 64-Core Processor, test cutoff: 25 minutes
  candidate  |   credit   | tasks x threads, affinity |     task duration     | tasks/day | points/day
-------------+------------+---------------------------+-----------------------+-----------+-----------
  4651711#-1 |   7,306.04 | 32x2, ascending           |   6:01:05 =   21665 s |       127 |    932,360
  4651711#-1 |   7,306.04 | 32x4, ascending           |   5:27:18 =   19638 s |       140 |  1,028,602
  4651711#-1 |   7,306.04 | 16x4, ascending           |   3:10:36 =   11436 s |       120 |    883,161
  4651711#-1 |   7,306.04 | 16x8, ascending           |   3:06:10 =   11170 s |       123 |    904,195

The PPT limit was set to 400 W. I was present during the first two tests and got this from the power meter "at the wall":
32x2: 930 kPPD / 470 W = 2.0 kPPD/W
32x4: 1,030 kPPD / 505 W = 2.0 kPPD/W

	Date	Time UTC	Project(s)	Best on	Challenge	Duration
5	8-13 August	08:08:00	Factorial Primorial	CPU?	(Tentative) International Cat Day Challenge	5 days

Info PrimeGrid Challenges 2024, sieve-free edition

Elite Member

Moderator Emeritus, Elite Member

Elite Member

Elite Member

Programming Moderator, Elite Member

Golden Member

Elite Member

Senior member

Lifer

Elite Member

Elite Member

Lifer

Moderator Emeritus, Elite Member

Elite Member

Lifer

Elite Member

Lifer

Programming Moderator, Elite Member

Elite Member

Elite Member

Moderator Emeritus, Elite Member

Elite Member

Elite Member

Diamond Member

Elite Member