PrimeGrid Challenges 2021

Page 16 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,219
3,799
75
Preliminary final stats:

Rank___Credits____Username
1______51823408___Skillz
6______15085628___Icecold
7______12691648___crashtech
10_____10045948___xii5ku
14_____7809236____Pokey
38_____2746478____Orange Kid
40_____2648787____biodoc
67_____1510885____Skivelitis2
82_____1136299____Lane42
109____814316_____emoga
122____620730_____Ken_g6
141____530332_____Fardringle
344____12977______geecee

Rank__Credits____Team
1_____107476230___TeAm AnandTech
2_____75324865___SETI.Germany
3_____70641441___Antarctic Crunchers
4_____39117076___Czech National Team

Wow, that's a lot of work done! :)

In the challenge of Skillz vs. the rest of the TeAm...the rest of the TeAm won, barely, with 55652822 points. ;)
 

Skillz

Senior member
Feb 14, 2014
911
929
136
Preliminary final stats:

Rank___Credits____Username
1______51823408___Skillz
6______15085628___Icecold
7______12691648___crashtech
10_____10045948___xii5ku
14_____7809236____Pokey
38_____2746478____Orange Kid
40_____2648787____biodoc
67_____1510885____Skivelitis2
82_____1136299____Lane42
109____814316_____emoga
122____620730_____Ken_g6
141____530332_____Fardringle
344____12977______geecee

Rank__Credits____Team
1_____107476230___TeAm AnandTech
2_____75324865___SETI.Germany
3_____70641441___Antarctic Crunchers
4_____39117076___Czech National Team

Wow, that's a lot of work done! :)

In the challenge of Skillz vs. the rest of the TeAm...the rest of the TeAm won, barely, with 55652822 points. ;)

I shut everything down this morning. ;)
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
The next challenge, starting 2½ weeks from now, will be at PSP-LLR. There are some similarities between this project and ESP-LLR, which was the project of the June challenge:
  • Both projects are "conjecture" projects: Their central point is not just about finding prime numbers, they are about finding primes (or not finding primes), in order to prove or disprove something. Particularly, "What is the smallest prime Sierpiński number?" is the big, burning question asked in PrimeGrid's PSP-LLR project.
  • Both are CPU-only projects, using a v9.01(mt) application version from December 2020 or January 2021 respectively.
  • Performance characteristics of the application in these two projects should be quite similar. PSP-LLR's search space is at bigger numbers than ESP-LLR's though. Current average CPU time at ESP-LLR is ≈160 hours, at PSP-LLR it's ≈200 hours.
During the ESP-LLR challenge, one job could still be coerced into the 16 MBytes of cache segments which Zen 2 based CPUs have. PSP-LLR jobs will want more cache, so it remains to be tested if spreading one job across two CCXs will work better. (Actually, ≈1½…2 CCX per job already worked better with the smaller ESP-LLR if you didn't enforce CPU affinity yourself on an operating system without strictly cache-aware task scheduling.) I guess owners of Zen 3 based CPUs will see quite a bit better performance at PSP-LLR then those of us who are still on Zen 2.
 
  • Like
Reactions: crashtech

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
Current search space of PSP-LLR (source), and corresponding FFT sizes on Haswell and similar CPUs:

k in progressn in progressFMA3 FFT lengthFFT data size
79,309​
24,192,254...24,453,182​
2304K​
18 MB​
79,817​
24,051,623...24,453,119​
2304K​
18 MB​
152,267​
23,898,819...24,453,123​
2304K…2400K​
18…18.75 MB​
156,511​
24,146,184...24,452,328​
2400K​
18.75 MB​
222,113​
24,072,701...24,453,213​
2400K…2560K​
18.75…20 MB​
225,931​
24,019,616...24,452,696​
2400K…2560K​
18.75…20 MB​
237,019​
24,166,006...24,451,666​
2560K​
20 MB​

Edit:
By a quick look through results lists of PrimeGrid's top hosts, I found this example result:

k = 222,113 | n = 24,416,597 | credit = 40,937.20 cobblestones
on Skylake-X: all-complex AVX-512 FFT length 2520K (FFT data size 19.7 MB)
on Haswell: all-complex FMA3 FFT length 2560K (FFT data size 20 MB)​

That is, this example workunit is located nearer the upper end of the current search space and is therefore well suited to synthesize some test runs.

Edit 2:
I thought I put a link to the result. Well, here it is: result ID 1246553261
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
I haven't benchmarked anything yet, too busy with work and whatnot. I am extrapolating from my llrESP measurements for the time being. (Though llrPSP will be a more or less different kettle of fish on Zen 2 because of the cache constraints.) And since this is a 10 days long challenge, and since I so much prefer to overtake rather than being overtaken, and with the weekend beeing just ahead with a little more spare time, I am making a slow start again.
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
* Note to self: *
How to view the processor topology of a computer, as seen by the operating system:

NUMA topology:
grep . /sys/devices/system/node/node*/cpulist

HyperThreading/ SMT:
grep . /sys/devices/system/cpu/cpu*/topology/thread_siblings_list

Level-3 cache sharing:
grep . /sys/devices/system/cpu/cpu*/cache/index3/shared_cpu_list
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,219
3,799
75
Day 1 stats:

Rank___Credits____Username
2______1693049____Skillz
4______1326828____crashtech
7______931862_____Icecold
20_____321069_____xii5ku
21_____317091_____biodoc
27_____241564_____Orange Kid
28_____237429_____emoga
112____37197______SlangNRox
130____320________Skivelitis2

Rank__Credits____Team
1_____5137176____Antarctic Crunchers
2_____5106413____TeAm AnandTech
3_____3867709____SETI.Germany
4_____2074086____Czech National Team
5_____1802790____AMD Users

It's shaping up to be a close race. But also realize that certain cloud machines may not report until the very end.
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
How to view the processor topology of a computer, as seen by the operating system:
PS,
on the Linux systems which I tested specifically for this so far, multiple LLR application instances were automatically scheduled in NUMA-aware and HT/SMT-aware fashion, but not aware of cache segments.

The latter issue can be overcome either by enforcing a processor affinity e.g. with the taskset command (tested), or with EPYC BIOS by switching to multiple NUMA nodes per socket in the BIOS (speaking theoretically, I did not test this yet).

Edit, I don't see it as a fault that the Linux process scheduler doesn't apply one or another cache-aware scheduling policy. After all, which policy is best depends on the application. In case of LLR, threads of the same process should be scheduled on physical cores which share the same last-level cache. (But use of "thread siblings" should be avoided.) Yet other applications with less intensive data sharing might benefit from a different policy.

20_____321069_____xii5ku
Uhm what, I'm in the top-20 while using only one computer?

(But as you said, these first 24 hours aren't very well indicative for the race to come.)
 
Last edited:
  • Like
Reactions: crashtech

Skillz

Senior member
Feb 14, 2014
911
929
136
@Markfw

Just letting you know we could use your help. I know you offered it last Prime Grid challenge, but we won that one easily and you didn't need to join us.

This one, on the other hand, we could probably use some help.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,482
14,434
136
@Markfw

Just letting you know we could use your help. I know you offered it last Prime Grid challenge, but we won that one easily and you didn't need to join us.

This one, on the other hand, we could probably use some help.
Its over 100 degrees here ! I can donate my 24 core EPYC (48 threads) but it is HOT here ! How long does this run ? Heat wave over Maybe Sunday , but for sure Monday.

Edit: Its running. It configured itself to 3 16 cpu tasks.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,482
14,434
136
Its doing 17 or bust units. Is that correct for the current challenge ?
 

Icecold

Golden Member
Nov 15, 2004
1,090
1,008
146
Is this correct now ?
PrimeGrid 9.01 Prime Sierpinski Problem (LLR) (mt) llrPSP_380377879_1 00:05:29 (01:15:28) 1373.20 0.270 01d,09:43:45 9/3/2021 6:02:26 PM 16C Running EPYC 7401P
Yes, that is the correct project / task name.

Glad to have you join in on it! Hopefully the weather cooperates and you can fire up some more machines :)
 

crashtech

Lifer
Jan 4, 2013
10,521
2,111
146
Yeah, thanks @Markfw! I have a feeling the regulars over at PrimeGrid are wondering what is this burr in the saddle, TeAm Anandtech! :)
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
During the ESP-LLR challenge, one job could still be coerced into the 16 MBytes of cache segments which Zen 2 based CPUs have. PSP-LLR jobs will want more cache, so it remains to be tested if spreading one job across two CCXs will work better. (Actually, ≈1½…2 CCX per job already worked better with the smaller ESP-LLR if you didn't enforce CPU affinity yourself on an operating system without strictly cache-aware task scheduling.)
The gist of llrESP tests in June and llrPSP tests today on EPYC 7452 (only the tests in which I enforced processor affinity, power efficiency relates to the complete computer power draw "at the wall"):
  • llrESP
    • 1:1 ratio of tasks:CCXs = best throughput and best power efficiency for llrESP
    • 1:2 ratio of tasks:CCXs = -9% throughput, -7% efficiency
  • llrPSP
    • 1:2 ratio of tasks:CCXs = best throughput and best power efficiency for llrPSP
    • 1:1 ratio of tasks:CCXs = -8% throughput, -17% efficiency
The intensive RAM I/O which goes on when llrPSP exhausts the caches in the 1:1 configuration causes a notably higher power draw. This means that the RAM I/O for inter-thread communication across CCXs (llrPSP 1:2 case) is not as costly as the RAM I/O from cache deficit (llrPSP 1:1 case).

— Edit: —
Nonetheless, communication across CCXs is very costly with the LLR application:
  • llrESP
    • 1:1 ratio of tasks:CCXs, tasks pinned to the CCXs = best throughput and best power efficiency for llrESP
    • 1:1 ratio of tasks:CCXs, tasks scheduled randomly be the Linux kernel = -30% throughput, -32% efficiency

I guess owners of Zen 3 based CPUs will see quite a bit better performance at PSP-LLR then those of us who are still on Zen 2.
I heard somewhere that 5950X has got about 1.3 times the llrPSP throughput of 3950X.

Edit: This seems to align well with the above mentioned 30% cost of inter-CCX comms. Though in case of my llrESP tests, these 30% are from random scheduling vs. task pinning, whereas the 30% performance uplift of Zen 3 over Zen 2 in llrPSP is mostly from cache exhaustion on Zen 2. From what I understood, the 5950X/3950X figures which I saw were without task pinning.
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,482
14,434
136
The stats say I have no points for today. Is that due to my config ? (`16 cpus each task, 12 hours remaining on the first batch of 3) Should I change that ?