Info PrimeGrid Challenges 2024, sieve-free edition

Page 12 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,476
15,588
136
OK, I only have one 9554 running, but it did one unit in 17 hours, but the next 8 have been running for 20 hours and not done yet. These are LONG running units ! almost 24 hours it looks like ! (8C config) Should I try 16 C tasks ? Only a little over 2 days to figure this out.

1731287099559.png
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,476
15,588
136
well, crap. Somebody showed me how to hide them, now I can't find is again to unhide !

Help !
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,476
15,588
136
@emoga Thanks ! done ( I think) let me know !


Edit : Not the logical place to put it when you have multiple profiles, you don't expect one to be different from all the rest.
 

StefanR5R

Elite Member
Dec 10, 2016
6,125
9,254
136
OK, I only have one 9554 running, but it did one unit in 17 hours, but the next 8 have been running for 20 hours and not done yet. These are LONG running units ! almost 24 hours it looks like ! (8C config) Should I try 16 C tasks ? Only a little over 2 days to figure this out.
The "edit PrimeGrid preferences" web page says:

Recent average CPU time: 215:48:55
FFT sizes: 3200K to 3840K (uses up to 30720K cache per task)
This project has a 35% long job credit bonus and a 10% conjecture credit bonus.​

Recent average 216 CPU-hours / 8 CPUs = 27 hours --> As expected, your 9554 ES completes them at above-average speed.

Up to 30 MBytes "cache per task" means that each of the eight 8c/16t CCDs of the 9554 can run 1 task at once, as each CCD = CCX has got 32 MB level 3 cache. It is crucial for performance that each task gets individually bound to the logical CPUs of one CCD exclusively for this task. Otherwise, a lot of time- and energy-wasting cross-CCX and memory traffic would occur.

25...30 MBytes cache per task is a hint that the current workunit sizes are quite variable. Therefore, to answer the question how many logical CPUs of each CCD should be used (8 or 16 threads per task), one would either have to complete several workunits and note both the task duration _and_ the credit per task, then evaluate points per day for 8 vs. 16 threads. Or one would have to test "offline" (outside BOINC) with a single fixed workunit, which would allow for much quicker testing based on completion rate, rather than based on total duration and credits. I'll take a look at home later today whether I have notes from similar tests, or perhaps run some offline tests on my 9554 myself.

An alternative to 8 tasks at once, 1 CCX/task would be to run only 4 tasks at once, 2 CCXs/task. Then you would obviously spend twice as many cores per task and reduce the task durations that way. But this would incur the cost of data traffic across two CCXs for each task. Therefore, and due to increased program overhead, this would not yield twice the speed per task. In other words, this would sacrifice throughput. It could be a viable option though if some CCXs would otherwise be idle during the last (maybe) 14+ hours (or something like that) of the challenge.
 

StefanR5R

Elite Member
Dec 10, 2016
6,125
9,254
136
The projects of the previous three challenges used the PRST program, but AFAICT llrPSP is still on LLR2. Hopefully the PrimeGrid admins don't do a last-minute stunt and update the application before the challenge... :-)
 

StefanR5R

Elite Member
Dec 10, 2016
6,125
9,254
136
I couldn't find a validated PSP result from a 3840K workunit. So I took a 3456K workunit from the results table of Pavel Atnashev's computer cluster instead. (That's 27.0 MB cache footprint of FFT coefficients.) It's the WU with the largest credit on this host when I looked about two hours ago. I ran this WU for 20 minutes per test and extrapolated total duration from the progress made until then.

workunit: 222113*2^34206293+1 for 82,165.65 credits
software: SuSE Linux, display-manager shut down, sllr2_1.3.0_linux64_220821
hardware: EPYC 9554P (Zen 4 Genoa 64c/128t), cTDP = PPT = 400 W, 12 channels of DDR5-4800

testaffinityavg. durationavg. tasks/dayavg. PPDavg. core clockhost powerpower efficiency
8×8none (random scheduling by Linux)35:49:20 (128960 s)
5.4​
0.440 M​
3.60 GHz​
370 W​
1.19 kPPD/W​
8×81 task : 1 CCX, only lower SMT threads12:52:37 (46357 s)
14.9​
1.225 M​
3.34 GHz​
485 W​
2.53 kPPD/W​
8×161 task : 1 CCX, all SMT threads13:02:32 (46952 s)
14.7​
1.210 M​
3.05 GHz​
500 W​
2.42 kPPD/W​
4×161 task : 2 CCXs, only lower SMT threads8:35:14 (30914 s)
11.1​
0.919 M​
3.60 GHz​
480 W​
1.91 kPPD/W​
4×321 task : 2 CCXs, all SMT threads8:39:42 (31182 s)
11.0​
0.911 M​
3.18 GHz​
490 W​
1.86 kPPD/W​

Conclusions for this particular host:
  • CPU affinity is a must.¹ Surprise! ;-)
  • SMT doesn't help.
  • As I said, running tasks on 2 CCXs instead of on 1 CCX reduces throughput quite a bit, but may be useful towards the very end of the challenge if you don't mind doing some micro-management.
  • If you see one configuration cause the CPU running at higher clock than another one, then the reason may be that the CPU is simply waiting more often for memory accesses in the former config, that is, it's just twiddling its thumbs very fast instead of calculating fast.

@Markfw, since you are getting way more than 20 hours, either CPU affinities are not applied (or not correctly applied), or your ES has a rather restrictive frequency cap. I don't know what PPT limit your ES has, maybe it is 360 W like the production 9554's default, which would still be 90% of my increased PPT limit, so that's perhaps not the reason for your worse task durations.

Edit: Also, right now you have 13 valid proof tasks on your host. Their points-per-seconds are 1.57, 0.53, 1.18, 0.95, 1.40, 0.83, 1.50, 0.75, 0.72, 0.72, 1.54, 0.85, 0.84. That's very high variability, which hints at lack of or suboptimal CPU affinities.

________
¹) On a Windows host, the performance drop from 8×8 with affinity to 8×8 witout affinity may be even worse than on Linux (due to dumber scheduler and more stuff going on in the background).
 
Last edited:
  • Like
Reactions: crashtech

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,476
15,588
136
So I turned it on, and it will stay on for the competition. But I will have no new tasks set shortly to be ready for the competition. So igmore all my previous run times, and than a lot @StefanR5R , @emoga
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,476
15,588
136
Well, stupid me, glad it's back on, now 14.5 hours instead of 27. no other changes.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,476
15,588
136
So, after lasso, a fresh set of tasks, is 17 hours looking good ? I am ready for now. 7 7950x, a 9950x, and 4 Genoa 64 core and one 96 core Genoa, all ready for 11 PM tonight including lasso running and configured using llr2_ for the pattern. In a couple of days, I should have 128 cores of Turin going !
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,476
15,588
136
3rd place and only the 9950x has dumped ??
by noon it will be a very different picture.

Looks like 21 more of the big units by noon PST

40 more by 13 hours after that.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,476
15,588
136
There will be computers in this race which take more than a day for these tasks.
Thats why I will need a big head start. Made number one. For how long I don't know. Turin motherboard on the truck. Due in the next 3 hours. That may save my bacon.

AC is up to 80F and its 50F outside ! as much as I can, the house is open (its a big windy storm outside)

Edit: actually its due in the next hour.Well, the 7742 with 8C units will take almost 2 days (about) Once the Turin is up, and it has finished one set of tasks, I will shut it off, so I know what you mean about slow computers. And yeas that is with pinning

PrimeGrid 9.03 Prime Sierpinski Problem (LLR) (mt) llrPSP_590816579_0 11:54:00 (03d,20:02:01) 96.67 21.710 01d,18:52:22 20d,12:05:49 8C Running 7742 dual Titan V

1731524400863.png
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,476
15,588
136
Damn Fedex ! First its supposed to be between 9:30 and 11:30. Well as soon as they miss that, now it "before 10 PM". At least the case is here and the PSU is in it all ready,
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,438
4,270
75
Day 1 stats:

Rank___Credits____Username
1______6363622____markfw
3______4351398____w a h
5______2766489____Icecold
9______1393430____cellarnoise2
16_____805537_____crashtech
37_____343604_____ChelseaOilman
60_____165784_____mmonnin
66_____93579______Ken_g6

Rank__Credits____Team
1_____16283446___TeAm AnandTech
2_____4762054____SETI.Germany
3_____4496571____Czech National Team
4_____4484338____Romania
Our team is not TWICE the points of 2nd place !
Not quite four times now! If Mark was a team he'd be first (or second to the rest of us?); if @emoga was a team he'd be a strong 5th.