PrimeGrid Challenges 2022

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
@Markfw, my dual-7452 with PPT and TDP at 180 W, "ACPI SRAT L3 Cache As NUMA Domain" = "Enabled", run 16 tasks at once per computer ( = 8 tasks at once per socket), 4 threads per task (i.e. I only use half of the logical CPUs to save a few Watts), and average duration of the latest >30 completed tasks are:

Main tasks, 3*2^n+1 form, >9,100 credits — 7.4 h
Main tasks, 3*2^n-1 form, >8,700 credits — 7.5 h

Edit,
according to my scripted tests prior to the challenge, the exact same setup except without the NUMA tweak would result in 28% longer runtimes, IOW throughput would be reduced to 78%.

Edit 2,
for reference, a 7452 has got 8 CCXs, each one with 4c/8t and 16 MB L3$.
My 7452 is doing 4.5 hours/task ! I will look at messing with the 7742 later based on the above.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
OK, if I forget the older units, the 2 new units will finish in 6 hours on the 7742. And thats with 8 threads (x8 each) running !
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
OK, if I forget the older units, the 2 new units will finish in 6 hours on the 7742. And thats with 8 threads (x8 each) running !
Forget that.... New ETA is 17.5 hours.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
I took measurements on a dual-7452. These measurements will be representative for a single-7452 as well. (The dual-7452 spends a part of the energy budget for the Infinity Fabric link between the two sockets, but it should not be much because inter-socket traffic is low in this sort of workload. Also, I increased PPT and TDP to 7452's possible maximum of 180 W in the BIOS, default is 155 W.)

I performed the measurements with a fixed workunit, in a scripted testbed. That is, the workloads consisted of multiple tasks from the same workunit running in parallel, and the same workunit being re-used in all test scenarios with different parallel task count and thread count. The consequence is that these tests are very precise, repeatable, and quick.

In contrast, observations of random workunits coming from PrimeGrid are not as conclusive, especially because there are two types of "main tasks" which have not only different durations but also different PPD, as I mentioned in #153.

Which particular tests I ran and what the results were is posted in a private section of the teamanandtech.org forum. However, I spilled the beans in #153 already.

One 7452 has got 8 CCXs, each CCX made up of 4c/8t and 16 MB L3$. Here is how I am configuring them for the challenge:
  • In the BIOS, "Advanced" --> "ACPI Settings", I am switching "ACPI SRAT L3 Cache As NUMA Domain" from "Auto" to "Enabled". (That's how it is labeled in Supermicro's AMI BIOS.)
    The effect of this is that the firmware will present each CCX as a NUMA node to the operating system. A NUMA aware OS like Linux will attempt to keep multithreaded processes within a NUMA node. Declaring a CCX a NUMA node is a bit of a hack as a replacement for cache-aware scheduling. The latter is more problematic than NUMA-aware scheduling, and I am not sure whether cache topology plays a role at all in current Linux scheduler decisions. (There is a related scheduler change in Linux 5.16 which I mentioned elsewhere, but this shouldn't affect all-core loads.)
    In the output of the "lscpu -e" command, the NODE column will show the effect of this change. A dual-7452 system will then have the nodes 0…15 instead of nodes 0…1.​

  • For the first 4+ days of the 5 challenge days, I will run 8 tasks in parallel on each 7452 (that is, 16 tasks at once on a dual-7452 computer) = a 1:1 ratio of # tasks to # CCXs.
    • I choose to let each task have 4 program threads.
    • However, the CPU could of course give 8 hardware threads to each task, since there are 8 threads in each CCX. And indeed, using all threads would increase performance by a small percentage. It would also increase power draw but more than proportionally.
    • So, since EPYC Rome isn't a power hog in the first place, you might prefer to go with 8 threads per task and spend that little bit of extra electric energy.

  • During the last day, if I find the time, I may switch to fewer tasks at once combined with more program threads per task. This will sacrifice throughput but decrease run times. That way, the last hours of the challenge will be better filled out.
________
Edit: There is one thing though which I haven't considered yet at all. It's whether @cellarnoise is to be classified as impish or as admirable.
I have an ASRock Rack and I can not find anything close to that in bios. I guess I have to live with 17.5 hours.....
 

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
@Markfw, my dual-7452 with PPT and TDP at 180 W, "ACPI SRAT L3 Cache As NUMA Domain" = "Enabled", run 16 tasks at once per computer ( = 8 tasks at once per socket), 4 threads per task (i.e. I only use half of the logical CPUs to save a few Watts), and average duration of the latest >30 completed tasks are:
That's with 8 tasks at once.

Main tasks, 3*2^n+1 form, >9,100 credits — 7.4 h — 26.0 tasks/day/socket
Main tasks, 3*2^n-1 form, >8,700 credits — 7.5 h — 25.7 tasks/day/socket


My 7452 is doing 4.5 hours/task !
The stderr.txt says this is with 16-threaded tasks. If boinc had 100% of the CPUs, that would be 4 tasks at once. But in #172 you said it's only 3 tasks at once.

From http://www.primegrid.com/results.php?hostid=1092034:
Main tasks, 3*2^n+1 form, >9,070 credits — 4.5 h – 16.0 tasks/day/socket
Main tasks, 3*2^n-1 form, >8,600 credits — 4.8 h – 15.0 tasks/day/socket

(Tasks per day per socket are calculated for 3 tasks at once. Would be 21.3 t/d/s and 20.0 t/d/s with 4 tasks at once; or actually still fewer because there would be more cache contention, more memory controller contention, and higher power draw with 4 at once compared to 3 at once.)

BTW, it seems you still had quite a few tasks in your work buffer which were downloaded before the challenge started. It would have been better to abort all of these at the beginning of the challenge, then download new work. The results of work downloaded before the challenge do not count.


(ACPI SRAT L3 Cache As NUMA Domain)
I have an ASRock Rack and I can not find anything close to that in bios. I guess I have to live with 17.5 hours.....
Supermicro has this under Advanced/ ACPI Settings, but it could be anywhere, or indeed not present at all. Related to this option, AMD's EPYC Rome workload tuning guide describes another ACPI setting in which the user could modify the "NUMA distance" of the L3$ domains. Supermicro's BIOS does not expose such a setting in the BIOS setup menu. So it's reasonable to assume that other BIOS vendors expose even fewer of AMD's tuning options.

A remedy would then be to pin tasks to dedicated CPUs explicity. But setting this up is a bit involved. (Actually, "manual" task pinning performs even better than the NUMA trick.)
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
That's with 8 tasks at once.

The stderr.txt says this is with 16-threaded tasks. Assuming you gave boinc 100% of the CPUs, that would be 4 tasks at once.

From http://www.primegrid.com/results.php?hostid=1092034:
Main tasks, 3*2^n+1 form, >9,070 credits — 4.5 h
Main tasks, 3*2^n-1 form, >8,600 credits — 4.8 h

BTW, it seems you still had quite a few tasks in your work buffer which were downloaded before the challenge started. It would have been better to abort all of these at the beginning of the challenge, then download new work. The results of work downloaded before the challenge do not count.
The only easy way I have found to do that is remove the project, and re-add. On 9 computers, what a pain. The 7452 is saving 2% for the video card. The 8 tasks at once is what happened after I added your config and rebooted. There is still 1/2 the threads available.
 

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
The 8 tasks at once is what happened after I added your config and rebooted.
This is sufficient in such cases:
sudo service boinc-client restart

(Or also sufficient:
Switch "[ ] Leave non-GPU tasks in memory while suspended" off.
Modify app_config.xml.
Use "Options/ Read config files" in boincmgr.
Use "Activity/ Suspend" in boincmgr.
Use "Activity/ Run based on preferences" in boincmgr.​
But although this does indeed switch the existing tasks to the new app_config parameters, boincmgr will show a wrong CPU count of the existing tasks then. Hence, client restart is probably preferable.)


The 7452 is saving 2% for the video card.
The computers which I have in this challenge are all headless. Except for one Xeon with 3 GPUs, but I left these GPUs idle. It's rare that I use CPUs and GPUs together, it's just too much heat if I start up everything.
 

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
As a reminder, for points in this challenge, the 321-LLR subproject needs to be selected (CPUs only, with multithreading support), work must be downloaded after Monday March 21, 03:21 UTC = today, Sunday March 20, 23:21 EDT, 20:21 PDT, [...]
If you have work in the buffer which was downloaded earlier than that, just abort this work and update the project.
It's too late to bring this up now of course, but it may help in future PrimeGrid Challenges:

One way to do this is with BoincTasks.
  • In "Extras" -> "BoincTasks settings" -> "Tasks", enable "Show column" -> "Received".
  • Go to the Tasks tab.
  • Double click respective tasks rows to expand them.
  • Based on what the "Received" column is showing, select tasks which are too old.
  • Right mouse button -> "Abort". Confirm.
  • Right mouse button -> "Project Update".
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,242
3,829
75
Day 2 stats:

Rank___Credits____Username
4______4135573____xii5ku
7______3027608____crashtech
11_____1991138____Galumpkis
18_____1518768____parsnip soup in a clay
22_____1260548____emoga
28_____952768_____mmonnin
60_____450173_____Orange Kid
64_____431483_____cellarnoise2-TAAT
72_____375134_____Fardringle
87_____285858_____waffleironhead
99_____249905_____markfw
126____169334_____Skivelitis2
205____54815______Ken_g6

Rank__Credits____Team
1_____14903111___TeAm AnandTech
2_____13600501___Ukraine
3_____12740372___Antarctic Crunchers
4_____12527372___Czech National Team

Wow, we've jumped into the lead! :eek:
 
  • Like
Reactions: Orange Kid

mmonnin03

Senior member
Nov 7, 2006
214
213
116
TAAT 1st by 5mil over Ukraine.
The big jump for TAAT today is Galumpkis with 3.1m today. 2.5m more than yesterday. parsnip also with almost 2.5m today and nearly nothing yesterday.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
Had some issues with the 12700F. So I shut it down. Going to blow everything away tomorrow and load win 10 and linux again. It was at 312 watts after a video card change and when I changed back it was still there. So its down until further notice.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
Linux 20.3 has a nice new option. ":reinstall mint 20.3" !! And its going now. 91 watts while installing.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
And its back up, with all cores. Lets see what happens to the time with e-cores enabled. It was 2:20

Its 2:03 now, but the wattage is 313 ! And it can't be avx-512, since the ecores are enabled.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
Aren't there BIOS options for short term and long term power limit? 300+ W are insane, and I bet that the chip is only marginally faster than if it was operating at, let's say, 120 W (which would still be borderline obscene for an 8+4 core desktop CPU).

Apparently, BIOS vendors of DIY mainboards tend to make some very silly choices for their default parameters.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
Aren't there BIOS options for short term and long term power limit? 300+ W are insane, and I bet that the chip is only marginally faster than if it was operating at, let's say, 120 W (which would still be borderline obscene for an 8+4 core desktop CPU).

Apparently, BIOS vendors of DIY mainboards tend to make some very silly choices for their default parameters.
OK, I could not find PL1 and PL2 limits. I found something about turbo performance and disabled it. Then another setting mentioning turbo. I disabled it. I changed video cards, as I could not even see the power or menu icons, to a very tiny video card (no external power cables at all). Now its running P cores only and 78 watts !!!! Question is, now how fast will it run. Current ETA says over 6 hours. Maybe I will reenable those turbo buttons.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
OK, after the 3rd reinstall, its running fast, like 2:03 a unit, but at 270 watts, and now just a little video card that only has a heatsink, and no external power, so thats not using any power. But e-cores are enabled. I may disable those and see what happens to the power usage. I can't find what would be PL1 and PL2 in the bios.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
I disabled the e-cores again.... 252 watts, so 18 for the e-cores, thats low. But why 252, when before I started messing with it, it was 162 and 2:20 ? Not sure the current ETA, have to leave it for a while.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
I gave up on linux, its running windows 11 now and an IGP sized video card with no fan on the heatsink or power cables, fully supported, and all 20 cores doing units almost as fast as 5950x in linux, but using 267 watts ! Not thrilled with alder lake. about 2:15 for a 321 regular unit.

The 7742 is doing 8 at a time in 21-24 hours, but thats like less than 3 hours, vs on a 5950x its 2:10 and the cpu speed is doubled on that, so not bad. (3.91 vs 2.0)
 
Last edited:

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,242
3,829
75
Wow, I missed a day! Day 4 stats:

Rank___Credits____Username
4______9912084____xii5ku
5______9890930____parsnip soup in a clay
9______6330605____crashtech
11_____5442476____Galumpkis
25_____2501109____emoga
30_____1974844____mmonnin
62_____951929_____Orange Kid
69_____804533_____markfw
77_____681710_____Fardringle
81_____640231_____cellarnoise2-TAAT
83_____615298_____waffleironhead
125____371669_____Skivelitis2
222____109762_____Ken_g6
269____58185______kiska

Rank__Credits____Team
1_____40285370___TeAm AnandTech
2_____39030531___Ukraine
3_____30955597___SETI.Germany
4_____26851462___Antarctic Crunchers

Looks like a souper messy fight for 4th place!
 
  • Like
Reactions: Orange Kid

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
Looks like a souper messy fight for 4th place!
Those who can't see blood need to look away now; it's going to be more like this than a veggie meal... boss of SETI.Germany and vaughan of AMD Users apparently want some of that too.
 

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
Kudos to @Icecold for taking 1st individually with monstrous 36 M credits, way ahead of the next two top users (19 M from tng, 15 M from MiHost).

Also, while TeAm AnandTech made it to second in the teams contest with 50 M credits, those of TeAm AnandTech who helped team Ukraine win this one at 52 M credits contributed >44 M to that victory. Congrats to you guys!
 

VirtualLarry

No Lifer
Aug 25, 2001
56,326
10,034
126
I did too although I don't know if my rigs will hold together for another CPU race, they are very unstable CPU mining, sometimes.
 

cellarnoise

Senior member
Mar 22, 2017
711
394
136
Good to read you TinyToesoup!!!

Hope you can join "us" for the Pent!

Bigfoot has never looked better!

Hope the "windings" are going in your direction!!!! We all can't live without alts or positive generator's!!!