A personal paradigm shift... Or from "DC on a gaming rig" to "gaming on a DC rig"

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

voodoo5_6k

Senior member
Jan 14, 2021
395
443
116
No, I haven't. But interesting idea. Last time I did CPU folding was still with the old 16 thread Xeon and of course rather pointless. It was a mere "all-in for fun" measure during the last folding race we had here ;)

Currently, the system's still emptying its WCG queue and work buffer (I set the "no new work" flag far too late). I was about to switch it over to Rosetta, but this test you're suggesting sounds really interesting. I'll give it a try and see how it works!

Side note: When I installed the FAHClient on Debian, it autoconfigured the CPU slot to 79 whereas on Windows it had made it 31 or 32 (don't remember exactly). And I'm talking about the "whoohoo-we-support-server-cpus-and-up-to-4-sockets" edition... What a bust... :D Well, maybe it'll work OK when I have a gaming session sometime again ;)
 
  • Like
Reactions: Assimilator1

voodoo5_6k

Senior member
Jan 14, 2021
395
443
116
Still waiting for the WCG queue to empty. 13 tasks and ~1.5h left. Then I'll try to test F@H with the standard config it self-applied (CPU slot 79, which leaves 1 thread for the GPU). After a few units, I'll pause the GPU slot and change the CPU slot to 80.
 

voodoo5_6k

Senior member
Jan 14, 2021
395
443
116
The first WU assigned to the 79 thread CPU slot threw the following error:

07:50:20:WU01:FS00:0xa8:ERROR:79 OpenMP threads were requested. Since the non-bonded force buffer reduction is prohibitively slow with more than 64 threads, we do not allow this. Use 64 or less OpenMP threads.

It seemed to run, but very slow. Also it gave a watchdog triggered, requesting soft shutdown error after a few minutes. I paused the slot, and changed it to 64 threads. After letting the slot resume its work, it quickly recovered (restarted the WU) and oscillated between 27 and 30 seconds TPF. PPD for the slot was estimated at roughly 850,000. The actual WU finished in a little more than 40 minutes.

After that, I changed the slot back to 79 threads. It got a WU assigned, but it threw the same error as above. Pausing, reconfiguring for 64 threads, unpausing got it back to 28s to 30s TPF for this WU, PPD estimate (for the slot) currently 810,000.

Both WUs are project 16815.

@StefanR5R: Do you know whether the 64 thread implication is limited to that project? Or is this a general thing now? If so, going to 80 threads wouldn't seem to add any value currently, or do you think I have missed something in my config?

Anyhow, an 800k +x CPU slot is not too bad, that's roughly equivalent to my current GPU in the pre-CUDA days. And with the current config, there's still 16 threads left (or a full 8-core CPU)...
 
  • Like
Reactions: Assimilator1

mmonnin03

Senior member
Nov 7, 2006
214
213
116
79 is a prime so its hard to evenly split up the task into even segments. Try some other core count #s like 80 or 76. You may have to run a 64 thread task plus a 16 thread task.
 
  • Like
Reactions: voodoo5_6k

StefanR5R

Elite Member
Dec 10, 2016
5,517
7,828
136
@voodoo5_6k, there were definitely times during which I had 128-threaded and 88-threaded slots running. The latter sometimes encountered startup failures due to inability to decompose the computational domain to 88 threads. (88 is factored into 2^3 * 11, and it evidently stumbled over this 11 sometimes.)

Now that you bring it up, I also remember that I eventually got this error message which you quoted. The fact that it began to run, but very slowly, was because it dumbed itself down to a low threadcount instead of giving up entirely. Might even have gone into a single-threaded fallback mode. This 64-thread limit is either something new in a recent FahCore revision, or it is a property of some workunit batches. I remember this limit only from our 2020 December Folding race, not from earlier races.

The 64-thread limit was a nuisance, because one 128-thread slot (when it worked) gave considerably higher PPD than two 64-thread slots combined.

On an 80-thread computer without GPU, one 80-thread slot would be ideal --- but obviously it is not supported by the work which F@h submits right now. From what I remember of my 88-threaded computers, next best for total PPD for an 80-threaded computer would be two uneven slots: One 64-threaded, and one 16-threaded slot. I believe to remember that this gave higher PPD than two slots of same size, i.e. 40+40 on your computer. (With a GPU in such a system, set aside as many threads as necessary to keep the GPU working at its near peak F@h performance. This should be figured out individually for a given host; but 2 threads set aside might already be sufficient.)
 
Last edited:
  • Like
Reactions: voodoo5_6k

voodoo5_6k

Senior member
Jan 14, 2021
395
443
116
@StefanR5R: Thanks for all the information! When you mentioned the folding race, I also recalled that issue. Skillz had even brought the 128 thread CPU slot question over to their forum: Folding Forum • View topic - 128 threads and low PPD

But nothing ever materialized. Seems to be an a8 limitation, currently. So, your suggestion with the asymmetric slots sounds about to be the best approach. I'm about to finish the WUs I had in a symmetrical test configuration (39 threads each, 2 threads for GPU, combined PPD estimate was about 2.3 million, only slightly higher than the sole 64 thread CPU slot + GPU). As soon as the slots are finished, I'll reconfigure to 64/14(/2), and give that a try.
 

StefanR5R

Elite Member
Dec 10, 2016
5,517
7,828
136
Thanks for the refresher. :-) Weird how much I have already forgotten about this. :-O

Apropos that folding forum thread, about its side discussion on "dumping" of workunits which then are in limbo until timeout: In the December race, I did not dump A8 workunits. Rather, I made them fail with an actual error which then got reported to the workserver immediately. (At least in theory the workservers could then give this work to other hosts immediately.) This way, my computers remained busy with 88-/128-threaded A7 work for almost the entire time during which I had them folding. According to my stats back then, quick return bonus for A7 work was still given, i.e. the portion of error'ed A8 work was low enough or perhaps even irrelevant in this regard.

(Based on F@h's statements that credits including QRB reflect what the accomplished work is worth to the project owners, my allowing only 88-/128-thread capable A7 work on the computers at my disposal must have been the scientifically optimal use of these computers. What I cannot be sure about though is whether or not the aborted+reported A8 work was indeed requeued quickly; this depends on the project's work scheduler.)
 
Last edited:
  • Like
Reactions: voodoo5_6k

voodoo5_6k

Senior member
Jan 14, 2021
395
443
116
I had a few WUs running yesterday using several CPU slot configurations. 1x 64 threads plus 2 for GPU, the rest idle; 2x 39 threads plus 2 for GPU; 1x 64 threads, 1x 14 plus 2 for GPU. I didn't achieve really good (i.e. comparable) results as the WUs differed too much and the testing period being rather short. Anyhow, the testing pushed me very close to 1.6 million points for yesterday and I learned a few interesting things. But for now, the CPUs are back on Rosetta. I'll look into this again, when we have the next folding race, with a larger sample size.
 

voodoo5_6k

Senior member
Jan 14, 2021
395
443
116
Almost forgot to add one final thing (well, as final as a PC can ever be ;))...

I haven't found a really good solution to monitor everything per software. I settled for monitoring the CPU/MB sensors per IPMI, and the cooling system separately. I have installed an aquaero 6 XT and configured it according to my needs. Now I can see all cooling system information by simply looking over to case (the top drive bay is high enough so I can see it when looking over the edge of my desk). As it is fully independent of the OS, I don't lose any functionality when running the system under Debian (=99.9% of time :D). It shows me individual fan speeds, temperatures from connected sensors, and flow rate. And I have configured a few warnings and alerts (e.g. too little flow rate, fan failure, exceeding temperature thresholds etc.). For the little I'm doing with it, it is actually wasted potential, but it is the best solution I could find to monitor all those proprietary sensors and also use it independent of the OS.

Also, I think it looks good ;)

DSC01048.JPG DSC01049.JPG