• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

Weekly DC Stats - 11APR2020

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

StefanR5R

Elite Member
Dec 10, 2016
4,093
4,528
136
Try the following, if you haven't yet:
In FAHControl, go to Configure -> Expert, and there add an option with name client-type and value advanced, then save.
Ok, will do, what does that do?
When scientists release batches of work, it gets fed into one of several separate queues:

First, there are different queues for different FahCore types: Currently that's the CPU cores A7 and A8, and the GPU core 22. You can see work availability per each core at https://apps.foldingathome.org/serverstats.

Second, there are presumably three queues per each core: The normal queue, the "advanced" queue, and the "beta" queue. A given new work batch is going into only one of these queues — presumably based on how stable the admins consider the work batch. There is no distinction of these queues at serverstats. (Edit: There is a distinction of the beta queue, but no distinction between normal and advanced queue.) But the work release announcements at foldingforum often, but not always, say whether a batch was sent to "ADVANCED" or short "ADV". Perhaps "released to FAH" means released to the normal queue. I don't know how beta releases are handled, could be in a separate forum which his hidden from visitors without login.

By default, a client receives work only from the normal queue. An "advanced" client receives work from the normal and the advanced queue. A "beta" client receives work from the beta, advanced, and normal queue.

Users are asked to set clients to "beta" only if they actively participate in the beta testing program. This requires, among else, a foldingforum account, and readiness to submit proper error reports for troubleshooting. In contrast, I am not aware of any requirement on users who set clients to "advanced".

This is all merely hearsay, plus my interpretations and assumptions. I am not aware of proper official documentation of all this anywhere at the F@H web site.

Edit: Summary –
Setting the special option "client-type" = "advanced" should increase the chances of the F@h client to receive work during times when its work requests are not fulfilled by the F@h work servers. The background is that scientists sometimes release new work batches exclusively at the "advanced" level, instead of making it available to the normal level.
 
Last edited:

Assimilator1

Elite Member
Nov 4, 1999
23,830
331
126
Re your 1st point, roger that, I posted that link earlier ;).
My client ran out of work, yet the servers had plenty of tasks, on the 2 times it stalled (that I saw) their were over 200k GPU tasks available. So, I don't get it.

Anyway, it's working now, and I've setup advanced client.
Thanks for the info :).
 
  • Like
Reactions: voodoo5_6k

voodoo5_6k

Member
Jan 14, 2021
140
157
86
After running WCG almost exclusively this week (TeAm challenge), I'll get back to Rosetta for the next few weeks, I guess.
I've got a lot of bad WUs in Rosetta lately. Annoying. My rig is now back at WCG again, after a little more than 2 days at Rosetta. No more "wasted" runtime. Also, the TeAm needs more points in WCG anyhow ;)
 

StefanR5R

Elite Member
Dec 10, 2016
4,093
4,528
136
I've got a lot of bad WUs in Rosetta lately.
Is it a common problem, or just on your host? *If* the latter, WCG MIP is using basically the same application as Rosetta@home. Should be worthwhile to check whether or not failures occur with MIP too. If so, then bad RAM (or bad RAM settings) is one likely candidate for the cause. Both Rosetta@home and WCG MIP lean towards the RAM access heavy side, compared with most other DC applications. (Some are heavier in this regard, e.g. some of the PrimeGrid subprojects.)
 
  • Like
Reactions: voodoo5_6k

voodoo5_6k

Member
Jan 14, 2021
140
157
86
Is it a common problem, or just on your host? *If* the latter, WCG MIP is using basically the same application as Rosetta@home. Should be worthwhile to check whether or not failures occur with MIP too. If so, then bad RAM (or bad RAM settings) is one likely candidate for the cause. Both Rosetta@home and WCG MIP lean towards the RAM access heavy side, compared with most other DC applications. (Some are heavier in this regard, e.g. some of the PrimeGrid subprojects.)
Since the switch to Linux, this host has returned over 4,150 results for WCG without a single error, ~600 of those were MIP (also no failures during the brief Windows intermezzo). For the moment, I'd lean more towards the "common problem" option, as I've seen others (who share their device statistics on the Rosetta@home site) also having an increased failure rate lately.

Edit: I see a failures with the miniprotein_relax8_... WUs and pre_helical_bundles... WUs (these fail at various times, computation error 11), and with the ...abinitio_1_abinitio... WUs (these usually fail within 2 seconds, computation error 1)
 
Last edited:

crashtech

Diamond Member
Jan 4, 2013
9,781
1,601
126
I've also had Rosetta errors across many AMD and Intel hosts, Windows and Linux.
 

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
21,404
9,466
136
I've also had Rosetta errors across many AMD and Intel hosts, Windows and Linux.
I have about 150 between my 4 EPYC boxes, all ECC ram, all stock settings. All 4 boxes have a bunch. All running linux.

the dual 7601 RETAIL CPUs box i the worst. 7 WCG errors after 4 hours (about) and over 100 errors either 0 or 2 seconds in for Rosetta. The 3 7742's only have about 10 each, from 7 to 15 seconds runtime.
 
Last edited:

crashtech

Diamond Member
Jan 4, 2013
9,781
1,601
126
I have about 150 between my 4 EPYC boxes, all ECC ram, all stock settings. All 4 boxes have a bunch. All running linux.
It seems as if high error rates are the price of running Rosetta at the moment.
 

voodoo5_6k

Member
Jan 14, 2021
140
157
86
Setting the special option "client-type" = "advanced" should increase the chances of the F@h client to receive work during times when its work requests are not fulfilled by the F@h work servers. The background is that scientists sometimes release new work batches exclusively at the "advanced" level, instead of making it available to the normal level.
Just a quick heads-up. Although I did not have any problems with F@h WU assignment, I thought, I'd check this out. In the past, standard client-type worked fine for me. However, since changing this to advanced, my GPU picks up a constant flow of 180k WUs. It's just been two days, but the PPD uplift is there. I'd estimate it at 100k - 150k currently. The GPU is now running almost exactly at 1.5 million PPD (before that somewhere between 1.25 and 1.35 million PPD). Great tip- thank you :)

F@h.PNG

@Assimilator1: How's this working for your GPU? Is it getting WUs now constantly or at least in a more regular fashion?
 
  • Like
Reactions: Assimilator1

Assimilator1

Elite Member
Nov 4, 1999
23,830
331
126
I've not seen it stall, but looking back through the logs the last stall I had was from ~4am to 9am on the 16th (sometime after I added in adv client).
So it's still having problems getting WUs, although from about the 12th it was mostly only stalling for about 10mins.


03:54:55:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
03:54:55:WU01:FS01:Connecting to 18.218.241.186:80
03:54:56:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
03:54:56:ERROR:WU01:FS01:Exception: Could not get an assignment
 
Last edited:
  • Like
Reactions: voodoo5_6k

voodoo5_6k

Member
Jan 14, 2021
140
157
86
Not nice. I can remember also having periods like that (but not lately), because the client will increase the server contact frequency step-by-step to at some point hours between checks. If this happens at daytime, OK, I'll catch this fast enough most times (and at least get back to minutes between checks by pausing and then unpausing the slot) . But when this happens during the night (like in your example), then I won't notice until morning.
 

ASK THE COMMUNITY