Weekly DC Stats - 11APR2020

StefanR5R · Apr 14, 2021

StefanR5R said:
Try the following, if you haven't yet:
In FAHControl, go to Configure -> Expert, and there add an option with name client-type and value advanced, then save.

Assimilator1 said:
Ok, will do, what does that do?

When scientists release batches of work, it gets fed into one of several separate queues:

First, there are different queues for different FahCore types: Currently that's the CPU cores A7 and A8, and the GPU core 22. You can see work availability per each core at https://apps.foldingathome.org/serverstats.

Second, there are presumably three queues per each core: The normal queue, the "advanced" queue, and the "beta" queue. A given new work batch is going into only one of these queues — presumably based on how stable the admins consider the work batch. There is no distinction of these queues at serverstats. (Edit: There is a distinction of the beta queue, but no distinction between normal and advanced queue.) But the work release announcements at foldingforum often, but not always, say whether a batch was sent to "ADVANCED" or short "ADV". Perhaps "released to FAH" means released to the normal queue. I don't know how beta releases are handled, could be in a separate forum which his hidden from visitors without login.

By default, a client receives work only from the normal queue. An "advanced" client receives work from the normal and the advanced queue. A "beta" client receives work from the beta, advanced, and normal queue.

Users are asked to set clients to "beta" only if they actively participate in the beta testing program. This requires, among else, a foldingforum account, and readiness to submit proper error reports for troubleshooting. In contrast, I am not aware of any requirement on users who set clients to "advanced".

This is all merely hearsay, plus my interpretations and assumptions. I am not aware of proper official documentation of all this anywhere at the F@H web site.

Edit: Summary –
Setting the special option "client-type" = "advanced" should increase the chances of the F@h client to receive work during times when its work requests are not fulfilled by the F@h work servers. The background is that scientists sometimes release new work batches exclusively at the "advanced" level, instead of making it available to the normal level.

Assimilator1 · Apr 14, 2021

Re your 1st point, roger that, I posted that link earlier 😉.
My client ran out of work, yet the servers had plenty of tasks, on the 2 times it stalled (that I saw) their were over 200k GPU tasks available. So, I don't get it.

Anyway, it's working now, and I've setup advanced client.
Thanks for the info 🙂.

voodoo5_6k · Apr 15, 2021

voodoo5_6k said:
After running WCG almost exclusively this week (TeAm challenge), I'll get back to Rosetta for the next few weeks, I guess.

I've got a lot of bad WUs in Rosetta lately. Annoying. My rig is now back at WCG again, after a little more than 2 days at Rosetta. No more "wasted" runtime. Also, the TeAm needs more points in WCG anyhow 😉

StefanR5R · Apr 15, 2021

voodoo5_6k said:
I've got a lot of bad WUs in Rosetta lately.

Is it a common problem, or just on your host? *If* the latter, WCG MIP is using basically the same application as Rosetta@home. Should be worthwhile to check whether or not failures occur with MIP too. If so, then bad RAM (or bad RAM settings) is one likely candidate for the cause. Both Rosetta@home and WCG MIP lean towards the RAM access heavy side, compared with most other DC applications. (Some are heavier in this regard, e.g. some of the PrimeGrid subprojects.)

voodoo5_6k · Apr 15, 2021

StefanR5R said:
Is it a common problem, or just on your host? *If* the latter, WCG MIP is using basically the same application as Rosetta@home. Should be worthwhile to check whether or not failures occur with MIP too. If so, then bad RAM (or bad RAM settings) is one likely candidate for the cause. Both Rosetta@home and WCG MIP lean towards the RAM access heavy side, compared with most other DC applications. (Some are heavier in this regard, e.g. some of the PrimeGrid subprojects.)

Since the switch to Linux, this host has returned over 4,150 results for WCG without a single error, ~600 of those were MIP (also no failures during the brief Windows intermezzo). For the moment, I'd lean more towards the "common problem" option, as I've seen others (who share their device statistics on the Rosetta@home site) also having an increased failure rate lately.

Edit: I see a failures with the miniprotein_relax8_... WUs and pre_helical_bundles... WUs (these fail at various times, computation error 11), and with the ...abinitio_1_abinitio... WUs (these usually fail within 2 seconds, computation error 1)

Assimilator1 · Apr 15, 2021

Now you mention it, I see I've had 22 WUs that have errored.

crashtech · Apr 15, 2021

I've also had Rosetta errors across many AMD and Intel hosts, Windows and Linux.

Markfw · Apr 15, 2021

crashtech said:
I've also had Rosetta errors across many AMD and Intel hosts, Windows and Linux.

I have about 150 between my 4 EPYC boxes, all ECC ram, all stock settings. All 4 boxes have a bunch. All running linux.

the dual 7601 RETAIL CPUs box i the worst. 7 WCG errors after 4 hours (about) and over 100 errors either 0 or 2 seconds in for Rosetta. The 3 7742's only have about 10 each, from 7 to 15 seconds runtime.

crashtech · Apr 15, 2021

Markfw said:
I have about 150 between my 4 EPYC boxes, all ECC ram, all stock settings. All 4 boxes have a bunch. All running linux.

It seems as if high error rates are the price of running Rosetta at the moment.

voodoo5_6k · Apr 16, 2021

crashtech said:
It seems as if high error rates are the price of running Rosetta at the moment.

Indeed.

Thanks for all the feedback everybody. Good to know, I'm really not the only one seeing this.

voodoo5_6k · Apr 17, 2021

StefanR5R said:
Setting the special option "client-type" = "advanced" should increase the chances of the F@h client to receive work during times when its work requests are not fulfilled by the F@h work servers. The background is that scientists sometimes release new work batches exclusively at the "advanced" level, instead of making it available to the normal level.

Just a quick heads-up. Although I did not have any problems with F@h WU assignment, I thought, I'd check this out. In the past, standard client-type worked fine for me. However, since changing this to advanced, my GPU picks up a constant flow of 180k WUs. It's just been two days, but the PPD uplift is there. I'd estimate it at 100k - 150k currently. The GPU is now running almost exactly at 1.5 million PPD (before that somewhere between 1.25 and 1.35 million PPD). Great tip- thank you 🙂

@Assimilator1: How's this working for your GPU? Is it getting WUs now constantly or at least in a more regular fashion?

Assimilator1 · Apr 18, 2021

I've not seen it stall, but looking back through the logs the last stall I had was from ~4am to 9am on the 16th (sometime after I added in adv client).
So it's still having problems getting WUs, although from about the 12th it was mostly only stalling for about 10mins.

03:54:55:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
03:54:55:WU01:FS01:Connecting to 18.218.241.186:80
03:54:56:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
03:54:56:ERROR:WU01:FS01:Exception: Could not get an assignment

voodoo5_6k · Apr 19, 2021

Not nice. I can remember also having periods like that (but not lately), because the client will increase the server contact frequency step-by-step to at some point hours between checks. If this happens at daytime, OK, I'll catch this fast enough most times (and at least get back to minutes between checks by pausing and then unpausing the slot) . But when this happens during the night (like in your example), then I won't notice until morning.

Weekly DC Stats - 11APR2020

StefanR5R

Elite Member

Assimilator1

Elite Member

voodoo5_6k

Senior member

StefanR5R

Elite Member

voodoo5_6k

Senior member

Assimilator1

Elite Member

crashtech

Lifer

Markfw

Moderator Emeritus, Elite Member

crashtech

Lifer

voodoo5_6k

Senior member

voodoo5_6k

Senior member

Assimilator1

Elite Member

voodoo5_6k

Senior member

TRENDING THREADS