Weekly Stats-18FEB2018

Kiska · Feb 19, 2018

StefanR5R said:
Right, because lots of other projects with overcrowded servers would be affected by such a race condition. Though maybe Cosmology@Home has some own file shuffling going on in the background, such that they introduce a race condition of their own.

I think this may be an incorrect assumption, but I'll still say it.
Projects with high volumes of connections/tasks tend to use RAM disks to temporarily store incoming data results. And because C@H use a quorum of 1, the task gets added to the validator queue to be validated immediately. Which is what I am assuming is happening to some of my tasks that get submitted, is that the validator doesn't wait for the file handler to move it from RAM disk to storage. This wouldn't be a problem if the invalid:valid ratio is low. But 35% is way too much

That is the basis I think they are creating their own race condition, in that they have quorum of 1, where most other projects I know have 2 or more. Such as PG or S@H.

EDIT: I may have found a config tag which specifies the delay between the scheduler declaring the task as finished and the validator running. And I think C@H have set that to 0. Meaning as soon as your scheduler request goes through it tries to validate

Rudy Toody · Feb 19, 2018

Able was I ere I saw Elba.

A man, a plan, a canal, Panama.

StefanR5R · Feb 20, 2018

crashtech said:
I have stopped LHC from running on the machine that was getting the most errors. Here's a log file from one of the failed tasks:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=176988322

StefanR5R said:
There is stuff about "object not found" and "invalid object state" and "access denied" and whatnot in my log. Whereas yours says "access denied" only.

@crashtech, incidentally I discovered a faulty WU on my host (which still has only a small fraction of failures) which showed an error log very similar to yours:
https://www.cosmologyathome.org/result.php?resultid=65190118

So, unlike I thought, this E_ACCESSDENIED failure mode is not tied to whether a host experiences many or few failures.

Search

Weekly Stats-18FEB2018

Kiska

Golden Member

Rudy Toody

Diamond Member

StefanR5R

Elite Member

TRENDING THREADS