Question Rosetta@home now requires virtual box on ubuntu ?? linux/virtualbox experts needed to get more than 16 tasks !

Markfw · Nov 24, 2021

I tried to start Rosetta back up after the WCG 17th Birthday event, but I get the error in red "virtualbox is not installed". So after my google research failed.... I typed:

sudo apt-get install virtualbox and rebooted. Now its working, but says "rosetta python projects (vbox64)"

So the question is, why is it now required, and is it faster or slower ? And what a pain to have to do this on every box that runs Rosetta now.

Markfw · Feb 16, 2022

So now that the 64 core EPYC's are out of work for WCG, I enabled Rosetta. Out of 384 possible threads, I have ONE Rosetta task.

Skillz · Feb 16, 2022

You could run PrimeGrid this month for Tour de Primes. lol

StefanR5R · Feb 17, 2022

Markfw said:
So now that the 64 core EPYC's are out of work for WCG, I enabled Rosetta. Out of 384 possible threads, I have ONE Rosetta task.

You'd probably have to wait for another "bump" for the classic workqueue happening like the ones you can see in the green ready-to-send (rts) graph in Rosetta results - by week, the ones which are paired with a rise of the blue tasks-in-progress graph. But even then you will only get work if your computers don't stop sending work requests. (Classic work, that is. The unstable and resource hungry vbox based work is readily available all the time of course.)

StefanR5R · Feb 20, 2022

I started a new experiment right now, on one out of two computers of otherwise same configuration:
– I shut down boinc-client.
– Made a backup of /var/lib/boinc/slots.
– Created a new slots directory, mounted a tmpfs on it, and copied the old slots subdirectories into it.
– Restarted boinc-client.

That is, I am now storing the active tasks' data on a RAM disk instead of an SSD. I am curious to see if this brings any improvement to the "postponed" tasks issue, or/ and to the "infinite" tasks issue.

From the computer with this modification:

Code:

$ free -h
              total        used        free      shared  buff/cache   available
Mem:          251Gi        20Gi       111Gi        38Gi       119Gi       190Gi
Swap:            0B          0B          0B
$ sudo du -sh /var/lib/boinc/slots/
39G     /var/lib/boinc/slots/
$ df -h /var/lib/boinc/slots/
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           126G   39G   88G  31% /var/lib/boinc/slots

The other computer without the modification:

Code:

$ free -h
              total        used        free      shared  buff/cache   available
Mem:          251Gi        20Gi       169Gi        31Mi        61Gi       229Gi
Swap:            0B          0B          0B
$ sudo du -sh /var/lib/boinc/slots/
39G     /var/lib/boinc/slots/
$ df -h /var/lib/boinc/slots/
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p2  868G  228G  639G  27% /var

The NVME device is a PCIe v3 2-lane attached SanDisk Extreme Pro.
As you can see, both computers have swap space disabled.

Each computer runs 16 simultaneous Rosetta tasks (currently half vbox, half classic) and 48 simultaneous QuChemPedIA tasks, with 64 cores/ 128 threads at its disposal.

StefanR5R · Feb 20, 2022

Well, this experiment ended quickly.
There was already 1 occurrence of a "postponed" task and 1 of an "infinite" task.

Markfw · Feb 20, 2022

Well, I am lucky I guess. 586 Rosetta threads running and 4-5 thousand queued.

StefanR5R · Feb 20, 2022

For the time being, not only is Rosetta 4 work being generated, but there are also many replica tasks sent because they have 100% (?) failure rate on Windows. (If you look at the tasks names on any computer, I bet that many end with _1 instead of _0; the _1 showing that this is a replica in a workunit which already had a failed result. And if you click these workunits in your tasks list on the website, you'll see that these failures came from Windows hosts. There aren't any _2 or _3 etc. tasks because Rosetta@home's server is configured to give up on a workunit after two failures.)

Therefore, the total contributors' GFLOPS of the Rosetta v4 workqueue shrank to that of Linux contributors, while all (?) Windows contributors are left out in the cold.

But even though, sever_status.php keeps showing zero or near zero tasks ready to send in the Rosetta v4 workqueue because total work demand still outpaces work generation.

Markfw · Feb 20, 2022

So here I have just 2 of the _1 tasks ? running ? The rest are _0 This is a linix host.

cellarnoise · Feb 20, 2022

Good to know about the Windows wu failure issue with Rosseta. I tried to devote some resource to it and all the tasks, that I observed anyway, failed in under 30 seconds. I thought i might be my puter, but it works fine on everything else.

StefanR5R · Feb 23, 2022

There is a shipload of classic tasks available again since last night.
But the failures on Windows are still happening.

Markfw · Feb 24, 2022

This is wacky , first no Rosetta "regular" tasks. Then there is a lot... for 2 days. Then they are back with a bunch. Now they are out again...

Consistency please.....

StefanR5R · Feb 24, 2022

I'd say Rosetta@home is very consistently operating in exactly the way which you describe.

https://munin.kiska.pw/munin/rosetta-month.html

Markfw · Feb 24, 2022

Consistently inconsistent....Maybe if I set my computers for 5 days or 10 days instead of 2.... Snatch them while hot. But the 64 core EPYCs get 1000 for 2 days. That will be 5000 for 10 days.

StefanR5R · Feb 24, 2022

Some R@h users appear to be doing something similar. I am routinely seeing workunits which had an earlier failed task on somebody else's computer which timed out. (Remember: The reporting deadline is 3 days. Hence, don't buffer more than can be completed in 3 days.)

Tip: Increase the "Target CPU run time" from the default ~~8 h~~ to more.
Rosetta@home preferences
Then the same number of downloaded tasks will last you longer. While browsing top_hosts.php, I saw that one of the prolific users set it to 24 h even (which may be a bit much, but works too of course).

Some notes on the process of changing "Target CPU run time":

1. It used to be that the boinc client was oblivious to these changes which are made at the server, and continued to put the old task duration as estimated task durations of new tasks. (In effect, the client would buffer too much after the target run time was increased, or vice versa would buffer too few after the target run time was decreased.) The client's estimation only gradually converged to the new run time while more and more tasks completed with the new setting.

I don't know if this problem still exists.

2. When said change is made on the website, which tasks exactly are affected by the new setting? Well, somebody once wrote:

After a scheduler request of the client (e.g. a project update), the target run time of the following tasks is being modified:

+ tasks which hadn't been started yet (and are started after the scheduler request), this of course also includes tasks which are yet to be downloaded,
+ tasks which were suspended to disk (and are resumed after the scheduler request).

The target run time of the following tasks are not modified:

–tasks which are running during the scheduler request,
–tasks which were suspended to RAM during the scheduler request.

Edit, I forgot:
3. For the time being, "Target CPU run time" is recognized only by the "Rosetta" and "Rosetta mini" applications, not by "rosetta python projects".

Markfw · Feb 24, 2022

StefanR5R said:
Some R@h users appear to be doing something similar. I am routinely seeing workunits which had an earlier failed task on somebody else's computer which timed out. (Remember: The reporting deadline is 3 days. Hence, don't buffer more than can be completed in 3 days.)

Tip: Increase the "Target CPU run time" from the default 8 h to more.
Rosetta@home preferences
Then the same number of downloaded tasks will last you longer. While browsing top_hosts.php, I saw that one of the prolific users set it to 24 h even (which may be a bit much, but works too of course).

Some notes on the process of changing "Target CPU run time":

1. It used to be that the boinc client was oblivious to these changes which are made at the server, and continued to put the old task duration as estimated task durations of new tasks. (In effect, the client would buffer too much after the target run time was increased, or vice versa would buffer too few after the target run time was decreased.) The client's estimation only gradually converged to the new run time while more and more tasks completed with the new setting.

I don't know if this problem still exists.

2. When said change is made on the website, which tasks exactly are affected by the new setting? Well, somebody once wrote:

After a scheduler request of the client (e.g. a project update), the target run time of the following tasks is being modified:

+ tasks which hadn't been started yet (and are started after the scheduler request), this of course also includes tasks which are yet to be downloaded,
+ tasks which were suspended to disk (and are resumed after the scheduler request).
The target run time of the following tasks are not modified:

–tasks which are running during the scheduler request,
–tasks which were suspended to RAM during the scheduler request.

Edit, I forgot:
3. For the time being, "Target CPU run time" is recognized only by the "Rosetta" and "Rosetta mini" applications, not by "rosetta python projects".

But.... on my 64 core EPYC boxes I have not changed the time, and they run in about 3:30.

StefanR5R · Feb 24, 2022

Actually on all of your computers the runtime is about 3 h. And now that I looked around more, the same is true on several other users' computers.

Apparently 3 h is the current default target runtime, not 8 h anymore.

This has been changed by the project admins a few times over the years.

(Years ago, admins might have made announcements about changes like this. But more recently, there is no communication from the project to the contributors anymore. Scientists feed work into Rosetta on their end, users complete work on their end, but in between there is deafening silence. Admins and moderators have vanished without a trace.)

Markfw · Feb 24, 2022

StefanR5R said:
Actually on all of your computers the runtime is about 3 h. And now that I looked around more, the same is true on several other users' computers.

Apparently 3 h is the current default target runtime, not 8 h anymore.

This has been changed by the project admins a few times over the years.

(Years ago, admins might have made announcements about changes like this. But more recently, there is no communication from the project to the contributors anymore. Scientists feed work into Rosetta on their end, users complete work on their end, but in between there is deafening silence. Admins and moderators have vanished without a trace.)

where would you change this anyway ? looked at every preference, I don't see it:

StefanR5R · Feb 24, 2022

It's not in "Computing preferences", but in "Rosetta@home preferences".

Markfw · Feb 24, 2022

StefanR5R said:
It's not in "Computing preferences", but in "Rosetta@home preferences".

Thanks ! It says defaults to 8, but we know its 3. Anyhow I changed it to 12. Will that give higher ppd ?

StefanR5R · Feb 25, 2022

PPD are meant to stay the same. I never checked this systematically. (Longer target run time means more simulations of the same model but varied starting parameters are performed in a given task. Proportionally more credit should be given to the result.)

Markfw · Mar 4, 2022

Up to 281 running and about 1500 queued on 5 computers today. VERY disparate of sending out work, but at least something is happening. My power bill should be really low this month. (like $300-400)

Markfw · Mar 15, 2022

I now have 6 different machines that have the same problem with Rosetta. 3 are EPYC (ECC memory) and 3 are 5950x, all work fine on primegrid and WCG. But on Rosetta, every unit comes back with "computation error". Below is one unit.

Rosetta@home 4.20 Rosetta
preetham_gen_66546_0001_0001_0_SAVE_ALL_OUT_2912149_151_0 00:00:31 (00:00:14) 45.26 100.000 - 3/18/2022 4:46:21 PM Computation error 5950x living room

Any ideas ? 6 computers, 3 of which are EPYC can not be corrupt.

cellarnoise · Mar 16, 2022

I have had two windows ) AMD computers with same issue. Though both either ram or CPU under volted some. Very stable long time on other projects, but both giving same error as you.???

I've switched to other projects and again no issues.

Something not right, but I don't think it is entirely on my end. Will see if I need to update BOiNc or the project tomorrow?

StefanR5R · Mar 16, 2022

Going by other posts in the R@h message board, the "preetham_*" batch is faulty:
– March 2022 - WU error rates
– Constant computation errors.

Looks the same on Linux and Windows computers: The task terminates after less than a minute, and stderr.txt always contains the message "Error in simple_cycpcp_predict app read_sequence() function! The minimum number of residues for a cyclic peptide is 4. (GenKIC requires three residues, plus a fourth to serve as an anchor)."

Question Rosetta@home now requires virtual box on ubuntu ?? linux/virtualbox experts needed to get more than 16 tasks !

Moderator Emeritus, Elite Member

Moderator Emeritus, Elite Member

Golden Member

Elite Member

Elite Member

Elite Member

Moderator Emeritus, Elite Member

Elite Member

Moderator Emeritus, Elite Member

Attachments

Senior member

Elite Member

Moderator Emeritus, Elite Member

Elite Member

Moderator Emeritus, Elite Member

Elite Member

Moderator Emeritus, Elite Member

Elite Member

Moderator Emeritus, Elite Member

Elite Member

Moderator Emeritus, Elite Member

Elite Member

Moderator Emeritus, Elite Member

Moderator Emeritus, Elite Member

Senior member

Elite Member