Question Rosetta@home now requires virtual box on ubuntu ?? linux/virtualbox experts needed to get more than 16 tasks !

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
I tried to start Rosetta back up after the WCG 17th Birthday event, but I get the error in red "virtualbox is not installed". So after my google research failed.... I typed:

sudo apt-get install virtualbox and rebooted. Now its working, but says "rosetta python projects (vbox64)"

So the question is, why is it now required, and is it faster or slower ? And what a pain to have to do this on every box that runs Rosetta now.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
So now that the 64 core EPYC's are out of work for WCG, I enabled Rosetta. Out of 384 possible threads, I have ONE Rosetta task.
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
So now that the 64 core EPYC's are out of work for WCG, I enabled Rosetta. Out of 384 possible threads, I have ONE Rosetta task.
You'd probably have to wait for another "bump" for the classic workqueue happening like the ones you can see in the green ready-to-send (rts) graph in Rosetta results - by week, the ones which are paired with a rise of the blue tasks-in-progress graph. But even then you will only get work if your computers don't stop sending work requests. (Classic work, that is. The unstable and resource hungry vbox based work is readily available all the time of course.)
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
I started a new experiment right now, on one out of two computers of otherwise same configuration:
– I shut down boinc-client.
– Made a backup of /var/lib/boinc/slots.
– Created a new slots directory, mounted a tmpfs on it, and copied the old slots subdirectories into it.
– Restarted boinc-client.

That is, I am now storing the active tasks' data on a RAM disk instead of an SSD. I am curious to see if this brings any improvement to the "postponed" tasks issue, or/ and to the "infinite" tasks issue.

From the computer with this modification:
Code:
$ free -h
              total        used        free      shared  buff/cache   available
Mem:          251Gi        20Gi       111Gi        38Gi       119Gi       190Gi
Swap:            0B          0B          0B
$ sudo du -sh /var/lib/boinc/slots/
39G     /var/lib/boinc/slots/
$ df -h /var/lib/boinc/slots/
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           126G   39G   88G  31% /var/lib/boinc/slots

The other computer without the modification:
Code:
$ free -h
              total        used        free      shared  buff/cache   available
Mem:          251Gi        20Gi       169Gi        31Mi        61Gi       229Gi
Swap:            0B          0B          0B
$ sudo du -sh /var/lib/boinc/slots/
39G     /var/lib/boinc/slots/
$ df -h /var/lib/boinc/slots/
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p2  868G  228G  639G  27% /var
The NVME device is a PCIe v3 2-lane attached SanDisk Extreme Pro.
As you can see, both computers have swap space disabled.

Each computer runs 16 simultaneous Rosetta tasks (currently half vbox, half classic) and 48 simultaneous QuChemPedIA tasks, with 64 cores/ 128 threads at its disposal.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
For the time being, not only is Rosetta 4 work being generated, but there are also many replica tasks sent because they have 100% (?) failure rate on Windows. (If you look at the tasks names on any computer, I bet that many end with _1 instead of _0; the _1 showing that this is a replica in a workunit which already had a failed result. And if you click these workunits in your tasks list on the website, you'll see that these failures came from Windows hosts. There aren't any _2 or _3 etc. tasks because Rosetta@home's server is configured to give up on a workunit after two failures.)

Therefore, the total contributors' GFLOPS of the Rosetta v4 workqueue shrank to that of Linux contributors, while all (?) Windows contributors are left out in the cold.

But even though, sever_status.php keeps showing zero or near zero tasks ready to send in the Rosetta v4 workqueue because total work demand still outpaces work generation.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
So here I have just 2 of the _1 tasks ? running ? The rest are _0 This is a linix host.

1645382836502.png
 

Attachments

  • 1645382732333.png
    1645382732333.png
    300.5 KB · Views: 3

cellarnoise

Senior member
Mar 22, 2017
709
394
136
Good to know about the Windows wu failure issue with Rosseta. I tried to devote some resource to it and all the tasks, that I observed anyway, failed in under 30 seconds. I thought i might be my puter, but it works fine on everything else.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
This is wacky , first no Rosetta "regular" tasks. Then there is a lot... for 2 days. Then they are back with a bunch. Now they are out again...

Consistency please.....
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
Consistently inconsistent....Maybe if I set my computers for 5 days or 10 days instead of 2.... Snatch them while hot. But the 64 core EPYCs get 1000 for 2 days. That will be 5000 for 10 days.
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
Some R@h users appear to be doing something similar. I am routinely seeing workunits which had an earlier failed task on somebody else's computer which timed out. (Remember: The reporting deadline is 3 days. Hence, don't buffer more than can be completed in 3 days.)

Tip: Increase the "Target CPU run time" from the default 8 h to more.
Rosetta@home preferences
Then the same number of downloaded tasks will last you longer. While browsing top_hosts.php, I saw that one of the prolific users set it to 24 h even (which may be a bit much, but works too of course).

Some notes on the process of changing "Target CPU run time":

1. It used to be that the boinc client was oblivious to these changes which are made at the server, and continued to put the old task duration as estimated task durations of new tasks. (In effect, the client would buffer too much after the target run time was increased, or vice versa would buffer too few after the target run time was decreased.) The client's estimation only gradually converged to the new run time while more and more tasks completed with the new setting.

I don't know if this problem still exists.

2. When said change is made on the website, which tasks exactly are affected by the new setting? Well, somebody once wrote:

After a scheduler request of the client (e.g. a project update), the target run time of the following tasks is being modified:
+ tasks which hadn't been started yet (and are started after the scheduler request), this of course also includes tasks which are yet to be downloaded,
+ tasks which were suspended to disk (and are resumed after the scheduler request).​
The target run time of the following tasks are not modified:
–tasks which are running during the scheduler request,
–tasks which were suspended to RAM during the scheduler request.​

Edit, I forgot:
3. For the time being, "Target CPU run time" is recognized only by the "Rosetta" and "Rosetta mini" applications, not by "rosetta python projects".
 
Last edited:
  • Wow
Reactions: cellarnoise

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
Some R@h users appear to be doing something similar. I am routinely seeing workunits which had an earlier failed task on somebody else's computer which timed out. (Remember: The reporting deadline is 3 days. Hence, don't buffer more than can be completed in 3 days.)

Tip: Increase the "Target CPU run time" from the default 8 h to more.
Rosetta@home preferences
Then the same number of downloaded tasks will last you longer. While browsing top_hosts.php, I saw that one of the prolific users set it to 24 h even (which may be a bit much, but works too of course).

Some notes on the process of changing "Target CPU run time":

1. It used to be that the boinc client was oblivious to these changes which are made at the server, and continued to put the old task duration as estimated task durations of new tasks. (In effect, the client would buffer too much after the target run time was increased, or vice versa would buffer too few after the target run time was decreased.) The client's estimation only gradually converged to the new run time while more and more tasks completed with the new setting.

I don't know if this problem still exists.

2. When said change is made on the website, which tasks exactly are affected by the new setting? Well, somebody once wrote:

After a scheduler request of the client (e.g. a project update), the target run time of the following tasks is being modified:
+ tasks which hadn't been started yet (and are started after the scheduler request), this of course also includes tasks which are yet to be downloaded,​
+ tasks which were suspended to disk (and are resumed after the scheduler request).​
The target run time of the following tasks are not modified:
–tasks which are running during the scheduler request,​
–tasks which were suspended to RAM during the scheduler request.​

Edit, I forgot:
3. For the time being, "Target CPU run time" is recognized only by the "Rosetta" and "Rosetta mini" applications, not by "rosetta python projects".
But.... on my 64 core EPYC boxes I have not changed the time, and they run in about 3:30.
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
Actually on all of your computers the runtime is about 3 h. And now that I looked around more, the same is true on several other users' computers.

Apparently 3 h is the current default target runtime, not 8 h anymore.

This has been changed by the project admins a few times over the years.

(Years ago, admins might have made announcements about changes like this. But more recently, there is no communication from the project to the contributors anymore. Scientists feed work into Rosetta on their end, users complete work on their end, but in between there is deafening silence. Admins and moderators have vanished without a trace.)
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
Actually on all of your computers the runtime is about 3 h. And now that I looked around more, the same is true on several other users' computers.

Apparently 3 h is the current default target runtime, not 8 h anymore.

This has been changed by the project admins a few times over the years.

(Years ago, admins might have made announcements about changes like this. But more recently, there is no communication from the project to the contributors anymore. Scientists feed work into Rosetta on their end, users complete work on their end, but in between there is deafening silence. Admins and moderators have vanished without a trace.)
where would you change this anyway ? looked at every preference, I don't see it:
1645741374276.png
1645741403203.png

1645741446767.png
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
PPD are meant to stay the same. I never checked this systematically. (Longer target run time means more simulations of the same model but varied starting parameters are performed in a given task. Proportionally more credit should be given to the result.)
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
Up to 281 running and about 1500 queued on 5 computers today. VERY disparate of sending out work, but at least something is happening. My power bill should be really low this month. (like $300-400)
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
I now have 6 different machines that have the same problem with Rosetta. 3 are EPYC (ECC memory) and 3 are 5950x, all work fine on primegrid and WCG. But on Rosetta, every unit comes back with "computation error". Below is one unit.

Rosetta@home 4.20 Rosetta
preetham_gen_66546_0001_0001_0_SAVE_ALL_OUT_2912149_151_0 00:00:31 (00:00:14) 45.26 100.000 - 3/18/2022 4:46:21 PM Computation error 5950x living room

Any ideas ? 6 computers, 3 of which are EPYC can not be corrupt.
 

cellarnoise

Senior member
Mar 22, 2017
709
394
136
I have had two windows ) AMD computers with same issue. Though both either ram or CPU under volted some. Very stable long time on other projects, but both giving same error as you.???

I've switched to other projects and again no issues.


Something not right, but I don't think it is entirely on my end. Will see if I need to update BOiNc or the project tomorrow?
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
Going by other posts in the R@h message board, the "preetham_*" batch is faulty:
March 2022 - WU error rates
Constant computation errors.

Looks the same on Linux and Windows computers: The task terminates after less than a minute, and stderr.txt always contains the message "Error in simple_cycpcp_predict app read_sequence() function! The minimum number of residues for a cyclic peptide is 4. (GenKIC requires three residues, plus a fourth to serve as an anchor)."
 
Last edited: