News Rosetta's role in fighting coronavirus

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,717
136
OK... this news item is not so new anymore, but worth reposting here nevertheless.

On 24 Feb 2020 boinc.bakerlab.org said:
Rosetta's role in fighting coronavirus

Thank you to all R@h volunteers for your contributions to help accurately model important coronavirus proteins. The collective computing power that you provide through R@h helps academic research groups world wide model important protein structures like these.

From a recent IPD news post:

"We are happy to report that the Rosetta molecular modeling suite was recently used to accurately predict the atomic-scale structure of an important coronavirus protein weeks before it could be measured in the lab. Knowledge gained from studying this viral protein is now being used to guide the design of novel vaccines and antiviral drugs."

Since the release of SARS-CoV-2 genome sequences in late January, a number of important corona virus proteins like the one described above have been modeled on R@h volunteer computers. A list of these proteins is provided by the Seattle Structural Genomics Center for Infectious Disease (SSGCID).

24 Feb 2020, 18:19:59 UTC · Discuss

As many of you know, Rosetta@home is a Distributed Computing project very similar to Folding@home. Differences are that it uses the BOINC client for work distribution, and that its science applications are limited to CPU processing (while Folding@home can use GPUs, and is in fact much faster on commonly available discrete GPUs than even on the largest server CPUs).

Right now, I have got Folding@home running myself on 9 GPUs since the AnandTech vs. Tom's race was announced, but I have been running Rosetta@home on a little 4-core/8-thread Haswell Xeon E3 already throughout the last weeks, or actually months. I now remembered to switch the host CPU of my biggest GPU computer, a Xeon E5 v4, over to Rosetta too. It's a 22c/44t processor, and I set 36 threads for Rosetta, leaving the rest to drive this computer's 3 GPUs and what else goes on in the background.

But back to the news item above: If you wonder if this concluded Rosetta@home's corona virus related research already, then no, they are still at it. Of the tasks which I have currently queued here myself, about half say in their name that they are involved with COVID-19 research. (I haven't taken the time yet to go look up what the others may be about. Similar to Folding@home, Rosetta@home is pursuing several projects in parallel, yet the computing preferences don't have options for us to choose which ones we want to support in particular.)

If you are new to BOINC or to Rosetta@home: TeAm AnandTech is of course active as a BOINC team at Rosetta too. When you create an account for yourself at Rosetta@home, or at any later point in time, you can join your account with TeAm AnandTech through the Rosetta web site --- http://boinc.bakerlab.org/rosetta/.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,478
14,434
136
Well, time to switch a few hundred cores back to Rosetta.

Added 2 EPYC boxes and a 3950x to Rosetta., 288 tasks.
 
Last edited:

biodoc

Diamond Member
Dec 29, 2005
6,257
2,238
136
Some of the rosetta tasks I'm working on are clearly covid-19 related:

2rg4se8m_3h3_design3_COVID-19_SAVE_ALL_OUT_902883_2
 
  • Like
Reactions: Ken g6

biodoc

Diamond Member
Dec 29, 2005
6,257
2,238
136
How is the boinc client on Linux? I might switch the cpu on my folding box to Rosetta.

It looks the same as the windows client.

For linux mint and perhaps other debian systems:

sudo apt install boinc-client <---this installs the client and manager
sudo usermod -a -G boinc $(whoami) <-----this adds your username to the boinc group
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,717
136
How is the boinc client on Linux? I might switch the cpu on my folding box to Rosetta.
I don't know about Rosetta@home in particular, but several BOINC based projects have applications which perform better on Linux than on Windows. At a small minority of BOINC projects, the Windows version is superior to the Linux version (due to the toolchains and libraries used by the application developers on the respective platforms, but this is really a vanishing minority). There was even the occasional project with Windows application but without Linux application, but I don't recall any of those being active right now.

The BOINC client itself works alright on Linux. Many Linux distributions (basically all mainstream distributions) offer it right there in their package manager.

Windows users who migrate to Linux may occasionally encounter difficulties when they want to fine-tune their BOINC operation on Linux, since the default BOINC installations run the client with a dedicated user ID, and have respectively restricted permissions set to the BOINC data directory. But that's totally OK if you keep in mind that both Linux and Win NT are true multi user operating systems, but on NT and its descendants this is not consequently used, catering to folks who came there from DOS and Win 95... :-) Though indeed, things like file permissions become relevant only once you go and reach deeper into the innards of the BOINC client setup.
 
  • Like
Reactions: biodoc

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,478
14,434
136
Just to show you what happened when I switched my dual 7601 (128 threads) to Rosetta:
dDFccb9.png
 

blckgrffn

Diamond Member
May 1, 2003
9,110
3,028
136
www.teamjuchems.com
Is there an easy way to set number of threads outside of using the %of CPU modifier? I'd like to declare it directly, if possible...

I think I figured it out. Percentages are hard. Also, when Rosetta/BOINC is running hard on the CPUs, this PC is unusable. :D
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,717
136
Is there an easy way to set number of threads outside of using the %of CPU modifier? I'd like to declare it directly, if possible...
The boinc-client only takes percentages, and percentages are hard for us humans --- there is no way around both of this.

But there is a little trick which some people use:
Tell boinc-client that your computer has 100 CPUs. Then, each percent equals one CPU.
  • Suspend computation. (This step avoids overwhelming the computer during the next steps.)
  • Open C:\ProgramData\BOINC\cc_config.xml or /var/lib/boinc*/cc_config.xml in an editor.
    On Linux, you'd need sudo or gksudo unless you have taken measures to make your primary user able to access the boinc data files:
    sudo nano -w /var/lib/boinc*/cc_config.xml
    or
    gksudo gedit /var/lib/boinc*/cc_config.xml
    (I am writing boinc* because some installations have it at /var/lib/boinc, others in /var/lib/boinc-client by default.)
  • Within the options section, set add ncpus or edit ncpus. (Some installations have a fully populated cc_config.xml, others an empty one or none at all. If you haven't one and create it, take care that it is a text file but named cc_config.xml, and that the boinc pseudo-user has permission to read it.)
    Edit it like so:
<cc_config>​
<!-- various stuff -->​
<options>​
<!-- maybe more stuff -->​
<ncpus>100</ncpus>​
</options>​
</cc_config>​
  • In boincmgr's advanced view, use Options -> Read config files.
    Tools -> Event Log should show you the effect immediately.
  • In Options -> Computing Preferences, set "Use at most __ % of the CPUs" to an accordingly low enough value and Save. Again, the event log should show the effect.
  • Resume computation.
Of course if you are nuts like Mark and build computers with 128 or even more hardware threads, this doesn't work so well anymore.

PS, documentation on cc_config.xml: wiki link
 
Last edited:
  • Like
Reactions: blckgrffn

Howdy

Senior member
Nov 12, 2017
572
480
136
@StefanR5R
Thanks for posting this, TR's are running this now.

Hopefully a better upload and download experience than F@H.
 
Last edited:
  • Like
Reactions: StefanR5R

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,717
136

Howdy

Senior member
Nov 12, 2017
572
480
136
Electricity will be in check, switching over to Rosetta and closing down F@H sadly. F@H machine sitting with completed work and failing to upload much less download. (All understandable) Rosetta working flawlessly as you state (although this could change obviously) I believe power consumption for TR will be slightly less than my GPUs. Either way I feel I need to do something with this equipment, even if I'm limiting my participation.
 

biodoc

Diamond Member
Dec 29, 2005
6,257
2,238
136
Is there an easy way to set number of threads outside of using the %of CPU modifier? I'd like to declare it directly, if possible...

I think I figured it out. Percentages are hard. Also, when Rosetta/BOINC is running hard on the CPUs, this PC is unusable. :D


I limit the # of threads for a project by creating an app_config.xml file in the boinc project directory.

<app_config>
<project_max_concurrent>16</project_max_concurrent>
</app_config>

Just replace the number 16 with the number of threads you want to allocate to rosetta tasks. Then save the file and in the boinc manager go to 'options' then choose 'read config files' in the drop down menu.

The problem with this method is that each project folder needs an app_config.xml file and sometimes I forget about it and wonder why I'm only crunching 14 tasks on a 16 thread processor. :)
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,717
136
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,717
136
From the other thread:
Btw, anyone know how Rosetta have been handling new donors? And how many they're getting?
See the twitter above in #17, or all-users credit graph in #14.

Current count of active users according to boincstats:

Screenshot_20200327_131300.png

The server can probably take more. :-)

--------
Edited to show active users, not total users.
Note, the y-axis of the graph begins at 21k, not at zero.
 
Last edited:
  • Like
Reactions: Assimilator1

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,717
136
From the other thread:
I saw a post on the Rosetta forums that they have had their active users more then double and they were running out of work units. They are working to adjust to the new demand.

Sounds familiar , eh?
Where does it say they are running low on work? I haven't found it in their forum, but didn't dig deeply.

BTW, Rosetta@home has a feature not seen at many other BOINC projects: The target run time of the work units can be configured at the user page in Rosetta@home preferences. Default is 8 hours, but it can be set as high as 24 hours. The client then issues server requests less often, obviously. (I have set up mine to 16 hours currently.) A single work unit may analyze several models, one after another, and increasing the target run time means more models are tried by the same work unit until next server contact.

BTW,
February 26: 23 k tasks ready to send, 332 k tasks in progress
March 26: 20 k tasks ready to send, 1.2 M tasks in progress
(source: @Kiska's server status tracker)

Edit:
saw a few interesting forum posts of the researchers.
Admin said:
Message 92229 - Posted: 24 Mar 2020, 18:19:41 UTC
Last modified: 25 Mar 2020, 1:58:58 UTC

We are currently trying are best considering what is happening around here with COVID-19. We are in Seattle, which is among the worst hit regions. Some are working in the lab related to COVID-19 and supporting staff even at risk of exposure, for example, the IPD core production group has been busy making COVID-19 proteins, an antibody known to bind, and the human ACE2 receptor target for research groups around the world. Relating to this project, we are trying to update Rosetta to the latest source so we can start running protein interface design jobs to design binders to COVID-19. This has already been done at a smaller scale on our local computing resources and we have some hopeful candidates that are already being worked on within the wet lab. The hope is to have many more candidates as that increases the chance of having a legitimate binder with the potential to develop and optimize for a therapeutic.

ALL COVID-19 related jobs that come out of our lab are named with the COVID-19 tag. Jobs from other institutions running through our structure prediction service, Robetta, may or may not be labeled, which is at the discretion of the individual researchers which we have no control over.

Thank you everyone for your contributions!

bcov said:
Message 92237 - Posted: 24 Mar 2020, 20:45:37 UTC
Last modified: 24 Mar 2020, 20:49:50 UTC

Hey, I'm one of the people on the other end here using Rosetta @ Home to do science for COVID-19.

The first thing I should say is that we only recently had the idea to use Rosetta @ Home for design calculations. Previously we've been focused on structure prediction. It's quite a bit more complicated to send out jobs that need to run once (design calculations) than it is to send out jobs that need to run thousands of times (structure prediction). This requires a lot more data to be sent as well as a lot more time spent book-keeping to make sure that everything is going through.

With that in mind, step 1 right now is to update the version of Rosetta on Rosetta @ Home to use the latest and greatest version. The structure prediction side of things is rather mature and doesn't need new features, whereas the interface design is brand new, unpublished, and rapidly being developed. We need new code that I and others have written in order to perform these calculations.


So the first question is, is the science benefitting from using Rosetta @ Home? To which the answer is a definite yes. Interface design requires first making scaffold proteins followed by designing them to make interfaces. Right now I'm designing scaffolds at an unprecedented scale with the help of Rosetta @ Home. To give you an idea, as soon as I switched, I gained 10x-100x more compute than what I had before.

The second question is, will more users help? And the answer there is definitely yes again. For all of this work, more sampling = better results. The more scaffold proteins I can make, the better the best scaffolds will be. And then, once we get interface design going, the more interfaces I can design, the better the best designs will be, and the higher the experimental success rate will be.

Additionally, we have a long list of target proteins we're interested in for COVID-19 (some of which are COVID-19 itself). The more compute we have, the faster and better we can work through these targets and the faster we'll be able to make new therapeutics. Now, I can't guarantee that we'll end up making a therapeutic that goes through clinical trials and makes it to the people, but I can guarantee that we'll make binding proteins to interesting target proteins which have the potential to be therapeutics against COVID-19.

bcov said:
Message 92277 - Posted: 25 Mar 2020, 16:00:45 UTC
Last modified: 25 Mar 2020, 16:04:34 UTC

I just wanted to add a comment here about GPU. We'd love to use GPUs and we're well aware of how powerful they are, but making Rosetta work on GPUs isn't as easy as just updating the website and the BOINC infrastructure.

Running on GPUs requires code that at the very CPU instruction level is doing the same thing millions of times. Like rendering every pixel on a screen or calculating the hash of a cryptocurrency function millions of times.

The problem with the Rosetta code base (which I should remind is 3M lines of C++ code), is that it was written well before the thought of ever using GPUs. Rosetta is this beautiful objected-oriented programming playground where everything is just a pointer away with elegant recursive functions to make everything work and a super-fast monte-carlo simulated annealer to pick new amino acids. The problem is that everything I just described is inherently single-CPU, and trying to change this would involve rewriting the core of Rosetta.

Now, with that in mind, some of the big names in Rosetta code development started a project 2 years ago to get Rosetta to run on GPUs. As I said, this is a full rewrite from the ground up. They've made great progress and have gotten the score function and minimizer to work. But I think the packer gave them a bit of trouble. As I said, the packer is a long series of trial and error that simply doesn't parallelize well. I think they eventually settled on trying to design like 100 proteins at once rather than making it truly parallelizable.

But yes, we want GPUs just as bad as you do. And while we are making progress with the GPU Rosetta, they've only recreated about 1% of the whole infrastructure so far.

bcov said:
Message 92291 - Posted: 25 Mar 2020, 19:49:35 UTC - in response to Message 92284.
Laurent said:
I do OpenCL coding for a living and I'm between jobs.

Any 2-3 weeks tasks?
Unfortunately I don't think so. This is very much a long slough by the same people that wrote Rosetta the first two times. Much of the challenge here is trying to figure out how to redo the core algorithms but in parallel fashion (with performance being #1).

There might be GPU Rosetta jobs running someday, but they'll likely be doing very different tasks from what you see here. One of the goals of getting the GPUs going was for fast data-transfer to machine learning algorithms. So with that in mind, you might see some sort of deep-learning Rosetta thing in a few years.

The other issue with the GPU version is that like I said, there's a whole stack of science built on-top of the core code that would take thousands of hours to reproduce. The GPU version is very much a fork with different goals.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,717
136
It was referenced in this post:


i haven’t looked much for an official announcement, however.
This user may be mistaken. From @Kiska's tracker:
minimum tasks ready to send: 17.20 k today, 17.20 k this week, 17.20 k this month, 8.08 k this year.

Edit,
for what it's worth, I checked my boinc logs.
  • My Xeon E3 had several "No tasks sent" messages in it from January, one from March 24, and one from March 26.
  • My Xeon E5 has one such message from March 26.
  • Three i7-7700K which I more recently switched to Rosetta have no such message yet.
Each of the three occasions of "No tasks sent" in March was immediately followed by a client retry, which succeeded right away each time.
 
Last edited:

ZipSpeed

Golden Member
Aug 13, 2007
1,302
169
106
Just a quick FYI for those crunchers that 'set it and forget it'. I noticed these new Rosetta WUs are a bit more memory hungry. I actually had a few WUs stall because it ran out of memory. With my dedicated DC machines, I usually allow BOINC to use 80% of the RAM when the computer is use, but there were a few rigs where I forgot to bump it up from 50 to 80.
 
  • Like
Reactions: StefanR5R

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,478
14,434
136
Just a quick FYI for those crunchers that 'set it and forget it'. I noticed these new Rosetta WUs are a bit more memory hungry. I actually had a few WUs stall because it ran out of memory. With my dedicated DC machines, I usually allow BOINC to use 80% of the RAM when the computer is use, but there were a few rigs where I forgot to bump it up from 50 to 80.
Thanks ! I just checked and my 7742 was set to 50%. I changed it to 90% and now its using 96 gig !
 
  • Like
Reactions: ZipSpeed

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,478
14,434
136
Question.... I have ONE box that when I enable Rosetta to run, the F@H GPU client say "Interrupted" and restarts.

BOINC is configured to 80% of the CPU's and NO GPU allowed.

And why does this only happen on this box, when I have other with Rosetta on CPU and F@H on GPU, no problem ?

Ideas ?