Question Rosetta@home now requires virtual box on ubuntu ?? linux/virtualbox experts needed to get more than 16 tasks !

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,478
14,434
136
I tried to start Rosetta back up after the WCG 17th Birthday event, but I get the error in red "virtualbox is not installed". So after my google research failed.... I typed:

sudo apt-get install virtualbox and rebooted. Now its working, but says "rosetta python projects (vbox64)"

So the question is, why is it now required, and is it faster or slower ? And what a pain to have to do this on every box that runs Rosetta now.
 

Endgame124

Senior member
Feb 11, 2008
954
669
136
Overprovisioning sounds tempting. I learned from @Endgame124 that this is even possible in a single client instance, by configuring >100% allowed RAM usage via an edited global_prefs_override.xml. *However*, based on what I saw at other projects which use vboxwrapper (I haven't tested this new Rosetta application yet), you definitely want to maintain a very responsive system at all times. Otherwise you might frequently see tasks which get stuck with "Postponed: VM job unmanageable, restarting later". Such tasks can then only be aborted, IME, with the respective loss of the task's computation up until that point. (Edit: There may be workarounds to get such tasks back up and running, but they don't seem reliable and feasible to automate.) Therefore I have doubts that overprovisioning of RAM is the way to go with a vboxwrapper based application like this.
Note that over provisioning ram works best when combined with compressed swap via zram in Linux. It prevents tremendous slowdowns or system crashes by converting the ram dynamically to a compressed swap space in ram. It has a performance penalty, but only kicks in when your tasks exceed physical memory. It used to be the only way to reliable way to get 4 Rosetta tasks running on a 4gb pi.
 

Icecold

Golden Member
Nov 15, 2004
1,090
1,008
146
I would assume with the changes that now the Pis don't get any work since the only work available is Virtualbox Python tasks? I know that's not totally relevant to what you posted about Zram, but was curious if the Pis are still usable on Rosetta.
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,478
14,434
136
Note that over provisioning ram works best when combined with compressed swap via zram in Linux. It prevents tremendous slowdowns or system crashes by converting the ram dynamically to a compressed swap space in ram. It has a performance penalty, but only kicks in when your tasks exceed physical memory. It used to be the only way to reliable way to get 4 Rosetta tasks running on a 4gb pi.
I got the "Postponed: VM job unmanageable, restarting later". on 2 jobs. This was on an EPYC with only 16 tasks running and 128 threads, and 128 gig ECC registered memory.. I just deleted the project from all my computers. To hell with Rosetta.
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
Note that over provisioning ram works best when combined with compressed swap via zram in Linux. It prevents tremendous slowdowns or system crashes by converting the ram dynamically to a compressed swap space in ram. It has a performance penalty, but only kicks in when your tasks exceed physical memory. It used to be the only way to reliable way to get 4 Rosetta tasks running on a 4gb pi.
I had very positive experience with overprovisioned RAM yet without zram at Amicable Numbers. But right, perhaps the lag which @Icecold described can be eliminated with zram.

I got the "Postponed: VM job unmanageable, restarting later". on 2 jobs. This was on an EPYC with only 16 tasks running and 128 threads, and 128 gig ECC registered memory.
Maybe this is because the client monitors the RAM usage while the tasks are running, and can then only react *after* RAM usage went too high, i.e. after a situation with large latency due to swapping to disk already began.

At Cosmology@home with its vboxwrapper based camb_boinc2docker application, these "Postponed..." incidents happen all the time (even though there is no problem with RAM requirements). Whenever I ran Cosmology@home, I had a monitoring script running which had a simplistic way to detect such tasks, and automatically aborted them. This enabled me to run Cosmology@home still pretty effectively, and unattended.

I just deleted the project from all my computers. To hell with Rosetta.
I just had a look at https://boinc.bakerlab.org/rosetta/prefs.php?subset=project and saw that there is no way to deselect the "rosetta python projects" application. :-(

It can be blocked however by either not installing VirtualBox in the first place, or by setting <dont_use_vbox>1</dont_use_vbox> in the <cc_config> <options> section. (Requires client restart.)

That doesn't help though if Rosetta doesn't queue any work for the classic application... :-|
 

biodoc

Diamond Member
Dec 29, 2005
6,257
2,238
136
It can be blocked however by either not installing VirtualBox in the first place, or by setting <dont_use_vbox>1</dont_use_vbox> in the <cc_config> <options> section. (Requires client restart.)
There is another way that that I've never seen before. If you go to your account, then for "computers on this account" choose "view". For each computer choose "details" and there you will see at the bottom, "VirtualBox VM jobs", where you can choose the "skip" button. I'm not sure if it works though.
That doesn't help though if Rosetta doesn't queue any work for the classic application... :-|
That is a problem. :(
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,478
14,434
136
There is another way that that I've never seen before. If you go to your account, then for "computers on this account" choose "view". For each computer choose "details" and there you will see at the bottom, "VirtualBox VM jobs", where you can choose the "skip" button. I'm not sure if it works though.

That is a problem. :(
All this started when I put my 4 regular Rosetta boxes back on, and for 2 days NO work units. Also, The skip button does NOT work. Goodbye Rosetta.
 

biodoc

Diamond Member
Dec 29, 2005
6,257
2,238
136
Each task uses 2 cpu threads for a total of 14 threads
As it turns out, the default # of threads per rosetta python vbox task is 1. Several months back, I was experimenting with an app_config.xml and had set 2 threads per task. The app_confg.xml file still was in place when I downloaded the new tasks in my post above. I decided to set up 2 boinc instances:

Primary boinc instance: 24 threads running WCG tasks
Second boinc instance: 8 threads running 8 rosetta python vbox tasks. I also added an app_config.xml to limit total project tasks to 8 in case Rosetta releases more tasks for the legacy application.
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
Do you happen to know whether the RAM footprint changes with threadcount of the tasks?


(E.g. in case of the vboxwrapper based version of LHC@home's "ATLAS Simulation", the memory size of the VMs increase with their configured thread count, but sublineraly. Therefore a higher threadcount per task allows for better CPU utilization at the same overall memory usage of all simultaneous tasks.)
 

biodoc

Diamond Member
Dec 29, 2005
6,257
2,238
136
Do you happen to know whether the RAM footprint changes with threadcount of the tasks?
I didn't collect RAM usage data when I was running 2 threads per task. At just over 50% complete, each single threaded task has reserved 1.4 GB of RAM.

EDIT: A few of the single threaded tasks just completed. The run time is the same whether I run single or 2 threaded tasks. CPU time is 2x run time with 2 threaded tasks but points awarded are the same so it makes sense to stick with single threaded tasks.
 

biodoc

Diamond Member
Dec 29, 2005
6,257
2,238
136
Does this application perhaps use the same user-configurable runtime limit as the classic application?
No, unless there's a minimum run-time of 6 hours. I have the run-time for the legacy apps set a 4 hours but the python tasks take ~6 hours. I've had a couple of "runners" that show longer run-times but use essentially no cpu time so I aborted those.
 

DrTechnical

Junior Member
Dec 7, 2021
1
0
6
I am running BOINC with Rosetta and WCG as my only two projects. My machine is an AMD Ryzen 2700 with 48GB of RAM.

I was surprised the other day to see that BOINC has apparently auto-created 3 Virtualbox VM's to run the Rosetta tasks. I currently have 45 Rosetta tasks under BOINC with only 3 currently running. Each of the 3 BOINC Rosetta VM's is configured for 6144MB of RAM.

Is this the expected behavior for BOINC clients who are running Rosetta?
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
I am running BOINC with Rosetta and WCG as my only two projects. My machine is an AMD Ryzen 2700 with 48GB of RAM.

I was surprised the other day to see that BOINC has apparently auto-created 3 Virtualbox VM's to run the Rosetta tasks. I currently have 45 Rosetta tasks under BOINC with only 3 currently running. Each of the 3 BOINC Rosetta VM's is configured for 6144MB of RAM.

Is this the expected behavior for BOINC clients who are running Rosetta?
I read that there are currently workunits which request 3, 6, or 8 GB RAM per task. So yes, that's currently expected behavior. There are still some workunits generated for the native Rosetta application which does not use vbox and does not have this extreme RAM footprint, but these few workunits are hard to come by.

As others mentioned, it may be possible to run a little more tasks at once than there is physical RAM available to fit them all in, if the client is tweaked to assume that more RAM is there (either edit global_prefs_override.xml to say that boinc is permitted to use >100% RAM, or run multiple client instances which together are allowed to use >100% RAM), and perhaps if you not just rely on swap space on disk but use zram (on Linux). The reason why this may work is because the application within the virtual machine actually accesses quite a bit less RAM than is allocated for the virtual machine.

A less risky way is to mix work from different projects (e.g. 2 "rosetta python projects" plus X World Community Grid tasks). But this isn't quite easy to configure either. If I want to do thinks like that, I use separate boinc client instances for each active project, because the boinc client just won't do the right thing for multiple active projects if left to its own devices. :-(

PS: Welcome to the forum! :-)
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
It can be blocked however by either not installing VirtualBox in the first place, or by setting <dont_use_vbox>1</dont_use_vbox> in the <cc_config> <options> section. (Requires client restart.)

That doesn't help though if Rosetta doesn't queue any work for the classic application... :-|
There are now more tasks queued for Rosetta 4.20 again than are for rosetta python projects, according to server_status.

If you block vbox work with the mentioned cc_config option (I haven't tried it myself yet), it may be worthwhile to have "Target CPU run time" in Rosetta@home preferences set to no less than 8 hours, such that you don't need too many tasks to remain busy on Rosetta 4.20 alone.
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
I have been running Rosetta on one computer since this morning.

When rosetta python projects tasks turn into "Postponed: VM job unmanageable, restarting later" state, they can be recovered by shutdown + restart of the client. They then proceed from where they stopped, and eventually finish... or may turn into "Postponed..." zombies soon again.

The frequency at which this happened on this computer was ridiculous. It happened orders of magnitude more frequent than at Cosmology@home when I last ran it.

Conclusion: This application doesn't work for me.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,478
14,434
136
I have been running Rosetta on one computer since this morning.

When rosetta python projects tasks turn into "Postponed: VM job unmanageable, restarting later" state, they can be recovered by shutdown + restart of the client. They then proceed from where they stopped, and eventually finish... or may turn into "Postponed..." zombies soon again.

The frequency at which this happened on this computer was ridiculous. It happened orders of magnitude more frequent than at Cosmology@home when I last ran it.

Conclusion: This application doesn't work for me.
Ditto. After being 1/2 split between WCG and Rosetta (CPU wise) for years, I quit. If an application does not work right they don't get my free cpu cycles.
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
You can deinstall vbox, or set <dont_use_vbox>1</dont_use_vbox> in cc_config. I will run Rosetta with the latter on one computer for a while now.

(On a computer which is not supposed to go idle at some point, a backup project should of course be configured for the possibility that the Rosetta 4.20 work supply dries up again. But this is no different from the times before the introduction of the vboxwrapper based application.)
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
The current status of Rosetta@home appears to be that there are
  • batches of classic Rosetta work becoming available at some "happy hours" almost every day recently,¹ which are slurped up quickly by users who have their clients set up to pull Rosetta work whenever possible (on hosts without vbox or disabled vbox),
  • batches of the newer vbox based work available at all times of day, but supported by much fewer contributors compared to the classic Rosetta workqueue.
¹) Or perhaps "happy minutes", rather.

That is, those who want to run classic Rosetta, still can do so, but need to go through hoops with the work buffering due to its intermittent availability. I haven't tried this myself, but the steps to take in order to run classic Rosetta are evident:
  • Do not install VirtualBox, or set <dont_use_vbox>1</dont_use_vbox> in cc_config.xml.
  • Set a large enough work buffer. But don't overdo it; the reporting deadline is 3 days.
  • Do not run any other project in the same client, or if you do, set the other project(s) to 0 % resource share, or set up a limit on tasks in progress for these projects. (The latter is only possible at a few projects as a web preferences option.)
  • Run a script which reminds the client to 'update' the Rosetta project periodically (I haven't investigated what period would be good to set), or run a client with modified source code which is not prolonging the request backoff period too much (I never tested this approach myself).
  • Optionally, set the "Target CPU run time" at the Rosetta@home web preferences to more than the default 8 hours, such that the tasks which you can get last longer.

If you go for such a setup but want to add some safeguards against idle time, you could activate one or another additional project in the same client but set them to 0 % resource share, such that they don't fill your work buffer which you want to reserve for Rosetta. Or, speaking theoretically, you could set up another client instance and have a watchdog script monitor the number of running tasks in the Rosetta instance and reduce/ increase the number of active CPUs in the other instance accordingly.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

So the above was all about running the classic Rosetta application under the current circumstances. But now about running the newer VirtualBox based Rosetta application, a.k.a. "rosetta python projects":

I had my computers off during the week but reactivated two of them yesterday. CPDN would have been my project of choice but doesn't have Linux work at this time, and so I decided to go for a mix of QuChemPedIA and the awful Rosetta Python Projects.
So the main issues with the latter are that these tasks require a larger than usual amount of RAM, that they may make the computer less responsive, and that they have a tendency to fall over into the notorious "Postponed: VM job unmanageable, restarting later" state. I believe the latter happens especially if (or rather: when) the computer is not very highly responsive.

Right now I am using two Linux computers for the QuChem + Rosetta mix, both computers with same hardware and boinc settings. Each computer has 64 cores/ 128 threads, 256 GB RAM, and NVME flash memory storage. The nice thing about bigger computers is that there is more flexibility in how they can be used for parallel workloads.

Using two client instances
I run QuChem's native Linux "nwchem" application in one client instance, and "rosetta python projects" in another client instance. This lets me control the number of running tasks and the number of downloaded tasks for each of these projects separately. (I do like being in control a lot.)

No swapping to disk
The computers have a swap disk, but I switched the swap space off for the time being in order to avoid huge latencies in case that memory pages would be swapped out to disk.

RAM utilization
QuChem's nwchem tasks occupy about 150 MB resident memory each. "rosetta python projects" tasks take about 1.5...1.8 GB resident memory each, at this time. This fits easily into the mentioned 256 GB RAM of my computers, but obviously, the operating system needs RAM for other purposes too. The by far largest chunks of RAM are used for filesystem caches, as long as there aren't more pressing uses for the RAM.

With 56 running QuChem tasks + 8 running Rosetta vbox tasks:
Code:
$ free -h
              total        used        free      shared  buff/cache   available
Mem:          251Gi        20Gi       147Gi        15Mi        84Gi       229Gi
Swap:            0B          0B          0B
With 48 running QuChem tasks + 16 running Rosetta vbox tasks:
Code:
$ free -h
              total        used        free      shared  buff/cache   available
Mem:          251Gi        32Gi        74Gi        15Mi       144Gi       217Gi
Swap:            0B          0B          0B
Note that "buff/cache" memory is also part of the total "available" memory here. However, since I want to keep the computers highly responsive, I do want to give the tasks as much "buff/cache" as they want.

As you can tell, I could still increase the number of Rosetta vbox tasks before I run out of RAM for filesystem cache.

Furthermore, I am obviously running merely 64 tasks in total on a 64c/128t computer. This is an arbitrary decision, favoring (a) high responsiveness of course, and (b) throughput per task, over total machine throughput. In particular, these are dual-socket computers, set to 180 W socket power limit, hence 5.6 W average power budget per core (and since I run 64 tasks, 5.6 W power budget per task). That's not a very high power budget, and hence I don't expect the loss of machine throughput to be very high.

Even if I wanted to go for a higher task count, I certainly would not run 128 concurrent tasks but rather leave a few spare hardware threads in reserve for whenever the various wrapper processes, which both "nwchem" and "rosetta python projects" are keeping around, are waking up.

I still want to refine my setup for Rosetta a bit
To do:
  • Implement a watchdog script which detects Rosetta tasks in "Postponed: VM job unmanageable, restarting later" state, and shuts down and restarts the boinc client instance in such cases.
    IME, client shutdown and restart is both necessary and sufficient to get such tasks going again.
    I have been running "rosetta python projects" work for somewhat more than a day now and got a few of these incidents on both computers during this time, despite the measures which I described above.
  • Implement a watchdog script which detects Rosetta tasks with unusually long elapsed time, and aborts those.
    During the 1+ day of running this, I got one task with now 23+ hours elapsed time and another with now 12+ hours elapsed time.
    Of the 100+ already completed tasks (all valid), the run times on my computers are
    3.3 h on average,
    2.3 h as the 10-percentile, 4.3 h as the 90-percentile,
    2.4 1.8 h min, 5.5 h max.​
    Therefore, aborting a task after maybe 10 h seems like a good idea. But a much better idea should be: Abort tasks whose current CPU time is considerably less than their elapsed time.
    That's because the fundamental property of these very long running tasks is that they are using almost no CPU time.
  • Unrelated to Rosetta: Have a watchdog script abort QuChem tasks with unusually long elapsed time. Occasionally, this application simply doesn't converge but isn't able to abort its iterations, from what I understand. Such tasks will blissfully continue to run forever if left alone.
 
Last edited:

crashtech

Lifer
Jan 4, 2013
10,521
2,111
146
I haven't spent any time on the Rosetta forums, but it seems as if this new app is going to drive away all but the most dedicated participants. I hope they are working on a fix.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,478
14,434
136
I haven't spent any time on the Rosetta forums, but it seems as if this new app is going to drive away all but the most dedicated participants. I hope they are working on a fix.
Yes, I am number 29 on the project, and they lost me. 1/2 million points per day gone. I am not coming back until virtualbox is not required. I remember ages ago I updated all my boxes to each thread could have 2 gig. They require up to 8 now for virtual box ? Only 14 of the top 29 are doing ANYTHING now. That must hurt. And of those I could be number 2 in points per day worldwide if it was the old days.
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
Only 14 of the top 29 are doing ANYTHING now.
BTW, the top producers still run the classic application only.

I am not coming back until virtualbox is not required.
It is only a "soft" requirement.

As I said, new work for it is still generated each day, but as soon as it is ready to send it is already downloaded by hosts which happen to request work at that time.
server_status shows 1500 users of the classic application in the last 24 hours, and 950 of the vbox application. (My user ID counts at both because while I allow reception of vbox work, I also receive some classic work occasionally.)

AFAIK different teams of the scientists submit work to the Rosetta platform whenever they have some. So perhaps Rosetta@home with its classic Rosetta workqueue and its new vbox based workqueue is somewhat similar to WCG: There is a common platform which hosts different scientific projects. (Though very different to WCG, we contributors are never informed which project goes live when, and for how long.) — From this viewpoint, those among Rosetta@home's projects which still feed into the classic Rosetta workqueue are very much like WCG's Open Pandemics Project: It is blessed with having much more contributed computer capacity available than it can submit work.

They require up to 8 now for virtual box ?
The vbox work currently consumes ~1.5...1.8 GB peak working set size (i.e. resident memory = actual physical RAM), ~3.3 GB peak swap size (i.e. virtual memory), ~8 GB peak disk usage. These figures are from Linux, I haven't looked them up on Windows.
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
To do:
  • Implement a watchdog script which detects Rosetta tasks in "Postponed: VM job unmanageable, restarting later" state, and shuts down and restarts the boinc client instance in such cases.
This can actually be simplified: Just restart the client twice a day, no matter what.

On my clients with 16 running Rosetta tasks, there will always be several 'postponed' tasks after half a day of operation. Hence, just let e.g. cron issue a service boinc-client restart two times per day.

(One positive thing to say about this application is that checkpointing works alright.)
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
To do: [...]
  • Implement a watchdog script which detects Rosetta tasks with unusually long elapsed time, and aborts those. [Or rather:] Abort tasks whose current CPU time is considerably less than their elapsed time. That's because the fundamental property of these very long running tasks is that they are using almost no CPU time.
Here is a script. It is not very much tested yet since I am receiving a good deal of classic Rosetta work along with the vbox based work lately. Therefore the frequency of tasks which exhibit the ultra-low CPU time condition is rather low currently. But so far the script does what it is meant to.

There is a fairly recent "boinccmd" program version required for this script to work. I only know that boinccmd which is bundled with client version 7.16.6 does *not* work (it doesn't show elapsed time of tasks), but the one bundled with client version 7.16.17 does work. (That's only important for boinccmd; the boinc client itself which runs Rosetta@home can be an older one.) The script doesn't have any dependencies except for boinccmd.

Bash:
#!/bin/bash

# Edit this:
#    a list of hosts, each optionally with GUI port number appended
#    (may be just a single host, or dozens of hosts)
hosts=(
	"computer_a"
	"computer_b"
	"computer_c:31420"
)

# Edit this:
#    the password from gui_rpc_auth.cfg
#    This script expects the same password on all hosts.
#    Can be set to "" if you have empty gui_rpc_auth.cfg's.
password="$(cat /var/lib/boinc/gui_rpc_auth.cfg)"

# Edit this if you want to apply this to a different project.
project_url="https://boinc.bakerlab.org/rosetta/"

# Change this from "abort" to "suspend" if you prefer.
task_op="abort"

# Before a task hasn't been executing for some time, other task stats
# may still be imprecise.  The script therefore does not touch any
# tasks which haven't been executing for at least this many seconds.
# You can use integer numbers here, but not floating point numbers.
# E.g.: 5 * 60 for 5 minutes.
min_elapsed_time=$((5 * 60))

# After tasks were aborted, boinc-client may cease to request
# new work due to "Communication deferred". To avoid this, should a
# project update be forced after one or more tasks were aborted?
# Set to 1 for yes, 0 for no.
force_project_update=1

# Loop intervals.
# You probably don't need to edit these.
check_every_n_minutes=10
timestamp_every_n_minutes=120

# That's it; there is probably no need to edit anything from here on.
delay=$((${check_every_n_minutes}*60/${#hosts[*]}+1))
ts=${timestamp_every_n_minutes}

echo "Monitoring ${hosts[*]}."
for ((;;))
do
	(( (ts += check_every_n_minutes) >= timestamp_every_n_minutes )) && { date; ts=0; }

	for host in ${hosts[*]}
	do
		# Edit this if you run on Cygwin:
		#    boinccmd="/cygdrive/c/Program*Files/BOINC/boinccmd --host ${host} --passwd ${password}"
		if [ -n "${password}" ]
		then
			boinccmd="boinccmd --host ${host} --passwd ${password}"
		else
			boinccmd="boinccmd --host ${host}"
		fi

		tasks=$(${boinccmd} --get_tasks) || { sleep ${delay}; continue; }

		unset name url state ett cct
		while read line
		do
			case ${line} in
		             		[1-9]* )	 i=${line%)*};;
		        	     "name: "* )  name[$i]=${line#*"name: "};;
			      "project URL: "* )   url[$i]=${line#*"project URL: "};;
			"active_task_state: "* ) state[$i]=${line#*"active_task_state: "};;
			"elapsed task time: "* )       tmp=${line#*"elapsed task time: "}; ett[$i]=${tmp%.*};;
			 "current CPU time: "* )       tmp=${line#*"current CPU time: "};  cct[$i]=${tmp%.*};;
			esac
		done <<< "${tasks}"

		n=0
		for j in ${!name[*]}
		do
			# Skip tasks
			#   - which do not belong to this project,
			#   - which are not currently running,
			#   - which have been running for less than $min_elapsed_time seconds,
			#   - which have a CPU time of more than 50% of elapsed time.
			[ "${url[$j]}"   != "${project_url}" ] && continue
			[ "${state[$j]}" != "EXECUTING"      ] && continue
			e=${ett[$j]}; ((e < min_elapsed_time)) && continue
			c=${cct[$j]}; ((e < 2*c)) && continue

			printf "${host}: ${task_op} ${name[$j]}\t"
			printf "(elapsed: %02d:%02d:%02d," $((e/3600)) $((e%3600/60)) $((e%60))
			printf " CPU: %02d:%02d:%02d)\n"   $((c/3600)) $((c%3600/60)) $((c%60))
			${boinccmd} --task "${project_url}" "${name[$j]}" "${task_op}"
			((n++))
		done

		((force_project_update && n)) && { sleep 1; ${boinccmd} --project "${project_url}" update; }

		sleep ${delay}
	done
done

PS: Besides aborting these infinitely long running tasks, I am not aware of any other means to heal this kind of tasks. E.g. client shutdown and restart does not help.

When the aborted task is reported to the server, it sends a replica to another host, and then there is a good chance that the other host completes it properly.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
server_status.php is still showing 0 tasks available for the classic application's workqueue. But more than half of the tasks in the work buffers of my two computers are for the classic application now. And that's even though these computers don't request work often. After all they have a few "postponed" vbox tasks sitting there much of the time, and won't request new work then at all.
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
Lately I noticed that it might be possible to enable/disable VirtualBox support per project, per client instance — to a certain degree.

In a project in which you want to disable VirtualBox, log in to your account on the project's web server, go to the host details, and look for a button which is labeled "VirtualBox VM jobs [Skip]". This button is implemented at Rosetta@home, but I don't see it at any of the other projects which use VirtualBox for some of their applications (Cosmology@home, LHC@home, QuChemPedIA).

I haven't actually tried the [Skip] button, therefore can only speculate what it does precisely. :-)