Sometimes you are working on a less than stable BOINC project which gives you tasks that won't finish, and clog up your host. Short of abandoning such a project, there are some measures you can take:
Suspend endless Universe@Home tasks
The problem:
Last year in November during the Formula Boinc sprint at Universe@Home, I and others frequently encountered tasks which ran normally for a while, then got stuck at a point where they ran without making any more progress. I.e. the progress bar as seen in boincmgr did no longer advance.
The workaround:
Many of these task would get going again if the option "Leave non-GPU tasks in memory while suspended" was kept off, the tasks were suspended, and then resumed. Sometimes they would get stuck again after resumption, in which case they should be aborted.
I have not checked whether Universe@Home is still plagued by this problem. (Recent forum posts indicate that it is.) For reference and as inspiration what can be done by scripting, I am reposting a bash script which Luigi R. of team BOINC.Italy posted at the Universe@Home forum in September. (The following code is marginally modified by me.)
The script detects Universe@Home tasks which no longer increase their progress percentage, suspends them, and logs those tasks.
The user still needs to resume or abort these tasks manually, but this is something which needs only infrequent attention.
There is a little bug in the script: In a single loop iteration, it calls "boinccmd --get_tasks ..." four times in order to collect name, project, state, and completion percentage of each tasks. It should call boinccmd only once per iteration, otherwise it is not ensured that task names match task progress and so on. However, while I used this script, it worked perfectly for me despite this more theoretical problem.
Kudos to Luigi R. for this script; it was a huge help to me when I ran Universe@Home.
Next up: A script of my own to deal with a different problem in Cosmology@Home.
- Take leave from your day job, check your rig all day for faulty tasks, clean them out as they occur.
- Rent a monkey from the local zoo, train it to do the job for you.
- Run a script which keeps watching for such tasks and deals with them.
Suspend endless Universe@Home tasks
The problem:
Last year in November during the Formula Boinc sprint at Universe@Home, I and others frequently encountered tasks which ran normally for a while, then got stuck at a point where they ran without making any more progress. I.e. the progress bar as seen in boincmgr did no longer advance.
The workaround:
Many of these task would get going again if the option "Leave non-GPU tasks in memory while suspended" was kept off, the tasks were suspended, and then resumed. Sometimes they would get stuck again after resumption, in which case they should be aborted.
I have not checked whether Universe@Home is still plagued by this problem. (Recent forum posts indicate that it is.) For reference and as inspiration what can be done by scripting, I am reposting a bash script which Luigi R. of team BOINC.Italy posted at the Universe@Home forum in September. (The following code is marginally modified by me.)
Code:
#!/bin/bash
# Slightly modified from Luigi R.'s script,
# https://universeathome.pl/universe/forum_thread.php?id=199&postid=2406
#
# Usage: ./suspend_endless_universe_tasks.sh [host[:port] [password [interval]]]
host=${1:-localhost}
password=${2:-mysupersecurepassword}
boinccmd="boinccmd --host ${host} --passwd ${password}"
universeathome_url="https://universeathome.pl/universe/"
if [[ -z `echo $($boinccmd --get_simple_gui_info)` ]]
then
echo "BOINC is not running. Exit..."
exit
fi
name_old=()
fraction_done_old=()
faulty_tasks=()
start_time=$(date +%s)
iter=0
interval=${3:-600}
while true; do
# Time vars
time=$(date +%H':'%M':'%S)
now_time=$(date +%s)
script_time=$(echo "$now_time - $start_time" | bc)
script_time_str=$(printf '%03dd:%02dh:%02dm:%02ds\n' $(($script_time/86400)) $(($script_time%86400/3600)) $(($script_time%3600/60)) $(($script_time%60)))
iter=$((iter+1))
reset
echo -e "${host} | Time: ${time} | Execution time: ${script_time_str} | Iteration N.${iter} | Interval: ${interval}s\n"
###
# BOINC vars
name=(`echo $($boinccmd --get_tasks | grep -v 'WU name'| grep 'name' | awk '{print $2}') | cut -d " " -f 1-`)
project_url=(`echo $($boinccmd --get_tasks | grep 'project URL' | awk '{print $3}') | cut -d " " -f 1-`)
active_task_state=(`echo $($boinccmd --get_tasks | grep 'active_task_state' | awk '{print $2}') | cut -d " " -f 1-`)
fraction_done=(`echo $($boinccmd --get_tasks | grep 'fraction done' | awk '{print $3}') | cut -d " " -f 1-`)
###
# Loop vars
ntasks=${#name[@]}
noldtasks=${#name_old[@]}
tmp_name=() # U@H names
tmp_fraction_done=() # U@H fractions done
###
# Loop
for (( i = 0; i < ntasks; i++ )) do
if [ ${active_task_state[$i]} == "EXECUTING" ]; then
if [ ${project_url[$i]} == $universeathome_url ]; then
#if [ "$noldtasks" == "0" ]; then # Case 1: no old tasks
# echo -e "${name[$i]} | \e[1;32mOK\e[0m"
#fi
name_not_found=1
for (( j = 0; j < noldtasks; j++ )) do # Case 2: old tasks exist
if [ ${name[$i]} == ${name_old[$j]} ]; then # Case 2a: executing task still in old tasks
name_not_found=0
if [ ${fraction_done[$i]} == ${fraction_done_old[$j]} ]; then
$boinccmd --task $universeathome_url ${name[$i]} suspend
echo -e "${name[$i]} | \e[1;31mFAULT\e[0m"
faulty_tasks+=("${name[$i]}")
else
echo -e "${name[$i]} | \e[1;32mOK\e[0m"
fi
break
fi
done
if [ "$name_not_found" == "1" ]; then # noldtasks == 0 is TRUE => name_not_found == 1 is TRUE
echo -e "${name[$i]} | \e[1;32mOK\e[0m" # Case 2b: new executing task, no match in old tasks
fi
tmp_name+=("${name[$i]}")
tmp_fraction_done+=("${fraction_done[$i]}")
else
echo -e "${name[$i]} | \e[1;33mNot U@H\e[0m"
fi
fi
done
name_old=("${tmp_name[@]}")
fraction_done_old=("${tmp_fraction_done[@]}")
###
# Print faulty U@H tasks
if [ ${#faulty_tasks[@]} -gt 0 ]; then
echo -e "\n\e[0;31mFaulty U@H tasks (${#faulty_tasks[@]}):\e[0m\n"$( echo ${faulty_tasks[@]} | sed 's/ /,\n/g' )
fi
###
sleep $interval
done
The user still needs to resume or abort these tasks manually, but this is something which needs only infrequent attention.
There is a little bug in the script: In a single loop iteration, it calls "boinccmd --get_tasks ..." four times in order to collect name, project, state, and completion percentage of each tasks. It should call boinccmd only once per iteration, otherwise it is not ensured that task names match task progress and so on. However, while I used this script, it worked perfectly for me despite this more theoretical problem.
Kudos to Luigi R. for this script; it was a huge help to me when I ran Universe@Home.
Next up: A script of my own to deal with a different problem in Cosmology@Home.
Last edited: