News Rosetta's role in fighting coronavirus

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,716
136
There is no option to ignore space. There are only the three limits which can optionally be set in the web preferences or locally. I haven't found documentation of the client's behavior if none of the limits are configured explicitly.

What does this show?
grep disk_m /var/lib/boinc*/global_prefs*
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,478
14,434
136
There is no option to ignore space. There are only the three limits which can optionally be set in the web preferences or locally. I haven't found documentation of the client's behavior if none of the limits are configured explicitly.

What does this show?
grep disk_m /var/lib/boinc*/global_prefs*
mark@dual-EPYC-7601:~$ grep disk_m /var/lib/boinc*/global_prefs*
/var/lib/boinc-client/global_prefs_override.xml: <disk_max_used_gb>0.000000</disk_max_used_gb>
/var/lib/boinc-client/global_prefs_override.xml: <disk_max_used_pct>100.000000</disk_max_used_pct>
/var/lib/boinc-client/global_prefs_override.xml: <disk_min_free_gb>0.000000</disk_min_free_gb>
/var/lib/boinc-client/global_prefs.xml:<disk_max_used_gb>0</disk_max_used_gb>
/var/lib/boinc-client/global_prefs.xml:<disk_min_free_gb>1</disk_min_free_gb>
/var/lib/boinc-client/global_prefs.xml:<disk_max_used_pct>90</disk_max_used_pct>
/var/lib/boinc-client/global_prefs.xml:<disk_max_used_gb>8.0</disk_max_used_gb>
/var/lib/boinc-client/global_prefs.xml:<disk_min_free_gb>4.0</disk_min_free_gb>
/var/lib/boinc-client/global_prefs.xml:<disk_max_used_pct>10.0</disk_max_used_pct>
/var/lib/boinc/global_prefs_override.xml: <disk_max_used_gb>0.000000</disk_max_used_gb>
/var/lib/boinc/global_prefs_override.xml: <disk_max_used_pct>100.000000</disk_max_used_pct>
/var/lib/boinc/global_prefs_override.xml: <disk_min_free_gb>0.000000</disk_min_free_gb>
/var/lib/boinc/global_prefs.xml:<disk_max_used_gb>0</disk_max_used_gb>
/var/lib/boinc/global_prefs.xml:<disk_min_free_gb>1</disk_min_free_gb>
/var/lib/boinc/global_prefs.xml:<disk_max_used_pct>90</disk_max_used_pct>
/var/lib/boinc/global_prefs.xml:<disk_max_used_gb>8.0</disk_max_used_gb>
/var/lib/boinc/global_prefs.xml:<disk_min_free_gb>4.0</disk_min_free_gb>
/var/lib/boinc/global_prefs.xml:<disk_max_used_pct>10.0</disk_max_used_pct>
mark@dual-EPYC-7601:~$
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,716
136
global_prefs.xml looks weird. It has each of the three limits occurring twice. I wouldn't have expected that.

global_prefs_override.xml looks better on paper, but the values there really just mean "accept the web settings", as far as I understand.

(global_prefs.xml reflects what the client read via scheduler requests from the web settings, from the latest consulted project AFAIK. global_prefs_override.xml shows what was set locally via boincmgr's advanced view computing preferences.)

Try this: In boincmgr, set
[x] Use no more than 1000 GB
[x] Leave at least 10 GB free
[x] Use no more than 90 % total​

If you make all three limits explicit there, then we can be sure that any web preferences at any of the projects to which the client is attached to are ignored, and these locally configured limits are honored.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,478
14,434
136
global_prefs.xml looks weird. It has each of the three limits occurring twice. I wouldn't have expected that.

global_prefs_override.xml looks better on paper, but the values there really just mean "accept the web settings", as far as I understand.

(global_prefs.xml reflects what the client read via scheduler requests from the web settings, from the latest consulted project AFAIK. global_prefs_override.xml shows what was set locally via boincmgr's advanced view computing preferences.)

Try this: In boincmgr, set
[x] Use no more than 1000 GB​
[x] Leave at least 10 GB free​
[x] Use no more than 90 % total​

If you make all three limits explicit there, then we can be sure that any web preferences at any of the projects to which the client is attached to are ignored, and these locally configured limits are honored.
OK, I set both boxes up like that, that had the errors, the 2 big 64 core EPYC boxes (dual 7601+7742)
 

Howdy

Senior member
Nov 12, 2017
572
480
136
Not to complain, but I will anyway. They need to get their points figured out- 10.33hrs crunching for 20 points. Or maybe it's just my system getting hammered? I'm having YoYo syndrome!!!
 

Assimilator1

Elite Member
Nov 4, 1999
24,120
507
126
Bizarre! Hmm, I haven't checked my ppd.....
Task time set to 24hrs, it's running on my Ryzen @~3.7 GHz. Don't have more valid tasks as I was running a little LHC & I've got a bunch of R@H WUs part way through.
But with these I seem to be mostly getting ~1k credits for ~24hrs (~42cr/hr), although one of them was only 236 credits! :p
What is everyone else getting?


Task
click for details
Show names
Work unit
click for details
ComputerSentTime reported
or deadline
explain
StatusRun time
(sec)
CPU time
(sec)
CreditApplication
1140765563102692138717616966 Apr 2020, 2:18:11 UTC7 Apr 2020, 7:03:22 UTCCompleted and validated
16,563.12​
16,299.84​
158.40​
Rosetta v4.12
windows_x86_64
1140780561102691736617616966 Apr 2020, 2:16:25 UTC6 Apr 2020, 14:59:37 UTCCompleted and validated
41,681.22​
41,523.16​
478.02​
Rosetta v4.12
windows_intelx86
1140781660102691750417616966 Apr 2020, 2:16:25 UTC7 Apr 2020, 2:30:43 UTCCompleted and validated
87,050.34​
86,353.00​
945.78​
Rosetta v4.12
windows_intelx86
1140763241102691969517616966 Apr 2020, 2:16:25 UTC7 Apr 2020, 2:27:51 UTCCompleted and validated
86,963.73​
86,244.50​
880.41​
Rosetta v4.12
windows_intelx86
1140763259102691973117616966 Apr 2020, 2:16:25 UTC7 Apr 2020, 2:22:15 UTCCompleted and validated
86,684.68​
85,970.84​
958.20​
Rosetta v4.12
windows_intelx86
1140763260102691973317616966 Apr 2020, 2:16:25 UTC7 Apr 2020, 4:49:40 UTCCompleted and validated
85,113.77​
84,549.01​
1,202.19​
Rosetta v4.12
windows_x86_64
1140763262102691973717616966 Apr 2020, 2:16:25 UTC7 Apr 2020, 2:25:21 UTCCompleted and validated
86,823.68​
86,090.80​
898.10​
Rosetta v4.12
windows_intelx86
1140763314102691973217616966 Apr 2020, 2:16:25 UTC7 Apr 2020, 2:28:41 UTCCompleted and validated
87,030.25​
86,301.02​
945.56​
Rosetta v4.12
windows_intelx86
1140763315102691973417616966 Apr 2020, 2:16:25 UTC7 Apr 2020, 2:25:21 UTCCompleted and validated
86,858.68​
86,123.00​
236.20​
Rosetta v4.12
windows_x86_64
1140763316102691973617616966 Apr 2020, 2:16:25 UTC7 Apr 2020, 4:50:52 UTCCompleted and validated
86,676.17​
86,099.90​
1,192.48​
Rosetta v4.12
windows_x86_64
1140778965102691697917616966 Apr 2020, 2:16:25 UTC7 Apr 2020, 4:44:35 UTCCompleted and validated
86,973.83​
86,447.16​
1,043.72​
Rosetta v4.12
windows_intelx86
 
Last edited:

Assimilator1

Elite Member
Nov 4, 1999
24,120
507
126
Just wondering whether to cut task time to 12hrs or less after seeing this :-
(And I wonder if the task time affects the credits/hr?)
Computing status
Work

Tasks ready to send29791
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,716
136
Credits/hour are designed to be independent of target CPU time per task, according to the R@h message board post cited in #58. How well this works is hard to verify, due to the wide variation of computational difficulty between models.

When you check server_status for "tasks ready to send", remember to scroll to the bottom to see x86 jobs ("Rosetta" and "Rosetta Mini") separate from ARM jobs ("Rosetta for Portable Devices"). So far it looks good; the work generator seems to keep up.

I am wondering though how far "Workunits waiting for assimilation" may still increase without affecting work generation/ downloads/ uploads/ validation. Check out the bottom graph at https://munin.kiska.pw/munin/rosetta-week.html.
 
  • Like
Reactions: Assimilator1

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,716
136
I shouldn't need to introduce you to the finer points of assimilation, ;-)
but here you go:
davea said:
Completed jobs are handled by programs called assimilators. These are generally application-specific: they might copy output files from the BOINC upload directory to a permanent location, or they might parse the output files and insert results into a database.
(source)

Jorden said:
The assimilator handles workunits that are 'completed': that is, that have a canonical task or for which an error condition has occurred. Handling a successfully completed task might involve record tasks in a database and perhaps triggering the generation of more work.
(source)

Mod.Sense said:
The "queued jobs" work units shown on the homepage is a queue of work coming from Robetta. Let's call those "jobs". But jobs must be processed in to BOINC "work units" before they can be sent out. The "buffer length" shown on the homepage refers to how far ahead BOINC work units should be created before the create work task goes to sleep. Suffice it to say that the make work task for Rosetta has not slept in a very long time. The task for portables reached the buffer max. and went to sleep.

I'm just trying to explain the numbers you see, please do not bother to offer advice about how to multi-thread or otherwise optimize the servers. There are many resources required to support the life of a WU, such as validation and assimilation, not to mention all of the disk space consumed. Work is being created as fast as possible.
(source)
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,716
136
PS, in other words, I understand that a steadily increasing number of workunits waiting for assimilation means steady inflation of the boinc server's database, and ongoing depletion of the boinc server's file storage space.
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,716
136
I had two dual-14core Xeon E5-v4's download tasks on April 6, ~4:50 UTC. The two computers have same hardware and same operating system.

At the time, I had 16 h target CPU time/task configured. After the first round of 16 h tasks finished, I configured to default 8 h target CPU time/task and updated the project on both computers.

Today I am looking at completed results:
  • The first computer has 130 valid results + 15 errors.
    The valid results had 5.3 / 13.6 / 16.1 min/avg/max hours CPU time, and 18 / 26 / 47 min/avg/max credits per hour.
    I'll try to make a scatter plot.
  • The second computer has 53 valid results and 5 errors.
    The valid results had 20.02 / 20.03 / 20.05 min/avg/max hours CPU time, and 20.00 / 20.00 / 20.00 min/avg/max credits per hour.
Now what's up with that?
  • The first computer ran "Rosetta v4.12 x86_64-pc-linux-gnu".
  • The second computer ran "Rosetta v4.12 i686-pc-linux-gnu".
Two other computers had good "Rosetta v4.12 x86_64-pc-linux-gnu" jobs exclusively. And further two computers had a mixture of good "Rosetta v4.12 x86_64-pc-linux-gnu" jobs and bad "Rosetta v4.12 i686-pc-linux-gnu" jobs.

I'm going to make a report over at the R@h forum. But before that, the scatterplot...

Update,
what happens is that the faulty tasks never finish their first "decoy", and the watchdog hits 4 hours after the target CPU time. I was referred to Ralph@home, where Rosetta v4.15 (released on April 6) is waiting to be tested.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,716
136
(And I wonder if the task time affects the credits/hr?)
Here are the 130 results of the dual-14core computer mentioned in the previous post. All tasks downloaded on April 6, in three requests at 4:49:15, 4:49:28, and 4:49:40 UTC. Tasks completed and reported on April 6 ... April 7.

credits per hour 2020-04-06+07.png

My conclusion: There is no correlation between target CPU time and credits per CPU time.
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,716
136
Here is another scatterplot, taken from 294 x86-64 results from the same fetch period and completion/ reporting period, but a dual 32-core computer:

credits per hour 2020-04-06+07.png

And now for something different: Power efficiency.

dual 14c/28t Broadwell-EP @ 3.2 GHz ... 34.6 kPPD ... ≈380 W ..... ≈90 PPD/W
dual 22c/44t Broadwell-EP @ 2.8 GHz ... 57.7 kPPD ... ≈420 W ... ≈140 PPD/W
dual 22c/44t Broadwell-EP @ 2.8 GHz ... 50.2 kPPD ... ≈420 W ... ≈120 PPD/W
dual 32c/64t Rome @ ≈2.6 GHz ............. 82.8 kPPD ... =340 W ... ≈240 PPD/W

PPD were obtained by averaging credits per run time over more than 100 x86-64 results of each computer, all from said period, and are per-host (not per-task; each host running as many concurrent tasks as there are hardware threads). Power draw was measured "at the wall", taking three readings ≈20 hours apart, and using the median. The first and last computer showed very little variation in power draw. The 2nd computer varied between 400...435 W. The third computer didn't have a power meter in the line; I estimated its power draw to be the same as the 2nd due to same hardware.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,478
14,434
136
Here is another scatterplot, taken from 294 x86-64 results from the same fetch period and completion/ reporting period, but a dual 32-core computer:

View attachment 19293

And now for something different: Power efficiency.

dual 14c/28t Broadwell-EP @ 3.2 GHz ... 34.6 kPPD ... ≈380 W ..... ≈90 PPD/W
dual 22c/44t Broadwell-EP @ 2.8 GHz ... 57.7 kPPD ... ≈420 W ... ≈140 PPD/W
dual 22c/44t Broadwell-EP @ 2.8 GHz ... 50.2 kPPD ... ≈420 W ... ≈120 PPD/W
dual 32c/64t Rome @ ≈2.6 GHz ............. 82.8 kPPD ... =340 W ... ≈240 PPD/W

PPD were obtained by averaging credits per run time over more than 100 x86-64 results of each computer, all from said period, and are per-host (not per-task; each host running as many concurrent tasks as there are hardware threads). Power draw was measured "at the wall", taking three readings ≈20 hours apart, and using the median. The first and last computer showed very little variation in power draw. The 2nd computer varied between 400...435 W. The third computer didn't have a power meter in the line; I estimated its power draw to be the same as the 2nd due to same hardware.
Yup... Thats why I started with the EPYC cores, way more efficient.
 

Assimilator1

Elite Member
Nov 4, 1999
24,120
507
126
Wow! That Rome CPU efficiency is amazing compared to the others! :openmouth::cool:
And lots more useful info again :).
My Ryzen 3600 pulls ~134w at the wall running 12 threads of R@H, although with GPU crunching I leave 2 spare for that. Running 10 threads for R@H (& no GPU crunching) it pulls, err ~134w! It must be up clocking...... 10 threads it's ~3.75 GHz, 12 threads ~3.7 GHz, yep! No idea what it's ppd is though, it hasn't been running R@H long to work that out.
Incase you're wondering why it's power draw is on the low side, that's mainly because I reduced it's PPT power for temperature reasons (and our 230v mains helps a little too).
 

Endgame124

Senior member
Feb 11, 2008
954
669
136
For these Rosetta mobile WUs, anyone know if will they run on a raspberry pi? I've got several in the house that are generally idle (they have kodi installed to stream HD homerun video occasoinally) and I wouldn't mind setting them up for rosetta if they would work.

edit: looks like you can, but my old pi 2s will not work (pi 4 4gb preferred) and it does not work on raspbian (no doubling up with Kodi):

 
Last edited:

Endgame124

Senior member
Feb 11, 2008
954
669
136
Wouldn't they get too hot? What sort of cooling do they have?
I edited my post with details I found on Toms.

Pi’s come with just the board, you choose the case. For my pi 2s, I was putting vga heatsinks on the pi chip. For my Pi 3s, I moved to a Flirc aluminum heat sink case:


With that heart sink case, my pi 3s will run 24x7 without an issue.
 

Endgame124

Senior member
Feb 11, 2008
954
669
136
I'm also currently trying to talk myself out of trying to install Boinc on my Freenas 11.2 box. Its on newer hardware than some of my other stuff (i3-8100) but its also fairly mission critical to the house. Breaking it or causing stability issues would be bad... but its just... sitting... there.. idle (aaaaaa)
 

Endgame124

Senior member
Feb 11, 2008
954
669
136
Lol :D, critical how?
Backup for all the family pictures (downtime tolerable), backups For home systems, such as kodi boxes (downtime tolerable), DVR media storage for my 3 year old such as Clifford, Sesame Street, peppa pig, etc (mission critical, HA probably needed :D). I don’t want to imagine the fallout if we couldn’t watch Clifford at the appointed time :eek:
 
  • Like
Reactions: ZipSpeed

ZipSpeed

Golden Member
Aug 13, 2007
1,302
169
106
Backup for all the family pictures (downtime tolerable), backups For home systems, such as kodi boxes (downtime tolerable), DVR media storage for my 3 year old such as Clifford, Sesame Street, peppa pig, etc (mission critical, HA probably needed :D). I don’t want to imagine the fallout if we couldn’t watch Clifford at the appointed time :eek:

Not Peppa Pig! I know how you feel. We try to limit screen time, but sometimes you just have to put something on just to shut them up. I feel bad for my kids. My 6 YO daughter doesn't quite understand why she can't go to school, or see her cousins or friends anymore. My 3 YO son OTOH is constantly bothering his sister because he doesn't have anyone else to play with besides her.
 
  • Like
Reactions: Endgame124