News Rosetta's role in fighting coronavirus

StefanR5R · Apr 6, 2020

There is no option to ignore space. There are only the three limits which can optionally be set in the web preferences or locally. I haven't found documentation of the client's behavior if none of the limits are configured explicitly.

What does this show?
grep disk_m /var/lib/boinc*/global_prefs*

Markfw · Apr 6, 2020

StefanR5R said:
There is no option to ignore space. There are only the three limits which can optionally be set in the web preferences or locally. I haven't found documentation of the client's behavior if none of the limits are configured explicitly.

What does this show?
grep disk_m /var/lib/boinc*/global_prefs*

mark@dual-EPYC-7601:~$ grep disk_m /var/lib/boinc*/global_prefs*
/var/lib/boinc-client/global_prefs_override.xml: <disk_max_used_gb>0.000000</disk_max_used_gb>
/var/lib/boinc-client/global_prefs_override.xml: <disk_max_used_pct>100.000000</disk_max_used_pct>
/var/lib/boinc-client/global_prefs_override.xml: <disk_min_free_gb>0.000000</disk_min_free_gb>
/var/lib/boinc-client/global_prefs.xml:<disk_max_used_gb>0</disk_max_used_gb>
/var/lib/boinc-client/global_prefs.xml:<disk_min_free_gb>1</disk_min_free_gb>
/var/lib/boinc-client/global_prefs.xml:<disk_max_used_pct>90</disk_max_used_pct>
/var/lib/boinc-client/global_prefs.xml:<disk_max_used_gb>8.0</disk_max_used_gb>
/var/lib/boinc-client/global_prefs.xml:<disk_min_free_gb>4.0</disk_min_free_gb>
/var/lib/boinc-client/global_prefs.xml:<disk_max_used_pct>10.0</disk_max_used_pct>
/var/lib/boinc/global_prefs_override.xml: <disk_max_used_gb>0.000000</disk_max_used_gb>
/var/lib/boinc/global_prefs_override.xml: <disk_max_used_pct>100.000000</disk_max_used_pct>
/var/lib/boinc/global_prefs_override.xml: <disk_min_free_gb>0.000000</disk_min_free_gb>
/var/lib/boinc/global_prefs.xml:<disk_max_used_gb>0</disk_max_used_gb>
/var/lib/boinc/global_prefs.xml:<disk_min_free_gb>1</disk_min_free_gb>
/var/lib/boinc/global_prefs.xml:<disk_max_used_pct>90</disk_max_used_pct>
/var/lib/boinc/global_prefs.xml:<disk_max_used_gb>8.0</disk_max_used_gb>
/var/lib/boinc/global_prefs.xml:<disk_min_free_gb>4.0</disk_min_free_gb>
/var/lib/boinc/global_prefs.xml:<disk_max_used_pct>10.0</disk_max_used_pct>
mark@dual-EPYC-7601:~$

StefanR5R · Apr 6, 2020

global_prefs.xml looks weird. It has each of the three limits occurring twice. I wouldn't have expected that.

global_prefs_override.xml looks better on paper, but the values there really just mean "accept the web settings", as far as I understand.

(global_prefs.xml reflects what the client read via scheduler requests from the web settings, from the latest consulted project AFAIK. global_prefs_override.xml shows what was set locally via boincmgr's advanced view computing preferences.)

Try this: In boincmgr, set

[x] Use no more than 1000 GB
[x] Leave at least 10 GB free
[x] Use no more than 90 % total

If you make all three limits explicit there, then we can be sure that any web preferences at any of the projects to which the client is attached to are ignored, and these locally configured limits are honored.

Markfw · Apr 6, 2020

StefanR5R said:
global_prefs.xml looks weird. It has each of the three limits occurring twice. I wouldn't have expected that.

global_prefs_override.xml looks better on paper, but the values there really just mean "accept the web settings", as far as I understand.

(global_prefs.xml reflects what the client read via scheduler requests from the web settings, from the latest consulted project AFAIK. global_prefs_override.xml shows what was set locally via boincmgr's advanced view computing preferences.)

Try this: In boincmgr, set

[x] Use no more than 1000 GB
[x] Leave at least 10 GB free
[x] Use no more than 90 % total

If you make all three limits explicit there, then we can be sure that any web preferences at any of the projects to which the client is attached to are ignored, and these locally configured limits are honored.

OK, I set both boxes up like that, that had the errors, the 2 big 64 core EPYC boxes (dual 7601+7742)

Howdy · Apr 6, 2020

Not to complain, but I will anyway. They need to get their points figured out- 10.33hrs crunching for 20 points. Or maybe it's just my system getting hammered? I'm having YoYo syndrome!!!

Assimilator1 · Apr 7, 2020

Bizarre! Hmm, I haven't checked my ppd.....
Task time set to 24hrs, it's running on my Ryzen @~3.7 GHz. Don't have more valid tasks as I was running a little LHC & I've got a bunch of R@H WUs part way through.
But with these I seem to be mostly getting ~1k credits for ~24hrs (~42cr/hr), although one of them was only 236 credits!

What is everyone else getting?

Task click for details Show names	Work unit click for details	Computer	Sent	Time reported or deadline explain	Status	Run time (sec)	CPU time (sec)	Credit	Application
1140765563	1026921387	1761696	6 Apr 2020, 2:18:11 UTC	7 Apr 2020, 7:03:22 UTC	Completed and validated	16,563.12	16,299.84	158.40	Rosetta v4.12 windows_x86_64
1140780561	1026917366	1761696	6 Apr 2020, 2:16:25 UTC	6 Apr 2020, 14:59:37 UTC	Completed and validated	41,681.22	41,523.16	478.02	Rosetta v4.12 windows_intelx86
1140781660	1026917504	1761696	6 Apr 2020, 2:16:25 UTC	7 Apr 2020, 2:30:43 UTC	Completed and validated	87,050.34	86,353.00	945.78	Rosetta v4.12 windows_intelx86
1140763241	1026919695	1761696	6 Apr 2020, 2:16:25 UTC	7 Apr 2020, 2:27:51 UTC	Completed and validated	86,963.73	86,244.50	880.41	Rosetta v4.12 windows_intelx86
1140763259	1026919731	1761696	6 Apr 2020, 2:16:25 UTC	7 Apr 2020, 2:22:15 UTC	Completed and validated	86,684.68	85,970.84	958.20	Rosetta v4.12 windows_intelx86
1140763260	1026919733	1761696	6 Apr 2020, 2:16:25 UTC	7 Apr 2020, 4:49:40 UTC	Completed and validated	85,113.77	84,549.01	1,202.19	Rosetta v4.12 windows_x86_64
1140763262	1026919737	1761696	6 Apr 2020, 2:16:25 UTC	7 Apr 2020, 2:25:21 UTC	Completed and validated	86,823.68	86,090.80	898.10	Rosetta v4.12 windows_intelx86
1140763314	1026919732	1761696	6 Apr 2020, 2:16:25 UTC	7 Apr 2020, 2:28:41 UTC	Completed and validated	87,030.25	86,301.02	945.56	Rosetta v4.12 windows_intelx86
1140763315	1026919734	1761696	6 Apr 2020, 2:16:25 UTC	7 Apr 2020, 2:25:21 UTC	Completed and validated	86,858.68	86,123.00	236.20	Rosetta v4.12 windows_x86_64
1140763316	1026919736	1761696	6 Apr 2020, 2:16:25 UTC	7 Apr 2020, 4:50:52 UTC	Completed and validated	86,676.17	86,099.90	1,192.48	Rosetta v4.12 windows_x86_64
1140778965	1026916979	1761696	6 Apr 2020, 2:16:25 UTC	7 Apr 2020, 4:44:35 UTC	Completed and validated	86,973.83	86,447.16	1,043.72	Rosetta v4.12 windows_intelx86

Assimilator1 · Apr 7, 2020

Just wondering whether to cut task time to 12hrs or less after seeing this :-
(And I wonder if the task time affects the credits/hr?)
Computing status
Work

Tasks ready to send

29791

StefanR5R · Apr 7, 2020

Credits/hour are designed to be independent of target CPU time per task, according to the R@h message board post cited in #58. How well this works is hard to verify, due to the wide variation of computational difficulty between models.

When you check server_status for "tasks ready to send", remember to scroll to the bottom to see x86 jobs ("Rosetta" and "Rosetta Mini") separate from ARM jobs ("Rosetta for Portable Devices"). So far it looks good; the work generator seems to keep up.

I am wondering though how far "Workunits waiting for assimilation" may still increase without affecting work generation/ downloads/ uploads/ validation. Check out the bottom graph at https://munin.kiska.pw/munin/rosetta-week.html.

Assimilator1 · Apr 7, 2020

What does Workunits waiting for assimilation mean?

StefanR5R · Apr 7, 2020

I shouldn't need to introduce you to the finer points of assimilation, ;-)
but here you go:

davea said:
Completed jobs are handled by programs called assimilators. These are generally application-specific: they might copy output files from the BOINC upload directory to a permanent location, or they might parse the output files and insert results into a database.

(source)

Jorden said:
The assimilator handles workunits that are 'completed': that is, that have a canonical task or for which an error condition has occurred. Handling a successfully completed task might involve record tasks in a database and perhaps triggering the generation of more work.

(source)

Mod.Sense said:
The "queued jobs" work units shown on the homepage is a queue of work coming from Robetta. Let's call those "jobs". But jobs must be processed in to BOINC "work units" before they can be sent out. The "buffer length" shown on the homepage refers to how far ahead BOINC work units should be created before the create work task goes to sleep. Suffice it to say that the make work task for Rosetta has not slept in a very long time. The task for portables reached the buffer max. and went to sleep.

I'm just trying to explain the numbers you see, please do not bother to offer advice about how to multi-thread or otherwise optimize the servers. There are many resources required to support the life of a WU, such as validation and assimilation, not to mention all of the disk space consumed. Work is being created as fast as possible.

(source)

StefanR5R · Apr 7, 2020

PS, in other words, I understand that a steadily increasing number of workunits waiting for assimilation means steady inflation of the boinc server's database, and ongoing depletion of the boinc server's file storage space.

StefanR5R · Apr 7, 2020

I had two dual-14core Xeon E5-v4's download tasks on April 6, ~4:50 UTC. The two computers have same hardware and same operating system.

At the time, I had 16 h target CPU time/task configured. After the first round of 16 h tasks finished, I configured to default 8 h target CPU time/task and updated the project on both computers.

Today I am looking at completed results:

The first computer has 130 valid results + 15 errors.
The valid results had 5.3 / 13.6 / 16.1 min/avg/max hours CPU time, and 18 / 26 / 47 min/avg/max credits per hour.
I'll try to make a scatter plot.
The second computer has 53 valid results and 5 errors.
The valid results had 20.02 / 20.03 / 20.05 min/avg/max hours CPU time, and 20.00 / 20.00 / 20.00 min/avg/max credits per hour.

Now what's up with that?

The first computer ran "Rosetta v4.12 x86_64-pc-linux-gnu".
The second computer ran "Rosetta v4.12 i686-pc-linux-gnu".

Two other computers had good "Rosetta v4.12 x86_64-pc-linux-gnu" jobs exclusively. And further two computers had a mixture of good "Rosetta v4.12 x86_64-pc-linux-gnu" jobs and bad "Rosetta v4.12 i686-pc-linux-gnu" jobs.

I'm going to make a report over at the R@h forum. But before that, the scatterplot...

Update,
what happens is that the faulty tasks never finish their first "decoy", and the watchdog hits 4 hours after the target CPU time. I was referred to Ralph@home, where Rosetta v4.15 (released on April 6) is waiting to be tested.

StefanR5R · Apr 7, 2020

Assimilator1 said:
(And I wonder if the task time affects the credits/hr?)

Here are the 130 results of the dual-14core computer mentioned in the previous post. All tasks downloaded on April 6, in three requests at 4:49:15, 4:49:28, and 4:49:40 UTC. Tasks completed and reported on April 6 ... April 7.

My conclusion: There is no correlation between target CPU time and credits per CPU time.

Assimilator1 · Apr 7, 2020

Useful stats & graph, thanks

, good to know, I'll return the CPU task time to default.

StefanR5R · Apr 7, 2020

Here is another scatterplot, taken from 294 x86-64 results from the same fetch period and completion/ reporting period, but a dual 32-core computer:

And now for something different: Power efficiency.

dual 14c/28t Broadwell-EP @ 3.2 GHz ... 34.6 kPPD ... ≈380 W ..... ≈90 PPD/W
dual 22c/44t Broadwell-EP @ 2.8 GHz ... 57.7 kPPD ... ≈420 W ... ≈140 PPD/W
dual 22c/44t Broadwell-EP @ 2.8 GHz ... 50.2 kPPD ... ≈420 W ... ≈120 PPD/W
dual 32c/64t Rome @ ≈2.6 GHz ............. 82.8 kPPD ... =340 W ... ≈240 PPD/W

PPD were obtained by averaging credits per run time over more than 100 x86-64 results of each computer, all from said period, and are per-host (not per-task; each host running as many concurrent tasks as there are hardware threads). Power draw was measured "at the wall", taking three readings ≈20 hours apart, and using the median. The first and last computer showed very little variation in power draw. The 2nd computer varied between 400...435 W. The third computer didn't have a power meter in the line; I estimated its power draw to be the same as the 2nd due to same hardware.

Markfw · Apr 7, 2020

StefanR5R said:
Here is another scatterplot, taken from 294 x86-64 results from the same fetch period and completion/ reporting period, but a dual 32-core computer:

View attachment 19293

And now for something different: Power efficiency.

dual 14c/28t Broadwell-EP @ 3.2 GHz ... 34.6 kPPD ... ≈380 W ..... ≈90 PPD/W
dual 22c/44t Broadwell-EP @ 2.8 GHz ... 57.7 kPPD ... ≈420 W ... ≈140 PPD/W
dual 22c/44t Broadwell-EP @ 2.8 GHz ... 50.2 kPPD ... ≈420 W ... ≈120 PPD/W
dual 32c/64t Rome @ ≈2.6 GHz ............. 82.8 kPPD ... =340 W ... ≈240 PPD/W

PPD were obtained by averaging credits per run time over more than 100 x86-64 results of each computer, all from said period, and are per-host (not per-task; each host running as many concurrent tasks as there are hardware threads). Power draw was measured "at the wall", taking three readings ≈20 hours apart, and using the median. The first and last computer showed very little variation in power draw. The 2nd computer varied between 400...435 W. The third computer didn't have a power meter in the line; I estimated its power draw to be the same as the 2nd due to same hardware.

Yup... Thats why I started with the EPYC cores, way more efficient.

Assimilator1 · Apr 7, 2020

Wow! That Rome CPU efficiency is amazing compared to the others!

And lots more useful info again

.
My Ryzen 3600 pulls ~134w at the wall running 12 threads of R@H, although with GPU crunching I leave 2 spare for that. Running 10 threads for R@H (& no GPU crunching) it pulls, err ~134w! It must be up clocking...... 10 threads it's ~3.75 GHz, 12 threads ~3.7 GHz, yep! No idea what it's ppd is though, it hasn't been running R@H long to work that out.
Incase you're wondering why it's power draw is on the low side, that's mainly because I reduced it's PPT power for temperature reasons (and our 230v mains helps a little too).

Endgame124 · Apr 8, 2020

For these Rosetta mobile WUs, anyone know if will they run on a raspberry pi? I've got several in the house that are generally idle (they have kodi installed to stream HD homerun video occasoinally) and I wouldn't mind setting them up for rosetta if they would work.

edit: looks like you can, but my old pi 2s will not work (pi 4 4gb preferred) and it does not work on raspbian (no doubling up with Kodi):

How to Fight Coronavirus With Your Raspberry Pi

Donate Pi’s processing power to medical research.

www.google.com

Assimilator1 · Apr 8, 2020

Wouldn't they get too hot? What sort of cooling do they have?

Endgame124 · Apr 8, 2020

Assimilator1 said:
Wouldn't they get too hot? What sort of cooling do they have?

I edited my post with details I found on Toms.

Pi’s come with just the board, you choose the case. For my pi 2s, I was putting vga heatsinks on the pi chip. For my Pi 3s, I moved to a Flirc aluminum heat sink case:

https://flirc.tv/more/raspberry-pi-case

With that heart sink case, my pi 3s will run 24x7 without an issue.

Endgame124 · Apr 8, 2020

I'm also currently trying to talk myself out of trying to install Boinc on my Freenas 11.2 box. Its on newer hardware than some of my other stuff (i3-8100) but its also fairly mission critical to the house. Breaking it or causing stability issues would be bad... but its just... sitting... there.. idle (aaaaaa)

Assimilator1 · Apr 8, 2020

Lol

, critical how?

Endgame124 · Apr 8, 2020

Assimilator1 said:
Lol , critical how?

Backup for all the family pictures (downtime tolerable), backups For home systems, such as kodi boxes (downtime tolerable), DVR media storage for my 3 year old such as Clifford, Sesame Street, peppa pig, etc (mission critical, HA probably needed

). I don’t want to imagine the fallout if we couldn’t watch Clifford at the appointed time

TennesseeTony · Apr 8, 2020

NAS. I imagine Endgame124 would like to have access to his files without interuption, or possible loss.

edit: Ohhh, I got ninja'd.

ZipSpeed · Apr 8, 2020

Endgame124 said:
Backup for all the family pictures (downtime tolerable), backups For home systems, such as kodi boxes (downtime tolerable), DVR media storage for my 3 year old such as Clifford, Sesame Street, peppa pig, etc (mission critical, HA probably needed ). I don’t want to imagine the fallout if we couldn’t watch Clifford at the appointed time

Not Peppa Pig! I know how you feel. We try to limit screen time, but sometimes you just have to put something on just to shut them up. I feel bad for my kids. My 6 YO daughter doesn't quite understand why she can't go to school, or see her cousins or friends anymore. My 3 YO son OTOH is constantly bothering his sister because he doesn't have anyone else to play with besides her.

News Rosetta's role in fighting coronavirus

Elite Member

Moderator Emeritus, Elite Member

Elite Member

Moderator Emeritus, Elite Member

Senior member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Moderator Emeritus, Elite Member

Elite Member

Senior member

Elite Member

Senior member

Senior member

Elite Member

Senior member

Elite Member

Golden Member