Announcement WCG update 9/9/2022 ** still messed up ***

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
OK, almost done. So I have tried Rosetta, next to nothing happening. NO WCG, so what other CPU medical is there that has work ?
Check out post #4. :-D


(Apart from these and F@H, there are no other active medical projects currently. OK, and GPUGrid, intermittently.)

Oh, one more edit: SiDock@home's project infrastructure is based in Russia if that sort of thing matters to anybody. But their science is done in Slovenia, which is a EU member. (SiDock team page)
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
UPDATE: I added Rosetta (actually, just resumed) for the 3 64 core boxes, and they are runnig ! Not all maxed to 1000 tasks, but a couple hundred ready to start. So I added it to other machines that were idle. I am up to 588 running.
 

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
Yep, looks like there is a batch of classic Rosetta tasks available again. You could try to buffer some, but keep in mind that the reporting deadline is just 3 days.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
Yep, looks like there is a batch of classic Rosetta tasks available again. You could try to buffer some, but keep in mind that the reporting deadline is just 3 days.
The slowest box I have are the 64 core 2 ghz boxes that will do these in 7 hours. I will NOT try to buffer anything. If it stops delivering, then so be it.

Progress

1645139026886.png
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
The runtime of (classic) Rosetta tasks is mostly independent of CPU speed. It is configured at the website in your account -> Rosetta@home preferences -> Target CPU run time. I believe the default is 8 h currently, but I am not sure.

Within each task, the application runs several trial simulations of the same molecule in succession, for as long as the target runtime isn't exceeded. Faster CPUs get some more of these runs done, slower CPUs some fewer. (How many runs can be accomplished within the target runtime depends on the complexity of the workunit, besides CPU speed.)

After a task finishes, the number of runs which were accomplished is logged in stderr.txt of the task in a line like "This process generated ... decoys from ... attempts". After the result was reported to the server, this can be seen in an individual task's details in the user's or host's tasks tables on the website.

If you look at several results with similar batch names but from computers with different per-core speed, you will see distinctively different numbers of "decoys" reported in relation to runtime. Besides, if you e.g. browse top_hosts.php, you will see that some users set a different runtime than the default.

Currently distributed workunits end up with hundreds of "decoys" within each task, so I guess the currently researched models aren't very complex.
 

cellarnoise

Senior member
Mar 22, 2017
711
394
136
I think this last post confirms Stefan is an A.I. Bot and we are all just doing "its" bidding. ;)
At least it appears that "Stefan" is on our side,for now...

Crap? The forum a.i. ( much smarter than I ) won't let me insert a photo in mobile? Only attach? We are doomed ? ;)

Thanks Stefan! :)
 

Attachments

  • HAL9000_Case.svg.png
    HAL9000_Case.svg.png
    77.9 KB · Views: 5
Last edited:

Assimilator1

Elite Member
Nov 4, 1999
24,120
507
126
I know. I was just saying that its happening. Instead of over 700 tasks running at once on my boxes, I am down to a couple hundred. Will be out in a couple of days. Time to find something to keep them busy.
My rig ran out sometime in the early hours (GMT) of Wednesday morning, you did well to keep your fleet going that long!

I hear you on the cold Basement. I put my 5950 on Milky Way - it seems silly to use it on projects that also use GPU, but There aren’t many projects left that just use CPU.
There's LHC, although I find it sometimes causes GUI lag on my system now, atm I'm sticking with it.
With the problems Rosetta's python WUs caused, I'm staying away from them sadly, I would've liked to rejoin it!

I tried to run Asteroids, and found out it was still down from last year! lol, what happened to them?
 

Assimilator1

Elite Member
Nov 4, 1999
24,120
507
126
LHC lag is worse than I thought, + getting many errored WUs + getting warnings from some WUs about needing to clean the VM environment is just doing my head in! So I'm going to stop the last VM based LHC project and run sixtrack, I just hope I get enough WUs for that!
 

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
LHC@home's applications are all difficult to run. They are resource hungry to varying degree (disk, network, some have VM overhead with big RAM requirements and the responsiveness issues which you mentioned) and are not very well integrated with BOINC. (E.g. they use LHC's distributed filesystem for most of their file transfers, not BOINC's own file download and upload mechanism.)

Even Sixtrack is difficult: Runtimes can vary wildly. It may happen that a host receives a large number of tasks with very short runtimes; the BOINC client would then reduce its runtime estimation for this application radically; and when the host started to receive normal Sixtrack tasks again, the client would abort them before they could finish because they took much longer than the client's newly adjusted timeout would permit.

On a more general note: Empirically, all applications of physics or chemistry DC projects — and a certain subset of astronomy projects — are difficult to run; some more than others. This is fundamentally the nature of numeric simulations in physics. Even if there is programmer talent and time available to get the application program in best shape, issues like SixTrack's bimodal runtime, or other LHC applications' large file data, or QuChemPedIA's frequent lack of convergence will remain.

So there will always be issues like these left to cope with by us DC volunteers. But we all have limited spare time that we can and want to dedicate to monitoring and controlling our computers. Sometimes there is just no alternative to switch over to an easy and reliable project like there are a few in the math/ number theory section of DC. Other times we might be able to invest some time upfront into the implementation of various workarounds which then allow us to keep a difficult project running with a lot less supervision.

Back to LHC@home: Personally, I like the subject matter of this project. But I haven't invested as much personal time and computer time on it than on other physics/ astrophysics/ chemistry related projects. LHC donated a sizable amount of their own datacenter capacity to medical research at some point, and are still doing so, and that left me wondering whether they really need volunteer computer time at all. (Perhaps at most during peek demands, but apparently not over the long term.)
 

Assimilator1

Elite Member
Nov 4, 1999
24,120
507
126
I've disabled all but the sixtrack app now, and there's no WUs, lol.
So I've started Universe@Home, no problems there so far :).

I really like the subject matter of LHC too, their research and discoveries have always fascinated me, but I'm not going through that mammoth post of LHCs about how to fix VM, I don't mind spending a bit of time on tweaking a few settings, but I'm not spending hours going through that!
LHC@H has been more problematic (since about 1yr ago) then any other DC project I have ever run. So I won't be doing much of that :( (bar any sixtrack work I get).
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
Well, WCG is totally dead, and Rosetta (without virtualbox) is out of work. So, for the first time in years, I only have 8 video cards going, and I have had to close up the windows in the house ! Its only 48F outside.
 
  • Wow
Reactions: Fardringle

cellarnoise

Senior member
Mar 22, 2017
711
394
136
TN-Grid seems to be having problems also on accepting finished WUs and is largely running out of work.

Maybe just run FaH on CPU some?

I think I am going to suspend running Si-Dock WUs because of recent Geo-politics... :(

Getting hard to find medical tasks.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
The runtime of (classic) Rosetta tasks is mostly independent of CPU speed. It is configured at the website in your account -> Rosetta@home preferences -> Target CPU run time. I believe the default is 8 h currently, but I am not sure.

Within each task, the application runs several trial simulations of the same molecule in succession, for as long as the target runtime isn't exceeded. Faster CPUs get some more of these runs done, slower CPUs some fewer. (How many runs can be accomplished within the target runtime depends on the complexity of the workunit, besides CPU speed.)

After a task finishes, the number of runs which were accomplished is logged in stderr.txt of the task in a line like "This process generated ... decoys from ... attempts". After the result was reported to the server, this can be seen in an individual task's details in the user's or host's tasks tables on the website.

If you look at several results with similar batch names but from computers with different per-core speed, you will see distinctively different numbers of "decoys" reported in relation to runtime. Besides, if you e.g. browse top_hosts.php, you will see that some users set a different runtime than the default.

Currently distributed workunits end up with hundreds of "decoys" within each task, so I guess the currently researched models aren't very complex.
I changed it from the defailt 3 to 12 hours. My ppd went down from 506k to 147k. I am back to 4 hours.
 

wayliff

Lifer
Nov 28, 2002
11,718
9
81
Hello fellow crunchers and anandtech members!

I have also been crunching WCG for many years...came to check what are the alternatives using BOINC. Thanks for the posts.
I used to have dedicated machines, but now I just do mostly overnight runs or while I am not working.

Registered: May 1, 2007
Run Time: 124:283 years:days
Points: 193,401,015
Results: 439,574
Last Result: February 21, 2022

:)
 

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
(Rosetta v4.20)
I changed it from the defailt 3 to 12 hours. My ppd went down from 506k to 147k. I am back to 4 hours.
Hmm. But how much of this was from the preferences change, and how much perhaps from "downtime" when Rosetta didn't send work?

PPH for individual tasks on three of your hosts:

Results from host 6179426 (Ryzen 9 5950X, 111 valid results when I collected these data):
10 results with 3.0 h duration, 54...62 points/hour
70 results with 5.5...6.6 h duration, 22...30 points/hour
30 results with 9.0...9.9 h duration, 25...28 points/hour
1 result with 12.0 h duration, 72 points/hour​
---> That is, points/hour are all over the place.

Results from host 6179428 (EPYC 7452, 203 valid results when I looked):
6 results with 6.1...6.2 h duration, 36...41 points/hour
62 results with 6.5...7.2 h duration, 31...38 points/hour
56 results with 7.6...8.0 h duration, 34...43 points/hour
79 results with 11.6...12.1 h duration, 32...46 points/hour​
---> That is, points/hour are consistent between 6h, 7h, 8h, and 12h. But we don't have 3h results on this host.

Results from host 6179463 (Ryzen 9 5950X, 88 valid results when I looked):
12 results with 3.0 h duration, 49...76 points/hour
8 results with 3.8...4.0 h duration, 47...72 points/hour
50 results with 8.0 h duration, 52...70 points/hour
4 results with 11.0...11.5 h duration, 8.3 points/hour *
14 results with 11.9...12.1 h duration, 50...67 points/hour​
---> That is, points/hour are consistent between 3h, 4h, 8h, and 12h tasks (ignoring the four special tasks which obviously had issues).
________
*) stderr.txt of these tasks shows that only 10 decoys or less were generated within these tasks. Tasks with normal credit had an order of magnitude more decoys.


If Rosetta@home would have a longer streak of Rosetta v4 work availability, then a "scientific" experiment would be to set up two or three or four boinc client instances on a single computer, configure different target CPU run times for each client instance, and then have them work for a while on whatever random work batches come along. After all instances amassed a good deal of valid results, compare PPH of the differently configured tasks.
But due to the frequent gaps in work availability, such an experiment is harder to do. We would want to take data only from tasks which ran while the host was fully utilized. An only partially busy host would have a higher per-core-performance due to higher power budget/ higher thermal budget/ lower cache pressure/ higher memory bandwidth per busy core.

(Edit: An alternative to several client instances on a single host would be several hosts with same hardware. But it's hard to have same hardware: CPUs from a very similar V/F bin would be required, or the CPUs should be tweaked to run at a fixed clock.)
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
really odd. I guess I can't tell anything until Rosetta gets "Normal"
 

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
Latest update on the currently static World Community Grid site:
2022-04-13Validation of the WCG Environment Enters Final Stages
Workunit management, website and forums, and APIs have been subjected to unit tests, integration testing, and manual inspection to validate previous WCG functionality. Over 80% of tests and checks have been completed successfully in the Krembil QA environment to date.
https://www.worldcommunitygrid.org/news
Lots of other short articles about the past and current WCG were posted there as well.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,243
3,831
75
Today would be a great day to resurrect WCG! Probably too much to hope for, though.
 

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
Ouch! AFAICT, May 9 is later than May 5, for instance. Still earlier than May 19 though. ;-)


PS, WCG's continuity and openness in public communication is exemplary, and is a good sign for the things to come even after the ownership change.
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
The latest update: No ETA yet
 

cellarnoise

Senior member
Mar 22, 2017
711
394
136
I dream about Where we Could Go anymore ;). My mostly retrained and falling water is waiting... ;)?