WCG problems

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,418
16,283
136
WCG stopped giving out work . I asked Igor, and his reply was that they are still fixing things.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,418
16,283
136
Now I can't even upload, it must be totally down

1761350653093.png
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,418
16,283
136
OK, WCG is apparently dead. No tasks for 11 computers that are actively pinging them for updates. and down to 20 tasks left for the last computer of the 11. Those will be gone before most of you read this. Yesterday at this time I had over 1000 still downloaded and in the active queue.

edit: I also emailed igor this morning and no reply.
 

cellarnoise

Senior member
Mar 22, 2017
862
453
136
OK, WCG is apparently dead. No tasks for 11 computers that are actively pinging them for updates. and down to 20 tasks left for the last computer of the 11. Those will be gone before most of you read this. Yesterday at this time I had over 1000 still downloaded and in the active queue.

edit: I also emailed igor this morning and no reply.
You must be getting cold soon? You running NO puter stuff?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,418
16,283
136
You must be getting cold soon? You running NO puter stuff?
as of a few hours ago, over a hundred and seventy million ppd on F@H.
1762671174073.png
warm now. and I think I still have 5 4080's idle. And maybe a 4090.
 
Last edited:

Skillz

Golden Member
Feb 14, 2014
1,231
1,244
136
Probably be better off just running other projects with your computers. The team is making a push on SiDock, which is a medical project and spacious@home which is a new astronomy project.
 
  • Like
Reactions: TennesseeTony

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,418
16,283
136
Probably be better off just running other projects with your computers. The team is making a push on SiDock, which is a medical project and spacious@home which is a new astronomy project.
sidock is CPU or GPU ?

and I just looked is covid, which is over

COVID.SI is a citizen science project to fight against SARS-CoV-2 by distributed computing.

Not sure I want to contribute to covid, I want cancer research.
 

Skillz

Golden Member
Feb 14, 2014
1,231
1,244
136
Then develop your own cancer research applications and start crunching it or settle that the options are limited and they don't have work. Also COVID.SI isn't even what I said. I said SiDock. As in SiDock@home which is working on other medical related research.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,418
16,283
136
Then develop your own cancer research applications and start crunching it or settle that the options are limited and they don't have work. Also COVID.SI isn't even what I said. I said SiDock. As in SiDock@home which is working on other medical related research.
from the distributed computing project list


SiDock@home
COVID.SI is a citizen science project to fight against SARS-CoV-2 by distributed computing.
In our project, we are looking for ligands – small molecules that can successfully bind to protein targets and modulate a specific process that is crucial for the virus biochemistry. Together we have developed software that can be easily installed on your computer to help participants help find the cure for today’s invisible enemy. Based on molecular docking, the ideal ligand should be complementary in shape and properties to the binding site of the target biomolecule. However, the complementarity of small molecules is only one prerequisite for the use of a molecule as a drug. more...

I quoted that first line last post.

and it appears to be russian. I don't do russian.

I have Rosetta queued, if I ever get work. F@H till then.
 

StefanR5R

Elite Member
Dec 10, 2016
6,856
11,038
136
sidock is CPU or GPU ?
At this time, SiDock has got CPU-only work queued. The application is single-threaded, and a task takes roughly half a day depending on CPU speed.

and I just looked is covid, which is over
They are now researching the Ebola virus. They simulate molecular docking to an RNA dependent RNA polymerase (RdRp) target of the virus now; the batch before this had an Ebola glycoprotein (GP) target.
https://www.sidock.si/sidock/server_status.php ("Research Status" section)
https://www.sidock.si/sidock/forum_forum.php?id=1

and it appears to be russian. I don't do russian.
The scientific team is located in Slovenia (once part of Yugoslavia; EU and NATO member state for over 20 years now). The BOINC server however is located in Russia (particularly in Karelia, a republic of Russia at the border to Finland) and is operated by the same team which also runs the RakeSearch project.

My personal opinion about whether or not to contribute to a project like this is that I am largely undecided, but since there are various active projects clear outside of Russia, I simply concentrate on those instead. (Or would, if my Internet link wasn't still broken.)
 
  • Like
Reactions: TennesseeTony

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,418
16,283
136
At this time, SiDock has got CPU-only work queued. The application is single-threaded, and a task takes roughly half a day depending on CPU speed.


They are now researching the Ebola virus. They simulate molecular docking to an RNA dependent RNA polymerase (RdRp) target of the virus now; the batch before this had an Ebola glycoprotein (GP) target.
https://www.sidock.si/sidock/server_status.php ("Research Status" section)
https://www.sidock.si/sidock/forum_forum.php?id=1


The scientific team is located in Slovenia (once part of Yugoslavia; EU and NATO member state for over 20 years now). The BOINC server however is located in Russia (particularly in Karelia, a republic of Russia at the border to Finland) and is operated by the same team which also runs the RakeSearch project.

My personal opinion about whether or not to contribute to a project like this is that I am largely undecided, but since there are various active projects clear outside of Russia, I simply concentrate on those instead. (Or would, if my Internet link wasn't still broken.)
OK, based on that, I am running on one of my 9755's, 256 tasks at a time. here are a few of them. Looks like about 5 hours ETA @ 2.3 ghz (3 min per 1% done, very early estimate)

SiDock@home 2.02 CurieMarieDock 0.2.0 long tasks ebola_RdRp_v1_sidock_00081497_r3_s-24.0_0 00:03:53 (00:03:56) 100.96 1.493 02d,14:36:31 03d,23:55:40 Running Turin 9755 EPYC 128c
SiDock@home 2.02 CurieMarieDock 0.2.0 long tasks ebola_RdRp_v1_sidock_00081498_r3_s-24.0_0 00:03:54 (00:03:45) 96.03 1.330 02d,14:42:48 03d,23:55:39 Running Turin 9755 EPYC 128c
SiDock@home 2.02 CurieMarieDock 0.2.0 long tasks ebola_RdRp_v1_sidock_00081509_r4_s-24.0_0 00:03:54 (00:04:02) 103.31 1.307 02d,14:43:35 03d,23:55:39 Running Turin 9755 EPYC 128c
SiDock@home 2.02 CurieMarieDock 0.2.0 long tasks ebola_RdRp_v1_sidock_00081528_r4_s-24.0_0 00:03:18 (00:03:16) 99.29 1.271 02d,14:44:57 03d,23:56:09 Running Turin 9755 EPYC 128c
SiDock@home 2.02 CurieMarieDock 0.2.0 long tasks ebola_RdRp_v1_sidock_00081504_r4_s-24.0_0 00:04:16 (00:03:57) 92.72 1.245 02d,14:45:52 03d,23:55:12 Running Turin 9755 EPYC 128c
SiDock@home 2.02 CurieMarieDock 0.2.0 long tasks ebola_RdRp_v1_sidock_00081530_r2_s-24.0_0 00:03:20 (00:03:21) 100.33 1.212 02d,14:47:19 03d,23:56:09 Running Turin 9755 EPYC 128c
SiDock@home 2.02 CurieMarieDock 0.2.0 long tasks ebola_RdRp_v1_sidock_00081506_r4_s-24.0_0 00:03:55 (00:03:48) 96.77 1.191 02d,14:48:09 03d,23:55:40 Running Turin 9755 EPYC 128c
SiDock@home 2.02 CurieMarieDock 0.2.0 long tasks ebola_RdRp_v1_sidock_00081502_r3_s-24.0_0 00:03:53 (00:04:08) 106.39 1.181 02d,14:48:30 03d,23:55:39 Running Turin 9755 EPYC 128c
SiDock@home 2.02 CurieMarieDock 0.2.0 long tasks ebola_RdRp_v1_sidock_00081507_r4_s-24.0_0 00:03:52 (00:03:54) 100.91 1.152 02d,14:49:31 03d,23:55:40 Running Turin 9755 EPYC 128c
SiDock@home 2.02 CurieMarieDock 0.2.0 long tasks ebola_RdRp_v1_sidock_00081514_r1_s-24.0_0 00:03:24 (00:03:13) 94.43 1.117 02d,14:50:56 03d,23:56:07 Running Turin 9755 EPYC 128c
SiDock@home 2.02 CurieMarieDock 0.2.0 long tasks ebola_RdRp_v1_sidock_00081542_r4_s-24.0_0 00:02:55 (00:02:49) 96.39 0.897 02d,14:59:20 03d,23:56:37 Running Turin 9755 EPYC 128c
SiDock@home 2.02 CurieMarieDock 0.2.0 long tasks ebola_RdRp_v1_sidock_00081541_r2_s-24.0_0 00:02:55 (00:03:08) 107.24 0.872 02d,15:00:18 03d,23:56:38 Running Turin 9755 EPYC 128c
SiDock@home 2.02 CurieMarieDock 0.2.0 long tasks ebola_RdRp_v1_sidock_00081523_r2_s-24.0_0 00:03:16 (00:03:14) 99.25 0.862 02d,15:00:40 03d,23:56:09 Running Turin 9755 EPYC 128c
SiDock@home 2.02 CurieMarieDock 0.2.0 long tasks ebola_RdRp_v1_sidock_00081509_r2_s-24.0_0 00:03:54 (00:03:42) 94.51 0.858 02d,15:00:39 03d,23:55:39 Running Turin 9755 EPYC 128c
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,418
16,283
136
later estimate = about 6 hours. per task on sidock.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,418
16,283
136
Very odd.... Now I have WCG tasks on 2 computers.....

edit: now on 4 computers, but going to bed.

edit2: its the morning and back to only 2 computers.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
6,856
11,038
136
They are still fixing stuff.
November 11, 2025
  • Database maintenance over Friday/Saturday completed without issue. We have resolved an issue with the backup scripts, effectively increased memory used to service database queries and added some new indices. We expect better performance from the BOINC database going forward.
  • However, the disk remains slower than initial benchmarking when we stood up the database. We will monitor and reach out to hosting to see if the Ceph placement group expansion (that caused the stuck blocks of that particular disk when the placement group the result table lives on) got stuck in a "peering" state. We were informed that we should expect temporary, possibly intermittent slow IO during this Ceph maintenance window. If we can get faster disks for the BOINC database (which would require restoring the database to a new volume as we did to migrate) we will consider a maintenance window. Right now, we are optimistic the issues revealed in the new system by hanging database queries and database crashes can all be resolved with patches the new BOINC daemons, and current performance will be sufficient.
  • As mentioned, this event identified several issues with the new BOINC daemons.
  • MCM1 workunit creation proceeds in the Kafka topic even though the database is down, the mcm1_create_work daemon for it's Kafka partition on science01...science06 tries to commit it's part of the batch, database isn't there, so it doesn't do anything, but it does commit it's offset/pointer into the batch plan topic and move on to consume the next batch plan. That means every 10-15m while the database is down, a batch is effectively skipped. We were able to fix that, and have restarted MCM1 batch creation at roughly 5:00 p.m. EST, November 10th, 2025.
  • We believe we have finally architected a fix for the pending validation backlog issue. This requires some non-trivial plumbing in the MCM1 batch assimilator, a Kafka connector deployed on the BOINC database node, and transitioner code changes.
  • Workunit supply may remain artificially lower while we roll out the new batch assimilator builds and monitor the transitioner -> Kafka event consumption and result table interaction.
  • We were able to resolve the issue with computing preferences not being updated from the website to BOINC client and vice versa. Generally, when the BOINC database goes down, so does the event listener that handles these messages on the webserver.
  • We are still working on resolving the validation backlog from over the break, with the result table bricked during the Ceph maintenance we architected a "trust the filesystem" solution, and we are hopeful that this issue will be resolved this week.
  • MAM1 was initially planned to be resumed in beta30 last week, to see if 7.07 fairly schedules work and respects --nthreads, which is a blocking issue in promoting the beta application to production. Depending on the error rate and behaviour on BOINC clients, we would then consider the stable code paths for the first production batches. Given our increased control over batch parameters with the new Kafka topic that uses a protobuf schema to fill out the workunit and result table entires, we intend to run work in production on Linux as soon as the beta30 application is stable with an error rate lower than MCM1 excepting the GLIBC dependency, which is typically the only repeated error we see from clients on the current LibTorch code path. We will then rely on iterating the beta30 application to 7.08 and 7.09 to get GPU and Windows support, and Parquet IO for input and uploaded results.
(from https://www.cs.toronto.edu/~juris/jlab/wcg.html - Operational Status)
BoincStats hasn't received credit updates yet.
 

StefanR5R

Elite Member
Dec 10, 2016
6,856
11,038
136
Can you see somewhere deep down in your userpages if your recent results are being validated now?
(I would test it myself... if I had Internet at home.)

From their status update, there is still a large amount of older results which are not validated yet because they are still working on these. *Maybe* they want to accomplish this first before they resume stats exports, for both older and current results.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,418
16,283
136
Can you see somewhere deep down in your userpages if your recent results are being validated now?
(I would test it myself... if I had Internet at home.)

From their status update, there is still a large amount of older results which are not validated yet because they are still working on these. *Maybe* they want to accomplish this first before they resume stats exports, for both older and current results.
I would if I knew how to do that. I have literally millions of tasks done since stats came down.

actually only 79,039 results, but over 41 million points (since 10/29/2025)
41,321,086​
 
Last edited:

mmonnin03

Senior member
Nov 7, 2006
365
285
136
Your entire results list can be downloaded into a csv

I still have some pending validation from August and late Oct into early Nov.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,418
16,283
136
when I click results it just does this like forever. If I go to history, I download the csv, but how do I know what is validated ?

here is the top of the csv/ results pic below that.
1763154818617.png

1763154549714.png