WCG problems

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,269
16,120
136
wow, quick response on my latest email, and here it is using gmail

1758834280780.png
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,269
16,120
136
I just reported (with pictures) that WCG was still broken, and I got a personal thank you from Igor !

1759407616229.png
 
  • Like
Reactions: Assimilator1

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,698
4,659
75
Now getting "Another scheduler instance is running for this host". Which might be an improvement.
 

mmonnin03

Senior member
Nov 7, 2006
339
272
136
October 3, 2025
We are aware of the issue with the scheduler returning "Another scheduler instance is running for this host" and have identified the cause in the config.xml template we adapated for the new containerzied environment. We will fix it once we have confirmed that the new event-driven validation and assmilation pipelines are working correctly.

Uploads are being processed normally, we've confirmed the new architecture for the containerized file_upload_handler pool behind Apache is correctly producing to the per-application Kafka (Redpanda) topics, storing the event and result data in separate queues on the local brokers partition.

As a result, there will be at least one more weekend sprint. Tentatively, we expect to be producing new workunits next week for MCM1, ARP1, and MAM1 beta version 7.07, validations should resume over the weekend, initial releases of batches will be intermittent.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,269
16,120
136
Igors reply to my email to him today:

almost there, i hope:

scheduler fix/open, some downloads sent, new validations should be the
endpoint in a few hours, then Dylan will get to work trying to create new
workunits for MCM1 and MAM1, once MCM1 is steady and we have beta mams
going out to hopefully promote 7.07 as the first production MAM1 release .
 

Skillz

Golden Member
Feb 14, 2014
1,189
1,199
136
They should probably focus on fixing the scheduler issue before even thinking of trying to release a new app.
 
  • Like
Reactions: Ken g6

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,269
16,120
136
Well, all the bad error messages went away, but no tasks. Here is the latest comment from Igor.


Igor Jurisica​

Wed, Oct 8, 7:10 AM (1 day ago)
to me







Thank you Mark - yes - no work has been going out yet - we wanted to
gracefullly start, monitor - update, increase - hopefully - all will start
coming up shortly.
 
  • Like
Reactions: Assimilator1

StefanR5R

Elite Member
Dec 10, 2016
6,697
10,609
136
November
11/06 - 11/09 (19:00 UTC) Boinc Games 2025 UCI Indoor Cycling World Championships sprint (project TBA, 3 days)
11/16 - 11/23 (00:00 UTC) World Community Grid 21st Birthday Challenge (7 days) — tentative; maybe not happening
11/16 - 11/23 (12:00 UTC) PrimeGrid UNESCO Anniversary Challenge (CUL/WOO-LLR, 7 days)
WCG Birthday Challenge status changed from "maybe not happening" to "most likely not happening".
Besides the recent technical issues, they point out that the submission rate of new work was lacking even at periods without other technical problems. (Reminds me of TN-Grid.) Of course there is a slim possibility that it gets better after Krembil fixed what broke during the move to other infrastructure, but nobody is holding their breath.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,269
16,120
136
Not exactly WCG, but I do have 182 Rosetta@home tasks !

Small update on WCG though (from https://www.cs.toronto.edu/~juris/jlab/wcg.html)

  • October 15, 2025
    • Testing the validators right now, been a lot of iterations on these.
    • As soon as the validator works, we will deploy across the six partitions and clear the backlog. Then we can check the transitioner interaction. If that is all good, we can finally start sending new work.
    • Going to finalize object storage for the archive - instead of previous tape backup.
 
  • Like
Reactions: Assimilator1

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,269
16,120
136
Well, that did not last long. Rge migger computers are not getting any work, as they have finished their queue and now asking for work and none available. In less than a day, we will be out of work (me)


Found this in their document. I probably for 8-10,000 of those sent out.

  • October 21, 2025
    • Finally stress testing rather than correctness testing.
    • Sent a batch of 100,000 workunits (fast running, not full size in case something crashed.
    • Thank you for your patience and continued support.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,269
16,120
136
Well, something happened. I am getting a number of tasks for my 2 9755's. the only ones put of work that are turned on.

Pretty sure we are past the 100,000 mark in total tasks, and I have completed thousands, but not one single tasks has showed up on free-dc. I don't know if its looking at the wrong server to get tasks, or if WCG is not filling in the stats file or ??? @Skillz , can you tell which is the case ?

 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,269
16,120
136
I also saw a batch of MCM tasks. The problem is, I do not have MCM selected, nor do I have "accept other tasks" selected. :)
Well,I have 11 boxes first up and if they have less than 1000, they are downloading (as best I can tell) So its really up now !

1761167235647.png
 

Skillz

Golden Member
Feb 14, 2014
1,189
1,199
136
WCG only updates their stats twice a day. Once around the end/start of the day and one around the middle of the day (In UTC time).

It just doesn't look like their stats have processed. IIRC, WCGs internal stats setup will go "offline" for half an hour to an hour each time this happens and you'll see your own history of these points on your account. Does it show this or was the last history day back before they went down?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,269
16,120
136
WCG only updates their stats twice a day. Once around the end/start of the day and one around the middle of the day (In UTC time).

It just doesn't look like their stats have processed. IIRC, WCGs internal stats setup will go "offline" for half an hour to an hour each time this happens and you'll see your own history of these points on your account. Does it show this or was the last history day back before they went down?
I keep checking and nothing, and I have returned work at least 24 hours ago. The starts worked fine before this, but I suspect its now on a different server, as this was supposed to be all new hardware as to why it was down.

Not sure whats up with this
Pinging www.worldcommunitygrid.org [199.241.161.110] with 32 bytes of data:
Request timed out.

Note as to load characteristics. the 9755 is 100% all 256 cores loaded, and runs at 2.7 (max per spec) but they are only running 50c or less for temps.
1761173522768.png
 

Skillz

Golden Member
Feb 14, 2014
1,189
1,199
136
Free-DC can't process stats until WCG processes them. Ask them.

The timed out on the ping request is probably because they block that.
 
  • Like
Reactions: Ken g6

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,269
16,120
136
BTW, I have 1042 cores (or threads by some peoples standards) Running WCG right now.

1761193794617.png
 

mmonnin03

Senior member
Nov 7, 2006
339
272
136
I surely don't have that much running, just a few threads, but all the new tasks are pending validation. No credit yet.