SETI@Home daily 3h-outages starting 22.08.2005, from 17 GMT.

Rattledagger · Aug 19, 2005

The outages is for SETI@Home/BOINC, but "classic" seems now to always be shut down for some time at the end of any SETI@Home/BOINC-outages...

August 19, 2005
Outage Notice. Starting on Monday, August 22, we will be having daily 3 hour outages for a few days. This will allow us to clear the upload directory of old results. The outages will begin at 17:00 UTC. See Technical News.

And latest technical news:

August 19, 2005 - 17:30 UTC
We determined yesterday that it will take around 24 hours of project down time to delete all of the old results. In order to keep an eye on the process and avoid the painful catch up period of a long outage, we will do this in several 3 hour installments. We hope to see the validator queue start going down even before we have completed the deletion.

petrusbroder · Aug 19, 2005

Thanks for the info, Rattledagger! We can fill up the buffers then... 🙂

Rattledagger · Aug 22, 2005

August 22, 2005 - 18:00 UTC
We are currently in the middle of the first of the scheduled daily 3-hour outages to clear out the large number of "antique" results. Some numbers will be in an addendum at the bottom of this post when the outage is over. Until then, here's a fun FAQ about the current situation:

Q: Where did all these antique results come from?
A: The client finishes a workunit and sends the result to the file upload handler which writes it to our file system. The file upload handler has no connection to the database. Then the client contacts the scheduling server saying, "I just upload result XYZ, please give me more work." The scheduling server, which does have a connection to database, normally updates the result entry and sends more work. However, if the result being sent back is so far beyond deadline that the result entry in the database has already been purged, nothing happens and the file remains on the system. This problem is currently being fixed. One additional theory is that as more people start using BOINC (especially now that all new users have to use the BOINC version), more slow/busy computers are being employed which can't return results before the deadline.

Q: So why is this a problem?
A: Because so many antique files were being left on our server it slowed validation down. We calculated that about 40% of the result files in the upload directories are antiques. To validate (or assimilate, or delete) any result, the file system needs to do a "directory lookup" to find the files before it can read them. When you have more and more files in a single directory, this takes more and more time.

Q: Are there other reasons the directories got so huge?
A: Actually, yes. We had a master science database crash a month ago. This was recoverable, but for several days the assimilators had to be shut off because they couldn't write to the database. This forced regular non-antique files to linger on disk longer than they normally would. Then, we discovered a logic problem in the file deleter (the process that removes files from disk after they finished assimilation). Workunits and results were being deleted from disk at the same rate, when results should have been being deleted at four times the rate (since there are four times the results for each workunit). So the result deletion was being throttled by the workunit deletion rate. These have all been fixed, and the regular file deletion queue has been slowly but steadily dropping. However, this compounds the current validator backlog problem.

Q: How could you possibly get one million results behind in validation? Doesn't this mean SETI@home/BOINC is a complete failure?
A: Here's a perspective check: In classic SETI@home, we only validate results after we give users credit. At this point in time classic SETI@home is roughly 50 million results behind in validation. At other times the number was hundreds of millions. Nobody notices because in classic users get their credit first. The BOINC validation system is actually faster and more elegant, and the current situation was brought on by several problems listed above, all of which are fixed or getting fixed.

Q: Why do you need an outage to delete antiques?
A: It's much faster that way. When the system is running full bore, there are too many processes accessing the upload directories to make antique file deletion worthwhile - it would just slow everything down.

Addendum: Some fun numbers: All the uploaded results are randomly distributed into 1024 subdirectories. Last Thursday we removed 235,666 antique results from 44 subdirectories, and today from 560,755 results from 105 more subdirectories (796,370 out of 150 total so far). So 14.65% of subdirectories have been cleaned up in about 4 total hours of outage time.

Dalephi · Aug 22, 2005

Thanks Rattledagger!

Assimilator1 · Aug 23, 2005

Wow thats a lot of antique results!:Q
Sounds like they should increase the deadline time so more results can be automatically deleted

Thx for info RD 🙂

Specabecca · Aug 25, 2005

Thanks for the information! I thought I was going a little bananas, wondering why I couldn't make contact with SETI BOINC for so long. I guess this '3 hour' outage is just running a bit longer.

:beer:

MikeSci457DC · Aug 25, 2005

It seems there is an issue with the Seti@BOINC homepage today, so perhaps the entire campus is in the middle of an outage right now

SETI@Home daily 3h-outages starting 22.08.2005, from 17 GMT.

Rattledagger

Elite Member

petrusbroder

Elite Member

Rattledagger

Elite Member

Dalephi

Golden Member

Assimilator1

Elite Member

Specabecca

Member

MikeSci457DC

Member

TRENDING THREADS