• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Latest News and Technical News for SETI@home 29.07.2008.

Rattledagger

Elite Member
Latest News:
July 25, 2008
We've started creating work for SETI@home's new Astropulse application. At first we will just create a small amount, but we expect to enter full production next week.
Technical News:
May I Have Another (Jul 28 2008)

Wow. What a weird weekend. A lot of little minor things went wrong causing a bunch of "perfect storms" in succession. I have a technical term for this which I can't say in public. Anyway, I'll spell some of it out in no particular order and in varying amounts of detail.

Our workunit storage server filled up again. We got the warnings too late, as mounting problems were keeping the server status scripts from running, which obscured a rather large assimilator queue backlog. When results stay on disk waiting to be assimilated, so does their respective workunit. Plus with Astropulse ramping up those giant workunits were filling up the storage faster than usual. Eric did already put in code for the splitter (which generates the workunits) to check for a full disk before attempting to write anything. Of course, this fix was only deployed in beta so far. The result, there are about 20000 workunits of zero length, which will cause annoying errors for all clients trying to download them, but they should pass through like kidney stones before too long. For a while I stopped the splitters to reduce the disk usage. Today we put the updated splitter in the main project.

We've been having general scheduler problems over the last week as BOINC code updates were made in preparation for Astropulse. We haven't built a new scheduler process in a while which brought to light several problems, mostly due to our database schema being outdated and therefore out of sync with what the code expected. This didn't cause any data corruption, but caused random hosts to be unable to connect. For no real good reason a lot of hosts reporting problems were Macs which added to the difficulty of diagnosis - we thought it was an architecture dependent issue at first. In any case, we got beyond understand those problems late last week and planned to clean it all up early this week. There was some miscommunication and the new "broken" scheduler was turned on again last Friday for about a day.

On Sunday our bandwidth dropped to zero. At this point we threw up our hands and figured we'll figure this out when we're all in the lab together on Monday (today). Remember we do have a policy that it is perfectly okay for our project to be down for a day or two as this is BOINC and people can crunch on other projects in the meantime. Nevertheless, we don't want to be too cavalier about that as we know a lot of people just crunch SETI data. But still, given our meager resources our average uptime is quite good, so a day or two of occasional downtime is acceptable. But I digress... Turns out apache was the problem on this server (once again a problem obscured by alerts not running due to mounting issues) and we had to kick it a couple times (including a full system reboot due to messed up shared memory segments) to get it going again. Once going, both download servers choked. So I had to kick both of them as well.

Then we ran out of work. Remember how I said we put a fix in the splitter to keep from writing if the workunit storage server was full? Well, it was being extra cautious and not writing if it said storage server was over 90% full. So as I write this paragraph we're low on work to send out, but Eric gave me permission to turn file deletion on in beta so that'll clear up space soon enough and we'll generate fresh work.

And oh yeah.. we were slashdotted again on Sunday.

That's enough for today. We'll have the usual outage tomorrow (may be slightly longer than normal) and maybe start splitting some more Astropulse workunits to send out!

- Matt

Due to the various problems, there haven't been any new stats-dumps generated, and this means there's no new stats to post... So, expect a jump, possibly after todays normal outage...


 
I had a new install that was already low on work units when all of this stuff hit the fan.

I made a comment in the last stats thread but when nobody replied having the same problems I started thinking I was having some sort of firewall problem. So all that was wasted effort. 🙁

All systems and work units are good to go right now. 🙂
 
Thanks for the info, Rattledagger!
Of course, such things happen. I still think, that the outages now are a breeze compared to those in seti@home classic ...
 
Originally posted by: Smoke
I had a new install that was already low on work units when all of this stuff hit the fan.

I made a comment in the last stats thread but when nobody replied having the same problems I started thinking I was having some sort of firewall problem. So all that was wasted effort. 🙁

All systems and work units are good to go right now. 🙂
AFAIK, by the time I checked the thread again, you've already posted you've successfully contacting the servers, so it wasn't much to comment on...

While SETI@home's own status-page is a good indication then the servers are turned-off, it's not always possible to connect even everything shows green. Also using the Cricket-graph is often just as informative as the status-page, since then it flat-lines, it's a good chance there's some kind of problem...

 
Looks like SETI finally updated. What a huge day! I imagine tomorrow will be big too as it goes through backed up WUs.

I also never even realized that SETI had an actual outage, I thought it was just problems with updating the stats. I guess the cache on my machines was to big.
 
Slashdot is amazing. Its where I get almost all my news. They do generate enough traffic to bring a good number of sites down though.
 
Back
Top