Technical News, and SETI@home daily stats for 11. to 18.04.2007.

Rattledagger

Elite Member
Feb 5, 2001
2,989
18
81
Technical news for last week...
11.04.2007
So as it turns out the donation screwup I briefly mentioned in yesterday's thread totally hosed the replica database. Lame but true. So we're recovering that now, or trying to. We're operating without the replica for the time being. In the future we'll set up the replica so that updates to its data are impossible except from the slave update/insert thread. Anyway, this also explains why various statistics on the web site weren't updating.

I mostly spent the day working on a revised php script with Dave that will send "reminder" e-mails to lapsed users, or those who failed to send in any work whatsoever. This actually required a new database table and me discovering "group by ... having ..." syntax to make more eloquent and efficient mysql queries. Hopefully these e-mails will help get some of our user base back on track.

- Matt
12.04.2007
Okay - I messed up. My workunit zombie cleanup process was querying against the replica database, unbeknownst to me (even though I wrote the script). So when the replica went offline my script started errantly removing workunits. That meant many users were getting "file not found" errors when trying to download work. Of course I'm smart enough to not actually delete files of such importance, and upon discovering the exact problem I was able to immediately move the mistakenly removed files back into place (they were simply moved into an analogous directory one level up). So all's well there, more or less. The good news is the replica issues of yesterday (and earlier) have been fixed sometime last night/this morning so we have both servers on line and caught up.

Once that workunit fire was put out I wrapped up work on the "nag" scripts and am now currently sending e-mails to users who signed up relatively recently but have failed to successfully send any work back. Directions about getting help were in the e-mail.

The validator queue has been a little high - not at panic levels but not really shrinking either. I believe this has to do with the extra stress the validators have now that there is less redundancy. They have to process results 25% faster than before (as long as work in continually coming in/going out). I just added 2 extra validators to the backend. Let's see if that helps.

- Matt
16.04.2007
The new fan arrived to replace my broken/noisy graphics card fan, so I installed it first thing in the morning. I ended up getting a Zerotherm fan per suggestions in an earlier thread. It's great, but I didn't realize how damn big it was, and my desktop is a tiny little Shuttle. Long story short it worked, but I had to move a bunch of cables out of the way that were brushing up against the fan spindles, and one of the flanges on the heat pipe is pressed up against part of the case and slightly bent. I swear if it ended up not working I would have sold all my post-1900 technology and moved into the woods. But as it stands it's super quiet and now that my desktop doesn't sound like a helicopter my blood pressure is returning to normal levels.

The db_dump process (which updates all the stats for third-party pages) has been failing for the past week now. I thought this was due to some configuration on the replica that's timing out the long queries. I pointed the process at the master database this morning, but this timed out, too. So we decided to run the process directly on the replica server itself (jocelyn). So I recompiled it, then ran into NFS lock issues which Eric and I cleared up. It's running now. Let's see if it keeps running and actually generates useful output. Looks good so far (at the time of writing).

[Edit: Nope. Didn't work - will trying again tomorrow...]

Meanwhile, while sending out e-mails to long lost users who never were able to get SETI@home working I found that php broke for some reason on the system sending out the mails. I had to reinstall php/libxml which was annoying, especially as I'm still not sure why. Nevertheless, this fixed the problem, but then froze a few apache instances around our lab (which choked on php changing underneath it). So one of the public web servers was off line for a minute or two this morning. Oy.

- Matt
17.04.2007
The BOINC web server (isaac) had its root partition fill up this morning. No big deal but the site was down for a bit as Eric cleaned that up.

During the outage we cleaned up the remaining master/replica database discrepancies and finally put sidious on UPS. Yup - it was running without a net for the past however many weeks. Well, not a direct net - we always had a replica database that was on UPS, as well as recent backup dumps. The "reorg" part took much longer than last week - perhaps due to the result/workunit tables being exercised by the new quorum settings.

While sidious was powered down I replaced the keyboard (it was using a flaky USB keyboard salvaged from a first-generation iMac) and removed the case to inspect its RAM (so we have exact specs in the event of upgrade). I popped open one of the memory banks and found that, at some point, a spider had taken up residence inside. Not really a wise choice on its part. The webs and carcass of the long deceased critter were removed before putting the memory back.

Once again, db_dump is running at the time of writing, seemingly successfully. There were some mysql configuration settings we were experimenting with last week. Though not obvious why, one of these may have been forcing the long db_dump queries to time out. Anyway, we shall see... it just wrapped up the user table sans hitch.

- Matt
Lastly, some other news, 09.04.2007:
Hello, generous donators of CPU time.

As we ramp down the current data analysis to make way for the hot, fresh multibeam data, we are going to change the quorums for result validation very soon. Perhaps tomorrow. Currently we generate four results per workunit, though three are enough for validation. We're changing this to three and two.

This means less redundancy, which has the wonderful side effect of consuming less of the world's power to do our data analysis. However, be warned that this may cause us to run out of work to process from time to time over the coming weeks, especially if we lower the quorum levels even more (two and two).

There's concern that "no work available" will cause us to lose valuable participants. Maybe it won't. In any case, hopefully we can get the multibeam data going before the well drains completely.

- Matt
Is the target to go for 2/2 for the quorum?

Yes, but probably not until we have multibeam workunits (or astropulse workunits) to send out as well. 3/2 is a good "bridge" until then.
Matt, you've certainly got more patience than me, both for the sys admin and the PR!

I wish I had all the time in the world to address all the questions. One thing is for certain: If I'm writing on the message boards, I'm procrastinating. All least it's productive procrastination.

- Matt





#____Total Work Done____Todays WD_____AWD________overtake_______Team-name
01______242.645.549______836.825______823.714______impossible______SETI.USA
02______223.753.001______434.872______414.410______impossible______SETI.Germany
03_______96.400.624_______75.490_______76.846______impossible______BroadbandReports.com Team Starfire
04_______87.198.262______259.946______250.979______impossible______L'Alliance Francophone
05_______79.192.700______168.993______160.454______impossible______BOINC Synergy
06_______78.877.592______104.072_______99.643______impossible______Czech National Team
07_______67.554.820______112.571______107.541______impossible______SETI@Netherlands
08_______43.797.380_______-3.130_________-168____260.699 days______OcUK - Overclockers UK
09_______43.164.759______172.840______168.327______impossible______The Knights Who Say Ni!
10_______32.704.314______-24.610______-24.675______1.325 days______BOINC.Italy
11_______29.868.319_______40.024_______42.628______impossible______Overclockers.com
12_______27.949.263_______88.966_______87.900______impossible______Team Art Bell
13_______24.980.037_______49.208_______52.606______impossible______Team 2ch
14_______20.884.979_______58.120_______54.862______impossible______The Planetary Society
15_______15.631.148_______28.515_______28.151______impossible______Ars Technica
16_______11.771.370______110.818______107.907______impossible______Team MacNN
17________2.370.738________8.032________4.101______impossible______Universe Examiners
18_______48.912.484_______87.148_______83.019______notanoption_____TeAm AnandTech
19_______-2.407.202________5.684________5.848________412 days______Phoenix Rising
20_______-5.996.358_______41.438_______40.000________150 days______SETI@Taiwan
21_______-9.898.435_______-4.510_______-5.092______impossible______Hewlett-Packard
22_______-9.990.248______-13.739______-11.242______impossible______Amateur Radio Operators
23______-10.543.889______-53.663______-50.777______impossible______Planet 3DNow!
24______-10.936.837______-29.842______-28.047______impossible______PC Perspective Killer Frogs
25______-11.814.307_______-9.713_______-7.952______impossible______Canada
26______-12.200.871______-35.487______-34.354______impossible______2CPU.com
27______-12.379.320_______44.063_______34.377________360 days______SETI@China
28______-13.053.602______115.565______105.674________124 days______Team Starfire World BOINC
29______-14.677.554_______22.385_______20.936________701 days______Dutch Power Cows
30______-14.928.156_________-220__________771_____19.362 days______Team MacAddict
31______-15.923.774______-22.373______-18.978______impossible______Team NIPPON
32______-17.503.566______-20.342______-19.504______impossible______BOINC SETI@home RUSSIA
33______-17.604.995______-31.994______-31.716______impossible______BOINC@Denmark
34______-17.706.879______-47.580______-44.909______impossible______Portugal@Home
35______-18.307.609______-67.358______-63.623______impossible______Picard
36______-19.019.399_______-8.050_______-6.834______impossible______Hungary
37______-19.827.184______-59.772______-56.641______impossible______LittleWhiteDog
38______-21.622.672_______12.513_______12.523______1.727 days______UK BOINC Team
39______-23.122.730______-36.674______-33.929______impossible______SETI@klamm.de
40______-23.215.306______-52.451______-49.227______impossible______Team EDGE
41______-23.271.080_______14.359_______12.870______1.808 days______US NAVY
42______-23.583.700______-16.009______-13.553______impossible______U.S.Air Force
43______-23.728.591_______15.006________1.760_____13.482 days______BOINC@AUSTRALIA
44______-24.000.706______-37.387______-35.078______impossible______HispaSeti & BOINC
45______-24.555.370______-14.806______-11.871______impossible______SETI.hr
46______-25.177.294______-41.627______-38.712______impossible______SETI Sverige [Sweden]
47______-26.976.823______-42.237______-37.660______impossible______BOINC UK
48______-27.428.005______-65.523______-58.246______impossible______The Final Front Ear
49______-27.909.347______-52.456______-49.674______impossible______SETI@Home Poland
50______-28.114.375______-55.980______-54.612______impossible______World Wide S.E.T.I.

Appart for Anandtech's stats, it shows how much more/less than Anandtech.
Also shows based on Average Work Done how many days for Anandtech to overtake the team, or be overtaken by a team behind

 

petrusbroder

Elite Member
Nov 28, 2004
13,343
1,138
126
Thanks a lot for the info and the stats, Rattledagger! :D
Good to know that all is developing - although one of the systems had a real bug! :) ;)