Rosetta@Home - 03/26: Stats

CyGoR · Mar 27, 2006

Rosetta@Home - 03-26: Stats

TA[R@H] is 124 members strong with 60 active members (48.39%).

No new members today...

Need some stats? Take a look at Free-DC, BOINC-Stats and BOINC-Synergy

Got some questions? Try the Official R@H FAQ (work in progress).

Yesterday's production for TA: 78,155 (day before: 80,537 = -2,382) Go up!! 😉

TA's RAC 88,536

TA's total score 9,121,386

TA's world rank is: #4

Upcomming overtakes:

Milestone Makers:

No milestones yesterday..

Yesterday's top 25 producers:

TeAm Enterprise (12,543 creds)
Rebel Alliance (10,373 creds)
TA_TheReaper (5,041 creds)
BadThad (4,827 creds)
OhioDude (4,498 creds)
Sofa King (4,415 creds)
Dalephi (4,352 creds)
TA_JC (3,810 creds)
ken008 (3,805 creds)
TA_GeoffS (3,750 creds)
Fardringle (2,784 creds)
jw.middleton (2,592 creds)
Kelemvor (1,684 creds)
biodoc (1,417 creds)
Insidious (1,028 creds)
Strikermike (986 creds)
uallas5 (968 creds)
chessaroo (897 creds)
mk (785 creds)
CyGoR (658 creds)
g.wizard (496 creds)
mondobyte (484 creds)
TA_EvilWobbles (424 creds)
GimpyOne (387 creds)
Fox (373 creds)

Rosetta News and Probs:

Rosetta is still looking for beta testers to advance and improve Rosetta. Interested? Take a look at RALPH here.
Do you have a"stuck at 1%" WU? Then please make sure to read the following thread BEFORE you abort that WU and help to get rid of that problem once and for all: Official 1% thread
Do you have a WU that errored out due to Maximum CPU Time Exceeded? Then please report it here
Do you have a WU that's stuck or was aborted for any other reason: Then please report it here
Do you want to help the Rosetta project to get more participating members? Share your thoughts and ideas here (Thanks RobertE)

Insidious · Mar 27, 2006

thanks CyGoR :beer:

-Sid

up uP UP UP UP (<-- You probably had something more tangible in mind 😀 )

Wolfsraider · Mar 27, 2006

Originally posted by: Insidious
thanks CyGoR :beer:

-Sid

up uP UP UP UP (<-- You probably had something more tangible in mind 😀 )

More up, More up, More up, More up!!!!!
Thanks CyGoR :beer:

winr · Mar 27, 2006

I had a stuck 1% I didnt notice on one machine.
On another machine I had 64 hours at 54 %.

I have only had one stuck so I havent been watching them close enough.😱

🙂

BadThad · Mar 27, 2006

I think I must have a bunch of machines stuck. Watching over my home fleet for the past week or so, I've had about 10-15 wu's stuck at 1%. This is really hard to find on my work machines since they are all running the HT enabled. I can check the rosetta website and see they are reporting, but they may only be running a single thread with the other one stuck at 1%.....arrrrggggg.

Thanks CyGoR! 🙂

Freewolf · Mar 27, 2006

Originally posted by: winr
I had a stuck 1% I didnt notice on one machine.
On another machine I had 64 hours at 54 %.

I have only had one stuck so I havent been watching them close enough.😱

🙂

Reboot the machine that at 54% and watch what happens to it.

Insidious · Mar 27, 2006

I don't have one (stuck) to watch.... what happens?

-Sid

Freewolf · Mar 27, 2006

Originally posted by: Insidious
I don't have one (stuck) to watch.... what happens?

-Sid

At least in the three or four cases I've had if you leave it alone it never finishes. If you either exit out of BOINC or restart the machine the time spent computing resets to an hour or so and the computing time starts counting but the time to finish keeps climbing and the percent completed never changes until you give up and abort the work units. I've losted a few thousand points worth of computing time on those units.

Insidious · Mar 27, 2006

🙁 OUCH! 🙁

My only suggestion would be that all of the 4.82 WUs are supposed to be funcioning under the time constraints set in the Rosetta project preferences. (Target CPU run time)

If you haven't set a value for this preference, the project presently defaults to 2 hours.

So, If you ever see a WU with much more than this value, it sounds like it should be aborted immediately.

Hopefully knowing this will help people cut their losses some.

-Sid

PS: Have you been reporting these to the Rosey team? They are very eager to get whatever insights they can from our experience with these borked WUs

Dalephi · Mar 27, 2006

Thanks CyGor.

Fardringle · Mar 27, 2006

My production was down the past two days because most of my computers are turned off for the weekend. I should be back up around 5-7K again tomorrow.

As far as the "Stuck WU" issue goes, I found a work-around for my network. All of my computers (at work and at home) are running the service install so I can't use any of the normal network monitoring apps for BOINC that connect to the GUI RPC for status. However, watching my individual PC stats at www.free-dc.org, and then matching up the point totals there with names on my computers listed on the Rosetta site, I can get a pretty good idea of which computers might be stuck (no results in the last few days). Once I have tracked down the offending computer (it has happened a LOT lately), I can use the Server Management tool on my domain controller at the office to remotely administer the Services on the stuck computer and stop and restart the BOINC service. Now, this won't help if the WU really is bad and just gets stuck again, but for the ones mentioned in the 1% WU thread linked at the bottom of CyGoR's posts that do actually run properly after restarting, it will force the WU to restart and might get it un-stuck. So far I've only had two WU's that actually ran properly after restarting out of several dozen that got stuck, but it's much easier to try than physically aborting the WU at the PC when I am only in that office maybe two or three times a month. If that doesn't work, then the only way I have found to fix the problem is to physically go to the problem PC (or use remote access to log in to the machine), open the BOINC Manager, and abort the WU. One side effect of this is that it has 'forced' me to finally getting around to getting VNC set up on some of the machines that I didn't have it running on before. 🙂

It's a pain in the butt to take care of, and I've had to abort about a dozen WU's this past week, but at least I don't have to sit and watch (and be annoyed) at having quite so many Zeroes in my stats because of these 1% WU's.

CyGoR · Mar 28, 2006

This really is a big problem with this project unfortunatly..
On the other hand, I haven't experienced a single WU that got stuck at 1%..
Lately I often check the Rosetta account page to see how long ago each machine returned a WU. Probably the best method to see
if any of the pc's have one of these messed up WU's.

I did however loose a lot of points (over a days worth on my fastest machine) because I didn't had enough free HD space :s
Looks like a dual-boot OS on a 36Gb (WD Raptor) is getting a little tight 😉

I really hope we don't lose any members because of these problems, there will probably be a solution soon..

up uP UP UP UP (<-- You probably had something more tangible in mind 😀)

It'll do just fine 😀

networkman · Mar 28, 2006

In a little under 18 days at my current rate of production I'll be crossing the 1 million mark in the E@H project. At that time, I'll take another look at this and other BOINC projects. 🙂

biodoc · Mar 28, 2006

Originally posted by: Fardringle

It's a pain in the butt to take care of, and I've had to abort about a dozen WU's this past week, but at least I don't have to sit and watch (and be annoyed) at having quite so many Zeroes in my stats because of these 1% WU's.

I'm down 1K+ points a day on Rosetta since I moved 3 boxes to Ralph@Home to help on the debugging. This 1% error is a severe barrier to any growth/recruiting on the project & TeAm because of the frustrations of many of us losing valuable CPU time & $$ for electricity. Looks like the most time we lose now is 24h (still too much). See below for todays comments by David Baker:

Posted 28 Mar 2006 4:50:51 UTC by David Baker on Rosetta@Home

"First, after consultations with many of you on the message boards, we have set the maximum allowed run time to close to 24 hours. No jobs should be getting stuck for much longer than this time. The maximum time that you can increase the length of your work units for has correspondingly dropped to 24 hours. This is of course a temporary fix as we are still hot on the trail of the "1%" bug. It is clear that it is not an infinite loop within Rosetta, as it only occurs when rosetta is run within boinc. With better backtracking in the latest version on ralph, and a new version of boinc soon to be released that should make error tracing even more straightforward, we are optimistic about solving the problem in the near future."

"Our RALPH tests have shown that Rom's fixes have solved most of the other problems and the error rates are way down on Windows machines in particular. You haven't seen the new version on Rosetta@home yet because everytime we think we have a version ready to release a small problem has shown up on a different platform which David has had to go back and fix. The new version included more freqent updating of the "percent complete" which will allow you (and us) to localize more precisely where in the folding process work units are getting stuck."

Thanks for the Stats CyGoR!!:thumbsup:

Fardringle · Mar 28, 2006

Originally posted by: biodoc
Looks like the most time we lose now is 24h (still too much). See below for todays comments by David Baker:

(snip)
"First, after consultations with many of you on the message boards, we have set the maximum allowed run time to close to 24 hours. No jobs should be getting stuck for much longer than this time. The maximum time that you can increase the length of your work units for has correspondingly dropped to 24 hours. This is of course a temporary fix as we are still hot on the trail of the "1%" bug. It is clear that it is not an infinite loop within Rosetta, as it only occurs when rosetta is run within boinc. With better backtracking in the latest version on ralph, and a new version of boinc soon to be released that should make error tracing even more straightforward, we are optimistic about solving the problem in the near future."
(snip)

I hope that actually works. I killed a 1% WU last night that was stuck for 197 hours on my Xeon server. Fortunately, only one of the CPUs on the server was stuck on this project and the other one was still cranking away (probably why I didn't notice it was stuck), but I lost more than 4000 points of production from the machine while that WU was hung at 1%. I didn't even bother trying to restart that one, I just aborted it. 🙁

BadThad · Mar 28, 2006

Originally posted by: networkman
In a little under 18 days at my current rate of production I'll be crossing the 1 million mark in the E@H project. At that time, I'll take another look at this and other BOINC projects. 🙂

Sweet! We need more people if we want to ever have any chance of moving up. Right now, we are firmly cemented in 4th place.

BadThad · Mar 28, 2006

Originally posted by: Fardringle

Originally posted by: biodoc
Looks like the most time we lose now is 24h (still too much). See below for todays comments by David Baker:

(snip)
"First, after consultations with many of you on the message boards, we have set the maximum allowed run time to close to 24 hours. No jobs should be getting stuck for much longer than this time. The maximum time that you can increase the length of your work units for has correspondingly dropped to 24 hours. This is of course a temporary fix as we are still hot on the trail of the "1%" bug. It is clear that it is not an infinite loop within Rosetta, as it only occurs when rosetta is run within boinc. With better backtracking in the latest version on ralph, and a new version of boinc soon to be released that should make error tracing even more straightforward, we are optimistic about solving the problem in the near future."
(snip)

Click to expand...

I hope that actually works. I killed a 1% WU last night that was stuck for 197 hours on my Xeon server. Fortunately, only one of the CPUs on the server was stuck on this project and the other one was still cranking away (probably why I didn't notice it was stuck), but I lost more than 4000 points of production from the machine while that WU was hung at 1%. I didn't even bother trying to restart that one, I just aborted it. 🙁

What a waste. 🙁

I'm scared to go to one of my remote machines, it's been running stuck for over 3 months now....that's over 2400 HRS of CPU time flushed down the toilet. :|

Freewolf · Mar 28, 2006

Just found a 1% running on my opty. Only been running for 8 hours tho.

petrusbroder · Mar 28, 2006

No 1% WUs for a long time ... 🙂 - Ilike that.

I just passed 100 000 credits in Rosetta - and I like that better. 😀

And 700 000 credits in combined BOINC! 😀 :beer: 😀

Insidious · Mar 28, 2006

Originally posted by: networkman
In a little under 18 days at my current rate of production I'll be crossing the 1 million mark in the E@H project. At that time, I'll take another look at this and other BOINC projects. 🙂

That is GREAT news NWM. As you can see in this thread, it does take some babysitting atm.

I hope it turns out that you can help all of us beleaguered babysitters! 😀

Originally posted by: petrusbroder
No 1% WUs for a long time ... 🙂 - Ilike that.

I just passed 100 000 credits in Rosetta - and I like that better. 😀

And 700 000 credits in combined BOINC! 😀 :beer: 😀

That's a couple mighty fine marks Peter! :thumbsup:

-Sid

I've only had 2 bad WUs in the past several weeks (one of those might not have been stuck.... (<-- itchy trigger finger 😱 )

networkman · Mar 28, 2006

I usually end up laying hands on the machines in my house once every day or two, so baby-sitting shouldn't be too much of a problem. For the machines not located in my house, those may need to stay running E@H as they are since it's a VERY stable project/client.

Insidious · Mar 28, 2006

Originally posted by: networkman
I usually end up laying hands on the machines in my house once every day or two, so baby-sitting shouldn't be too much of a problem. ...

A man of my own heart

That's the spirit! :beer: :beer:

bluestrobe · Mar 28, 2006

I've been getting hammered with the 1% WU's (+30 on 10 processors). For a free project you would think they would have sorted this out by now.

BadThad · Mar 29, 2006

I've been getting hammered as well. Sure hope they get this figured out QUICK before I find a new project.

TAandy · Mar 29, 2006

Strange, I think I've only had one stuck at 1%, what processor are you guys working with?
I'm running 2 on Athlon XP's

Rosetta@Home - 03/26: Stats

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Platinum Member

Lifer

Diamond Member

Diamond Member

Lifer

Lifer

Diamond Member

Elite Member

Diamond Member

Lifer

Diamond Member

Platinum Member

Lifer

Diamond Member