BOINC Problems and Issues

mondobyte

Senior member
Jun 28, 2004
918
0
71
I have a "few" ;) computers that are now crunching 4 Boinc projects:

SETI
Rosetta
CPDN
SIMAP

The problem is that SIMAP seems to have taken over and is the only project that is processing although I have given it less than 5% resources ...

Can anyone tell me how I can fix this so that resource allocations actually work?

I am considering adding PrimeGrid to the mix but I need to sort out the SIMAP problem first ...

Thanks

BTW ... I figured out how to do the MSI transforms with ORCA and distribute the client using Active Directory ... turns out to be relatively simple but time consuming.

I've even figured out how to distribute the optimized clients to the correct processors using Active Directory to my "few" clients ...

Tomorrow promises to be a fun day ... I hope that I will have an operational Dual Core Opteron up by EOD ... s/b an interesting addition to the hordes ... I already have a Dual Core Athlon 64 4400+ running Rosetta ... that is really cool ...

More later on the hordes ... lots of hardware changes in the hordes for the next few weeks ... I'll let you know which is better ... the Opteron or the 64 ... s/b a real interesting comparison

For those of you who might be interested ... my lovely bride surprised me with a week in Hawaii (on the Kona coast of the Hawaii - the big island) for Christmas ... 10 years together ... First Class airfare, a week at the Waikoloa Hilton ... (A HUGE and wonderful resort) ... Snorkeling, Whale Watching (Humpbacks and Pygmy Sperm Whales), Fishing (Marlin, Sail, Sword, Tuna, Mahi-Mahi), Porpoise watching, Green Flash Sunset(s), a trip up the mountain to the observatories on Mauna Kea ... generally kicking back and enjoying the wonderful climate. Coming back was ... well ... cold and harsh ...

Later

Mondo
 

bluestrobe

Platinum Member
Aug 15, 2004
2,033
1
0
Did you set it up in BOINC to run less than 5% resources or is this setup somewhere else? I've never had a problem with the allocation in the actual BOINC clients. Another thing is you might suspend work on the project and see if it favors another project like it does now. If so, it might be a bad client install. Just a thought.
 

BlackMountainCow

Diamond Member
May 28, 2003
5,759
0
0
I've heard of that problem with SIMAP before but I can't remeber where. Did you check the official SIMAP forums?
 

mondobyte

Senior member
Jun 28, 2004
918
0
71
I went ahead and posted on the SIMAP forums ...

Not very pretty from here ... suspending every client could be a nightmare for me .... I think I might have over 100 clients at this time in my "beta" test before rolling BOINC out to all the computers ... he he he
 

rise

Diamond Member
Dec 13, 2004
9,116
46
91
i also had the same problem mono and i haven't seen a resolution yet. its not as big of a deal for me as it is you so i hadn't really looked into it much.

i was told that boinc will run everything based on the deadline which in my case i guess makes sense as all the simap units have much quicker deadlines than the rosettas.

but seti boinc and rosetta played differently, splitting the time prettty much as i allocated it iirc in november.
 

petrusbroder

Elite Member
Nov 28, 2004
13,348
1,156
126
The problem is -as you say, Mondo - allocation of resources. I have read in the dev-mail lists that BIONC uses two mechanisms for allocating resources: one is the specification (Resource share ) which the cruncher specifies ... 100 or 50 or 500 ... the other is the "best-before-time": if the deadline for a WU is before the next specified connection time the the resources are allocated according to that deadline.

E.g.: Say you have set up your comps tho connect to the project every three days. Each project then downloads WUs for three days (which in the case with 4 projects makes up for 12 days of crunching). If a WU has a deadline which expires before those 12 days have passed, that WU will be crunched first. Some projects (Predictor, LHC, SIMAP) have short deadlines and therefore these projects get crunched first.

One solution is to set a short time for the setting "Connect to network about every ..." (e.g. 1 day or 0,5 days) if you have 4 or 5 projects. Then the total cache will be 3 - 5 days of crunching anyhow. The deadlines will not fall within those 5 days and the WUs will get crunched according to the Resource share you have specified.

Take care, thanks for your massive crunching for the TeAm and may you have a really Good New Year 2006!
 

rise

Diamond Member
Dec 13, 2004
9,116
46
91
bah, i forgot RD's recommendation to lower the "connect to" setting like you suggest Peter. thanks for the refresher. i'll do that on my sons machine later :)
 

mondobyte

Senior member
Jun 28, 2004
918
0
71
Originally posted by: petrusbroder
...Take care, thanks for your massive crunching for the TeAm ...

The vast majority of the hordes are still quiesced. I am awaiting the results of this pilot to determine if or when I will call them to action ... ;)

mondo

 

Freewolf

Diamond Member
Feb 15, 2001
9,673
1
81
Dual Core Opteron I'm waiting for one of these myself ( a 165) It will be my first dual core and I'm looking forward to it.
Sound like you and yours had a great time!
I'm 100% Rosetta myself so I'm no help with your problem. looking forward to seeing the hordes back in action.
 

mondobyte

Senior member
Jun 28, 2004
918
0
71
Originally posted by: Freewolf
Dual Core Opteron I'm waiting for one of these myself ( a 165) It will be my first dual core and I'm looking forward to it.
Sound like you and yours had a great time!
I'm 100% Rosetta myself so I'm no help with your problem. looking forward to seeing the hordes back in action.

Opteron 175 here ...

Such a problem to have ;) The BOINC pilot will be somewhere in the range of 120 to 150 computers -- at least that is SWAG at approximation.

I'll give the pilot about a month to settle in and see what happens before I decide what to do with the rest of the hordes. I have several activities to get through.

1. Rolling out a software update via Active Directory
2. Removing a project via Active Directory startup script.
3. Adding a project via Active Directory (either as a software update via transform or via a startup script.
4. Tweaking the parameters so that it plays nice on the network.

Can anyone think of any other testing that I should do ???

Thanks everyone for the very useful and informative responses.

mondo
 

Rattledagger

Elite Member
Feb 5, 2001
2,994
19
81
Client crunches in 2 modes, "round robin" and "nearest deadline mode".
"Round robin" is the normal mode, and for any project client currently has downloaded any work you'll basically follow your resource-shares so if example A is 33% and B is 66% you'll crunch 2h B, 1h A, 2h B, 1h A in infinite loop.

In "nearest deadline mode" on the other hand, just like the name implies you'll crunch whatever result is closest to the deadline, regardless of resource-shares. Most common reason for client switching to deadline-mode is if a result is closer than 2x cache-setting to deadline. Other reason is if cached work needs more than 80% till deadline.


In deadline-mode you definitely will not follow your resource-shares, for this reason the client keeps track of "long term debt", and any asking for work is after this "long term debt".

So, while short-term you can be crunching one project exclusively in deadline-mode, long-term the client will still try to follow your resource-shares.


With large cache more time is used in deadline-mode, and this means it's harder for client to "correct" afterwards to your desired resource-share-settings. Also, as already mentioned, with many projects you can risk you'll download too much total work before client denies asking for more work...

So, if you're permanently connected and runs multiple projects the default 0.1-cache is normally good enough.
 

GimpyOne

Senior member
Aug 25, 2004
302
1
0
One thing to keep an eye on with Rosetta is that it likes the "left in memory" option set to Yes.

Some of the other projects, however, supposedly like it set to No. Looking through the forums, it appears as if Rosetta can hang with it set to No, and other projects can hang with it set to Yes. It might be easier splitting the projects up with one group doing 2-3 projects that are happy with Yes, and a second with 2-3 projects that like No. Just a thought that may or may not be an issue for you.

BTW, at least you are heading to St Louis after the cruise. I just left UMR and will be working in Minneapolis, now that's a shock from Hawaii!
 

RaySun2Be

Lifer
Oct 10, 1999
16,565
6
71
For those of you who might be interested ... my lovely bride surprised me with a week in Hawaii (on the Kona coast of the Hawaii - the big island) for Christmas ... 10 years together ... First Class airfare, a week at the Waikoloa Hilton ... (A HUGE and wonderful resort) ... Snorkeling, Whale Watching (Humpbacks and Pygmy Sperm Whales), Fishing (Marlin, Sail, Sword, Tuna, Mahi-Mahi), Porpoise watching, Green Flash Sunset(s), a trip up the mountain to the observatories on Mauna Kea ... generally kicking back and enjoying the wonderful climate. Coming back was ... well ... cold and harsh ...

Dayum, that was an excellent Xmas gift! :Q :D First Class and all! :D

She's a keeper. :thumbsup:
 

Rattledagger

Elite Member
Feb 5, 2001
2,994
19
81
Originally posted by: mondobyte
1. Rolling out a software update via Active Directory
2. Removing a project via Active Directory startup script.
3. Adding a project via Active Directory (either as a software update via transform or via a startup script.
4. Tweaking the parameters so that it plays nice on the network.

While haven't tested to do active directory for any of this, would expect:
1; Just update your installation-package with a new clients *.msi and re-run the exact same script as initial installation... well, not if you've been messing with client_state.xml...
2; The startup-script runs on individual computer yes? If so, it should look something like this:
del "c:\program files\boinc\account_project-url.xml"
net stop boinc
net start boinc
3; Similar as #2, except you adds a file instead of deleting one. ;)


 

Rattledagger

Elite Member
Feb 5, 2001
2,994
19
81
Originally posted by: GimpyOne
One thing to keep an eye on with Rosetta is that it likes the "left in memory" option set to Yes.

Some of the other projects, however, supposedly like it set to No. Looking through the forums, it appears as if Rosetta can hang with it set to No, and other projects can hang with it set to Yes. It might be easier splitting the projects up with one group doing 2-3 projects that are happy with Yes, and a second with 2-3 projects that like No. Just a thought that may or may not be an issue for you.

Rosetta@home desidedly needs "left in memory" set to yes, since otherwise application often crashes. Also note to not use too old BOINC-client, since some of the older ones switches out Rosetta@home during benchmarking that happens every 5 days, regardless of "left in memory"-setting.


As long as you're not very short on memory and swap-space, or you're running win9x, can't remember any problems with other projects... So my recommendation is also to use Yes in other projects, since this means doesn't lose crunched work since last checkpoint.
 

mondobyte

Senior member
Jun 28, 2004
918
0
71
Thanks for the advice folks ... and especially Rattledagger.

I now have cache set to 1.5 days and retain in memory to yes ...

I generally use startup scripts (rather than logon scripts) via Active Directory keeping in mind that some WMI stuff doesn't work at startup. Your scripts were just about what I was planning for the deletion ...

For the addition of a project, I was going to create a new transform and then redeploy the MSI installation ... too bad there isn't a function with MSI to delete files!

My suggestion, though ... is to do the net stop ... manipulate the client files ... then do the net start just so I don't run into any "in-use" files with locks.

I'm still looking at setting up the remote RPC stuff and trying to get BOINCVIEW up and running. Haven't quite puzzled out exactly what to do to make the remote cfg file work and such ... The WIKI is helpful but vague on EXACTLY what the file should look like when I finish ... do I delete the long string of numbers or do I replace them or do I add to it ...

TeAM rocks!

mondo
 

Freewolf

Diamond Member
Feb 15, 2001
9,673
1
81
Originally posted by: mondobyte
Thanks for the advice folks ... and especially Rattledagger.

I now have cache set to 1.5 days and retain in memory to yes ...

I generally use startup scripts (rather than logon scripts) via Active Directory keeping in mind that some WMI stuff doesn't work at startup. Your scripts were just about what I was planning for the deletion ...

For the addition of a project, I was going to create a new transform and then redeploy the MSI installation ... too bad there isn't a function with MSI to delete files!

My suggestion, though ... is to do the net stop ... manipulate the client files ... then do the net start just so I don't run into any "in-use" files with locks.

I'm still looking at setting up the remote RPC stuff and trying to get BOINCVIEW up and running. Haven't quite puzzled out exactly what to do to make the remote cfg file work and such ... The WIKI is helpful but vague on EXACTLY what the file should look like when I finish ... do I delete the long string of numbers or do I replace them or do I add to it ...

TeAM rocks!

mondo

Sid has Boincview working and said he is going to work up a howto for it.
 

Rattledagger

Elite Member
Feb 5, 2001
2,994
19
81
Originally posted by: mondobyte
I'm still looking at setting up the remote RPC stuff and trying to get BOINCVIEW up and running. Haven't quite puzzled out exactly what to do to make the remote cfg file work and such ... The WIKI is helpful but vague on EXACTLY what the file should look like when I finish ... do I delete the long string of numbers or do I replace them or do I add to it ...

Edit gui_rpc_auth.cfg and set whatever password you wants yourself (deleting the random password).
Generate a file and call it remote_hosts.cfg
On each line of this file put computer-name or ip-address of a computer you wants to monitor from.

So, if you're only going to control all BOINC-installations from example Mondos_computer, you only need one line with one name in this file.



Anyway, the 1st. "Account Manager"-website will likely launch early next year, this should mean can add/remove the different projects you wants to run on a web-site instead of "attaching" for each single project on each computer or running a script.
 

Spacehead

Lifer
Jun 2, 2002
13,067
9,858
136
Originally posted by: mondobyte
For those of you who might be interested ... my lovely bride surprised me with a week in Hawaii (on the Kona coast of the Hawaii - the big island) for Christmas ... 10 years together ... First Class airfare, a week at the Waikoloa Hilton ... (A HUGE and wonderful resort) ... Snorkeling, Whale Watching (Humpbacks and Pygmy Sperm Whales), Fishing (Marlin, Sail, Sword, Tuna, Mahi-Mahi), Porpoise watching, Green Flash Sunset(s), a trip up the mountain to the observatories on Mauna Kea ... generally kicking back and enjoying the wonderful climate. Coming back was ... well ... cold and harsh ...

Later

Mondo
Sounds like alot of fun. I'd like to witness one of those green flashes sometime... i've only seen pictures. I suppose the critter watching was fun too... ;)


Good luck with the BOINC testing :)
 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
I think BoincView will ease your job Mondo (assuming firewall issues don't prevent your using it)

You would only need to do network installs of the boinc manager (and adding the two text files you create.. 'gui_rpc_auth.cfg' which contains a password of your choosing (replace the one that is generated automatically) and 'remote_hosts.cfg' which is the IP of the machine(s) you will be running BoincView on. You can put the same file pair on every client.)

Once you have Boinc and BoincView installed you can perform every function of the Boinc manager from a single location. (attach, detach, update, monitor, etc.) BoincView lets you run a command on every client with a single click (or any group of clients you select).

You will have to sit down and manually add each client to BoincView, but that is a one time deal... well worth it.

-Sid

edit: I think you would like to have a single screen where at a glance, you can see any client misbehaving and the commands to rectify the issue

Another afterthought: BoincView is very chatty. When you add each client, be sure to change the update rate to something longer than the 5 seconds it defaults to. (probably several minutes with so many clients running)