Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
as of 1972, my BBC world has come to a crashing end... (taking BOINC with it of course)

It was a good run. but now it's dun :(

-Sid
 

petrusbroder

Elite Member
Nov 28, 2004
13,348
1,155
126
Hmm, that is sad. With my WUs I am in 1998 and 1972 respectively - not ending yet ... :)
Soon the prodiction part will come ... then it will be interesting. I think many WUs will crash when thwy get passed 2010 ... ;)
 

kb3edk

Senior member
Jul 11, 2004
494
0
0
Originally posted by: Insidious
as of 1972, my BBC world has come to a crashing end... (taking BOINC with it of course)

It was a good run. but now it's dun :(

-Sid

Ohh man, 1972? I guess Nixon dropped The Bomb on Hanoi and the Russkies retaliated and wiped out the entre US East Coast.

Then nuclear winter must have set in with a giant ice cap growing down from the poles. The Brits at the BBC started burning all their T. Rex and Mott The Hoople records to keep warm but it was not enough...

Amazing what they can simulate with computers these days ain't it?
 

rise

Diamond Member
Dec 13, 2004
9,116
46
91
rose.gif


:beer::p
 

Rattledagger

Elite Member
Feb 5, 2001
2,994
19
81
Even if CPDN is crashing it shouldn't have taken BOINC with it in the fall... :confused:

1972... roughly 33% done? Sorry for your loss.


Petrusbroder reached 1972 and 1998, excellent work. :beer:


As for my own models, all except one has by now reached 15%, and has still expected finish-time in October/November. :p The last is paused while waiting on the "Seasonal Attribution"-work to be finished in 2-3 weeks...

The only other notable is some more BOINC-alpha-builds, among these a couple "bad" builds, and also re-formatting and OS re-install on all computers, but everything is still crunching-along...

 

TAandy

Diamond Member
Oct 24, 2002
3,218
0
0
All the carboniferous fuel (Vodka :) ) ran out :(

Edit:
That should have been COWboniferous :D
 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
all condolances are appreciated
rose.gif


I had managed to convince myself I had one that was going to go all the way. :confused:
(but I knew from our team mates that wasn't really a given)

@RD, I don't know why it always takes BOINCmanager with it when it goes. It did on every crash (I think I was up to 5 or 6 :( ) It gave the error 'Boinc manager cannot comunicate with localhost'.

I used add/remove programs to uninstall BOINC then re-installed. The remaining projects took up where they left off so it isn't so bad.

But look at it this way.. it took Nuclear Winter and a dehydrated VodCOW to doom me.... :p

-Sid

:shocked: I OWNED a T.Rex album when I was in HS. I had a friend jamed Jeff who we all called Jake and he ate crap everytime the song 'Jungle Faced Jake' came on. (yes, Jeff had some serious acne :laugh: )
 

TAandy

Diamond Member
Oct 24, 2002
3,218
0
0
Originally posted by: Insidious
all condolances are appreciated
rose.gif


I had managed to convince myself I had one that was going to go all the way. :confused:
(but I knew from our team mates that wasn't really a given)

@RD, I don't know why it always takes BOINCmanager with it when it goes. It did on every crash (I think I was up to 5 or 6 :( ) It gave the error 'Boinc manager cannot comunicate with localhost'.

I used add/remove programs to uninstall BOINC then re-installed. The remaining projects took up where they left off so it isn't so bad.

But look at it this way.. it took Nuclear Winter and a dehydrated VodCOW to doom me.... :p

-Sid

:shocked: I OWNED a T.Rex album when I was in HS. I had a friend jamed Jeff who we all called Jake and he ate crap everytime the song 'Jungle Faced Jake' came on. (yes, Jeff had some serious acne :laugh: )

Jeff, Jake, "Jungle Faced Jake", crap, acne????????????
You're upping the stakes a bit here, aren't you?
I don't think I can afford all that :(
 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
Watch out where the VODcows go........... Don't you eat that yellow snow! :shocked:

:laugh:
 

TAandy

Diamond Member
Oct 24, 2002
3,218
0
0
Originally posted by: Insidious
Watch out where the VODcows go........... Don't you eat that yellow snow! :shocked:

:laugh:

Mmmmmmmmmmmmmm, yellow snow!!!!!
 

Assimilator1

Elite Member
Nov 4, 1999
24,165
524
126
eww :p......it's not lemon juice that's been 'spilt' you know? ;)

Sorry to hear it crashed again Sid :(
Btw my mother had that same BOINC problem running SETI ,had to re-install too.

Why do they have to have such massive WUs anyway??:confused: ,surely that's just asking for trouble?
 

Rattledagger

Elite Member
Feb 5, 2001
2,994
19
81
Originally posted by: Assimilator1
Why do they have to have such massive WUs anyway??:confused: ,surely that's just asking for trouble?
They was initially planning to split hindcast/forecast in 2, but decided against it since different cpu/OS will give different result...
Also, you'll most likely get much bigger uploads/downloads whan now...

As for very long model, well, at the start-date you've also got a 200-year Spin-up model as a basis, the ocean needs very long initializing...
Also, the 80-year hindcast-period is run to see how close to reality a particular model is, if hindcast is very close to reality the probability forecast is also correct is higher than if a model lead to a new ice-age in 1972...
 

petrusbroder

Elite Member
Nov 28, 2004
13,348
1,155
126
:( I am very sad to report that one of my two comps with BBC-CCE crashed due to "Beräkningsfel" (in swedish): Computation error, at exactly 50.0%; the date: some 32 minutes before the crash: April 15, 2006. I am saddened by the crash - it crunched for such a long time and I hoped it would go on to the end.
I shall not start a new WU on this comp.
I have one more comp crunching. Those 2 WUs are at about 26% --- if those crash too, I'll will not start a new WU either. With these WUs you really need a fast computer with a lot of RAM; the one which crashed was a AMD A64 3800+ with 768 MB Ram, the one which goes on is a Intel P4 with HT at 2.8 GHz, with 1.5 GB Ram and is much slower.

The good in all that bad: The comp uploaded a ZIP-file with the results sometime today - must have been some hours before the crash. So maybe the scientists can get something out of the data... :Q
 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
Sorry to hear it Peter.

It is disappointing to have so much time go up in smoke.

I think this project is perhaps simply not competent.

-Sid

(not sorry I tried.... :thumbsup: )
 

petrusbroder

Elite Member
Nov 28, 2004
13,348
1,155
126
Originally posted by: Insidious
Sorry to hear it Peter.

It is disappointing to have so much time go up in smoke.

I think this project is perhaps simply not competent.

-Sid

(not sorry I tried.... :thumbsup: )

I am not so sure about the lack of competence in that project. I think that climate prediction is the hardest stuff to do in DC since the models can go haywire anytime. I think that the models often crash because the models get to some constraints (limits), e.g. that the oceans start boiling or freezing. Then the model obviously is wrong ... and the cruching is aborted.
OTOH: I think that the researchers need to know which models are wrong - negative results show what you don not have to crunch again.
That is important knowledge too.
Being a scientist myself, I can directly say, that I have learned as much (or perhaps more) from the experiments which did not lead to any conclusion - because I now know what can go wrong, etc. Analysis of these experiments have lead me to the correct ones but: these unsuccessful results never get published and that is a great pity, since other scientists thus can repeat my unsuccessful experiments and thus waste resources.
I am sorry that my BBC-CCE-WU crashed because I wanted to contribute with a viable WU/result, not with one which was not realistic.
 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0

I meant that the WUs are not viable (pronounced: ill-conceived).

It is not unrealistic to expect soft,controlled exits for models that reach unrealistic conclusions. I really don't believe competent work units would crash the BOINC manager. I also don't believe it is unrealistic to expect the vast majority of WUs to complete as opposed to the current vast majority crashing.

-Sid
 

petrusbroder

Elite Member
Nov 28, 2004
13,348
1,155
126
Yeah - from that point of view you are right. Had not thought of that possibility: a "soft" controlled exit, using the data collected until the time point the process reached the constraints.
Well, since the devs did not build that in, the project and its WUs are of course not well conceived and thus we are a bunch of beta-testers :( without knowing it.
The four (three very early in the crunching- like 5 - 20 seconds, one today) crashed WUs did not take down BOINC in my case.
But I have seen that your comps suffered that fate. That is absolutely not acceptable.
I am considering of shutting down the other two WUs too. It takse such a long time (1 150 hours to get to 25% - expecting thus a crunching time of 4600 hours = 4½ more months - that is a long time to block a computer and not knowing at all if the model works out ... I'll consider it until tomorrow night and then decide. QMC, Seti and SIMAP could really use that processor.
 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
I think that is one of my favorite aspects of BOINC DC'ing. When one project doesn't fit anyone's personal goals, there is another to try. I can't imagine anyone not finding a great BOINC project to run if they so desired.

Besides, it will be fun to come back to CDPN someday, when they have evolved (<-- PUNN :shocked: ) and recall the dark days of the BBC disaster.

Happy Easter! (or Sunday... which ever applies)

-Sid
 

Rattledagger

Elite Member
Feb 5, 2001
2,994
19
81

petrusbroder

Elite Member
Nov 28, 2004
13,348
1,155
126
Thanks Rattledagger. I now found the code - but at the time when the comp had crashed no explanation was at hand. I could not read at the forums, because they were (understandibly) down for some hour(s?) or so after I found the killed model.
It would have been good if the devs had some better understandable explantion for the codes when WUs crash/end/are killed. Now it said "Unrecoverable error for the ... exit code 99 (0x63)" and - on the results list "Client error". That is - if anything - a misnomer. It should say: "WU-error, process killed by trickel up" or something.
Sorry, I am somewhat frustrated (but not at you RD and not at any cruncher) - it has taken more than 4 000 hours of crunching and for limited benefit.
I almost feel like leaving the project. OTOH: as I said above: Climate prediction is the hardest stuff to do in DC since the models can go haywire anytime and I will continue contributing, but at a lower rate.
 

Assimilator1

Elite Member
Nov 4, 1999
24,165
524
126
Originally posted by: Rattledagger
Originally posted by: Assimilator1
Why do they have to have such massive WUs anyway??:confused: ,surely that's just asking for trouble?
They was initially planning to split hindcast/forecast in 2, but decided against it since different cpu/OS will give different result...
Also, you'll most likely get much bigger uploads/downloads whan now...

As for very long model, well, at the start-date you've also got a 200-year Spin-up model as a basis, the ocean needs very long initializing...
Also, the 80-year hindcast-period is run to see how close to reality a particular model is, if hindcast is very close to reality the probability forecast is also correct is higher than if a model lead to a new ice-age in 1972...

Yea I see the hindcasting bit ,makes sense ,but I still don't see why they have to be such big WUs.
I don't see why different OSs or CPUs would give different results ,they don't on all the other DC projects *shrug*
 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
Originally posted by: Assimilator1

...

I don't see why different OSs or CPUs would give different results ,they don't on all the other DC projects *shrug*

I was struck by the very same notion Assimilator.

Hearing that the project devs. expect different results, even with all else equal, simply from different hardware/OS running their program??? I find this highly questionable. Just exactly WHAT are they testing with these work units?

Honestly, so far we know they have a difficult time writing software that will function w/o a crash, and it will not run the same way twice??

What am I missing?

-Sid
 

Rattledagger

Elite Member
Feb 5, 2001
2,994
19
81
Originally posted by: Assimilator1
I don't see why different OSs or CPUs would give different results ,they don't on all the other DC projects *shrug*

Actually, in most instances the results is different between example Amd & Intel, and Windows/Linux, even they're using the same IEEE754-standard. But for example SETI@Home it doesn't really matter if a spike is reported at 12,345678 with power 53,46632 or if it's reported at 12,345677 with power 53,46635 or something. Actually, for SETI@Home even if you re-run the exact same wu on the exact same computer, but in one of the runs stops & continues you'll likely get a slightly different result...

But, in some projects, even an "insignificant" rounding-difference between cpu or OS can become significant. This is example true in LHC@Home, and Predictor@home is using Homogeneous Redundancy just because computational differences across cpu/platform can give markedly different results.

As for CPDN that is partially caotic, here a small difference can atleast theoretically be the difference between a new ice-age and a significantly hotter climate... Meaning, switching computers "mid-track" increases the uncertainties a "good" hindcast also gives a "good" forecast.

BTW, there's also another reason for CPDN, if splits models up into example 10 years, you'll likely need much bigger uploads/downloads, and while some haven't any problems handling this others can't handle it and must drop-out. Also, CPDN likely doesn't have the capasity either...


As for some quotes mentioning numerical differences due to different cpu/OS in DC-projects:
LHC@Home news
5.4.2005 8:06 UTC
The new sixtrack application seems to run fine but we will have to do a final analysis on the results.
We are not yet using the new BOINC api, because this requires a lot of changes to the way we do graphics.
Before we made these changes we wanted to test the new physics code. Version 4.66 uses the CRlibm library. This will hopefully remove the differing results we have seen from different platforms.
LHC@Home FAQ
Please note that different CPU architectures may yield different floating point results. This is especially true between Pentium and Athlon XP CPUs.

BOINC, Dealing with numerical discrepancies
Most numerical applications produce different outcomes for a given workunit depending on the machine architecture, operating system, compiler, and compiler flags. For some applications these discrepancies produce only small differences in the final output, and results can be validated using a 'fuzzy comparison' function that allows for deviations of a few percent.

Other applications are 'divergent' in the sense that small numerical differences lead to unpredictably large differences in the final output. For such applications it may be difficult to distinguish between results that are correct but differ because of numerical discrepancies, and results that are erroneous. The 'fuzzy comparison' approach does not work for such applications.
BOINC, Eliminating discrepancies
Given all this I was delighted, until I started finding small numerical difference in a small percentage of runs. This was relatively easy to spot, as even a difference of 1 in the least significant bit of the mantissa of an IEEE floating-point number, will be magnified as the SixTrack particles pass through ~10,000 computational steps of each of up to one million turns.
Did even stumble over a small Folding@home-quote indicating disrepancies also here, Folding-forum, Vijay Pande
The differences between CPUs are not important when trajectories are taken statistically. As it is, any single trajectory isn't all that useful, but as a whole, we can learn a lot.

 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
It sounds to me like some of these projects are taking "algorithmic shortcuts" that require huge samples to overcome (make the errors due to shortcuts statistically insignificant in the end). That seems to be what I was missing.

IMO, an example of the falacy: "we don't need to hire programmers, we can do it ourselves. We have PHDs ya know"