SETI: Corrupted seti.exe file (BEOWULF) - Loosing serious WU Crunching pwr.

LANMAN

Platinum Member
Oct 10, 1999
2,898
128
106
Anyone been having troubles with their clients not wanting to grab another WU from their SETI Q?

I found a work around by deleting the oringinal SETI.exe file and copying a fresh new one, but to do that all the time
just plain sucks.

Any ideas?

--LANMAN
 

Smoke

Distributed Computing Elite Member
Jan 3, 2001
12,650
207
106
This could be one of several things ... isn't that enlightening? :p lol

Were these installations that were working and have now run into trouble? Or were these new Seti installations?

Let me conjecture in case these are old installations that have just quit working properly. You could have a defective WU. If you are using the Cli Client running without a caching program (besides the SetiQueue), a defective WU would clog up the works. Instead of replacing the Cli Client ... try deleting the WU. I'm not sure how you do that with a Cli Client running by itself for I have always used either SetiDriver or SetiHide. A clue would be to look in the SetiQueue LOGS and see if there are any problems.
 

Rattledagger

Elite Member
Feb 5, 2001
2,989
18
81
Well, only problems I've had with the combination setiqueue / firedaemon-seti-service getting new wu is:
1; if running setiqueue on NT4 there after re-boot the setiq-service times out so must manually start it.
2; if using Computername as proxy-setting & haven't got a network-card... Use 127.0.0.1 instead.
3; if using dial-up, the computer wants to dial out instead of contacting the setiqueue running on the same machine... Fixed by running service as an user that don't have a configured dial-up-setting...

Uhm, and unplugged network-cable, or the queue-machine is down. :eek:

The only other problem can think of for the moment is if the ip-address has changed...

Defective wu? Setiqueue checks & deletes corrupted wu then downloading or before delivering to a client... So if you haven't got a machine-crash or disk-problem the wu shouldn't be corrupted... Of course some 1-minute-wu shows up, but this is another problem. ;)


If the seti-exe is corrupted, 1st to look for would be virus, 2nd disk-corruption...
 

Soggysocks

Golden Member
Jun 20, 2001
1,250
0
0
I've had wu's that finish but when I go to upload...........they won't go. :frown:

So I swallow the loss and delete the WU and start over. Don't know if thet helps any.

I , also have had corrupted seti installs. I have deleted the folders and copy and pasted a copy of a good folder from another machine, deleted the wu's and restarted the cli.

That worked for me ;)

Might be worth a try.
 

BadThad

Lifer
Feb 22, 2000
12,099
47
91
Originally posted by: Soggysocks
I've had wu's that finish but when I go to upload...........they won't go. :frown:

So I swallow the loss and delete the WU and start over. Don't know if thet helps any.

I , also have had corrupted seti installs. I have deleted the folders and copy and pasted a copy of a good folder from another machine, deleted the wu's and restarted the cli.

That worked for me ;)

Might be worth a try.

Rather than mess around, that's exactly what I do when a client is hosed.

 

Smoke

Distributed Computing Elite Member
Jan 3, 2001
12,650
207
106
I just saw the following in the LOG of my SetiQueue. It appears you tried to send in a couple of dups and also register the 1st WU (of a new CLIENT?) through me and it was rejected. Here are the pertinent lines in my LOG:


4:36pm: Q0082 Work unit 06se03ab.14564.22258.947138.88 given to NCC1701A.LANMAN.3.3.i386-winnt-cmdline
4:35pm: Q0002 Work Unit 10no03aa.10853.6946.498588.233 added to the queue
4:35pm: s@h_wrk Sending result 18oc03aa.20188.1120.115896.83
4:35pm: s@h_wrk Sending result 02oc03aa.25261.29554.542342.126
4:35pm: Q0082 Work unit 04se03aa.14141.9634.673578.102 given to NCC1701A.LANMAN.3.3.i386-winnt-cmdline
4:34pm: s@h_wrk Sending result 27no03aa.25808.16002.404820.56
4:34pm: s@h_wrk Waiting 10s before trying again
4:34pm: Q0082 Seti returned: Duplicate result
4:34pm: s@h_wrk Sending Result: Seti@home status: Duplicate result
4:34pm: s@h_wrk Sending result 28no03aa.7910.401.940902.115
4:34pm: s@h_wrk Sending result 28no03aa.7910.4322.567312.130
4:34pm: s@h_wrk Waiting 10s before trying again
4:34pm: Q0082 Seti returned: Duplicate result
4:34pm: s@h_wrk Sending Result: Seti@home status: Duplicate result
4:34pm: s@h_wrk Sending result 28no03aa.7910.401.940902.115
4:34pm: s@h_wrk Sending result 28no03aa.7910.7842.211072.222
4:34pm: s@h_wrk Sending result 28no03aa.7910.9984.1047168.79
4:34pm: s@h_wrk Sending result 27no03ab.1701.3665.236066.162
4:34pm: Q0082 Work unit 10oc03aa.23581.9634.679838.146 given to NCC1701A.LANMAN.3.3.i386-winnt-cmdline
4:34pm: Q0079 Work unit 18no03ab.5521.8034.254826.173 given to rrcs-midsouth-24-172-76-57.biz.rr.com
4:34pm: Q0079 Note work-unit 18oc03aa.20188.1120.115896.83 was not dispatched by SetiQueue
4:34pm: Q0079 Result 18oc03aa.20188.1120.115896.83 completed by rrcs-midsouth-24-172-76-57.biz.rr.com [9h22m]
4:33pm: s@h_wrk Sending result 28no03aa.7910.786.361066.111
4:33pm: Q005D Work Unit 19no03aa.20606.4753.23560.201 added to the queue
4:33pm: s@h_wrk Sending result 05oc03aa.23332.5778.254826.64
4:33pm: s@h_wrk Returning passing through request response
4:33pm: s@h_wrk Passthrough: Seti@home status: ErrorCode 0x00000064 100
4:33pm: s@h_wrk Seti@home message: 'Corrupt user ID. Please exit SETI@Home, delete user_info.sah and result.sah and resta...
4:33pm: s@h_wrk Passing through request send_result_get_user_stats


Your Queue Info on the Crazee Smoke SetiQueue:

Queue Q0075
Seti User Name TA_Beowulf
Seti User ID 4313102
Seti@home Platform i386-winnt-cmdline
Seti@home Version 3.08
Seti@home clients 1
Child SetiQueue Clients 88
Current Min, Max 1, 2
Queued WUs 63

--------------------------------------------------------------------------------

WU downloaded 221
Results uploaded 166
Total Time 1.19251 years
Total CPU Time 37 days 20 hr 16 min
Average Time 2 days 14 hr 56 min
Average CPU Time 5 hr 28 min (8% efficient)
Connections 325
Last connection Fri 2004 Mar 05 4:32:02pm (2 hr 21 min ago)


Have things started working?
 

Smoke

Distributed Computing Elite Member
Jan 3, 2001
12,650
207
106
Originally posted by: Rattledagger

Defective wu? Setiqueue checks & deletes corrupted wu then downloading or before delivering to a client......

RD, the DEFECTIVE WUs to which I refer are the ones that were sent out long ago during the slowdown and final crash of one of the main hard drives on the S@H Server. These WUs were distributed wide and far and are still out there in the Ethernet. When a CLIENT attempts to send one of these DEFECTIVE WUs (after crunching) through a SetiQueue the WU gets refused/rejected by S@H and the SetiQueue reports that the WU was defective and also rejects it. A replacement WU to the CLIENT is NOT sent.

Depending on the installation on the CLIENT MACHINE the following happens:

A. If the installation is either SetiDriver or SetiHide, the CLIENT COMPUTER just crunches away on the next WU and when that WU is completed both the newly crunched (good WU) and the still un-transmitted (defective WU) are sent on to their SetiQueue. The SetiQueue forwards both WUs to S@H but once again S@H rejects the DEFECTIVE WU and the SetiQueue once more reports that the WU was defective and also rejects it. This situation will go on forever ... until the DEFECTIVE WU is either FIXED or DELETED. Deleting is easier. Completed non-defective WUs are continued to be crunched and for all practical purposes things go on as if nothing is wrong. The BAD is that the SetiQueues get burdened with a increase in bandwidth demand for they try over and over to submit the Defective WUs each time the CLIENT sends them in along with other non-defective WU results.

B. If the installation is one of our Service Installs that does not have a cache and only works on one WU at a time, the following happens: When the DEFECTIVE WU has been crunched it (the CLIENT COMPUTER) tries to send it in. Because S@H rejects it and the SetiQueue dutifully reports the same, the CLIENT never gets back (1) a status report from S@H via the SetiQueue OR (2) a NEW WU. The CLI CLIENT is stuck in a "Catch 22" situation and for all practical purposes is DEAD.

The fix for "B" is the same as "A": Either repair the DEFECTIVE WU or DELETE IT.

In situation "A", the trouble is more a waste of bandwidth and time but in situation "B", the trouble is terminal until rectified.

:)
 

Rattledagger

Elite Member
Feb 5, 2001
2,989
18
81
Well, since never got any of these defective wu to manage being distributed through my setiqueue haven't had any problems returning the wu back to setiqueue either so... :confused:
Of course wu that isn't downloaded from a setiqueue can be corrupted, but my comment wasn't about this, and I've only had this problem in BOINC. ;)



Anyway, from the log, a pass-through should normally only happen if someone is trying to either setup a new account or new install without user_info.sah so looks like a corrupted user_info.sah is your problem here...