SETI WU that will not finish?

Poof · Feb 11, 2001

JWM - the VLARs are being processed but NOT with the win CLI. I hope this is made clear.

When the first big run of VLARs went out under the early 3.x clients back last fall, people aware of the inconsistencies of the processing, shunted them to OTHER OS clients, eg., Linux.

I'm not sure if it's been made clear (at least over here at Anandtech), but hopefully I can do so now:

For EVERY client OTHER THAN the win 3.x CLI (and that includes the win 3.x GUI), VLARs RUN FASTER THAN MID ANGLE RANGES AND SLIGHTLY SLOWER THAN HIGH ANGLE RANGES (with the VHARs running FASTER than mids).

As a hypothetical example, if you run a 0.417 (mid) in say 11 hours, the SAME machine should run a 0.033 (VLAR) in maybe 10 hours... NOT 12 hours!! And similarly, the SAME machine would run an 11 (VHAR) in about 8 hours.

Once again: VLARS RUN SLOWER ONLY ON THE WINDOWS TEXT CLIENT.

Your production running VLARS is SUFFERING because the THE WIN CLI IS BROKEN. If the win CLI WORKED correctly, you would WISH for VLARs! They would be considered FAST WUs!!!

Understand?

Very little analysis is happening at the VLAR and VHAR... so the slowdown is NOT due to any "science". All of the science occurs at the mid ranges, which is why the TLC benchmark unit was changed from 6.718 (VHAR) to 0.417 (mid AR), to fairly test the hardware when EVERYTHING was being analyzed in the WU.

Please understand this.

I hate to say that in order to grasp the full meaning of this, along with the proof (including CpF charts produced by Roelof Engelbrecht - who you would know as the author of SetiSpy), you'd have to hunt through at least 3-4 threads where we TESTED this. That's why I created a VLAR thread by itself to get it all consolidated and make people aware.

It has now been ACKNOWLEDGED as a problem by one of the SETI porters, who posted in that thread - "Lawrence" (who is Lawrence Kirby of alt.sci.seti fame) who even indicated that he wrote a little test "fix" for it for himself based on some reports about the Solaris port.

Robor · Feb 11, 2001

Poof: Yes, all of my NT machines (Server & Workstation) are patched to Service Pack 6. They pretty much run 24/7 with very few reboots and give me no problems outside of those created by the end users. 🙂

Rob

Poof · Feb 11, 2001

Cool Robor.

(and /me hopes you mean SP 6a and not SP 6... LOL 😉)

JWMiddleton · Feb 11, 2001

Poof: I hear what you are saying, but...

<< JWM - the VLARs are being processed but NOT with the win CLI. I hope this is made clear. >>

Boycott does not mean queue/WU management! A lot of us on this forum noticed a big increase in VLAR WUs in late Nov-Dec time frame. I suspect they were due to them being recycled and given to those who were not culling our queues. I'd hate to see this start again!

I know very well the characteristic of the various types of WUs. And I had stated months ago that it appears to be a bug, of sorts, in that it only effects certain OSes.

I will send you my data as I stated before as I'd really like to resolve this!

Robor · Feb 11, 2001

Poof: Yep, they are all on SP6a. I've got 4 Win2K boxes arriving late next week. They will be the first of what I expect we will be moving to in the near future.

Rob

Poof · Feb 11, 2001

JWM - Appreciate anything you have! 🙂

Also - maybe I can make it even clearer:

VLAR + VHAR = "Fast WU"
Mids = "slow WU"

This is the way it's SUPPOSED to be. With EVERY OTHER client, the above is true. It is ONLY the win CLI where VLAR = "slow WU". If the CLI was fixed, then VLAR = "fast WU". 🙂

I suspect they were due to them being recycled and given to those who were not culling our queues. I'd hate to see this start again!

A real easy way to determine if these were "recycled" from previous times is to look at the dates when the data in the WU was originally collected at Arecebo. I believe that Setispy lists the date the data was collected.

And think carefully about the following - I am CERTAIN that if they are recycling anything, the recycles are coming from those 1,000,000+ BOGUS results that were recently thrown out after that client hack fiasco was revealed, and are now being redistributed to be re-processed correctly. Those WUs would include both the large VLAR-run from late last year AND the subsequent VHAR "mistake" WUs (generated from the problematic slewing of the telescope causing alot of 11 & 12 AR WUs).

The problem with the hacked 3.00 CLI client (and later 3.03 CLI) was that it downloaded LEGITIMATE WUs (including VLARs) but didn't do any analysis on them, however that fact was never caught by the SETI@home folks. Thus the WU run with that hacked client and its results, had been kept on file. These must now be resent out.

If infact they are redistributing those 1,000,000 WUs right now, in the same order that they were distributed before, then I would predict that after the VLAR run, there would a corresponding VHAR run (I do recall them indicating that they managed to delete a number of those VHARs before they were sent out but ALOT were still sent out and were processed, and are in their database for verification re-running).

Please don't think that a handful of people who might be holding onto and/or shunting no more than a hundred or so VLAR WUs for later analysis, would somehow account for any but the tinyest fraction of a fraction of a percent of the total WUs sent out. As it is, based on Roelof's charting of the distribution of WUs, only about 10% of the total WUs sent out fall in the category of "VLAR", which we know are WUs of AR < 0.1.

What gets us queuers is that we tend to pull down alot at one time and neither the VLARs or the VHARs are random enough (they come in piles) to avoid getting dumped on.

[EDIT: speeling 😉]

JWMiddleton · Feb 12, 2001

[Poof: The SetiSpy logs don't help. The time it reports to complete a WU is just slightly less that the delta time between log entries. The difference is the latency of downloading a WU. I suspect that is what was reported on ARS rather than clock time vs. CPU time. So, does anything report CPU time? The only option I see is to monitor Wintop, which would be a real pain!

As far as VLAR, VHAR and others: If the TeraFLOPs reported by SetiSpy are correct, then one should be able to judge the completion time for each WU very easily. But, with Win9x clients it isn't linear. I have even seen a well-formed sine wave when graphing WUs in the .5 to 1.0 range. Makes no sense!

Poof · Feb 12, 2001

JWM - Hellburner was using an alternate Task monitoring program (forgot the name offhand but he'll probably pop in and bug me... and post it...😀)

One of the issues that I've pretty much confirmed is that the descrepancies in time are exaggerated when not having some SP or hotfix in place. This issue would be considered problem #2 of a series of problems that are sortof different from the VLAR problem in general. I need to research it a little more (and I had someone volunteer to check 2K w/o SP1 to confirm - my 2K is so loaded up and has been running just fine for 130+ days, that I'm afraid to patch it and reboot it to test... 😉)

It seems that a problem #3 is cropping up where people are reporting WUs that either complete but won't upload and/or will crash just short of completing... It gets curiouser and curiouser... 🙁

I have even seen a well-formed sine wave when graphing WUs in the .5 to 1.0 range. Makes no sense!

You mean like this (most fascinating thing that I've ever seen)????!!!!!!

🙂

I saw the above first reported in this thread. And back then, I thought it was cool as hell!!!!

(these things get me excited!.... Uh... errr... not that way of course....😛)

Sukhoi · Feb 12, 2001

Hmm, I'm getting a problem similar to this on the computer I assimilated at my school. When I first started the computer, all I had in the SETI folder besides the normal programs, was my user_info.sah since I was gonna install SETI as a service and I couldn't access it to log it into the servers. I also put in the proxy switch to use Orange Kid's proxy because port 80 (normal SETI servers) is blocked, but port 5001 (Orange Kid's server) doesn't seem to be blocked. SETI installed fine as a service, and automatically connected to Orange Kid's server and DLed the WU to work on. It worked on the WU fine, and actually finished the WU during class today when I was watching the computer. But it wouldn't upload the WU! It seems like it tried to connect and couldn't, and then SETI just sat their idle. Right before class ended I rebooted the machine, but I didn't have time to see if that fixed anything. I'll check tomorrow morning. I hope they didn't block port 5001. 🙁 It's a PII 400 WinNT 4.0 SP6 if anyone cares.

JWMiddleton · Feb 12, 2001

Poof: Yep, it was like that one; a downward-sloping sine-wave type form. That defies definition! This is why I like RC5! If you have an 800 MHz CU or CU128 (Celly2) you know exactly what to expect, down to the block! With SETI, the 800 MHz CU will do better than the 800 MHz CU128. If it is a 66 MHz bus on the RC5 machine, so what, it still performs just like the 133 MHz bus box. I realize it is due to the working set of the executable, but it is very nice to have it perform as predicted!

Sukhoi · Feb 12, 2001

Hmm, it seems that restarting the computer at the end of class may have worked. There's an odd connection under my account in Orange Kid's logs, which may be the school computer. I'll check tomorrow at school.

Poof · Feb 12, 2001

Yep, it was like that one; a downward-sloping sine-wave type form. That defies definition!

I STILL think it's the coolest thing to look at! 😀 And yeah, RC5 IS much more predictable... And speaking of that, I benched my 433Mhz Alpha (21164) and it can do about 700K keys/s... not wonderful, but then it is an older machine. Alphas aren't really known for their integer speed anyway...

An Sukhoi - let us know how it goes with that machine and the WU. 🙂

Assimilator1 · Feb 13, 2001

I also had a client stall on me when it tried to get a WU from the old SETIQ ,but for some unknown reason it couldn't🙁 ,what was wierd was that the auto dial up window had popped up(despite my NIC IP be shown in the SETI window) ,it had presumable tried to connect to the net!😕 wtf??? ,it also did this after I rebooted the PC but it then managed to connect to SETIQ.

Poof · Feb 13, 2001

Assim1 - I've had that happen every once in awhile. I usually end up stopping and starting the client a few times and if that doesn't work, about 99% of the time, if I stop and restart setiqueue, the client will then upload just fine...

But what some people are reporting are random 3.03 WUs that refuse to upload at all to anywhere - including directly to seti. 🙁

Sukhoi · Feb 13, 2001

Well, it seems that when I rebooted the PC at the end of class, it then uploaded the WU. I'm not sure why it didn't upload the WU when it finished it. I'm interested to see if the next WU that's finished uploads automatically.

Wiz · Feb 13, 2001

I'm using seti driver and have had this in one instance for hours now:

"Sending result - connecting to server."

Another machine on my lan has sent results since this has been going on so it's just this one that's hanging. What do I do if it can't send it's results? I've stopped and started seti driver several times, rebooted the machine, same thing. 🙁🙁

Sukhoi · Feb 14, 2001

Hmm, I dunno Wiz, sorry. 🙁

Poof, the machine automatically connected when the WU finished today. I really have no idea why it didn't autoconnect for the first WU. I hope the next one connects fine too.

JWMiddleton · Feb 14, 2001

I had two boxes doing RC5 that I moved over to SETI yesterday. They are on a LAN with the system that has SetiQ. I put SetiDriver on both and loaded 2 WUs into each queue. When I woke this morning I found that one had completed its' WU and had started on the second. Like a good little ship, it had transmitted the results to SetiQ. The second one was sitting at 100% complete and was doing nothing! 🙁:| I stopped transmit on that ship and restarted SetiQ on the server. Still wouldn't transmit. I stopped and restarted SetiDriver and the WU was transmitted. But, it caused SetiQ to dial into the Internet and Dump/Upload a WU?!?!? I wasn't at a threshold, so I don't know why it happened. It had done the passthru thing when I first started it?!?!

Well, the wayward ship looked like it was starting the new WU, but the %Complete stayed at Zero for a long time (10 minutes, or so.) I stopped SetiDriver and rebooted the machine and all was well! :Q

Robor: Was this the same thing you were seeing??

This does not appear to be an isolated incident. How do we report such stuff to S@H? Do they listen? Or, is this project too big for them to handle?

Robor · Feb 14, 2001

I'm not using SetiDriver on my fleet but it does sound similar. I use SetiLog in combo with the CL client. My SetiWatch reported the WU 99.97% complete. Restarting the client and SetiQ didn't help. I had to blow away the \Data directory (where I run SETI from) and copy over fresh copies of SetiLog and seti.exe (I rename the client with the /s switch in SetiLog). After that everything's been working smooth. Oh, I'm using the .78 SetiQ at work.

Rob

Poof · Feb 14, 2001

JWM et al... Seems a bunch of people have been reporting hangs with random results trying to connect to either a queue or directly to SETI. So much so that I had kindof considered it problem #3 with the 3.03. Seems that rebooting helps - or as an alternative, I discovered (as part of troubleshooting one particular result I had that wouldn't upload) via Lawrence Kirby (who posts on alt.sci.seti and is a porter for one of the clients), that when I sent him the zipped up directory with the WU that wouldn't upload, he was successful at getting it to go. And in this case, I think it may have had to do with sending it from a different machine....

I am not a programmer but I'm wondering now if something is happening with the win network sockets - similar to what had been happening with the folding@home client, ie., sockets weren't closing after each WU upload and then after awhile, there were no more left to use to upload the next WU. Rebooting would clear it out (or alternately, uploading the result from another win machine to at least get some credit).

I guess this may be something to investigate further. I know that even with 3.0, I don't recall this problem to such a degree... 🙁

Wiz · Feb 15, 2001

I have tried my WU that can't be sent on two machines. One W2K pro and the other W2K server.
I will try it from a W98 box next. I have tried it and tried it over and over with reboots etc not doing any good. I'll post again after trying it from W98.

Wiz · Feb 15, 2001

It does the same on Win98. Sits there saying:
"Sending result - connecting to server"
forever.

SETI WU that will not finish?

Diamond Member

Elite Member

Diamond Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Diamond Member

Elite Member

Diamond Member

Elite Member

Diamond Member

Elite Member

Diamond Member

Elite Member

Diamond Member

Elite Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Diamond Member