Dupe stubs, possible pproxy net protocol problem ?

LeBlatt

Golden Member
Dec 8, 1999
1,220
0
76
Joe O noticed that I had dupe stubs in my OGR logs. Scanning my pproxy log (thx excel), here's what I found.

11/06/00 10:28:18,Uplink: [1] The perproxy says: "[pp] TheCool1's Anandtech Baby Bovine pProxy"
[...]
11/06/00 10:28:42,server: Received ogr stub 25/8-3-4-27-18-10

and later :

11/06/00 10:38:22,Uplink: [1] The perproxy says: "[pp] TheCool1's Anandtech Baby Bovine pProxy"
[...]
11/06/00 10:37:32,server: Received ogr stub 25/8-3-4-27-18-10
11/06/00 10:38:03,Uplink: [1] 24.177.191.77 disconnected (reading).

What is strange here, is that the log has lines with a dated 10:37 between lines dated 10:38. Wonder what causes this, but it's not the matter for now.

Obviously the BB sent me the same stub twice. Either they were part of different passes of OGR-25 (unlikely, did we start pass 2 yet ?) or the BB didnt get acknoledgement that I had received the stub correctly and reissued it. That may be because my pproxy got a huge flush and fetch query from one of my comps (10K in and out) during the connection with the BB, causing a timeout.
In the end, the stub was issued to 2 comps on my LAN and counted twice in my log, while probably once by dnet.

The problem, I think, is that we can possibly do work that will not be accounted for if a pproxy uplink connection is interferred with by a client request.

Ideas, anyone ?
 

TwoFace

Golden Member
May 31, 2000
1,811
0
0
LeBlatt, bummer on the stub! :(

I've got one idea, but I'm not sure that it works... you could run two instances of the pproxy and have scheduled updates between them...

Say one pproxy for client connections and one for outgoing to the BBs... So whenever you fetch/flush to the BBs that instance of the pproxy won't get any blocks from the other pproxy since both can have scheduled updates... this way you'll also know exactly when to look for your blocks at Mika's :)

Dunno if this works, but I think it should!

I know you'll get other ideas too!

With love and respect your fellow TA member

Two-Face
 

Joe O

Senior member
Oct 11, 1999
961
0
0
JonB, But LeBlatt had five different pairs of duplicate stubs! Thats statistically unlikely to the 5th power!!!!!
 

JonB

Platinum Member
Oct 10, 1999
2,126
13
81
www.granburychristmaslights.com
Perhaps its time for LeBlatt to play the French equivelant of the Lottery! Or, he can head to the casinos. They are close by for him.

Most likely, somebody has re-used some "inbuffers" that they thought they deleted. I've done it before. I'm not proud of it.
 

imported_Thunder

Senior member
Oct 14, 1999
509
0
0
LeBlatt,

I can think of two times that I've seen a situation occur that might explain what happened. Both times, I was shutting down a Win9x machine and got an error message from the dnetc client that it had not exited properly. The machine shut down and when it was restarted, it recovered the checkpoint file (just like it should) and resumed counting the stub. So here's the odd part... The first machine this happened on is "stand-alone", with it's own buffers. When it finished the stub that it had recovered from the checkpoint file, it then loaded the exact same stub from the buff-in with it being partially done (done to the point where the lockup occured on shutdown) and counted it again. The second time this happened, it was on a machine that shares it's buffer (via remote rather than actual "shared" buffers) and instead, while the first machine was cracking the stub from the checkpoint recovery, the second machine loaded that same stub (again partially done as well) and finished it.

I don't know if this is what happened to you, but I thought it might help explain things.

-Brian
 

LeBlatt

Golden Member
Dec 8, 1999
1,220
0
76
Thunder : This wasn't a client fault, as my pproxy got the stub twice. After that it worked normally, ie handled the stub to 2 different clients, which processed and reported it.

I've not yet looked at the logs for the 4 other dupes, but I already know those 4 other were processed twice by the same machine. Might be another problem there. I'll let you know what comes out.

what I understand is that my pproxy gave priority to the client connection, causing a timeout with the BB uplink. The BB not receiving ack for the stub it had sent, believed it was lost and kept it in its inbuff, issuing it to my pproxy again a few minutes again when it asked for more.

So the real problem here is that my pproxy, not being able to acknoledge that the stub was received, should not have put it in its inbuff, since it was going to be reissued by the BB. Though I don't know if the pproxy has a way to know that the acknoledgement has not been sent, as I dont write network layers ;).

If someone is bored enough to have a look at the log, here is a zip (22K)

/edit darn geocities... direct link doesnt load ! Well you can DL PP-20001106.zip from here
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
Humm, this could be grave indeed.:( With RC5, most stubs aren't big enough to worry about; but OGR is a different story. Hopefully, it's just a rare occurence due to bad shutdowns; otherwise, we may be losing a good chunk of work to a bug.:Q
 

Jator

Golden Member
Jun 14, 2000
1,445
7
81
Anyone with enough knowledge wanna write a script to look for dupes in logs? I don't have the cranial capacity for it, but I'm sure someone here does.

Jay
 

LeBlatt

Golden Member
Dec 8, 1999
1,220
0
76
Here's my walkthru to do it manually if someone is willing to automate it, but it's quite easy in excel :

- I keep all the ogr (pproxyogr-yyyymmdd.log) and rc5 (pproxyrc5-yyyymmdd.log) logs at the pproxy, just delete the pproxy logs after a while since they're quite big.

- run COPY PPROXYOGR*.* STUBS.TXT
this will concatenate all logs in one file.

- This next step may be optional with US version of excel. My FR version won't handle well the mmddyy format the logs use.
Open STUBS.TXT in your text editor with search/replace function. Notepad 2k is ok, olders not. Do a search on /00 and replace with /00,. This will cause to separate dates from times.

- Start excel, open file STUBS.TXT. File import wizard pops up. Choose variable lenght ANSI file, not fixed.

- Choose comma as separator.

- Set the format for the first column as Date MDY.

-Import.

- You will want to set the stub size column format as integer in order not to have exponent notation.

- Given the Stub column is E, copy this formula in J1 : =IF(E1=E2;"Dupe";"") ; copy and paste on all lines. This will cause a stub that is equal to the next one to be marked as a dupe.

- Sort the table by column E.

- Browse column J to look for dupes. You'll have the dates, sizes and IP.
 

Joe O

Senior member
Oct 11, 1999
961
0
0
Jay, That doesn't look like a dupe - the node counts are different! Was this from a PProxy log or a client log? If it was from a client log, it could be a shutdown at xx% followed by the completed at 100%.