Help: How can I make this perl script faster??

uCsDNerd

Senior member
Mar 3, 2001
338
0
0
Hi All,
I made this little perl script below to examine two files, both being log files created by a packet sniffer. Mainly what I'm doing is seaching each file for the same packet ID number (src and dst addys and ports are already known to match) and taking the differences between the timestamps recorded by the sniffer at each endpoint. The problem that I'm having is that the way it is set up now is VERY SLOW as it loops through a file each time looking for a match. Does anyone know how I can improve on this? Thank you for your help!

#####################################################
$local_log = @ARGV[0];
$remote_log = @ARGV[1];

open( rl_txt, ">remote_minus_local.txt" ) or die "Cannot access remote_minus_local.txt";
open( lm_txt, ">local_minus_remote.txt" ) or die "Cannot access local_minus_remote.txt";
open( file1, $local_log );

while(<file1>)
{
@l_in = split(' ', $_);

open( file2, $remote_log );
while(<file2>)
{
@r_in = split(' ', $_);

if( ($l_in[7] == $r_in[7]) )
{
$temp = $r_in[1] - $l_in[1];
print rm_txt "$l_in[1] $l_in[2] $l_in[3] $l_in[5] $l_in[6] $l_in[7] $l_in[10] $temp\n";
$temp2 = $l_in[1] - $r_in[1];
print lm_txt "$l_in[1] $l_in[2] $l_in[3] $l_in[5] $l_in[6] $l_in[7] $l_in[10] $temp2\n";

last;
}
}
}

close (file1);
close (file2);
close (rm_txt);
close (lm_txt);
##########################################################
 

oog

Golden Member
Feb 14, 2002
1,721
0
0
in your current version, you search through file2 once for every record in file1. that's a lot of times to go through file2. i haven't looked that closely at what you're doing, but if you could read the records in each file into some kind of sorted list then you could compare them once through.

for instance, if one file contained A, D, E, G and the other contained B, C, D, F and you knew that both lists were sorted, then you could compare elements from each list one at a time and move to the next element when you don't have a match.

in the example, you would compare A to B. since they don't match and A < B then you move forward through list1 and compare D to B. since those don't match and D > B, you move forward through list2 and compare D to C. you'll eventually come to the point where D is compared with D and you have a match.

it's just a matter of getting the records in each file in a known sort order. creating the sorted list should take no more than n * log(n) time, and comparing the two lists should take only n time. this gives you a net result of n * log(n) time. in your current algorithm you take n^2 time.

<edit> typo </edit>
 

Barnaby W. Füi

Elite Member
Aug 14, 2001
12,343
0
0
Sounds simple enough for grep/sed/awk, can you paste an example of the contents of the file, and what you want to get out of them? (I didn't totally understand your explanation)
 

oog

Golden Member
Feb 14, 2002
1,721
0
0
from your post, it looks like you should be sorting on the packet id number.
 

uCsDNerd

Senior member
Mar 3, 2001
338
0
0
Hi,
thanks for the quick responses. sorting on the packet id sounds good, but is there a quick (hopefully built-in) function for doing such a thing?

here's a sample line of what the data file looks like (ip addys changed ofcourse)

source destination src dst IP tcp TCP ack
if first seen address address pro port port ip id len win delay
--- ----------------- --------------- --------------- --- ----- ----- ----- ----- ----- ----------
0 1056650175.133949 xxx.xxx.xxx.134 xxx.xxx.xxx.190 423 22 2513 47224 200 - 0.001857
0 1056650193.296072 xxx.xxx.xxx.40 xxx.xxx.xxx.134 323 3222 18661 132 - 0.000462


The remote log looks just like the local log except it has a different timestamp for when the packet was received/sent.

The output that I want is simply: ( local.log timestamp - remote.log timestamp ), for each matching packet.

I hope this makes sense, I'm not familiar with awk and grep's sorting capabilities, are they the solution to my problem?

Thanks again for your help!
 

notfred

Lifer
Feb 12, 2001
38,241
4
0
let me look at this... BTW, comments are your friend. I'm formatting it correctly now, give me a few minutes.

Also, links to sample files would be nice.
 

notfred

Lifer
Feb 12, 2001
38,241
4
0
Try this:

#####################################################

# Get names of local and remote log files from command line.
$local_log = @ARGV[0];
$remote_log = @ARGV[1];

# Open output files for recording differences.
open( rl_txt, ">remote_minus_local.txt" ) or die "Cannot access remote_minus_local.txt";
open( lm_txt, ">local_minus_remote.txt" ) or die "Cannot access local_minus_remote.txt";

# Open local log file for reading.
open( file1, $local_log );

# Read the remote file into RAM so we don't keep reading it off the hard drive, which is slow.
open( file2, $remote_log );
@remote_file = <file2>;
close (file2);

# For each line in the local log:
while(<file1>){

# Make a list of variables in the current line of the local file. Make the scope of the list appropriate.
my @l_in = split(' ', $_);

# Foreach line in the remote log, deo our comparisions.
foreach my $line(@remote_file){
my @r_in = split(' ', $line);

# Some note on what the hell array[7] is might be nice....
if( ($l_in[7] == $r_in[7]) ){
print rm_txt "$l_in[1] $l_in[2] $l_in[3] $l_in[5] $l_in[6] $l_in[7] $l_in[10] ", $r_in[1] - $l_in[1], "\n";
print lm_txt "$l_in[1] $l_in[2] $l_in[3] $l_in[5] $l_in[6] $l_in[7] $l_in[10] ", $l_in[1] - $r_in[1], "\n";

# Skip the rest of the iterations for this line in the local log.
last;
}
}
}

# Close all the files that are still open.
close (file1);
close (rm_txt);
close (lm_txt);
##########################################################