Perl: Help debug my script!

n0cmonkey

Elite Member
Jun 10, 2001
42,936
1
0
This script should be simple, but it's not working for me. It takes each line of file1 and compares it to each line of file2. When they match, it should incriment a count. It's working fine with the first line of file1, but not with the rest.

File 1 is just 0-9, one number per line. File 2 is a bunch of 0-9's, in there kind of randomly, one number per line.

 

Red and black

Member
Apr 14, 2005
152
0
0
I can't tell exactly what problem you're trying to solve (the above code looks rather strange), but it sounds like grep and/or sort would do it.

BTW: when using variables in Perl code you don't have to put quotes around them, unless you're trying to build a bigger string by interpolating. Also, "use strict" is your friend.
 

n0cmonkey

Elite Member
Jun 10, 2001
42,936
1
0
Originally posted by: Red and black
I can't tell exactly what problem you're trying to solve (the above code looks rather strange), but it sounds like grep and/or sort would do it.

BTW: when using variables in Perl code you don't have to put quotes around them, unless you're trying to build a bigger string by interpolating. Also, "use strict" is your friend.

grep and sort won't work for this. There'd be too many false positives. I've used it for similar things and it takes FOREVER to get through this much data.

I know about the quotes, sh habits. :p

I don't know what use strict does, I'm just starting to learn perl.
 

Bulldog13

Golden Member
Jul 18, 2002
1,655
1
81
I don't think I have actually ever seen you ask a question. Ever. What are you using to learn perl ? book or online tutorials ?
 

n0cmonkey

Elite Member
Jun 10, 2001
42,936
1
0
Originally posted by: Bulldog13
I don't think I have actually ever seen you ask a question. Ever. What are you using to learn perl ? book or online tutorials ?

"Perl for Dummies." :p

I've asked a few, most things are googleable though.
 

oog

Golden Member
Feb 14, 2002
1,721
0
0
based on my reading of the first set of code that you wrote, you open both files, you iterate through the first and then see if that line exists in the second. however, the problem is that you don't reset your inner loop. your inner loop (while(<DSTFILE> ) ...) can only go through the file once because you do nothing to reset it to the beginning of the file. once DSTFILE is at the end, it's at the end and subsequent calls to try to start that loop exit immediately. that explains why only the first line is matching.

the second code you wrote fixes this by reopening the file every time you iterate on the outer loop.

if the files are not large and you prefer to avoid the IO, you could always read the 2nd file into memory so that you don't have to read the file over and over. you may even be able to put it into a hash so that the search for a match is quicker.
 

Red and black

Member
Apr 14, 2005
152
0
0
As for grep: The following works for me with NetBSD grep: `grep -Fx -f file1 file2 | sort | uniq -c`

BTW: the Perl Cookbook is great for learning by example.

"use strict" requires you to declare variables, rather than having undeclared variable default to global.

Also, as oog said unless the first file is very large, doing the counting with a hash is probably the best way:

#!/usr/pkg/bin/perl -w
use strict;

open( KEYS, $ARGV[0] ) or die $!;

my %counts;

while ( <KEYS> ) { $counts{ $_ } = 0; }

close KEYS;

open( DATA, $ARGV[1] ) or die $!;

while ( <DATA> ) { $counts{ $_ }++ if exists $counts{ $_ }}

for my $key ( keys %counts )
{
next unless $counts{ $key };
print "$counts{$key}\t$key";
}
 

oog

Golden Member
Feb 14, 2002
1,721
0
0
Originally posted by: Red and black
As for grep: The following works for me with NetBSD grep: `grep -Fx -f file1 file2 | sort | uniq -c`

BTW: the Perl Cookbook is great for learning by example.

"use strict" requires you to declare variables, rather than having undeclared variable default to global.

Also, as oog said unless the first file is very large, doing the counting with a hash is probably the best way:

#!/usr/pkg/bin/perl -w
use strict;

open( KEYS, $ARGV[0] ) or die $!;

my %counts;

while ( <KEYS> ) { $counts{ $_ } = 0; }

close KEYS;

open( DATA, $ARGV[1] ) or die $!;

while ( <DATA> ) { $counts{ $_ }++ if exists $counts{ $_ }}

for my $key ( keys %counts )
{
next unless $counts{ $key };
print "$counts{$key}\t$key";
}

nice answer. it's been a while since i've written much perl. also, don't forget to close DATA.
 

Red and black

Member
Apr 14, 2005
152
0
0
Originally posted by: oog
don't forget to close DATA.

Actually, there's no real need to close either file, as they'll get closed automatically at the end of the program.

Also, to the OP: congrats on starting in on learning Perl! It's super duper useful. I most often use it as a sort of ueber-sed. Check out what `perl -ple` and `perl -nle` do. Also, `perl -pi -e`.

Oh, for a good time replace the last loop with a one-liner:
print "$counts{$_}\t$_" for grep { $counts{$_} } keys %counts;
 

statik213

Golden Member
Oct 31, 2004
1,654
0
0
how big are these files? how many files are you going to compare like this?
If you are going to be doing this on a huge number of small files, you might want to sort one of the files (the smaller one) and do a binary search on the sorted 'file' (store it as an array?), for each line of the unsorted file. u'll get n log(n) performance as opposed to n^2, you should see a pretty significant boost in performance.... but probably not worth ur time if you don't have a HUGE number of files to process....
no idea abt the perl stuff though, got to learn it. g'luck.
 

n0cmonkey

Elite Member
Jun 10, 2001
42,936
1
0
This time the bigger file was about 6MB. Small for some of the files I've been messing with. The work is probably done by now (I haven't been in to work to check yet), so I hopefully won't ever have to do this again. Although, it is a nice little thing that gives me a goal for perl scripts. It's been making learning perl easier.

Thanks for the help, suggestions, and encouragement! :)