Perl: Help debug my script!

n0cmonkey · Oct 15, 2005

This script should be simple, but it's not working for me. It takes each line of file1 and compares it to each line of file2. When they match, it should incriment a count. It's working fine with the first line of file1, but not with the rest.

File 1 is just 0-9, one number per line. File 2 is a bunch of 0-9's, in there kind of randomly, one number per line.

n0cmonkey · Oct 15, 2005

Ok, this seems to work. Any other suggestions, hints, etc.?

Red and black · Oct 16, 2005

I can't tell exactly what problem you're trying to solve (the above code looks rather strange), but it sounds like grep and/or sort would do it.

BTW: when using variables in Perl code you don't have to put quotes around them, unless you're trying to build a bigger string by interpolating. Also, "use strict" is your friend.

n0cmonkey · Oct 17, 2005

Originally posted by: Red and black
I can't tell exactly what problem you're trying to solve (the above code looks rather strange), but it sounds like grep and/or sort would do it.

BTW: when using variables in Perl code you don't have to put quotes around them, unless you're trying to build a bigger string by interpolating. Also, "use strict" is your friend.

grep and sort won't work for this. There'd be too many false positives. I've used it for similar things and it takes FOREVER to get through this much data.

I know about the quotes, sh habits.

I don't know what use strict does, I'm just starting to learn perl.

Bulldog13 · Oct 17, 2005

I don't think I have actually ever seen you ask a question. Ever. What are you using to learn perl ? book or online tutorials ?

n0cmonkey · Oct 17, 2005

Originally posted by: Bulldog13
I don't think I have actually ever seen you ask a question. Ever. What are you using to learn perl ? book or online tutorials ?

"Perl for Dummies."

I've asked a few, most things are googleable though.

oog · Oct 17, 2005

based on my reading of the first set of code that you wrote, you open both files, you iterate through the first and then see if that line exists in the second. however, the problem is that you don't reset your inner loop. your inner loop (while(<DSTFILE> ) ...) can only go through the file once because you do nothing to reset it to the beginning of the file. once DSTFILE is at the end, it's at the end and subsequent calls to try to start that loop exit immediately. that explains why only the first line is matching.

the second code you wrote fixes this by reopening the file every time you iterate on the outer loop.

if the files are not large and you prefer to avoid the IO, you could always read the 2nd file into memory so that you don't have to read the file over and over. you may even be able to put it into a hash so that the search for a match is quicker.

Red and black · Oct 18, 2005

As for grep: The following works for me with NetBSD grep: `grep -Fx -f file1 file2 | sort | uniq -c`

BTW: the Perl Cookbook is great for learning by example.

"use strict" requires you to declare variables, rather than having undeclared variable default to global.

Also, as oog said unless the first file is very large, doing the counting with a hash is probably the best way:

#!/usr/pkg/bin/perl -w
use strict;

open( KEYS, $ARGV[0] ) or die $!;

my %counts;

while ( <KEYS> ) { $counts{ $_ } = 0; }

close KEYS;

open( DATA, $ARGV[1] ) or die $!;

while ( <DATA> ) { $counts{ $_ }++ if exists $counts{ $_ }}

for my $key ( keys %counts )
{
next unless $counts{ $key };
print "$counts{$key}\t$key";
}

oog · Oct 18, 2005

Originally posted by: Red and black
As for grep: The following works for me with NetBSD grep: `grep -Fx -f file1 file2 | sort | uniq -c`

BTW: the Perl Cookbook is great for learning by example.

"use strict" requires you to declare variables, rather than having undeclared variable default to global.

Also, as oog said unless the first file is very large, doing the counting with a hash is probably the best way:

#!/usr/pkg/bin/perl -w
use strict;

open( KEYS, $ARGV[0] ) or die $!;

my %counts;

while ( <KEYS> ) { $counts{ $_ } = 0; }

close KEYS;

open( DATA, $ARGV[1] ) or die $!;

while ( <DATA> ) { $counts{ $_ }++ if exists $counts{ $_ }}

for my $key ( keys %counts )
{
next unless $counts{ $key };
print "$counts{$key}\t$key";
}

nice answer. it's been a while since i've written much perl. also, don't forget to close DATA.

Red and black · Oct 18, 2005

Originally posted by: oog
don't forget to close DATA.

Actually, there's no real need to close either file, as they'll get closed automatically at the end of the program.

Also, to the OP: congrats on starting in on learning Perl! It's super duper useful. I most often use it as a sort of ueber-sed. Check out what `perl -ple` and `perl -nle` do. Also, `perl -pi -e`.

Oh, for a good time replace the last loop with a one-liner:
print "$counts{$_}\t$_" for grep { $counts{$_} } keys %counts;

statik213 · Oct 18, 2005

how big are these files? how many files are you going to compare like this?
If you are going to be doing this on a huge number of small files, you might want to sort one of the files (the smaller one) and do a binary search on the sorted 'file' (store it as an array?), for each line of the unsorted file. u'll get n log(n) performance as opposed to n^2, you should see a pretty significant boost in performance.... but probably not worth ur time if you don't have a HUGE number of files to process....
no idea abt the perl stuff though, got to learn it. g'luck.

n0cmonkey · Oct 18, 2005

This time the bigger file was about 6MB. Small for some of the files I've been messing with. The work is probably done by now (I haven't been in to work to check yet), so I hopefully won't ever have to do this again. Although, it is a nice little thing that gives me a goal for perl scripts. It's been making learning perl easier.

Thanks for the help, suggestions, and encouragement!

Search

Perl: Help debug my script!

n0cmonkey

Elite Member

n0cmonkey

Elite Member

Red and black

Member

n0cmonkey

Elite Member

Bulldog13

Golden Member

n0cmonkey

Elite Member

oog

Golden Member

Red and black

Member

oog

Golden Member

Red and black

Member

statik213

Golden Member

n0cmonkey

Elite Member

TRENDING THREADS