Perl and WWW Mechanize

CrazyLazy

Platinum Member
Jun 21, 2008
2,124
1
0
Hey, AT programming I am looking for your wonderful assistance again. I am trying to write a perl script that will go to a given web page, search for a link with a given anchor text, and then return the URL of that link. Here is what I have so far,

#!/usr/bin/perl -w

use WWW::Mechanize;

my $agent = WWW::Mechanize->new();
$agent->get("http://www.google.com");

my @links = $mech->find_all_links(
tag => "a", text_regex => qr/\bStuff\b/i );

Now this doesn't even compile, and I really don't think it's even remotely close to being correct. I have no experience with perl and all the tutorials and resources I found online were less than helpful. Any links or help you could provide me would be great.
 

esun

Platinum Member
Nov 12, 2001
2,214
0
0
First, make sure you actually have WWW::Mechanize installed (only a subset of packages on CPAN are installed by default, and I don't believe WWW::Mechanize is one of them).

Assuming you are on a UNIX box and have admin privileges, type at a shell:

perl -MCPAN -e shell

That will cause a another shell to come up. Into that shell, type 'install WWW::Mechanize' and type "yes" at any prompts that come up (if this is your first time running the CPAN shell, you'll have to do some initial setup by following some instructions).

As for your code, I haven't used WWW::Mechanize myself, so I'm not 100% sure of its usage. However, you definitely need to replace $mech with $agent on line 5 (excluding blank lines).

 

CrazyLazy

Platinum Member
Jun 21, 2008
2,124
1
0
Originally posted by: esun
First, make sure you actually have WWW::Mechanize installed (only a subset of packages on CPAN are installed by default, and I don't believe WWW::Mechanize is one of them).

Assuming you are on a UNIX box and have admin privileges, type at a shell:

perl -MCPAN -e shell

That will cause a another shell to come up. Into that shell, type 'install WWW::Mechanize' and type "yes" at any prompts that come up (if this is your first time running the CPAN shell, you'll have to do some initial setup by following some instructions).

As for your code, I haven't used WWW::Mechanize myself, so I'm not 100% sure of its usage. However, you definitely need to replace $mech with $agent on line 5 (excluding blank lines).

Yes I have installed WWW Mechanize and confirmed it works. I just don't know how to do anything with it.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,512
4,362
75
Definitely look at the cookbook if you haven't already; although your code looks similar to some of those examples.

Also, when writing Perl, always "use strict;". This will tell you that "Global symbol "$mech" requires explicit package name...", i.e. $mech was used without being declared.
 

CrazyLazy

Platinum Member
Jun 21, 2008
2,124
1
0
I took a look at the documentation/cookbook, but frankly I'm still a little lost. Here is the new code,

#!/usr/bin/perl -w

use WWW::Mechanize;

my $mech = WWW::Mechanize->new( autocheck => 1 );

$mech->get("http://google.com");
print $mech->find_link( text => 'Business Solutions');


Now in theory this should print the link on entitled "business solutions" on google's homepage, but it doesn't do print anything, but it does seem to compile. I'll take a look at the documentation again and see if I can figure it out.
 

IronWing

No Lifer
Jul 20, 2001
71,635
31,531
136
Originally posted by: CrazyLazy
I took a look at the documentation/cookbook, but frankly I'm still a little lost. Here is the new code,

#!/usr/bin/perl -w

use WWW::Mechanize;

my $mech = WWW::Mechanize->new( autocheck => 1 );

$mech->get("http://google.com");
print $mech->find_link( text => 'Business Solutions');


Now in theory this should print the link on entitled "business solutions" on google's homepage, but it doesn't do print anything, but it does seem to compile. I'll take a look at the documentation again and see if I can figure it out.

Try this in place of the line above:

$mech->get('h ttp://google.com');

I changed double quotes to single quotes to stop perl from trying to interpolate the literal string. I also added a space to keep the forum software from making a link out of the code.
 

esun

Platinum Member
Nov 12, 2001
2,214
0
0
Originally posted by: CrazyLazy
I took a look at the documentation/cookbook, but frankly I'm still a little lost. Here is the new code,

#!/usr/bin/perl -w

use WWW::Mechanize;

my $mech = WWW::Mechanize->new( autocheck => 1 );

$mech->get("http://google.com");
print $mech->find_link( text => 'Business Solutions');


Now in theory this should print the link on entitled "business solutions" on google's homepage, but it doesn't do print anything, but it does seem to compile. I'll take a look at the documentation again and see if I can figure it out.

This code works fine aside from what you're printing.

find_link() returns a WWW::Mechanize::Link object (which it says pretty explicitly in the CPAN documentation). A WWW::Mechanize::Link object has the following methods:

url()
text()
name()
tag()
base()
attrs()
URI()
url_abs()

So if you wanted it to print the absolute URL, you'd do:

print $mech->find_link(text => "Business Solutions")->url_abs();
 

CrazyLazy

Platinum Member
Jun 21, 2008
2,124
1
0
Originally posted by: esun
Originally posted by: CrazyLazy
I took a look at the documentation/cookbook, but frankly I'm still a little lost. Here is the new code,

#!/usr/bin/perl -w

use WWW::Mechanize;

my $mech = WWW::Mechanize->new( autocheck => 1 );

$mech->get("http://google.com");
print $mech->find_link( text => 'Business Solutions');


Now in theory this should print the link on entitled "business solutions" on google's homepage, but it doesn't do print anything, but it does seem to compile. I'll take a look at the documentation again and see if I can figure it out.

This code works fine aside from what you're printing.

find_link() returns a WWW::Mechanize::Link object (which it says pretty explicitly in the CPAN documentation). A WWW::Mechanize::Link object has the following methods:

url()
text()
name()
tag()
base()
attrs()
URI()
url_abs()

So if you wanted it to print the absolute URL, you'd do:

print $mech->find_link(text => "Business Solutions")->url_abs();

Thank you, that worked and explained it perfectly.