perl regex fix needed

danzigrules

Golden Member
Apr 20, 2000
1,255
0
76
and I have no programming experience at all, so any help would be appreciated.

I am not sure how to post code on fusetalk so I shall post here and be bashed for the formatting and take suggestions after the fact.

while ($line = <FIL>)
{
chop $line;
# 1/14/04 12:32 AM 124 <A HREF="/data/123072.txt">123072.txt</A><br>
#1/15/04 1:59 AM 6572 <A HREF="/data/123097.txt">123097.txt</A><br>
if($line =~ /(\d{1,2})\/(\d{1,2})\/(\d{2})\s+(\d{1,2}):(\d{1,2})\s(\w{2})\s+(\d+) </)
{
#print "MATCH: \n";

if($6 eq 'PM' && $4 < 12) {
$mil_hour = $4 + 12; # i.e. 12:55 PM = 12:55 while 11:55 PM = 23:55
} elsif($6 eq 'AM' && $4 == 12) {
$mil_hour = $4 - 12; # i.e. 12:55 AM = 00:55
} else {
$mil_hour = $4;
}
$date_stamp = sprintf("20%02d%02d%02d%02d%02d", $3,$1,$2,$mil_hour,$5); # YYYYMMDDhhmm
$day_stamp = sprintf("20%02d%02d%02d", $3,$1,$2); # YYYYMMDD
$filesize = $7;
}

if($DEBUG_MODE > 1)
{
printf "D: $date_stamp,$day_stamp,$filesize\n"; # First 2 lines is just html and won't match regexp above
}

# CHECK if this feed is old, or valid for this reset (it should never be == as news don't really start till day 2/3)
if($day_stamp >= $reset_start)
{

if($line =~ /(\d{6}\.txt)/)
{
#print "Bingo!\n";
$feed = $1;

I am sure it has to do with with regex at the top but then that will also affect the variables,
I think they are called, the $1, $2, $3. Since the format that the site that the script gets the info from from has changed the way they store the date format,

Thursday, November 13, 2008 11:00 PM 496 <A HREF="/data/165455.txt">165455.txt</A><br>

Thanks for any help

danzigrules

 

ScottMac

Moderator<br>Networking<br>Elite member
Mar 19, 2001
5,471
2
0
What's the input look like?

What's it doing wrong now and

What does "right" look like?
 

danzigrules

Golden Member
Apr 20, 2000
1,255
0
76
Originally posted by: ScottMac
What's the input look like?

<pre><A HREF="/">[To Parent Directory]</A><br>
<br>
Thursday, November 13, 2008 2:52 PM 124 <A HREF="/data/165446.txt">165446.txt</A><br>
Thursday, November 13, 2008 3:56 PM 248 <A HREF="/data/165447.txt">165447.txt</A><br>

Originally posted by: ScottMac
What's it doing wrong now and

DBI::db=HASH(0x897780)->disconnect invalidates 1 active statement handle (either destroy statement handles or call finish on them before disconnecting) at ./e3_data2.pl line 263.

Originally posted by: ScottMac
What does "right" look like?

The same as the first part. It used to be this way:

1/14/04 12:32 AM 124 <A HREF="/data/123072.txt">123072.txt</A><br>
1/15/04 1:59 AM 6572 <A HREF="/data/123097.txt">123097.txt</A><br>

 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,836
4,815
75
Hey, Dan,

I see the problem. You used to search for a date with a number, and now it starts with letters.

From:
/(\d{1,2})\/(\d{1,2})\/(\d{2})...
try:
/,\s+([A-Z][a-z]{2})[a-z]*\s+(\d{1,2}),\s*\d{2}(\d{2})...

With everything after that (in this line) the same.

See, that's a comma, some spaces, a capital letter, two lower-case letters (which are captured), some more lower-case letters (which are ignored), space(s), a day, a comma, some spaces, the first two numbers in the year (which are ignored) and the last two (which aren't).

Now, your next problem is that the month (formerly $1) is now three letters instead of one or two numbers. You need a hash like:
%monthNum=('Jan'=>1, 'Feb'=>2, [etc.]);
Finally, change all instances of $1 to $monthNum{$1} and you should be all set.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,836
4,815
75
It's a constant, so it can go before the while (... line. Be sure to fill it out with the other months, too.
 

danzigrules

Golden Member
Apr 20, 2000
1,255
0
76
Yup I did that, getting new errors now, :

danzigrules@x64:~/thalesbots$ ./e3_data2.pl
Operator or semicolon missing before %monthNum at ./e3_data2.pl line 179.
Ambiguous use of % resolved as operator % at ./e3_data2.pl line 179.
syntax error at ./e3_data2.pl line 179, near ")
%"
Execution of ./e3_data2.pl aborted due to compilation errors.



while ($line = <FIL>)
%monthNum=('Jan'=>1, 'Feb'=>2, 'Mar'=>3, 'Apr'=>4, 'May'=>5, 'Jun'=>6, 'Jul'=>7, 'Aug'=>8, 'Sep'=>9, 'Oct'=>10, 'Nov'=>11, 'Dec'=>12);
{


if($line =~ /,\s+([A-Z][a-z]{2})[a-z]*\s+(\d{1,2})\/(\d{1,2})\/(\d{2})\s+(\d{1,2}):(\d{1,2})\s(\w{2})\s+(\d+) </)
{


$date_stamp = sprintf("20%02d%02d%02d%02d%02d", $3,$monthNum{$1},$2,$mil_hour,$5); # YYYYMMDDhhmm
$day_stamp = sprintf("20%02d%02d%02d", $3,$monthNum{$1},$2); # YYYYMMDD
$filesize = $7;
 

ScottMac

Moderator<br>Networking<br>Elite member
Mar 19, 2001
5,471
2
0
while ($line = <FIL>) { <<<<<- I Think you missed the opening brace for your while statement
%monthNum=('Jan'=>1, 'Feb'=>2, 'Mar'=>3,
 

danzigrules

Golden Member
Apr 20, 2000
1,255
0
76
Of course I didn't read the reply from Ken right and put the %monthNum line "after" the while, instead of BEFORE it like he said.

I am such a dummy.

Thanks Ken!

danzigrules