perl regex

danzigrules

Golden Member
Apr 20, 2000
1,255
0
76
13 November 2008 14:52 124 165446.txt


/[\d{1,2}]\s(\w+)\s+(\d{4})\s+(\d{1,2}):(\d{1,2})\s+(\d+) </)


Thanks in advance
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Originally posted by: danzigrules
13 November 2008 14:52 124 165446.txt


/[\d{1,2}]\s(\w+)\s+(\d{4})\s+(\d{1,2}):(\d{1,2})\s+(\d+) </)


Thanks in advance

I'm going to go with 'no' its not right. Smilicons are rarely part of a correct regex.

Also, you haven't told us what the regex is supposed to do.
 

danzigrules

Golden Member
Apr 20, 2000
1,255
0
76
while ($line = <FIL>)
{
chop $line;
# 1/14/04 12:32 AM 124 <A HREF="/data/123072.txt">123072.txt</A><br> Mehul's Format
# Thursday, November 13, 2008 11:00 PM 496 165455.txt First OMAC news format
# 13 November 2008 14:52 124 165446.txt OMAC's new Format
if($line =~ /[\w ]+,\s(\w+)\s(\d{1,2}),\s+(\d{4})\s+(\d{1,2}):(\d{1,2})\s+(\w{1,2})\s+(\d+) </)
{
#print "MATCH: \n";

my %fix_month = ('jan', '1', 'feb', '2', 'mar', '3', 'apr', '4',
'may', '5', 'jun', '6', 'jul', '7', 'aug', '8', 'sep', '9',
'oct', '10', 'nov', '11', 'dec', '12');

if($6 eq 'PM' && $4 < 12) {
$mil_hour = $4 + 12; # i.e. 12:55 PM = 12:55 while 11:55 PM = 23:55
} elsif($6 eq 'AM' && $4 == 12) {
$mil_hour = $4 - 12; # i.e. 12:55 AM = 00:55
} else {
$mil_hour = $4;
}
$date_stamp = sprintf("%02d%02d%02d%02d%02d",$3,$2,$fix_month{lc(substr($1,0,3))},$mil_hour,$5); # YYYYMMDDhhmm
$day_stamp = sprintf("%02d%02d%02d", $3,$fix_month{lc(substr($1,0,3))},$2); # YYYYMMDD
$filesize = $7;
}


Yes I know I will have to fix variables, but that I can handle :)
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,836
4,815
75
I see three very different patterns there, demonstrated in the comments, that you might be trying to match. The expression you have looks like a mix of two or more of them that won't work at all. It's possible to match them all with one expression, but practically impossible to get match variables out at the same time.

I would use a coding pattern like:

my $foundAMatch = 0;
if(/pattern1/) {
# Set key variables from matched pattern
$foundAMatch = 1;
}
elsif(/pattern2/) {
# Set key variables from different positions in the matched pattern
$foundAMatch = 1;
}
elsif(/pattern3/) {
# Set key variables from different positions in the matched pattern
$foundAMatch = 1;
}

if($foundAMatch) {
# Do the other calculations and prints you were doing.
}
 

danzigrules

Golden Member
Apr 20, 2000
1,255
0
76
# Thursday, November 13, 2008 11:00 PM 496 165455.txt First OMAC news format

if($line =~ /[\w ]+,\s(\w+)\s(\d{1,2}),\s+(\d{4})\s+(\d{1,2}):(\d{1,2})\s+(\w{1,2})\s+(\d+) </)

The code worked fine when http://a.swirve.com/data/ that site listed the files in the above format. As you can see, they changed the format to: 13 November 2008 14:52 124 165446.txt

If I can get the regex right, and I change the variables, I assume it work fine again.

Thanks
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,836
4,815
75
Remember, [] is for a single character matching any inside.
() is for grouping.

Let me try my hand at a regular expression for this string:
/([0-3]?\d)\s+([A-Z][a-z]+)\s+(20\d{2})\s+([012]?\d):([0-5]\d)\s+(\d+)\s+(\d+)\.txt/

That seems to work.

Edit: That was sad. :( And I rewrote it.

If you want to read from the page source, I think:

/([0-3]?\d)\s+([A-Z][a-z]+)\s+(20\d{2})\s+([012]?\d):([0-5]\d)\s+(\d+)\s+<[Aa]\s/

does it.