perl regex

danzigrules · Feb 16, 2009

13 November 2008 14:52 124 165446.txt

/[\d{1,2}]\s(\w+)\s+(\d{4})\s+(\d{1,2})

\d{1,2})\s+(\d+) </)

Thanks in advance

degibson · Feb 17, 2009

Originally posted by: danzigrules
13 November 2008 14:52 124 165446.txt

/[\d{1,2}]\s(\w+)\s+(\d{4})\s+(\d{1,2})\d{1,2})\s+(\d+) </)

Thanks in advance

I'm going to go with 'no' its not right. Smilicons are rarely part of a correct regex.

Also, you haven't told us what the regex is supposed to do.

danzigrules · Feb 17, 2009

while ($line = <FIL>)
{
chop $line;
# 1/14/04 12:32 AM 124 <A HREF="/data/123072.txt">123072.txt</A><br> Mehul's Format
# Thursday, November 13, 2008 11:00 PM 496 165455.txt First OMAC news format
# 13 November 2008 14:52 124 165446.txt OMAC's new Format
if($line =~ /[\w ]+,\s(\w+)\s(\d{1,2}),\s+(\d{4})\s+(\d{1,2})

\d{1,2})\s+(\w{1,2})\s+(\d+) </)
{
#print "MATCH: \n";

my %fix_month = ('jan', '1', 'feb', '2', 'mar', '3', 'apr', '4',
'may', '5', 'jun', '6', 'jul', '7', 'aug', '8', 'sep', '9',
'oct', '10', 'nov', '11', 'dec', '12');

if($6 eq 'PM' && $4 < 12) {
$mil_hour = $4 + 12; # i.e. 12:55 PM = 12:55 while 11:55 PM = 23:55
} elsif($6 eq 'AM' && $4 == 12) {
$mil_hour = $4 - 12; # i.e. 12:55 AM = 00:55
} else {
$mil_hour = $4;
}
$date_stamp = sprintf("%02d%02d%02d%02d%02d",$3,$2,$fix_month{lc(substr($1,0,3))},$mil_hour,$5); # YYYYMMDDhhmm
$day_stamp = sprintf("%02d%02d%02d", $3,$fix_month{lc(substr($1,0,3))},$2); # YYYYMMDD
$filesize = $7;
}

Yes I know I will have to fix variables, but that I can handle

Ken g6 · Feb 17, 2009

I see three very different patterns there, demonstrated in the comments, that you might be trying to match. The expression you have looks like a mix of two or more of them that won't work at all. It's possible to match them all with one expression, but practically impossible to get match variables out at the same time.

I would use a coding pattern like:

my $foundAMatch = 0;
if(/pattern1/) {
# Set key variables from matched pattern
$foundAMatch = 1;
}
elsif(/pattern2/) {
# Set key variables from different positions in the matched pattern
$foundAMatch = 1;
}
elsif(/pattern3/) {
# Set key variables from different positions in the matched pattern
$foundAMatch = 1;
}

if($foundAMatch) {
# Do the other calculations and prints you were doing.
}

danzigrules · Feb 17, 2009

# Thursday, November 13, 2008 11:00 PM 496 165455.txt First OMAC news format

if($line =~ /[\w ]+,\s(\w+)\s(\d{1,2}),\s+(\d{4})\s+(\d{1,2})

\d{1,2})\s+(\w{1,2})\s+(\d+) </)

The code worked fine when http://a.swirve.com/data/ that site listed the files in the above format. As you can see, they changed the format to: 13 November 2008 14:52 124 165446.txt

If I can get the regex right, and I change the variables, I assume it work fine again.

Thanks

esun · Feb 17, 2009

I would recommend using a regex builder (e.g., http://gskinner.com/RegExr/ or http://renschler.net/RegexBuilder/ or http://www.dhtmlgoodies.com/sc...gular-expression.html) to debug. I would also recommend using a multiline regex and commenting it.

You may also want to look into using this package: http://search.cpan.org/~roode/.../Regexp/Common/time.pm

Ken g6 · Feb 18, 2009

Remember, [] is for a single character matching any inside.
() is for grouping.

Let me try my hand at a regular expression for this string:
/([0-3]?\d)\s+([A-Z][a-z]+)\s+(20\d{2})\s+([012]?\d)

[0-5]\d)\s+(\d+)\s+(\d+)\.txt/

That seems to work.

Edit: That was sad.

And I rewrote it.

If you want to read from the page source, I think:

/([0-3]?\d)\s+([A-Z][a-z]+)\s+(20\d{2})\s+([012]?\d)

[0-5]\d)\s+(\d+)\s+<[Aa]\s/

does it.

Search

perl regex

danzigrules

Golden Member

degibson

Golden Member

danzigrules

Golden Member

Ken g6

Programming Moderator, Elite Member

danzigrules

Golden Member

esun

Platinum Member

Ken g6

Programming Moderator, Elite Member

TRENDING THREADS