• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

perl regex

Originally posted by: danzigrules
13 November 2008 14:52 124 165446.txt


/[\d{1,2}]\s(\w+)\s+(\d{4})\s+(\d{1,2})🙁\d{1,2})\s+(\d+) </)


Thanks in advance

I'm going to go with 'no' its not right. Smilicons are rarely part of a correct regex.

Also, you haven't told us what the regex is supposed to do.
 
while ($line = <FIL>)
{
chop $line;
# 1/14/04 12:32 AM 124 <A HREF="/data/123072.txt">123072.txt</A><br> Mehul's Format
# Thursday, November 13, 2008 11:00 PM 496 165455.txt First OMAC news format
# 13 November 2008 14:52 124 165446.txt OMAC's new Format
if($line =~ /[\w ]+,\s(\w+)\s(\d{1,2}),\s+(\d{4})\s+(\d{1,2})🙁\d{1,2})\s+(\w{1,2})\s+(\d+) </)
{
#print "MATCH: \n";

my %fix_month = ('jan', '1', 'feb', '2', 'mar', '3', 'apr', '4',
'may', '5', 'jun', '6', 'jul', '7', 'aug', '8', 'sep', '9',
'oct', '10', 'nov', '11', 'dec', '12');

if($6 eq 'PM' && $4 < 12) {
$mil_hour = $4 + 12; # i.e. 12:55 PM = 12:55 while 11:55 PM = 23:55
} elsif($6 eq 'AM' && $4 == 12) {
$mil_hour = $4 - 12; # i.e. 12:55 AM = 00:55
} else {
$mil_hour = $4;
}
$date_stamp = sprintf("%02d%02d%02d%02d%02d",$3,$2,$fix_month{lc(substr($1,0,3))},$mil_hour,$5); # YYYYMMDDhhmm
$day_stamp = sprintf("%02d%02d%02d", $3,$fix_month{lc(substr($1,0,3))},$2); # YYYYMMDD
$filesize = $7;
}


Yes I know I will have to fix variables, but that I can handle 🙂
 
I see three very different patterns there, demonstrated in the comments, that you might be trying to match. The expression you have looks like a mix of two or more of them that won't work at all. It's possible to match them all with one expression, but practically impossible to get match variables out at the same time.

I would use a coding pattern like:

my $foundAMatch = 0;
if(/pattern1/) {
# Set key variables from matched pattern
$foundAMatch = 1;
}
elsif(/pattern2/) {
# Set key variables from different positions in the matched pattern
$foundAMatch = 1;
}
elsif(/pattern3/) {
# Set key variables from different positions in the matched pattern
$foundAMatch = 1;
}

if($foundAMatch) {
# Do the other calculations and prints you were doing.
}
 
# Thursday, November 13, 2008 11:00 PM 496 165455.txt First OMAC news format

if($line =~ /[\w ]+,\s(\w+)\s(\d{1,2}),\s+(\d{4})\s+(\d{1,2})🙁\d{1,2})\s+(\w{1,2})\s+(\d+) </)

The code worked fine when http://a.swirve.com/data/ that site listed the files in the above format. As you can see, they changed the format to: 13 November 2008 14:52 124 165446.txt

If I can get the regex right, and I change the variables, I assume it work fine again.

Thanks
 
Remember, [] is for a single character matching any inside.
() is for grouping.

Let me try my hand at a regular expression for this string:
/([0-3]?\d)\s+([A-Z][a-z]+)\s+(20\d{2})\s+([012]?\d)🙁[0-5]\d)\s+(\d+)\s+(\d+)\.txt/

That seems to work.

Edit: That was sad. 🙁 And I rewrote it.

If you want to read from the page source, I think:

/([0-3]?\d)\s+([A-Z][a-z]+)\s+(20\d{2})\s+([012]?\d)🙁[0-5]\d)\s+(\d+)\s+<[Aa]\s/

does it.
 
Back
Top