pythonic way to write this?

Borkil

Senior member
Sep 7, 2006
248
0
0
I'm reading a file and grabbing data. The file is set up into sections that are formatted in different ways. So I'm reading the file and checking each line for a header. When a header is found I set a flag and then if the flag is checked i apply more regex to find the data. When the next header is encountered I change the flags accordingly. The code I have works but I was just curious if there is a more "pythonic" way or easy one liner to set these flags. I'm working on version 2.4.3

Code:
for line in file:
    match1 = re.match(r'header1',line)
    match2 = re.match(r'header2',line)
    match3 = re.match(r'header3',line)
    if match1 is not None:
        flag1 = True
        flag2 = False
        flag3 = False
    if match2 is not None:
        flag1 = False
        flag2 = True
        flag3 = False
    if match3 is not None:
        flag1 = False
        flag2 = False
        flag3 = True

......... rest of code
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
One general way to do it, without lots of OOP boilerplate (you could, and it would look neater, I'll grant), would be to create a perser management class, an object of which could dynamically take some number of parsers, and not really do anything but hold dictionaries with keys that map to the functions of some named header type.

Code:
Class MrParseText(object):
    DctIsMatch = {}
    DctParseLines = {}

    def __init__():
        ... #what specifically goes here, if anything, is on you

    def FormatAdd( FormatName, IsMatch, ParseLines ):
        DctIsMatch[ FormatName ] = IsMatch
        DctParseLines[ FormatName ] = ParseLines

    def LineMatches( line ):
        for Format in DctIsMatch:
            if Format( line ):
                return Format
        return False

    def Parse( lines, Format )
        return DctParseLines[ Format ]( lines )
Then:
Code:
McParser = MrParseText()
Then something like:
Code:
def IsMatch1( line):
    ... #returns True is it matches a header of this type

def ParseLines1( lines ):
    ... # whatever you currently do when parsing lines within a section of a given type

McParser.AddFormat( "Type1", IsMatch1, ParseLines1 )
For each format.

Then, when actually running through the file:
Code:
buffer = []
format = False #assuming the 1st line is a header--if not, well, that's for you to figure out

for line in file:
    header = McParser.LineMatches( line )

    if not header: # another line from the current section
        buffer.append( line )
    else:  # encountered new header
        something = McParser.ParseLines( buffer, format ) # something is wherever the parsed data goes, here
        buffer = []
        format = header #reset buffer and change formats
That's all rough and off the top of my head, so it may not even be syntactically correct, but hopefully you get the idea. The "something = " line will also be where the extra logic of what happens with the data goes; and the whole thing can be relatively easily extended to support different destinations by type and whatnot, too. If you want it to be more dynamic for future format inclusion, or user extension, making a factory wouldn't be a bad idea, but for a small list of known formats, it would add to boilerplate amount, add to lines, and not be any more readable (unless you consider adding lots of decorators, and using "hidden" class properties as more readable).

The more OOP way would probably be to create a class that had the match functions, and on a match to a new header, would return an object of a class that had the parsing function and line list. The way OOP is tacked on to Python, I tend to dislike both writing and reading such implementations (and, some may very well not even work, with such an old Python version). You get into having to deal with decorators, __subclasses__, etc., for what aught to have been made basic language features.
 
Last edited:

esun

Platinum Member
Nov 12, 2001
2,214
0
0
For just the code you wrote, the below code seems simpler (assumes that only one header will match at a time). For performance you could also compile the regular expressions before the loop, but not sure if you actually care about that or not.

Code:
headers = ['header1', 'header2', 'header3']
for line in file:
    flags = [re.match(header, line) is not None for header in headers]

Could also use map to set the flags but most people prefer list comprehensions.
 

BigDH01

Golden Member
Jul 8, 2005
1,631
88
91
For just the code you wrote, the below code seems simpler (assumes that only one header will match at a time). For performance you could also compile the regular expressions before the loop, but not sure if you actually care about that or not.

Code:
headers = ['header1', 'header2', 'header3']
for line in file:
    flags = [re.match(header, line) is not None for header in headers]

Could also use map to set the flags but most people prefer list comprehensions.

I would probably prefer to make a dict out of a tuple in the list comprehension with the header as the key.

Code:
dct = dict([(header, re.match(header, line) is not None) for header in headers])
if dct['header1']:
   etc