pythonic way to write this?

Borkil · Jun 12, 2013

I'm reading a file and grabbing data. The file is set up into sections that are formatted in different ways. So I'm reading the file and checking each line for a header. When a header is found I set a flag and then if the flag is checked i apply more regex to find the data. When the next header is encountered I change the flags accordingly. The code I have works but I was just curious if there is a more "pythonic" way or easy one liner to set these flags. I'm working on version 2.4.3

Code:

for line in file:
    match1 = re.match(r'header1',line)
    match2 = re.match(r'header2',line)
    match3 = re.match(r'header3',line)
    if match1 is not None:
        flag1 = True
        flag2 = False
        flag3 = False
    if match2 is not None:
        flag1 = False
        flag2 = True
        flag3 = False
    if match3 is not None:
        flag1 = False
        flag2 = False
        flag3 = True

......... rest of code

Cerb · Jun 12, 2013

One general way to do it, without lots of OOP boilerplate (you could, and it would look neater, I'll grant), would be to create a perser management class, an object of which could dynamically take some number of parsers, and not really do anything but hold dictionaries with keys that map to the functions of some named header type.

Code:

Class MrParseText(object):
    DctIsMatch = {}
    DctParseLines = {}

    def __init__():
        ... #what specifically goes here, if anything, is on you

    def FormatAdd( FormatName, IsMatch, ParseLines ):
        DctIsMatch[ FormatName ] = IsMatch
        DctParseLines[ FormatName ] = ParseLines

    def LineMatches( line ):
        for Format in DctIsMatch:
            if Format( line ):
                return Format
        return False

    def Parse( lines, Format )
        return DctParseLines[ Format ]( lines )

Then:

Code:

McParser = MrParseText()

Then something like:

Code:

def IsMatch1( line):
    ... #returns True is it matches a header of this type

def ParseLines1( lines ):
    ... # whatever you currently do when parsing lines within a section of a given type

McParser.AddFormat( "Type1", IsMatch1, ParseLines1 )

For each format.

Then, when actually running through the file:

Code:

buffer = []
format = False #assuming the 1st line is a header--if not, well, that's for you to figure out

for line in file:
    header = McParser.LineMatches( line )

    if not header: # another line from the current section
        buffer.append( line )
    else:  # encountered new header
        something = McParser.ParseLines( buffer, format ) # something is wherever the parsed data goes, here
        buffer = []
        format = header #reset buffer and change formats

That's all rough and off the top of my head, so it may not even be syntactically correct, but hopefully you get the idea. The "something = " line will also be where the extra logic of what happens with the data goes; and the whole thing can be relatively easily extended to support different destinations by type and whatnot, too. If you want it to be more dynamic for future format inclusion, or user extension, making a factory wouldn't be a bad idea, but for a small list of known formats, it would add to boilerplate amount, add to lines, and not be any more readable (unless you consider adding lots of decorators, and using "hidden" class properties as more readable).

The more OOP way would probably be to create a class that had the match functions, and on a match to a new header, would return an object of a class that had the parsing function and line list. The way OOP is tacked on to Python, I tend to dislike both writing and reading such implementations (and, some may very well not even work, with such an old Python version). You get into having to deal with decorators, __subclasses__, etc., for what aught to have been made basic language features.

esun · Jun 13, 2013

For just the code you wrote, the below code seems simpler (assumes that only one header will match at a time). For performance you could also compile the regular expressions before the loop, but not sure if you actually care about that or not.

Code:

headers = ['header1', 'header2', 'header3']
for line in file:
    flags = [re.match(header, line) is not None for header in headers]

Could also use map to set the flags but most people prefer list comprehensions.

BigDH01 · Jun 13, 2013

esun said:
For just the code you wrote, the below code seems simpler (assumes that only one header will match at a time). For performance you could also compile the regular expressions before the loop, but not sure if you actually care about that or not.

Code:

headers = ['header1', 'header2', 'header3'] for line in file: flags = [re.match(header, line) is not None for header in headers]

Could also use map to set the flags but most people prefer list comprehensions.

I would probably prefer to make a dict out of a tuple in the list comprehension with the header as the key.

Code:

dct = dict([(header, re.match(header, line) is not None) for header in headers])
if dct['header1']:
   etc

Search

pythonic way to write this?

Borkil

Senior member

Cerb

Elite Member

esun

Platinum Member

BigDH01

Golden Member

TRENDING THREADS