Regular expression help

Childs · Feb 17, 2015

I'm trying to break down output that looks like this:

Code:

param1:  1  param2:  1000 (1000/ 3000)  param3:  1000 (50/ 30000) param4:   3826  param5: 0  param6: 1  param7: 0  param8: 2  param9: 0
param1:  2  param2:  1000 (1000/ 3000)  param3:  1000 (50/ 30000) param4:   3826  param5: 0  param6: 0  param7: 0  param8: 2  param9: 0
param1:  3  param2:  1000 (1000/ 3000)  param3:  1000 (50/ 30000) param4:   3826  param5: 0  param6: 1  param7: 0  param8: 2  param9: 0
param1:  4  param2:  1000 (1000/ 3000)  param3:  1000 (50/ 30000) param4:   3826  param5: 0  param6: 0  param7: 0  param8: 2  param9: 0

I don't quite see an efficient way to format the pattern to split it up. I can do it with:

Code:

fields = [ x.rstrip(' ') for x in re.split(r'(\w*):\s*(\W*)\s*', line) if x ]

But it seems like I'm missing an easier solution. Any help is appreciated. Regular expressions always kick my butt.

Ken g6 · Feb 17, 2015

What's the goal here? "re.split('\s*(\w*):\s*', line)" will probably get you each param name followed by each value with no spaces.

postmortemIA · Feb 17, 2015

Not sure if relevant, but this free online tool seems like good aid for newbs
https://www.regex101.com/

Childs · Feb 17, 2015

Ken g6 said:
What's the goal here? "re.split('\s*(\w*):\s*', line)" will probably get you each param name followed by each value with no spaces.

Ideally, I'd like to split on each param and value, but thats where I'm stuck. So either an array of strings like "param1: 1", "param2: 100 (1000/ 3000)", etc, or a list of tuples ('param1', '1'), ('param2', '100 (1000/3000)'). Instead, I'm splitting everything and just looking at the param name and grabbing the next item in the array.

postmortemIA said:
Not sure if relevant, but this free online tool seems like good aid for newbs
https://www.regex101.com/

I'm not even sure how I'd go about specifying a delimiter to split the lines based on 'param: value param: value'. If I knew what the technique or phrase was called for doing something like this, I could look it up. These lines are not delimited in a traditional way.

Ken g6 · Feb 18, 2015

Childs said:
Ideally, I'd like to split on each param and value, but thats where I'm stuck. So either an array of strings like "param1: 1", "param2: 100 (1000/ 3000)", etc, or a list of tuples ('param1', '1'), ('param2', '100 (1000/3000)'). Instead, I'm splitting everything and just looking at the param name and grabbing the next item in the array.

So, make it a two-step process.

https://stackoverflow.com/questions/4998427/how-to-group-elements-in-python-by-n-elements

veri745 · Feb 23, 2015

Use a regex to match each param:value pair, and then use groups to get the param and value:

https://docs.python.org/2/library/re.html#re.MatchObject

Try this regex:

Code:

(\w+):([\w\(\)\/ ]+(?=$| \w+:))

It uses lookahead to avoid matching the next param.

Code:

'(' start of group
  '\w+' one or more word chars
')' end of group
':' literal :
'(' start of group
  '[\w\(\)\/ ]+' one or more of: word character, literal (, literal ), literal /, or space
  '(?=' start of lookahead
    '$' end of line
    '|' or
    ' \w+:' space, then one or more word characters, then a literal ':'
  ')' end of lookahead
')' end of group

Childs · Feb 23, 2015

Thanks guys for replying, I totally forgot about this. Since it looked like it was always going to be a two step process, I just kept doing what I was already doing and moved on. In case anyone comes across this later, here is the example code for each solution:

Code:

string = 'param1:  1  param2:  1000 (1000/ 3000)  param3:  1000 (50/ 30000) param4:   3826  param5: 0  param6: 1  param7: 0  param8: 2  param9: 0'

# my original solution

fields = [x.rstrip(' ') for x in re.split(r'(\w*):\s*(\W*)\s*', string) if x]
field_dict = {}
for field in fields:
	if field == 'param1':
		field_dict['param1'] = fields[fields.index(field) + 1]
	elif field == 'param5':
		field_dict['param5'] = fields[fields.index(field) + 1]
	elif field == 'param6':
		field_dict['param6'] = fields[fields.index(field) + 1]
print field_dict

# solution 2 based on Ken_g6

N = 2
field_list = [fields[n:n+N] for n in range(0, len(fields), N)]
field_dict = {}
field_dict['param1'] = field_list[0][1]
field_dict['param5'] = field_list[4][1]
field_dict['param6'] = field_list[5][1]
print field_dict

# solution 3 based on veri745

field_dict = {}
field_dict['param1'] = re.search(r"param1:\s*(\w*)", string).group(1)
field_dict['param2'] = re.search(r"param2:\s*(\w*\s*\([^()]+\))\s*", string).group(1) # including because pattern took time to figure out
field_dict['param5'] = re.search(r"param5:\s*(\w*)", string).group(1)
field_dict['param6'] = re.search(r"param6:\s*(\w*)", string).group(1)
print field_dict

veri745 · Feb 25, 2015

I was thinking something a little more automatic:

Code:

string = 'param1:  1  param2:  1000 (1000/ 3000)  param3:  1000 (50/ 30000) param4:   3826  param5: 0  param6: 1  param7: 0  param8: 2  param9: 0'

pattern = re.compile('''(\w+):([\w\(\)\/ ]+(?=$| \w+:))''')

field_dict = {}

for match in pattern.findall(string):
   field_dict[match[0].strip()] = match[1].strip()

print field_dict

Search

Regular expression help

Childs

Lifer

Ken g6

Programming Moderator, Elite Member

postmortemIA

Diamond Member

Childs

Lifer

Ken g6

Programming Moderator, Elite Member

veri745

Golden Member

Childs

Lifer

veri745

Golden Member

TRENDING THREADS