Regular expression help

Childs

Lifer
Jul 9, 2000
11,313
7
81
I'm trying to break down output that looks like this:

Code:
param1:  1  param2:  1000 (1000/ 3000)  param3:  1000 (50/ 30000) param4:   3826  param5: 0  param6: 1  param7: 0  param8: 2  param9: 0
param1:  2  param2:  1000 (1000/ 3000)  param3:  1000 (50/ 30000) param4:   3826  param5: 0  param6: 0  param7: 0  param8: 2  param9: 0
param1:  3  param2:  1000 (1000/ 3000)  param3:  1000 (50/ 30000) param4:   3826  param5: 0  param6: 1  param7: 0  param8: 2  param9: 0
param1:  4  param2:  1000 (1000/ 3000)  param3:  1000 (50/ 30000) param4:   3826  param5: 0  param6: 0  param7: 0  param8: 2  param9: 0

I don't quite see an efficient way to format the pattern to split it up. I can do it with:

Code:
fields = [ x.rstrip(' ') for x in re.split(r'(\w*):\s*(\W*)\s*', line) if x ]

But it seems like I'm missing an easier solution. Any help is appreciated. Regular expressions always kick my butt.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,837
4,817
75
What's the goal here? "re.split('\s*(\w*):\s*', line)" will probably get you each param name followed by each value with no spaces.
 

Childs

Lifer
Jul 9, 2000
11,313
7
81
What's the goal here? "re.split('\s*(\w*):\s*', line)" will probably get you each param name followed by each value with no spaces.

Ideally, I'd like to split on each param and value, but thats where I'm stuck. So either an array of strings like "param1: 1", "param2: 100 (1000/ 3000)", etc, or a list of tuples ('param1', '1'), ('param2', '100 (1000/3000)'). Instead, I'm splitting everything and just looking at the param name and grabbing the next item in the array.

Not sure if relevant, but this free online tool seems like good aid for newbs
https://www.regex101.com/

I'm not even sure how I'd go about specifying a delimiter to split the lines based on 'param: value param: value'. If I knew what the technique or phrase was called for doing something like this, I could look it up. These lines are not delimited in a traditional way.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,837
4,817
75

veri745

Golden Member
Oct 11, 2007
1,163
4
81
Use a regex to match each param:value pair, and then use groups to get the param and value:

https://docs.python.org/2/library/re.html#re.MatchObject

Try this regex:
Code:
(\w+):([\w\(\)\/ ]+(?=$| \w+:))
It uses lookahead to avoid matching the next param.

Code:
'(' start of group
  '\w+' one or more word chars
')' end of group
':' literal :
'(' start of group
  '[\w\(\)\/ ]+' one or more of: word character, literal (, literal ), literal /, or space
  '(?=' start of lookahead
    '$' end of line
    '|' or
    ' \w+:' space, then one or more word characters, then a literal ':'
  ')' end of lookahead
')' end of group
 
Last edited:

Childs

Lifer
Jul 9, 2000
11,313
7
81
Thanks guys for replying, I totally forgot about this. Since it looked like it was always going to be a two step process, I just kept doing what I was already doing and moved on. In case anyone comes across this later, here is the example code for each solution:

Code:
string = 'param1:  1  param2:  1000 (1000/ 3000)  param3:  1000 (50/ 30000) param4:   3826  param5: 0  param6: 1  param7: 0  param8: 2  param9: 0'

# my original solution

fields = [x.rstrip(' ') for x in re.split(r'(\w*):\s*(\W*)\s*', string) if x]
field_dict = {}
for field in fields:
	if field == 'param1':
		field_dict['param1'] = fields[fields.index(field) + 1]
	elif field == 'param5':
		field_dict['param5'] = fields[fields.index(field) + 1]
	elif field == 'param6':
		field_dict['param6'] = fields[fields.index(field) + 1]
print field_dict

# solution 2 based on Ken_g6

N = 2
field_list = [fields[n:n+N] for n in range(0, len(fields), N)]
field_dict = {}
field_dict['param1'] = field_list[0][1]
field_dict['param5'] = field_list[4][1]
field_dict['param6'] = field_list[5][1]
print field_dict

# solution 3 based on veri745

field_dict = {}
field_dict['param1'] = re.search(r"param1:\s*(\w*)", string).group(1)
field_dict['param2'] = re.search(r"param2:\s*(\w*\s*\([^()]+\))\s*", string).group(1) # including because pattern took time to figure out
field_dict['param5'] = re.search(r"param5:\s*(\w*)", string).group(1)
field_dict['param6'] = re.search(r"param6:\s*(\w*)", string).group(1)
print field_dict
 
Last edited:

veri745

Golden Member
Oct 11, 2007
1,163
4
81
I was thinking something a little more automatic:

Code:
string = 'param1:  1  param2:  1000 (1000/ 3000)  param3:  1000 (50/ 30000) param4:   3826  param5: 0  param6: 1  param7: 0  param8: 2  param9: 0'

pattern = re.compile('''(\w+):([\w\(\)\/ ]+(?=$| \w+:))''')

field_dict = {}

for match in pattern.findall(string):
   field_dict[match[0].strip()] = match[1].strip()

print field_dict