• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Is there a better way to do this re pattern?

Childs

Lifer
My pattern:

Code:
import re

string = "Encoded: 1920 x 1080 px | Display: 1920 x 1080 px | Square | Top First | 29.97 fps"
pattern = re.compile(r'''(?![ \w]+: ) ([\w \.]+) ''')
print pattern.findall(string)

Which produces:

Code:
['1920 x 1080 px', '1920 x 1080 px', 'Square', 'Top First', '29.97']

Which is pretty much what I want. I just want the values, delimited by a pipe. When I tried to use '|' as a delimiter it would pick up everything except the last item, probably because there was no terminating delimiter. Since I couldnt use a delimiter this feels ghetto. Is there a better way to do this? Ideally I'd strip off that ' px' as well, but doing that created match objects with one or more null elements.
 
Well, there's split, but split doesn't strip each entry.

So I'd say the clean way to do it is with split and a list comprehension:

Code:
import re

string = "Encoded: 1920 x 1080 px | Display: 1920 x 1080 px | Square | Top First | 29.97 fps"
l = string.split('|')
print [ re.sub('^([^:]*:)? *', '', re.sub('( [a-z]+)* *$', '', x)) for x in l ]

Or I can do it all in one ugly regex:

Code:
import re

string = "Encoded: 1920 x 1080 px | Display: 1920 x 1080 px | Square | Top First | 29.97 fps"
pattern = re.compile(r'''(?![ \w]+: ) *([A-Z0-9](?:[\w\.]+ (?:[a-z] )?[A-Z0-9])*[\w\.]+) ''')
print pattern.findall(string)
 
Yeah, initially I did it with split, then subbed out the stuff I didnt want, but then I thought this would be good regex practice. I use it for one thing then dont for a few months and forget it all. Thanks. I'll play with your example and figure out whats going on.
 
really wish i knew regex better than i do. pretty much everytime i need to do it i google the regex cheat sheet, and i still feel lost lol. just haven't hard to do too much of it.
 
really wish i knew regex better than i do. pretty much everytime i need to do it i google the regex cheat sheet, and i still feel lost lol. just haven't hard to do too much of it.

Yeah, thats where I'm at, but for some of the trickier things the cheat sheet doesnt quite cut it.

What is your ideal output? Do you just want to get rid of the "title:" pieces?

Ideal output is

Code:
['1920 x 1080', '1920 x 1080', 'Square', 'Top First', '29.97']

Just want the values between the pipes without anything else. So remove the 'title:', ' px', and ' fps' pieces.
 
Well I guess it comes down to optimization. Do you do this a billion times and care that the regex might take 4x longer to run to get exactly what you want?

As "Ken g6" showed, you can always process the line assuming you know the line will be in the same format you showed:
Code:
'^\w+:\s*(\d+.+)\s+px.+:\s+(\d+.+)\s+px\s+|\s+(\S.*)\s+|\s+(\S.*)\s+|\s+(\S.*)\s+fps'
 
I'm actually fine with the way I originally did it. Its easy enough to just strip off what I dont want when I assign those values to variables. I was just curious if there was a more efficient way of doing it.
 
Back
Top