• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Easy question for a regex expert

DT4K

Diamond Member
I need an expression that will match anything that falls within brackets in a string.
I'm using the Regex.Replace function in .Net, which is straightforward enough. I just pass the string, the pattern to search for, and the replacement string to replace everything that matches the pattern.
The trouble I'm having is trying to get the write regex pattern.

For example, I have a string that looks like:

bob397[C242], john235[B928]

and I want it to turn it into this:

bob397, john235

The first pattern I came up with was \[.*\]

This works fine if there is only one occurence of the bracketed values in the string. But in the above example, it replaces everything from the first open bracket to the last closing bracket. Like this:
bob397

Can anyone give me a pattern that will match anything that is within brackets?

Thanks
 
Maybe there's a better way to do it, but I think this works if anyone is interested:

\[[^[]*\]

The ^[ in the middle is what prevents the problem I had with it matching everything from the first opening bracket all the way to the last closing bracket. It says match strings that start with an opening bracket and end with a closing bracket, but don't have another opening bracket inside them.

If everything within the brackets was guaranteed to be alphanumeric, this would also work:
\[([a-z]|[A-Z]|[0-9])*\]
 
a few comments, I don't know about .Net regex specifically but you should have some metasymbols you can use for regex, like \d = [0-9], \w = work characters (alphanumeric and _) and so on.

Also, instead of [a-z]|[A-Z]|[0-9] you can simply put [a-zA-Z0-9]. In your [] you just define a class of characters that match, so if you typed [sox] it would be the same as [xos] or [oxs], it just means any one of the characters in the class.

But really want you were struggling with is the greedy behaviour of regex. There should be a non-greedy modifier you can use, such as: \[.*?\]

Note the ? after the .*,
*? matches 0 or more minimally
+? matches 1 or more minimally

Check the .Net documentation on regex, it should explain the metacharacters and quantifiers.
 
onund's right. short answer, you need to specify the wildcard as non greedy. it's greedy by default in many implementations.
 
I would do (using non-greedy quantifiers, as 0nund mentioned):

$string =~ s/\[.*?\]//g

If your programming language uses a different syntax (maybe a function like regexp_replace), it might look more like this:

regexp_replace(string, '\[.*?\]', '', g);
 
Originally posted by: DT4K
Maybe there's a better way to do it, but I think this works if anyone is interested:

\[[^[]*\]

This breaks on the example: "[abc] ]" because there is no extra opening bracket to block the machine from continuing.

Better would be \[[^][]*\] as this prevents any brackets from appearing in the match text between the brackets. I'm not sure what the intended behavior is in the case of "[ab [c]". My regex will only match "[c]".

Syntax may depend on your implementation. Emacs supports the above regex, where if a ] is the first thing in the block then it matches a literal ]. YMMV.
 
Originally posted by: Onund
a few comments, I don't know about .Net regex specifically but you should have some metasymbols you can use for regex, like \d = [0-9], \w = work characters (alphanumeric and _) and so on.

Also, instead of [a-z]|[A-Z]|[0-9] you can simply put [a-zA-Z0-9]. In your [] you just define a class of characters that match, so if you typed [sox] it would be the same as [xos] or [oxs], it just means any one of the characters in the class.

But really want you were struggling with is the greedy behaviour of regex. There should be a non-greedy modifier you can use, such as: \[.*?\]

Note the ? after the .*,
*? matches 0 or more minimally
+? matches 1 or more minimally

Check the .Net documentation on regex, it should explain the metacharacters and quantifiers.

Thanks.
I hadn't really touched regular expressions since college, so I had no idea what greedy and non-greedy meant. That makes sense now.

Originally posted by: dinkumthinkum
This breaks on the example: "[abc] ]" because there is no extra opening bracket to block the machine from continuing.

Better would be \[[^][]*\] as this prevents any brackets from appearing in the match text between the brackets. I'm not sure what the intended behavior is in the case of "[ab [c]". My regex will only match "[c]".

Syntax may depend on your implementation. Emacs supports the above regex, where if a ] is the first thing in the block then it matches a literal ]. YMMV.

Yep, you're right. It worked in my case because my data always has the same pattern. It's a comma separated list where each item has a batch number, followed by the material type in brackets. I just needed to remove the material types from the list. So no worries about brackets within brackets.
 
Back
Top