Regex question

Thraxen

Diamond Member
Dec 3, 2001
4,683
1
81
This is likely very easy, but I don't use regex very often and I'm obviously unclear on how to use wildcards in regex.

How do I trim everything starting with a specific match? In other words, say I have a string and in that string I want to trim everything starting from the first instance of ABCXYZ (including ABCXYZ). And there is no specific/consistent number of characters to the right or left of ABCXYZ.

Thanks!
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
So you have a string like "JKDJFKDJFABCXYZJKDJFKDJF" and you want to match on everything from ABCXYZ and earlier?
 

Thraxen

Diamond Member
Dec 3, 2001
4,683
1
81
Well, given your example of "JKDJFKDJFABCXYZJKDJFKDJF", I was wanting an expression that would allow me to return just "JKDJFKDJF". I would do that by deleting everything from "ABCXYZ" onward (e.g. REReplace with "").

But now that I've examined the string further, it appears I need to do something else. To be more specific, I'm trying to extract the SMTP e-mail addresses from the proxyAddresses attribute field in Active Directory. Here is what a typical proxyAddress attribute looks like:

SMTP:user@exampledomain.com;smtp:user1@exampledomain.com;X400:c=US\;a= \;p=First Organizati\;o=Exchange\;s=lastname\;g=firstname\;i=middleinitial\;

As you can see there are two addresses listed, but some users will have more. I want to be able to extract those from the results of an LDAP query, but I don't want any of the rest of the information... in this case the X400 onward.

But I just noticed that the order is different for some users... like this:

X400:c=US\;a= \;p=First Organizati\;o=Exchange\;s=lastname\;g=firstname\;i=middleinitial\;;SMTP:user@exampledomain.com;smtp:user1@exampledomain.com

So given that, I clearly can't simply trim everything starting with X400. So now I need something that will match and delete everthing EXCEPT the SMTP addresses (I also don't need the "SMTP:". Or do the reverse and match the email address and delete everything else. But I really have no idea how to do that. From doing some Googling I found this expression which is supposed to be able to match most e-mail addresses:

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

But I'm not sure how to apply that to my situation.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
SMTP:(.*);X400:c=US;a= ;p=First Organizati;o=Exchange;s=lastname;g=firstname;i=middleinitial;

I believe that should return everything between SMTP: and X400, at least it did here in perl.
 

Thraxen

Diamond Member
Dec 3, 2001
4,683
1
81
I guess I should have indicated that "lastname", "firstname", and "middleinitial" are all variables in my example. In each user's proxyAddresses attribute field those are actually their real names. Also, as I stated above, some of the fields are flipped for some reason where the X400 is first and the actual addresses are at the end.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
Yea but if you know even a little bit of regexp you can add in whatever you need to match them too. Hell just put like "s=.*;g=.*;i=.*" and it'll match anything. The swapping of SMTP and X400 is a little more complicated though.
 

Onund

Senior member
Jul 19, 2007
287
0
0
You could split the line into an array on ; then process each element in a loop. If an element contains smtp at the start then it's an email address. This way you don't have to worry how many addresses there are.

Alternately, if whatever programming language allows this, you can match on something like:
smpt:(.*);
do some processing on what's returned then find the next instance of the match. I believe Java allows something like this.

The difficulty with writing one regex that will match all emails is that there are about a million variations of email addresses. I remember reading a thread on some other message board a while ago where there were about 15 suggestions of regex patterns to match all emails and there were 16 examples of valid emails that failed to match the patterns.
 

Skeeedunt

Platinum Member
Oct 7, 2005
2,777
3
76
Do you specifically need to remove all non smtp sections using one regex, or can you just grab the smtp parts and reassemble? Onund's suggestion would be good for that, this seems to work in perl as well:

@emails = ($string =~ /(smtp:[^;]*;*)/gi);

print $emails[0] . $emails[1]; # or whatever, @emails should have all the smtp:*; parts in it.
 

statik213

Golden Member
Oct 31, 2004
1,654
0
0
Like others have pointed out, you should try to "extract" the potion of the string you want. Usually with regex in most languages you can match on an expression like: SMTP: (\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b ) and the resulting match will be an array. the [0] index would be the complete match (i.e. SMTP:foo@bar.com) and [1] would be the contents of the first parenthesized group (foo@bar.com).

** there should be no space between SMTP: and (\b...). damn Fusetalk.