Requesting some perl expression help

chuck2002 · Jul 27, 2006

I am working on locking down the web sites some computers on my domain can access. The software (IEURLLock) I?m using is taking expressions via perl for URLs and I am lost. He says look at this web page for syntax, but that isn?t helping.

http://www.pcre.org/

Here are his examples and an explanation of what?s going on:

Locations that the user navigates to will get checked against the regular expressions in this list, permitting the navigation as soon as it finds a regular expression that matches the location. If none of the regular expressions match, IEURLLock blocks access to that location.

IEURLLock uses Perl-Compatible Regular Expressions through the PCRE library. More information on this library and how to construct regular expressions exists at http://www.pcre.org/ and http://gnuwin32.sourceforge.net/packages/pcre.htm.

Put descriptive names for each regular expression into the Value Name field and put each corresponding regular expression into the Value field.

Example:

Value Name: Microsoft

Value: ^http://(www\.)?microsoft\.com(/|$)

Case-Insensitive Example:

Value Name: Sourceforge.net Project Web Sites

Value: (?i)^http://(\w)+\.(sf|sourceforge)\.net(/|$)

Can someone expand his examples to include urls like http://something.microsoft.com as an example for me please?

Thank you.
-Chuck

Nothinman · Jul 27, 2006

^http://.*\.microsoft\.com(/|$)

. matches anything (which is why you have to escape it with \ when you want a literal period) and * tells it to match as many of the previous expression as possible, so .* will match any string of any size.

If you're going to be using regexps you should probably get one of the O'Reilly books on them, it's a pretty complicated subject.

chuck2002 · Jul 27, 2006

That works like a champ! Thank you very much Nothinman.

chuck2002 · Jul 27, 2006

Actually, I spoke too soon. I had the url filter app disabled.
It still is not working. It blocks everything to microsoft.com now.
Thanks.

Nothinman · Jul 27, 2006

Not sure what to tell you, the regexp works fine here.

$ cat blah.txt
http://www.microsoft.com
http://something.microsoft.com
http://www.symantec.com

$egrep "^http://(www\.)?microsoft\.com(/|$)" blah.txt
http://www.microsoft.com

$ egrep "^http://.*\.microsoft\.com(/|$)" blah.txt
http://www.microsoft.com
http://something.microsoft.com

chuck2002 · Jul 27, 2006

Eh. I don't know either. I guess I will try some other software.
Thanks again.

cleverhandle · Jul 27, 2006

Originally posted by: chuck2002
It still is not working. It blocks everything to microsoft.com now.

You mean to any server in the microsoft.com domain, or to "http://microsoft.com" itself? You've got to be specific when you're talking regular expressions. ^http://.*\.microsoft\.com(/|$) will match (in layman's terms) http://*.microsoft.com, but not http://microsoft.com. If you also want to match (allow) http://microsoft.com, change the expression to ^http://.*\.?microsoft.com(/|$) - the added ? makes the "\." optional. That expression could be improved somewhat, because it will also allow http://ilovemicrosoft.com, which doesn't seem to be your intention. I leave the fix for that as an exercise for the reader.

Originally posted by: chuck2002
Eh. I don't know either. I guess I will try some other software.

That's right... when something is difficult, give up. Come on - regular expressions are not supposed to easy, but they're really useful once you learn them. You can do it.

edit: Reread the OP and fixed up my use of "match" and "allow". From the OP, if the site request matches the expression, it's allowed. If it doesn't match, it's denied.

chuck2002 · Jul 27, 2006

Ahh. I will try that.
I am not just giving up on the first hint of trouble. I have been trying to get this to work for 2 days now. I'm just really frustrated at this point.....

chuck2002 · Jul 27, 2006

Nope. That didn't work either. Now you feel my frustration....

cleverhandle · Jul 27, 2006

What precisely are you trying to allow and deny? Give some examples. Also, double check your typing - it's really easy to enter a regular expression and miss a period or put a slash in the wrong direction. Regexp's are unforgiving biatches.

chuck2002 · Jul 27, 2006

I am using microsoft as an example, but I am wanting to allow all urls for a given site.
I want to allow:
http://www.microsoft.com/something
http://microsoft.com/something
http://something.microsoft.com
http://www.something.microsoft.com

Also https:// urls as well, but I figured that isn't anything more than putting https in where http is.

I am using this software to block all web sites and then allow only a handful.
Since we started working with microsoft as the example, I have been copying and pasting the text examples as given and testing it with microsoft urls.

cleverhandle · Jul 27, 2006

As I read the description in the OP, the expression I gave should work...

^http://.*\.?microsoft.com(/|$)

...though, as mentioned, it can be improved on.

It's also possible that the filter is looking to match the entire site request (i.e. the site plus everything after the /), though it doesn't look that way from the examples. You might try...

^http://.*\.?microsoft.com(/.*|$)

...as well. Try to see what the difference is from what's been said so far.

If neither of those work, then the problem isn't in the regexp. Maybe you need to restart the program or do something else to get it to reread its config. Check the docs.

chuck2002 · Jul 28, 2006

That didn't work either. I appreciate your help. I think I will change gears and try a web proxy to accomplish the blocking goal.

Requesting some perl expression help

chuck2002

Senior member

Nothinman

Elite Member

chuck2002

Senior member

chuck2002

Senior member

Nothinman

Elite Member

chuck2002

Senior member

cleverhandle

Diamond Member

chuck2002

Senior member

chuck2002

Senior member

cleverhandle

Diamond Member

chuck2002

Senior member

cleverhandle

Diamond Member

chuck2002

Senior member

TRENDING THREADS