• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Detecting URLs in a string?

CurseTheSky

Diamond Member
Hi guys,

I need to find a way to detect and remove all URLs from a string using PHP. The string contents are unknown, and the URLs may or may not be "correct" - for example, a URL may or may not specify the scheme (http://, https://, ftp://, mailto:, etc.), a subdomain (www), and could have any valid TLD (.com, .org, .gov, etc.). Though it's possible, it's highly unlikely that any URLs will contain ports 🙂8080), queries (?var1=a&var2=b), or fragments (#top).

What's the best way to tackle this? I'm thinking a regular expression using preg_match, but unfortunately I'm not good with regular expressions. I could use stri_pos and a massive nested if as a brute force approach, but that would be a last resort.

Examples of URLs that would need to be removed include:
http://www.google.com
www.google.com
google.com
ftp://examplewebsite.net
https://www.examplewebsite.net/dir/dir2/pagename.html
etc.

The best example I found so far was this: http://www.reddit.com/r/PHP/comments/75dzi/how_do_i_find_a_url_in_a_string_of_input_using/c05q1so. I wrote a recursive function around it that removes URLs from the input string and returns the cleaned string at the end, but unfortunately it only matches a "correct" url, including scheme, etc.

Any ideas?

Thanks. 🙂
 
Last edited:

What about:
http://subdomain.domain.tld/dir/rea...ouldintheoryjustkeepongoingandgoingwithoutend
?

The problem with removing urls is identifying them. Any string is an url. E.g., 'localhost' is an url in the current (implied) domain. So is 'steve', though usually 'steve' is an invalid url.

I guess what I'm driving at is that you need to be more concrete about which urls you want to remove.
 
What is this for? Just about anything can be a valid internet address. An IP adress, even a number can be a valid internet address, for example, give this link a try http://3493972330 (Not dangerous. I promise, firefox doesn't like it)
 
Back
Top