CurseTheSky
Diamond Member
Hi guys,
I need to find a way to detect and remove all URLs from a string using PHP. The string contents are unknown, and the URLs may or may not be "correct" - for example, a URL may or may not specify the scheme (http://, https://, ftp://, mailto:, etc.), a subdomain (www), and could have any valid TLD (.com, .org, .gov, etc.). Though it's possible, it's highly unlikely that any URLs will contain ports 🙂8080), queries (?var1=a&var2=b), or fragments (#top).
What's the best way to tackle this? I'm thinking a regular expression using preg_match, but unfortunately I'm not good with regular expressions. I could use stri_pos and a massive nested if as a brute force approach, but that would be a last resort.
Examples of URLs that would need to be removed include:
http://www.google.com
www.google.com
google.com
ftp://examplewebsite.net
https://www.examplewebsite.net/dir/dir2/pagename.html
etc.
The best example I found so far was this: http://www.reddit.com/r/PHP/comments/75dzi/how_do_i_find_a_url_in_a_string_of_input_using/c05q1so. I wrote a recursive function around it that removes URLs from the input string and returns the cleaned string at the end, but unfortunately it only matches a "correct" url, including scheme, etc.
Any ideas?
Thanks. 🙂
I need to find a way to detect and remove all URLs from a string using PHP. The string contents are unknown, and the URLs may or may not be "correct" - for example, a URL may or may not specify the scheme (http://, https://, ftp://, mailto:, etc.), a subdomain (www), and could have any valid TLD (.com, .org, .gov, etc.). Though it's possible, it's highly unlikely that any URLs will contain ports 🙂8080), queries (?var1=a&var2=b), or fragments (#top).
What's the best way to tackle this? I'm thinking a regular expression using preg_match, but unfortunately I'm not good with regular expressions. I could use stri_pos and a massive nested if as a brute force approach, but that would be a last resort.
Examples of URLs that would need to be removed include:
http://www.google.com
www.google.com
google.com
ftp://examplewebsite.net
https://www.examplewebsite.net/dir/dir2/pagename.html
etc.
The best example I found so far was this: http://www.reddit.com/r/PHP/comments/75dzi/how_do_i_find_a_url_in_a_string_of_input_using/c05q1so. I wrote a recursive function around it that removes URLs from the input string and returns the cleaned string at the end, but unfortunately it only matches a "correct" url, including scheme, etc.
Any ideas?
Thanks. 🙂
Last edited: