/    Sign up×
Community /Pin to ProfileBookmark

Detecting URL within an HTML string

Essentially I am trying to detect URLs in a string, so as I can hyperlink them.

I essentially produced the following function

[code=php]
$array=explode(‘ ‘,$string);
foreach($array as $key => $val){
if(substr($val, 0, 7)==’http://’){
$array[$key]='<a href=”‘.$val.'” target=”_blank”>’.$val.'</a>’;
}
}
$string=implode(‘ ‘,$val);
[/code]

This works fine, until you factor in the various html tags placed within the string output from my CMS. If the link is next to, for example, a <p> or <br /> tag, then there is no space in between for the explode function to isolate just the link.

I have experimented with further exploding each array value around ‘<‘ and ‘>’ symbols, which seems like it should work (well, in my head, anyway) But it doesnt seem to be.

The code I now have is as follows…

[code=php]
$content2=explode(‘ ‘,$content3);
foreach($content2 as $key => $val){
$subcontent=explode(‘<‘,$val);
foreach($subcontent as $key => $val){
$subsubcontent=explode(‘>’,$val);
foreach($subsubcontent as $key => $val){
if(substr($val, 0, 7)==’http://’){
$subsubcontent[$key]='<a href=”‘.$val.'” target=”_blank”>’.$val.'</a>’;
}
}
$val=implode(‘x’,$subsubcontent);
}
$val=implode(‘x’,$subcontent);
}
$content=implode(‘ ‘,$content2);
[/code]

However, now it does not detect any of the links.

Is anyone able to suggest why this is not working, and perhaps demonstrate how I might be better off doing it? Or, of course, if there is a better method than this?

to post a comment
PHP

7 Comments(s)

Copy linkTweet thisAlerts:
@aj_nscMar 27.2010 — Sounds like its time for you to learn regular expressions.

Here's what you're looking for:

http://snipplr.com/view/2371/regex-regular-expression-to-match-a-url/
Copy linkTweet thisAlerts:
@abimpson_googleauthorMar 27.2010 — Ah i was aware that regular expressions might do the trick, but i've never really been able to get my head around them!

[code=php]$content = preg_replace('@(https?://([-w.]+)+(:d+)?(/([w/_.]*(?S+)?)?)?)@', '<a href="$1" target="_blank" >$1</a>', $content);[/code]

It seems to work to an extent, but as the comments on that page suggest, it stops running when it meets anything other than a letter or _. For example, any of the following - + & etc...

I can see there is the _ part in the code.... [w/_.]

But i wouldn't know how to add in additional characters in the right format.

What would I do to add an additional character to search for within this fucntion?

Thanks very much for the help!
Copy linkTweet thisAlerts:
@donatelloMar 27.2010 — Bookmark for later
Copy linkTweet thisAlerts:
@abimpson_googleauthorMar 27.2010 — @donatello What?

Ok after some playing i realised i discovered that checking for an additional symbol isnt too bad...but there's still an issue with what i produced.

[code=php]@(https?://([-w.]+)+(:d+)?(/([w/_.]*[w/+.]*[w/-.]*[w/&.]*[w/&#37;.]*(?S+)?)?)?)@[/code]

I copied what was there and put some additional symbols in, seperated by * which works - except that the symbols have to be in the order listed in the expression. If it gets to an ampersand &, for example, then any plus + symbols after that are not recognised, because they appeared earlier on in the expression.

So what method should I use instead?
Copy linkTweet thisAlerts:
@abimpson_googleauthorMar 27.2010 — I think it needs to be something along the lines of...

@(https?://([-w.]+)+(:d+)?(/([w/ _ OR + OR & OR &#37; .]*(?S+)?)?)?)@

Is this possible?
Copy linkTweet thisAlerts:
@abimpson_googleauthorMar 29.2010 — I guess it's not possible then. Any other potential workarounds?
Copy linkTweet thisAlerts:
@abimpson_googleauthorMay 16.2010 — Unfortunately I still haven't been able to solve this issue, and am coming back to it now in the hope that I may still be able to implement it.

Anybody able to develop on the the preg_replace function mentioned...

[code=php]@(https?://([-w.]+)+(:d+)?(/([w/_.]*[w/+.]*[w/-.]*[w/&.]*[w/&#37;.]*(?S+)?)?)?)@ [/code]

As I say it effectively needs to be tweaked to detect one of an array of characters, not list through each sequentially...it's the bit within the "" below that I think needs to work:

@(https?://([-w.]+)+(:d+)?(/([w/ " _ OR + OR & OR % " .]*(?S+)?)?)?)@

Thanks very much!
×

Success!

Help @abimpson_google spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 6.2,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @meenaratha,
tipped: article
amount: 1000 SATS,

tipper: @meenaratha,
tipped: article
amount: 1000 SATS,

tipper: @AriseFacilitySolutions09,
tipped: article
amount: 1000 SATS,
)...