/    Sign up×
Community /Pin to ProfileBookmark

[RESOLVED] Strip all html.

Hi,

I have a search function that searched a DB and returns the data, that works great. BUT, In the DB i store the info as layout, so all the data come back all formatted with html, p’s and div’s and even some tables in the older content.

I have searched and found this code posted by Bokeh and NogDog years ago but it doesn’t strip everything. I just want the plain text outside of <> tags.

Any Ideas

[url]http://www.asktom.co.nz/search[/url]

[code=php]
function replace_links($text)
# convert HTML links to textual representations
# “<a href=”http:/a.b.com/”>test</a>” -> “test (http:/a.b.com/)”
{
# define regexp components for main regexp:
$start = ‘<as[^>]*href=’; # start of A link
$mail_q = ‘[‘”]mailto:([^'”]+)[‘”]’; # quoted mailto
$mail_u = ‘mailto:([^s>]+)’; # unquoted mailto
$link_q = ‘[‘”](h?[ft]tp:[^'”]+)[‘”]’; # quoted http or ftp link
$link_u = ‘(h?[ft]tp:[^s>]+)’; # unquoted http or ftp link
$end = ‘[^>]*>(.+)</a>’; # end of A link

$search = array(“/$start(?:$mail_q|$mail_u|$link_q|$link_u)$end/i”,
‘/<as[^>]*>(.*)</a>/i’); # local file or other non-match
$replace = array(‘5 (1234)’, ‘1’);
return(preg_replace($search, $replace, $text));
}
function CleanUp($input)
{
// list of allowed tags. Edit to taste
define(‘__HTML__’, ‘a|b|br|i|img|p|span’);

// list of allowed attributes. Edit to taste
define(‘__ATTRIBUTES__’, ‘src|alt|href|title|class’);

if(!function_exists(‘DisallowedTagsCallback’))
{
function DisallowedTagsCallback($input)
{
$input[0] = strip_tags($input[0]);
return htmlentities($input[0]);
}
}

if(!function_exists(‘DisallowedAttributesCallback’))
{
function DisallowedAttributesCallback($input)
{
$regex = ‘/s*b(?!(?:’.__ATTRIBUTES__.’))[a-z]+bs*[=]s*([‘”])’.
‘(((?!1).)|((?<=[\\])1))*1/is’;
return preg_replace($regex, ”, $input[0]);
}
}

// strip out any javascript ( <script>, onclick etc, and href=”javascript:” )
$regex = array(‘/<scriptb[^>]*>((?!</scriptb[^>]*>).)*</scriptb[^>]*>/is’,
‘/s*bon[a-z]+s*[=]s*([“‘])(((?!1)[^\])|((?<!\\)(?:\\\\)*\\1)|(?!(?:\\)*1)\\)*1/i’,
‘/href+s*[=]s*([“‘])((?!1).)*javascript((?!1).)*1/is’);
$replace = array(”, ”, ‘href=”#”‘);
$input = preg_replace($regex, $replace, $input);

// strip disallowed tags
$regex= ” @(((?<=^)|(?<=[>]))(?![<]/?(“.__HTML__.”)b[^>]*[>])”.
“([^<]|((?![<]/?(“.__HTML__.”)b[^>]*[>])[<]))+)@i”;
$input = preg_replace_callback($regex, ‘DisallowedTagsCallback’, $input);

// strip disallowed attributes
$regex = ‘/(?<=[<])[^>]+(?=[>])/’;
return preg_replace_callback($regex, ‘DisallowedAttributesCallback’, $input);
}
function bold($tag,$line){
$line = replace_links($line);
$line = CleanUp($line);
$line = htmlentities($line);
$line = substr(str_replace($tag, “<strong class=”highlight”>”.$tag.”</strong>”, $line), 30);
return $line;
}
[/code]

Thanks for taking the time

to post a comment
PHP

3 Comments(s)

Copy linkTweet thisAlerts:
@patenaudematMay 13.2007 — So, let me get this straight-- you just want to get rid of all HTML tags? Nothing fancier?

Try this: http://php.net/strip_tags ?

Hope it helps!

-Matt
Copy linkTweet thisAlerts:
@SheldonauthorMay 13.2007 — Thanks, Gezz i new there was an easier way, I should have know it.
Copy linkTweet thisAlerts:
@patenaudematMay 13.2007 — No problem, happy to help. ?

-Matt
×

Success!

Help @Sheldon spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 5.18,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @AriseFacilitySolutions09,
tipped: article
amount: 1000 SATS,

tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,
)...