/    Sign up×
Community /Pin to ProfileBookmark

Remove html tags using ereg_replace

Hi All

I’ve got a reasonable amount of know-how with php now and in my time I’ve seen these sceeeery scripts resembling something the enigma encrypter would churn out (for example, $text = ereg_replace(“[[:alpha:]]+://[^<>[:space:]]+[[:alnum:]/]”,”<a href=”0″>0</a>”, $text); – omgwtf ๐Ÿ˜ฎ runawaaayyy!!!)

Anyway – I find myself in the situation where I’m reading a webpage source code into a string and I need to strip the html tags out of it and leave just the displayable content. I realise I need to use ereg_replace or preg_replace (do I?) but I’m jiggered if I know how to use these expressions and the manuals aren’t much help!

For example if I read this into a string…

[code=html]
<html>
<head>
<title>Something wicked this way comes</title>
</head>
<body>
This is the interesting stuff I want to extract
</body>
</html>
[/code]

I want to end up with just this:

[code=html]Something wicked this way comes This is the interesting stuff I want to extract[/code]

I’d massively appreciate being given the line of code that performs this little piece of magic and if you can explain exactly what it’s doing then I might even pop round and give you a hug ?

Thanks loads already

Sol

to post a comment
PHP

12 Comments(s) โ†ด

Copy linkTweet thisAlerts:
@scragarMar 30.2005 โ€”ย [code=php]$txt = "<html>
<head>
<title>Something wicked this way comes</title>
</head>
<body>
This is the interesting stuff I want to extract
</body>
</html>";
$text = ereg_replace("/<([^<>]*)>/", "", $txt);
[/code]
Copy linkTweet thisAlerts:
@solomonauthorMar 30.2005 โ€”ย hmm... doesn't quite seem to work fully. It looks like it would probably work on the simplified example of html I gave but not on 'real world' html. Perhaps I wasn't specific enough :o

Basically, if anything appears between a '<' and a '>' it must be removed (along with the '<' and '>')... whether it's a character, a number, a space, a slash, equals, quotes, javascript.... anything!

Any other ideas anyone?

Thanks for a damn good stab tho, scragar ?
Copy linkTweet thisAlerts:
@solomonauthorMar 30.2005 โ€”ย This is me eating my words:

:eek:[SIZE=1]my words[/SIZE]

It looks like I owe you a bit of an apology, scragar. Sorry. It seems to work just fine. I obviously had difficulty squeezing it into my code - I was getting results that simply weren't working and it would appear that it was my fault. Thanks for going to the effort of uploading that bit of script for me - you da man.

Anyway - grovelling aside - could you be wonderful enough to explain how that special line of code is working? Cheers.
Copy linkTweet thisAlerts:
@JonaMar 30.2005 โ€”ย ?

[font=trebuchet ms]Hate to make things easier for you guys, but...[/font]

[code=php]
$htmlStr = striptags($htmlStr);
[/code]
Copy linkTweet thisAlerts:
@MarkLMar 30.2005 โ€”ย ?

[font=trebuchet ms]Hate to make things easier for you guys, but...[/font]

[code=php]
$htmlStr = striptags($htmlStr);
[/code]
[/QUOTE]


Darn those predefined functions, they take all the fun out of scripting!

BTW - I have always used this function as strip_tags(), not sure if both syntaxes are valid.
Copy linkTweet thisAlerts:
@JonaMar 30.2005 โ€”ย Darn those predefined functions, they take all the fun out of scripting!

BTW - I have always used this function as strip_tags(), not sure if both syntaxes are valid.[/QUOTE]

[font=trebuchet ms]Why, so it [i]is[/i] [/font][font=courier new]strip_tags()[/font][font=trebuchet ms]! Darn those predefined functions, you never know when they have an underscore or where that underscore may be![/font]
Copy linkTweet thisAlerts:
@Stephen_PhilbinMar 30.2005 โ€”ย I find that [url=http://www.zend.com/phpfunc/
]this little scamp[/url]
is quite a help. ?
Copy linkTweet thisAlerts:
@solomonauthorMar 30.2005 โ€”ย ?

[font=trebuchet ms]Hate to make things easier for you guys, but...[/font]

[code=php]
$htmlStr = striptags($htmlStr);
[/code]
[/QUOTE]

*THWACK* (sound of palm rapidly meeting forehead)

You also just inspired me here - I was trying to figure out how to get ereg_replace to strip out all html entities with no joy. After hunting through php.net I discover html_entity_decode() !!

So far, my little portion of script looks like this:
[code=php]
<?php
$words = strip_tags(file_get_contents('http://'.$url));
$words = html_entity_decode($words);
$words = preg_replace("/[[:punct:]]/", "", $words);
$words = preg_replace("/[[:space:]]/", " ", $words);
?>
[/code]

:rolleyes: You should have seen the first incarnation of this script - not quite so neat.

I'd be interested if anyone can polish it up a little more tho!

AND... if someone could at least point me in the direction of instruction on how to write the business end of ereg_replace I will be eternally grateful! ?
Copy linkTweet thisAlerts:
@JonaMar 30.2005 โ€”ย [font=trebuchet ms]Your code has [i]preg_replace[/i], but your RegEx syntax doesn't look Perl-compatible. I think you meant [i]ereg_replace[/i]. Am I missing something?[/font]
Copy linkTweet thisAlerts:
@Stephen_PhilbinMar 31.2005 โ€”ย See that's the problem with regexp. I swear they make a new type of it to celebrate the birth of every bunny in this world.
Copy linkTweet thisAlerts:
@solomonauthorMar 31.2005 โ€”ย [font=trebuchet ms]Your code has [i]preg_replace[/i], but your RegEx syntax doesn't look Perl-compatible. I think you meant [i]ereg_replace[/i]. Am I missing something?[/font][/QUOTE]

You aren't missing anything! Firstly, as I keep saying, I don't understand the first thing about that syntax - secondly, I just copied that bit from somebody elses script, warts and all! :o so I shall change it to ereg straight away ?

Hmm.. just tried changing it and it stops doing quite what I need - it doesn't remove all punctuation and numbers ? n/m - I shall just have to deal with having slightly clumsy code? wouldn't be the first time!

I have just finished working on my little script - would you like to see it in operation?

http://www.thrutch.co.uk/code/passwords/

constructive criticism always welcome
ร—

Success!

Help @solomon spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 โ€” BETA 5.20,
whats_new: community page,
up_next: more Davinciโ€ข003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinciโ€ข003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @AriseFacilitySolutions09,
tipped: article
amount: 1000 SATS,

tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,
)...