/    Sign up×
Community /Pin to ProfileBookmark

I need help with this preg_replace script.

I have a script which processes the output of TinyMCE. I added a custom <code></code> tag in TinyMCE. However, TincyMCE still uses <p> and <div> to create the line breaks. This breaks validation because they cannot be inside a <code> tag. I want to strip_tag() everything in the <code></code>. Here’s what I have so far (it does not work):

[code=php]preg_match_all(‘/(<code>)(.*)(</code>)/’,$content,$results,PREG_PATTERN_ORDER);

foreach($results[1] as $result){
$code=strip_tags($result);
$content = preg_replace(‘/<code>.*</code>/’,'<code>’.$code.'</code>’,$content)
}[/code]

to post a comment
PHP

27 Comments(s)

Copy linkTweet thisAlerts:
@narutodude000authorOct 23.2010 — Can someone give me some ideas/functions to work with?
Copy linkTweet thisAlerts:
@ChipzzzOct 23.2010 — Try this:

[code=php]
preg_match_all('/<code>(.*)</code>/',$content,$results,PREG_PATTERN_ORDER);
$content='<code>';
foreach($results[1] as $result){
$content.=preg_replace('/(<.+?>)/', '',$result);
}
$content .= '</code>';
[/code]


Have fun ?
Copy linkTweet thisAlerts:
@narutodude000authorOct 23.2010 — Thanks, but there are other stuff in $content other than the <code>. After all the tags in <code></code> is removed, I want to return the cleaned <code> to where it was (in $content). How can I do this?
Copy linkTweet thisAlerts:
@ChipzzzOct 23.2010 — Something like this, maybe?

[code=php]
preg_match_all('/(.*?)?<code>(.*?)?</code>(.*?)?/',$content,$results,PREG_SET_ORDER);
$newContent='';
$trailer = FALSE;
foreach($results as $result){
$newContent.=$result[1];
$newContent.='<code>' . preg_replace('/(<.+?>)/', '',$result[2]) . '</code>';
if (isset($result[3])) $trailer = TRUE;
}
if ($trailer) {
preg_match('/.*</code>(.*)/', $content, $tail);
}
$content = $newContent . $tail[1];
[/code]


Cheers ?
Copy linkTweet thisAlerts:
@zimonyiOct 25.2010 — Another solution could be the following:

1: Change all <code> to something else, like FOOBAR

2: strip all tags

3: Change back FOOBAR to <code>

Code below:

[CODE]
preg_replace('/<(/?)code>/,"$1FOOBAR",$content);
$content=strip_tags($content);
preg_replace('/(/?)FOOBAR/,"<$1code>",$content);
[/CODE]


Hope it works,

Archie
Copy linkTweet thisAlerts:
@ChipzzzOct 25.2010 — Using bbcodes ([ code ] and [ /code ]) would also simplify the problem with the advantage of allowing tags to be embedded in the code, as happens occasionally.
Copy linkTweet thisAlerts:
@narutodude000authorOct 26.2010 — @Chipzz: I tried your script, it only returned the content inside of <code></code>, I also have other stuff outside the <code></code>. Thanks anyways ?

@zimoyi: I only want to strip the tags inside <code></code>, I have tags that I want to keep outside of <code></code>

Basically, I want something similar to this forum. You can use tags outside the [code][/code], but the tags inside get converted to plain text. However, in my case, my TinyMCE editor automatically adds <br>'s, <div>'s, and other tags inside the <code></code>, I want to remove them. This way, tags are normal outside the <code>, but there are no tags inside the <code>.
Copy linkTweet thisAlerts:
@eval_BadCode_Oct 27.2010 — [code=php]

function pop_source_between_tags($string,$open,$close,$repl) {
if (substr_count($string,$open) == 0) return $repl;
if ( (substr($string,0,strlen($open)) != $open) && (substr_count($string,$open)) ) {
$e = substr($string,0,strpos($string,$open));
$string = substr($string,strlen($e));
return pop_source_between_tags($string,$open,$close,$repl); }
$part = substr($string,strpos($string,$open), strpos($string,$close)+strlen($close));
$repl[] = $part;
$new = substr($string,(strlen($close)+strpos($string,$close)));
if (substr_count($string,$close) >= 1) { return pop_source_between_tags($new,$open,$close,$repl); }
else { return $repl; }
}

function yo_dawg_i_heard_you_didnt_like_tags($block,$tag) {
$endtag = "</" . substr($tag, 1);
$z = pop_source_between_tags($block,$tag,$endtag,array());
foreach($z as $y => $x) {
$block = str_replace($z[$y],$tag.strip_tags($z[$y]).$endtag,$block);
}
return $block;
}

[/code]


This forum doesn't replace tags... it just replaces < and > with &lt; and &gt; (among some other things im sure, they're html entities)

This would probably be.... 5 lines in python :/
Copy linkTweet thisAlerts:
@ChipzzzOct 27.2010 — Hi,

I thought that was what you wanted, and although I didn't test the code rigorously, I gave it this:
<i>
</i>$content='&lt;p&gt;Here is some text &lt;/p&gt; and some more text&lt;code&gt;and a little code snipped&lt;p&gt; with embedded tags&lt;/p&gt; and another tag&lt;h4&gt;over here&lt;/h4&gt; and this will be the end of the code section&lt;/code&gt; but not the end of the whole message.&lt;p&gt;In fact, we can add another code section here if we like, like this: &lt;code&gt;&lt;h2&gt;This is another code section...&lt;/h2&gt;&lt;/code&gt;, and finally, some more plain text&lt;p&gt;with tags&lt;/p&gt;.';


and it gave me back this:

<p>Here is some text </p> and some more text<code>and a little code snipped with embedded tags and another tagover here and this will be the end of the code section</code> but not the end of the whole message.<p>In fact, we can add another code section here if we like, like this: <code>This is another code section...</code>, and finally, some more plain text<p>with tags</p>.

which seemed like what you were looking for.

Not too long ago I had a problem manipulating the text in FCKeditor, which turned out to be caused by some characters above 128 (160 was one of them if memory serves, but I can check if you need to know) that they had embedded for their formatting and wreaked havoc with my code until I figured it out. Maybe TinyMCE is doing something similar.

I don't have TinyMCE set up on anything that I can play with easily so I just used plain PHP for this, all of which convinces me that there's something going on with the editor that you haven't found yet. I'll keep it in mind and if I think of anything I'll let you know.

Have a nice day ?
Copy linkTweet thisAlerts:
@criterion9Oct 27.2010 — 
This forum doesn't replace tags... it just replaces < and > with &lt; and &gt; (among some other things im sure, they're html entities)[/quote]

This forum also uses bbcode style ([code=php], , etc) because that has sort of become the standard for embedding a specific subset of tags in a forum.

This would probably be.... 5 lines in python :/[/QUOTE]
[code=php]htmlentities($string);
? See here.
Copy linkTweet thisAlerts:
@narutodude000authorOct 28.2010 — @Chipzzz: I tried the script again and it works this time. Thanks!

Btw, I know how forums work. I'm using TinyMCE, which is very difficult to modify. Thanks anyways ?
Copy linkTweet thisAlerts:
@ChipzzzOct 28.2010 — Glad to hear you got it sorted out ?
Copy linkTweet thisAlerts:
@narutodude000authorOct 29.2010 — Sorry to bother you again, but there still appears to be bugs in the script. Sometimes, all of $content becomes nothing. Also, when it does strip tags in <code> properly, there's a lot of newline and space characters inside and outside the <code>. My WordPress CMS automatically converts them to <br>, which is a problem. Here's the code I'm currently using.

[code=php]//To reduce risk of errors
if(preg_match('/<code>/',$content)){
preg_match_all('/(.*?)?<code>(.*?)?</code>(.*?)?/',$content,$results,PREG_SET_ORDER);
$newContent='';
$trailer = FALSE;
foreach($results as $result){
$newContent.=$result[1];
$newContent.='<code>' . preg_replace('/(<.+?>)/', '',$result[2]) . '</code>';
if (isset($result[3])) $trailer = TRUE;
}
if ($trailer) {
preg_match('/.*</code>(.*)/', $content, $tail);
}
$content = $newContent . $tail[1];
}[/code]


The form which this script processes is here: http://linksku.com/new-post
Copy linkTweet thisAlerts:
@ChipzzzOct 29.2010 — No bother... I enjoyed looking into this and gained an interest in TinyMCE in the process. Probably your best option is to use the bbcode plugin as shown here: http://tinymce.moxiecode.com/examples/example_09.php# . Doing it that way gives you the WYSIWYG code block while you're still in the editor, which seems far better than trying to modify the editor's output after the fact even when it isn't problematic. There's a demo at that URL... take a look, I'm sure you'll be impressed.

P.S. When $content comes up empty, it is likely due to embedded control characters like the ones I was describing earlier. If so, putting this in front of the previous code will take care of it:

[code=php]
$content='Here is a little content upon which to test this code';

$tempContent = $content;
$content = '';
$j = strlen($tempContent);
$i = 0;
while ($i < $j) {
$ch = $tempContent[$i];
if (ord($ch) > 128) $ch = ' ';
$content .= $ch;
$i++;
}
[/code]
Copy linkTweet thisAlerts:
@narutodude000authorOct 29.2010 — I looked at and tried the BBCode link. However, tags can still be added between [code ][/code ]. Do you know of any way to strip all tags within a certain tag in TinyMCE? I created the "codebox" button in TinyMCE by duplicating the "blockquote" button and changing <blockquote></blockquote> to <code></code>. However, tags are still allowed in blockquote.

My form does not use BBCodes. TinyMCE generates all of the HTML tags.
Copy linkTweet thisAlerts:
@ChipzzzOct 30.2010 — Well, it's a little long-winded, but if you combine the P.S. from #15 (slightly enhanced) with the code from #5 that sometimes works, you get this:

[code=php]
$tempContent = $content;
$content = '';
$j = strlen($tempContent);
$i = 0;
while ($i < $j) {
$ch = $tempContent[$i];
if ((ord($ch) < 32) || (ord($ch) > 127)) $ch = ' ';
$content .= $ch;
$i++;
}
preg_match_all('/(.*?)?<code>(.*?)?</code>(.*?)?/',$content,$results,PREG_SET_ORDER);
$newContent='';
$trailer = FALSE;
foreach($results as $result){
$newContent.=$result[1];
$newContent.='<code>' . preg_replace('/(<.+?>)/', '',$result[2]) . '</code>';
if (isset($result[3])) $trailer = TRUE;
}
if ($trailer) {
preg_match('/.*</code>(.*)/', $content, $tail);
}
$content = $newContent . $tail[1];
[/code]


It first removes all but the printing ascii characters and then does the replacement. As long as you don't try to embed quoted tags in the code, it should cover about any circumstance that may be causing you problems. If you want to be able to use quoted tags in your code blocks, just go with the bbcode and that problem will be solved.

I'll be interested in hearing how it works ?
Copy linkTweet thisAlerts:
@narutodude000authorOct 30.2010 — Hi, ASCII characters are not the problem. I tried submitting a random string of unusual ASCII characters, but there wasn't a problem.

I tried using this script, which is a slightly modified version of your script:

[code=php]if(preg_match('/<code>/',$content)){
preg_match_all('/(.*?)?<code>(.*?)?</code>(.*?)?/',$content,$results,PREG_SET_ORDER);
$newContent='';
$trailer = FALSE;
foreach($results as $result){
$newContent.=$result[1];
$newContent.='<code>' . strip_tags($result[2]) . '</code>';
if (isset($result[3])) $trailer = TRUE;
}
if ($trailer) {
preg_match('/.*</code>(.*)/', $content, $tail);
}
$content = $newContent . $tail[1];
}[/code]


I tried submitting various posts with <code>, and I found the following problem: there must be a newline character before and after the <code></code>, or else everything outside the <code> will disappear. However, if there are newline characters before and after it, a few extra <br> will appear before the <code>, creating a large space.

I have no idea what's wrong, do you?
Copy linkTweet thisAlerts:
@ChipzzzOct 30.2010 — Sometimes when you stare at a piece of code for too long, it begins to behave strangely... ?

Try this:

[code=php]
if(preg_match('/<code>/',$content)){
$tempContent = $content;
$content = '';
$j = strlen($tempContent);
$i = 0;
while ($i < $j) {
$ch = $tempContent[$i];
if ((ord($ch) < 32) || (ord($ch) > 127)) $ch = ' ';
$content .= $ch;
$i++;
}
preg_match_all('/(.*?)?<code>(.*?)?</code>(.*?)?/',$content,$results,PREG_SET_ORDER);
$newContent='';
$trailer = FALSE;
foreach($results as $result){
$newContent.=$result[1];
$newContent.='<code>' . preg_replace('/(<.+?>)/', '',$result[2]) . '</code>';
if (isset($result[3])) $trailer = TRUE;
}
if ($trailer) {
preg_match('/.*</code>(.*)/', $content, $tail);
}
$content = $newContent . $tail[1];
}
[/code]


Notice the '<' and '>' in the first line... they're important.

Have fun ?
Copy linkTweet thisAlerts:
@narutodude000authorOct 30.2010 — I tried starting all over and wrote this script:
[code=php]// Code tags in the main content
if(preg_match('/(<code>)/',$content,$match)){
$i = $a = count($match)-1;

while($i>0) {
$content = preg_replace('/<code>(.*?)</code>/','<code'.$i.'>$1</code'.$i.'>',$content,1);
$i--;
}

while($a>0){
preg_match('/<code'.$a.'>(.*?)</code'.$a.'>/',$result);
$content = preg_replace('/<code'.$a.'>(.*?)</code'.$a.'>/','<code>'.preg_replace('/(<.+?>)/', '',$result[1]).'</code>');
$a--;
}
}[/code]

I don't think the first two lines are working properly because $content isn't affected at all.

This script replaces <code>hi</code><code>bye</code>

with <code2>hi</code2><code1>bye</code1>. Then it strips the tags inside the <code> and preg_replaces the cleaned string back into $content. However, it isn't working properly.
Copy linkTweet thisAlerts:
@ChipzzzOct 30.2010 — That's an interesting way to go about it. I think you want to preg_match_all so that it doesn't stop after the first match.
Copy linkTweet thisAlerts:
@narutodude000authorOct 31.2010 — Edit: here's what I got so far:

[code=php]// Code tags in the main content
if(preg_match('/<code>/',$content)){
$i = $a = preg_match_all('/<code>/',$content,$match);
while($i>0) {
$content = preg_replace('/<code>/','<code'.$i.'>',$content,1);
$content = preg_replace('/</code>/','</code'.$i.'>',$content,1);
$i--;
}
while($a>0){
//debug
$content = $content.$a;
preg_match('/<code'.$a.'>(.*?)</code'.$a.'>/',$content,$result);
echo $result[1];die;
$content = preg_replace('/<code'.$a.'>(.*?)</code'.$a.'>/','<code>'.preg_replace('/(<.+?>)/', '',$result[1]).'</code>',$content);
$a--;
}
}[/code]
Copy linkTweet thisAlerts:
@narutodude000authorOct 31.2010 — [code=php]if(preg_match('/<code>/',$content)){
$i = $a = preg_match_all('/<code>/',$content,$match);
while($i>0) {
$content = preg_replace('/<code>/','<code'.$i.'>',$content,1);
$content = preg_replace('/</code>/','</code'.$i.'>',$content,1);
$i--;
}
while($a>0){
//debug
$content = $content.$a;
//this line works
preg_match('/(<code'.$a.'>)(.*)(</code'.$a.'>)/',$content,$result);
//error probably lies in this line
$content = preg_replace('/(<code'.$a.'>)(.*?)(</code'.$a.'>)/','<code>'.strip_tags($result[2]).'</code>',$content);
$a--;
}
}[/code]

I still do not know whats wrong. Everything in $content disappears.
Copy linkTweet thisAlerts:
@ChipzzzOct 31.2010 — Strange... I gave it this:

This is some text <code> and an <i>italic</i> tag in the code section; </code> and some more text, followed by another <code> code section and an <p> paragraph tag </p> with a <h1> heading</h1> for good measure</code> and some text to w<i>rap</i> it all up.

and it gave me back this:

This is some text <code> and an italic tag in the code section; </code> and some more text, followed by another <code> code section and an paragraph tag with a heading for good measure</code> and some text to w<i>rap</i> it all up.21

which is what it should have, with the exception of the '21' at the end, which is probably something you'll figure out without much difficulty. I still think you'll have to prefix your code with this:
<i>
</i>$tempContent = $content;
$content = '';
$j = strlen($tempContent);
$i = 0;
while ($i &lt; $j) {
$ch = $tempContent[$i];
if (ord($ch) &gt; 128) $ch = ' ';
$content .= $ch;
$i++;
}

to get rid of anything the editor is adding to the text that you don't know about, though.

Good job so far... and Happy Halloween! ?
Copy linkTweet thisAlerts:
@eval_BadCode_Oct 31.2010 — Have you tried my yo_dawg_i_heard_you_didnt_like_tags function?

That way you can parse while you parse.
Copy linkTweet thisAlerts:
@narutodude000authorOct 31.2010 — @eval: I tried it, it was too complicated.

@chipzzz: It seems that my regex doesn't work with newline characters.

My editor generates this:
[code=html]tes<strong>ting</strong>
<code>
<div>te<strong>stin</strong>g</div>
</code>
<div><strong>tes</strong>ting</div>
<code>
<div>te<strong>stin</strong>g</div>
</code>[/code]

Which returns:
[code=html] tes<strong>ting</strong>
<br><code2>
<br><div>te<strong>stin</strong>g</div>
<br></code2>
<br><div><strong>tes</strong>ting</div>
<br><code1>[/code]


But if I enter:
[code=html]tes<strong>ting</strong><code><div>te<strong>stin</strong>g</div></code><div><strong>tes</strong>ting</div><code><div>te<strong>stin</strong>g</div></code>[/code]
It returns:
[code=html]tes<strong>ting</strong><code>testing</code><div><strong>tes</strong>ting</div><code>testing</code>[/code]
Which is what I want. I have to modify the regex so that it works with newline characters. I'm not too good with regex, can you help me?
Copy linkTweet thisAlerts:
@narutodude000authorOct 31.2010 — I got it working by adding the modifier "s" at the end of the regex. Thanks everyone for all your help!
Copy linkTweet thisAlerts:
@CharlesOct 31.2010 — I might have written this once or twice before but HTML is way too complicated for regular expressions. You need a parser and DOMDocuement will parse HTML.
×

Success!

Help @narutodude000 spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 5.18,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @AriseFacilitySolutions09,
tipped: article
amount: 1000 SATS,

tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,
)...