I need help with this preg_replace script.

@narutodude000Oct 22.2010

I have a script which processes the output of TinyMCE. I added a custom <code></code> tag in TinyMCE. However, TincyMCE still uses <p> and <div> to create the line breaks. This breaks validation because they cannot be inside a <code> tag. I want to strip_tag() everything in the <code></code>. Here’s what I have so far (it does not work):

[code=php]preg_match_all(‘/(<code>)(.*)(</code>)/’,$content,$results,PREG_PATTERN_ORDER);

foreach($results[1] as $result){ $code=strip_tags($result); $content = preg_replace(‘/<code>.*</code>/’,'<code>’.$code.'</code>’,$content) }[/code]

to post a comment

PHP

27 Comments(s) _↴

@narutodude000authorOct 23.2010 — #Can someone give me some ideas/functions to work with?

@ChipzzzOct 23.2010 — #Try this:

[code=php]
 preg_match_all('/<code>(.*)</code>/',$content,$results,PREG_PATTERN_ORDER);
 $content='<code>';
 foreach($results[1] as $result){
 $content.=preg_replace('/(<.+?>)/', '',$result);
 }
 $content .= '</code>';
 [/code]

Have fun ?

@narutodude000authorOct 23.2010 — #Thanks, but there are other stuff in $content other than the <code>. After all the tags in <code></code> is removed, I want to return the cleaned <code> to where it was (in $content). How can I do this?

@ChipzzzOct 23.2010 — #Something like this, maybe?

[code=php]
 preg_match_all('/(.*?)?<code>(.*?)?</code>(.*?)?/',$content,$results,PREG_SET_ORDER);
 $newContent='';
 $trailer = FALSE;
 foreach($results as $result){
 $newContent.=$result[1];
 $newContent.='<code>' . preg_replace('/(<.+?>)/', '',$result[2]) . '</code>';
 if (isset($result[3])) $trailer = TRUE;
 }
 if ($trailer) {
 preg_match('/.*</code>(.*)/', $content, $tail);
 }
 $content = $newContent . $tail[1];
 [/code]

Cheers ?

@zimonyiOct 25.2010 — #Another solution could be the following:

1: Change all <code> to something else, like FOOBAR

2: strip all tags

3: Change back FOOBAR to <code>

Code below:

[CODE]
 preg_replace('/<(/?)code>/,"$1FOOBAR",$content);
 $content=strip_tags($content);
 preg_replace('/(/?)FOOBAR/,"<$1code>",$content);
 [/CODE]

Hope it works,

Archie

@ChipzzzOct 25.2010 — #Using bbcodes ([ code ] and [ /code ]) would also simplify the problem with the advantage of allowing tags to be embedded in the code, as happens occasionally.

@narutodude000authorOct 26.2010 — #@Chipzz: I tried your script, it only returned the content inside of <code></code>, I also have other stuff outside the <code></code>. Thanks anyways ?

@zimoyi: I only want to strip the tags inside <code></code>, I have tags that I want to keep outside of <code></code>

Basically, I want something similar to this forum. You can use tags outside the [code][/code], but the tags inside get converted to plain text. However, in my case, my TinyMCE editor automatically adds <br>'s, <div>'s, and other tags inside the <code></code>, I want to remove them. This way, tags are normal outside the <code>, but there are no tags inside the <code>.

@eval_BadCode_Oct 27.2010 — #

[code=php]
 
 function pop_source_between_tags($string,$open,$close,$repl) {
 if (substr_count($string,$open) == 0) return $repl;
 if ( (substr($string,0,strlen($open)) != $open) && (substr_count($string,$open)) ) {
 $e = substr($string,0,strpos($string,$open));
 $string = substr($string,strlen($e));
 return pop_source_between_tags($string,$open,$close,$repl); }
 $part = substr($string,strpos($string,$open), strpos($string,$close)+strlen($close));
 $repl[] = $part;
 $new = substr($string,(strlen($close)+strpos($string,$close)));
 if (substr_count($string,$close) >= 1) { return pop_source_between_tags($new,$open,$close,$repl); }
 else { return $repl; }
 }
 
 function yo_dawg_i_heard_you_didnt_like_tags($block,$tag) {
 $endtag = "</" . substr($tag, 1);
 $z = pop_source_between_tags($block,$tag,$endtag,array());
 foreach($z as $y => $x) {
 $block = str_replace($z[$y],$tag.strip_tags($z[$y]).$endtag,$block);
 }
 return $block;
 }
 
 [/code]

This forum doesn't replace tags... it just replaces < and > with < and > (among some other things im sure, they're html entities)

This would probably be.... 5 lines in python :/

@ChipzzzOct 27.2010 — #Hi,

I thought that was what you wanted, and although I didn't test the code rigorously, I gave it this:

<i>
 </i>$content='&lt;p&gt;Here is some text &lt;/p&gt; and some more text&lt;code&gt;and a little code snipped&lt;p&gt; with embedded tags&lt;/p&gt; and another tag&lt;h4&gt;over here&lt;/h4&gt; and this will be the end of the code section&lt;/code&gt; but not the end of the whole message.&lt;p&gt;In fact, we can add another code section here if we like, like this: &lt;code&gt;&lt;h2&gt;This is another code section...&lt;/h2&gt;&lt;/code&gt;, and finally, some more plain text&lt;p&gt;with tags&lt;/p&gt;.';

and it gave me back this:

<p>Here is some text </p> and some more text<code>and a little code snipped with embedded tags and another tagover here and this will be the end of the code section</code> but not the end of the whole message.<p>In fact, we can add another code section here if we like, like this: <code>This is another code section...</code>, and finally, some more plain text<p>with tags</p>.

which seemed like what you were looking for.

Not too long ago I had a problem manipulating the text in FCKeditor, which turned out to be caused by some characters above 128 (160 was one of them if memory serves, but I can check if you need to know) that they had embedded for their formatting and wreaked havoc with my code until I figured it out. Maybe TinyMCE is doing something similar.

I don't have TinyMCE set up on anything that I can play with easily so I just used plain PHP for this, all of which convinces me that there's something going on with the editor that you haven't found yet. I'll keep it in mind and if I think of anything I'll let you know.

Have a nice day ?

@criterion9Oct 27.2010 — #
This forum doesn't replace tags... it just replaces < and > with < and > (among some other things im sure, they're html entities)[/quote]
This forum also uses bbcode style (

[code=php], , etc) because that has sort of become the standard for embedding a specific subset of tags in a forum.
 
 This would probably be.... 5 lines in python :/[/QUOTE]
 [code=php]htmlentities($string);

? See here.

@narutodude000authorOct 28.2010 — #@Chipzzz: I tried the script again and it works this time. Thanks!

Btw, I know how forums work. I'm using TinyMCE, which is very difficult to modify. Thanks anyways ?

@ChipzzzOct 28.2010 — #Glad to hear you got it sorted out ?

@narutodude000authorOct 29.2010 — #Sorry to bother you again, but there still appears to be bugs in the script. Sometimes, all of $content becomes nothing. Also, when it does strip tags in <code> properly, there's a lot of newline and space characters inside and outside the <code>. My WordPress CMS automatically converts them to <br>, which is a problem. Here's the code I'm currently using.

[code=php]//To reduce risk of errors
 if(preg_match('/<code>/',$content)){
 preg_match_all('/(.*?)?<code>(.*?)?</code>(.*?)?/',$content,$results,PREG_SET_ORDER); 
 $newContent=''; 
 $trailer = FALSE; 
 foreach($results as $result){ 
 $newContent.=$result[1]; 
 $newContent.='<code>' . preg_replace('/(<.+?>)/', '',$result[2]) . '</code>'; 
 if (isset($result[3])) $trailer = TRUE; 
 } 
 if ($trailer) { 
 preg_match('/.*</code>(.*)/', $content, $tail); 
 } 
 $content = $newContent . $tail[1]; 
 }[/code]

The form which this script processes is here: http://linksku.com/new-post

@ChipzzzOct 29.2010 — #No bother... I enjoyed looking into this and gained an interest in TinyMCE in the process. Probably your best option is to use the bbcode plugin as shown here: http://tinymce.moxiecode.com/examples/example_09.php# . Doing it that way gives you the WYSIWYG code block while you're still in the editor, which seems far better than trying to modify the editor's output after the fact even when it isn't problematic. There's a demo at that URL... take a look, I'm sure you'll be impressed.

P.S. When $content comes up empty, it is likely due to embedded control characters like the ones I was describing earlier. If so, putting this in front of the previous code will take care of it:

[code=php]
 $content='Here is a little content upon which to test this code';
 
 $tempContent = $content;
 $content = '';
 $j = strlen($tempContent);
 $i = 0;
 while ($i < $j) {
 $ch = $tempContent[$i];
 if (ord($ch) > 128) $ch = ' ';
 $content .= $ch;
 $i++;
 }
 [/code]

@narutodude000authorOct 29.2010 — #I looked at and tried the BBCode link. However, tags can still be added between

[code ][/code ]. Do you know of any way to strip all tags within a certain tag in TinyMCE? I created the "codebox" button in TinyMCE by duplicating the "blockquote" button and changing <blockquote></blockquote> to <code></code>. However, tags are still allowed in blockquote.
 
 My form does not use BBCodes. TinyMCE generates all of the HTML tags.

@ChipzzzOct 30.2010 — #Well, it's a little long-winded, but if you combine the P.S. from #15 (slightly enhanced) with the code from #5 that sometimes works, you get this:

[code=php]
 $tempContent = $content;
 $content = '';
 $j = strlen($tempContent);
 $i = 0;
 while ($i < $j) {
 $ch = $tempContent[$i];
 if ((ord($ch) < 32) || (ord($ch) > 127)) $ch = ' ';
 $content .= $ch;
 $i++;
 }
 preg_match_all('/(.*?)?<code>(.*?)?</code>(.*?)?/',$content,$results,PREG_SET_ORDER);
 $newContent='';
 $trailer = FALSE;
 foreach($results as $result){
 $newContent.=$result[1];
 $newContent.='<code>' . preg_replace('/(<.+?>)/', '',$result[2]) . '</code>';
 if (isset($result[3])) $trailer = TRUE;
 }
 if ($trailer) {
 preg_match('/.*</code>(.*)/', $content, $tail);
 }
 $content = $newContent . $tail[1]; 
 [/code]

It first removes all but the printing ascii characters and then does the replacement. As long as you don't try to embed quoted tags in the code, it should cover about any circumstance that may be causing you problems. If you want to be able to use quoted tags in your code blocks, just go with the bbcode and that problem will be solved.

I'll be interested in hearing how it works ?

@narutodude000authorOct 30.2010 — #Hi, ASCII characters are not the problem. I tried submitting a random string of unusual ASCII characters, but there wasn't a problem.

I tried using this script, which is a slightly modified version of your script:

[code=php]if(preg_match('/<code>/',$content)){
 preg_match_all('/(.*?)?<code>(.*?)?</code>(.*?)?/',$content,$results,PREG_SET_ORDER); 
 $newContent=''; 
 $trailer = FALSE; 
 foreach($results as $result){ 
 $newContent.=$result[1]; 
 $newContent.='<code>' . strip_tags($result[2]) . '</code>'; 
 if (isset($result[3])) $trailer = TRUE;
 } 
 if ($trailer) { 
 preg_match('/.*</code>(.*)/', $content, $tail); 
 } 
 $content = $newContent . $tail[1]; 
 }[/code]

I tried submitting various posts with <code>, and I found the following problem: there must be a newline character before and after the <code></code>, or else everything outside the <code> will disappear. However, if there are newline characters before and after it, a few extra <br> will appear before the <code>, creating a large space.

I have no idea what's wrong, do you?

@ChipzzzOct 30.2010 — #Sometimes when you stare at a piece of code for too long, it begins to behave strangely... ?

Try this:

[code=php]
 if(preg_match('/<code>/',$content)){
 $tempContent = $content;
 $content = '';
 $j = strlen($tempContent);
 $i = 0;
 while ($i < $j) {
 $ch = $tempContent[$i];
 if ((ord($ch) < 32) || (ord($ch) > 127)) $ch = ' ';
 $content .= $ch;
 $i++;
 }
 preg_match_all('/(.*?)?<code>(.*?)?</code>(.*?)?/',$content,$results,PREG_SET_ORDER);
 $newContent='';
 $trailer = FALSE;
 foreach($results as $result){
 $newContent.=$result[1];
 $newContent.='<code>' . preg_replace('/(<.+?>)/', '',$result[2]) . '</code>';
 if (isset($result[3])) $trailer = TRUE;
 }
 if ($trailer) {
 preg_match('/.*</code>(.*)/', $content, $tail);
 }
 $content = $newContent . $tail[1]; 
 }
 [/code]

Notice the '<' and '>' in the first line... they're important.

Have fun ?

@narutodude000authorOct 30.2010 — #I tried starting all over and wrote this script:

[code=php]// Code tags in the main content
 if(preg_match('/(<code>)/',$content,$match)){
 $i = $a = count($match)-1;
 
 while($i>0) {
 $content = preg_replace('/<code>(.*?)</code>/','<code'.$i.'>$1</code'.$i.'>',$content,1);
 $i--;
 }
 
 while($a>0){
 preg_match('/<code'.$a.'>(.*?)</code'.$a.'>/',$result);
 $content = preg_replace('/<code'.$a.'>(.*?)</code'.$a.'>/','<code>'.preg_replace('/(<.+?>)/', '',$result[1]).'</code>');
 $a--;
 }
 }[/code]

I don't think the first two lines are working properly because $content isn't affected at all.

This script replaces <code>hi</code><code>bye</code>

with <code2>hi</code2><code1>bye</code1>. Then it strips the tags inside the <code> and preg_replaces the cleaned string back into $content. However, it isn't working properly.

@ChipzzzOct 30.2010 — #That's an interesting way to go about it. I think you want to preg_match_all so that it doesn't stop after the first match.

@narutodude000authorOct 31.2010 — #Edit: here's what I got so far:

[code=php]// Code tags in the main content
 if(preg_match('/<code>/',$content)){
 $i = $a = preg_match_all('/<code>/',$content,$match);
 while($i>0) {
 $content = preg_replace('/<code>/','<code'.$i.'>',$content,1);
 $content = preg_replace('/</code>/','</code'.$i.'>',$content,1);
 $i--;
 }
 while($a>0){
 //debug
 $content = $content.$a;
 preg_match('/<code'.$a.'>(.*?)</code'.$a.'>/',$content,$result);
 echo $result[1];die;
 $content = preg_replace('/<code'.$a.'>(.*?)</code'.$a.'>/','<code>'.preg_replace('/(<.+?>)/', '',$result[1]).'</code>',$content);
 $a--;
 }
 }[/code]

@narutodude000authorOct 31.2010 — #

[code=php]if(preg_match('/<code>/',$content)){
 $i = $a = preg_match_all('/<code>/',$content,$match);
 while($i>0) {
 $content = preg_replace('/<code>/','<code'.$i.'>',$content,1);
 $content = preg_replace('/</code>/','</code'.$i.'>',$content,1);
 $i--;
 }
 while($a>0){
 //debug
 $content = $content.$a;
 //this line works
 preg_match('/(<code'.$a.'>)(.*)(</code'.$a.'>)/',$content,$result);
 //error probably lies in this line
 $content = preg_replace('/(<code'.$a.'>)(.*?)(</code'.$a.'>)/','<code>'.strip_tags($result[2]).'</code>',$content);
 $a--;
 }
 }[/code]

I still do not know whats wrong. Everything in $content disappears.

@ChipzzzOct 31.2010 — #Strange... I gave it this:

This is some text <code> and an <i>italic</i> tag in the code section; </code> and some more text, followed by another <code> code section and an <p> paragraph tag </p> with a <h1> heading</h1> for good measure</code> and some text to w<i>rap</i> it all up.

and it gave me back this:

This is some text <code> and an italic tag in the code section; </code> and some more text, followed by another <code> code section and an paragraph tag with a heading for good measure</code> and some text to w<i>rap</i> it all up.21

which is what it should have, with the exception of the '21' at the end, which is probably something you'll figure out without much difficulty. I still think you'll have to prefix your code with this:

<i>
 </i>$tempContent = $content;
 $content = '';
 $j = strlen($tempContent);
 $i = 0;
 while ($i &lt; $j) {
 $ch = $tempContent[$i];
 if (ord($ch) &gt; 128) $ch = ' ';
 $content .= $ch;
 $i++;
 }

to get rid of anything the editor is adding to the text that you don't know about, though.

Good job so far... and Happy Halloween! ?

@eval_BadCode_Oct 31.2010 — #Have you tried my yo_dawg_i_heard_you_didnt_like_tags function?

That way you can parse while you parse.

@narutodude000authorOct 31.2010 — #@eval: I tried it, it was too complicated.

@chipzzz: It seems that my regex doesn't work with newline characters.

My editor generates this:

[code=html]tes<strong>ting</strong>
 <code>
 <div>te<strong>stin</strong>g</div>
 </code>
 <div><strong>tes</strong>ting</div>
 <code>
 <div>te<strong>stin</strong>g</div>
 </code>[/code]

Which returns:

[code=html]          tes<strong>ting</strong>
 <br><code2>
 <br><div>te<strong>stin</strong>g</div>
 <br></code2>
 <br><div><strong>tes</strong>ting</div>
 <br><code1>[/code]

But if I enter:

[code=html]tes<strong>ting</strong><code><div>te<strong>stin</strong>g</div></code><div><strong>tes</strong>ting</div><code><div>te<strong>stin</strong>g</div></code>[/code]

It returns:
[code=html]tes<strong>ting</strong><code>testing</code><div><strong>tes</strong>ting</div><code>testing</code>[/code]
Which is what I want. I have to modify the regex so that it works with newline characters. I'm not too good with regex, can you help me?

@narutodude000authorOct 31.2010 — #I got it working by adding the modifier "s" at the end of the regex. Thanks everyone for all your help!

@CharlesOct 31.2010 — #I might have written this once or twice before but HTML is way too complicated for regular expressions. You need a parser and DOMDocuement will parse HTML.

Also in #PHP _↴

[RESOLVED] Difference between database storage and ftp storage?foreach loop fails - help Help with a simple function

Success!

Help @narutodude000 spread the word by sharing this article on Twitter...

Tweet This

I need help with this preg_replace script.

27 Comments(s) _↴

Also in #PHP _↴

Success!

Social

Version

I need help with this preg_replace script.

27 Comments(s) ↴

Also in #PHP ↴

Success!

The web is an endless sea of information. Don't miss the boat... Subscribe!

Social

Version

27 Comments(s) _↴

Also in #PHP _↴