How to exclude parts when using regex.

@DragonkaiDec 06.2007

I’ve looked at lookahead and lookaround and this site: ~~[URL=”http://www.regular-expressions.info/lookaround2.html”]~~http://www.regular-expressions.info/lookaround2.html[/URL] makes some sense, but I don’t see how it help me in this problem.

I got this string:

[code=php]$englishpart = ”

===synonyms====

*blahblahblah [[THIS IS RIGHT]] ijtir. *usdhty [[this is wee right]] ksdhfiudf.

==bkshbdf===”;[/code]

and Im using this regex code:

[code=php]/^=+synonyms=+.**.*[[([^nr])*]].*=+[a-z]*=+$/imsu[/code]

but what Im getting is the ===synonyms==== all the *[[THIS IS RIGHT]] and also the bottom ==bkshbdf

All I want is the words inside the [[ ]] so I want the THIS IS RIGHT and this is wee right. But not the other code surrounding it and also the [[,]].

This has something to do with sub patterns right? How do you use it?

to post a comment

PHP

9 Comments(s) _↴

@bokehDec 06.2007 — #Using preg_match the bit you want should be stored in [I]$matches[1][/I] (if [I]$matches[/I] were the 3rd argument to preg_match).

@andre4s_yDec 06.2007 — #maybe like this :

[code=php]
 if(preg_match("/=+synonyms=+[s|n]*(*w*s*[{2}[w|s]*]{2}s*w*.n*s*)+={2}[a-z]+={2}/mi",$englishpart,$result))
 {
 preg_match_all("/[{2}(.*)]{2}/i",$result[0],$result); 
 //print_r($result);
 for($i=0; $i<count($result[1]);$i++)
 {
 echo $result[1][$i];
 //print THIS IS RIGHT and this is wee right
 }
 }
 [/code]

@DragonkaiauthorDec 06.2007 — #[s|n]* that part of your code, couldn't it be substituted with a . with the dotall flag?

Please can you take me through your steps in your regex code.

hmm for some reason my original regex code shows no backslashes for escaping the [ and the ]. So if it doesn't work you know why.

Thanks for the code anyhow I'll try it.

@andre4s_yDec 06.2007 — #as you wish... ?
[CODE]preg_match("/=+synonyms=+.*(*w*s*[{2}[w|s]*]{2}s*w*.n*s*)+={2}[a-z]+={2}/smi",$englishpart,$result)[/CODE]
There are so many alternatif of the pattern.. I can make it flexible for you...

?

that [ is my style.

I always escape any non [a-zA-Z0-9] characters.

Play it safe..

@DragonkaiauthorDec 06.2007 — #Just want to clarify your code, tell me if I'm going wrong anywhere.

/=+synonyms=+.*(*w*s*[{2}[w|s]*]{2}s*w*.n*s*)+={2}[a-z]+={2}/smi

I understand the =+synonyms=+.*

It's basically saying ====synonyms==== with anything after that including newlines. But why do you need to escape the equal(=) signs? I did not have to before.

Wouldn't this be acceptable: ^{^=+synonyms=+.**.*}

which is saying the synonym part along with anything then with a asterix(*) because all the words I want would be started off with an asterix(*) however there maybe things in between the asterix(*) and the square brackets ([[]]) so which is why I put another dot(.)

But this part I don't quite understand.

(*w*s*[{2}[w|s]*]{2}s*w*.n*s*)+

It starts off with bracket meaning its a sub pattern. Then with a * is that the part that means the asterix(*) part that I had in my previous pattern? And the w meaning words. then s* meaning whitespace. But as I said before those things are meaningless, so can you just have a (.*) for all that so it would be

(*.*[{2}[w|s]*]{2}s*w*.n*s*)+

then here [{2}[w|s]*]{2}s*w*.n*s*)+

Its saying the [ which I understand to be the "[" brackets. The {2} is there for telling us that there are two square [[ brackets right? But wouldn't it just be easier to say [[ That uses 4 characters while the top uses 5 characters.

Thus [{2}[w|s]*]{2}s*w*.n*s*)+

becomes: [[[w|s]*]{2}s*w*.n*s*)+

Then you have [w|s]* which is exactly the same as the [^{^nr]}* correct?

Which then ends with the ]] for the ]{2} then it has s*w*.n*s*)+

which is saying more whitespace and words which can be substituted for a (.*).

Then .n*s*)+ which has a "." a newline and whitespace is this part required because the previous (.*) would have handled it right?

)+={2}[a-z]+={2}/smi

Then you have that, which ends the subpattern, with escaping the equal signs and so on...

So this code ultimately becomes

/^{^=+synonyms=+.**.*[[[^{^nr]*]].*=+[a-z]*=+$/imsu}}

Which is exactly what I said in the beginning, only that the backslashes were missing in my beginning code.

Right?

But the

preg_match_all("/[{2}(.*)]{2}/i",$result[0],$result);

//print_r($result);

for($i=0; $i<count($result[1]);$i++)

{

echo $result[1][$i];

//print THIS IS RIGHT and this is wee right

}

This is the part that actually does the trick right?

preg_match_all("/[{2}(.*)]{2}/i",$result[0],$result);

This part tells us to directly those inside the [[ and the ]] right? From the previous result[0]. I don't quite understand why it should be result[0]. I mean the previous preg_match doesn't it only match once? So does that mean it would match only one of the [[THIS IS RIGHT]] or would it match all of it: meaning it would get [[THIS IS RIGHT]] And also [[this is wee right]].

If it matches both, then you can use the result[0] Sorry I'm still trying to understand the difference between preg_match and preg_match_all.

Ok here we have

for($i=0; $i<count($result[1]);$i++)

{

echo $result[1][$i];

//print THIS IS RIGHT and this is wee right

}

The $i<count($result[1]); "Count" counts the number of elements in an array right? So $result[1] is an array? would $result be the array? Or is the $result[1] have more elements in it, making it a multidimensional array? Where is this result coming from? The preg_match_all before it? If so, why are you using result[1]?

I mean result[1] would be the second element of the matches($result) in the preg_match_all right? And it wouldn't be in the first preg_match. So why are you using [1] wouldn't that skip the first match of the preg_match_all. Are you saying that if you use result[0] it would make an error because it's already using result[0] from the previous preg_match to match it's results from the preg_match_all

Therefore should there be a different name for the matches? like "outerresult" and "inneresult".

echo $result[1][$i];

//print THIS IS RIGHT and this is wee right

Then here it's saying echo the $result[1][i] meaning that $result[1] is an array and your cycling through the array to print out the elements in which it's print THIS IS RIGHT and this is wee right.

Again I still don't understand where did the result[1] come from and how did it become an array?

It you saying it's because of the subpattern? Because I don't quite understand how that works.

Sorry for the long post.

@andre4s_yDec 07.2007 — #step by step...

first, i need to say that, i understand regex, but not all.. so if there are another techniques, please let me know it.. We both learn something new here.. ?

and there are so many alternatives pattern, i can not close my eyes in that..

Just want to clarify your code, tell me if I'm going wrong anywhere.

It's basically saying ====synonyms==== with anything after that including newlines. But why do you need to escape the equal(=) signs? I did not have to before.
[/QUOTE]

I have already mentioned that i always escape non [a-zA-Z0-9] characters. The reason are :

1. there will be no occured problem if i escape it.?

2. to avoid problem misunderstand with meta character. The backslashes make me sure, not confuse, what character i need to add in the pattern.

Wouldn't this be acceptable: ^{^=+synonyms=+.**.*}
[/QUOTE]

yes...

but :

1. why you use meta char ^{^??} is that necessary??

2. you can have :

[CODE]
 =+synonyms=+.*
 [/CODE]

that pattern has already covered all the $englishpart. So is there really necessary to add *.* ?? The answer is not really. The reason that the pattern longer is there's a need to clarify that after the pattern ====synonyms====, the next pattern is *[a-z] and so on..

right?? meta char dot covers all, but we need to sure..

But this part I don't quite understand.

(*w*s*[{2}[w|s]*]{2}s*w*.n*s*)+

It starts off with bracket meaning its a sub pattern. Then with a * is that the part that means the asterix(*) part that I had in my previous pattern? And the w meaning words. then s* meaning whitespace. But as I said before those things are meaningless, so can you just have a (.*) for all that so it would be

(*.*[{2}[w|s]*]{2}s*w*.n*s*)+

then here [{2}[w|s]*]{2}s*w*.n*s*)+

[/QUOTE]

The reason i take the subpattern is i see :

*blahblahblah [[THIS IS RIGHT]] ijtir.

*usdhty [[this is wee right]] ksdhfiudf.

those are 2 lines with the same pattern.

The pattern is :

start with *, end with *, there are two [s in the middle and there are 2]s in the middle to.

Again, we need to clarify that subpattern. So if you need a super simple subpattern, you can write it :

[CODE]
 (*.*[[.*]].*.)+
 [/CODE]

That pattern is enough to get :

*blahblahblah [[THIS IS RIGHT]] ijtir.

*usdhty [[this is wee right]] ksdhfiudf.

There is another usefullness of the subpattern, i will mention it below.

Its saying the [ which I understand to be the "[" brackets. The {2} is there for telling us that there are two square [[ brackets right? But wouldn't it just be easier to say [[ That uses 4 characters while the top uses 5 characters.
[/QUOTE]
Yes, you right... my alternatif is 1 char plus.. :p

Then you have [w|s]* which is exactly the same as the [^{^nr]}* correct?
[/QUOTE]
in this section, i do not really understand the mean of [^{^nr].}

Is that pattern means : all chars except newline and carriage return??

Then .n*s*)+ which has a "." a newline and whitespace is this part required because the previous (.*) would have handled it right?
[/QUOTE]
The question is : do you need to clarify dot in the end of the pattern?? if not, then use meta char . will be alright.

Which is exactly what I said in the beginning, only that the backslashes were missing in my beginning code.

Right?
[/QUOTE]
the only doubt in my mine is only [^{^nr].}

But the

preg_match_all("/[{2}(.*)]{2}/i",$result[0],$result);

//print_r($result);

for($i=0; $i<count($result[1]);$i++)

{

echo $result[1][$i];

//print THIS IS RIGHT and this is wee right

}

This is the part that actually does the trick right?
[/QUOTE]
Right...

The first if only make sure or clarify the whole pattern.

preg_match_all("/[{2}(.*)]{2}/i",$result[0],$result);

This part tells us to directly those inside the [[ and the ]] right?
[/QUOTE]
yosh... ?

again, you can use [[ if you want to... my pattern surplus one char..

From the previous result[0]. I don't quite understand why it should be result[0]. I mean the previous preg_match doesn't it only match once? So does that mean it would match only one of the [[THIS IS RIGHT]] or would it match all of it: meaning it would get [[THIS IS RIGHT]] And also [[this is wee right]].
[/QUOTE]
First :

$result[0] is a place for text that match full pattern of the previous preg_match pattern.

second :

Yes, the previous pattern only match once.

So text that in $result[0] is this:

===synonyms====

*blahblahblah [[THIS IS RIGHT]] ijtir.

*usdhty [[this is wee right]] ksdhfiudf.

==bkshbdf===

Again the function is to clarify.

third:

Remember that $result[1] is the match text of the first subpattern. Because i only have one subpattern, so there will be no $result[2]. And the text that match of the subpattern is the last.

Because of that, $result[1] is contain :

*usdhty [[this is wee right]] ksdhfiudf.

IF... if...

if you doble the subpattern, you will get $result[1] which is contain *blahblahblah [[THIS IS RIGHT]] ijtir. and $result[2] which is contain *usdhty [[this is wee right]] ksdhfiudf.

Here is the usefullness of the subpattern

fourth:

from the php manual :

preg_match() returns the number of times pattern matches. That will be either 0 times (no match) or 1 time because preg_match() will stop searching after the first match. preg_match_all() on the contrary will continue until it reaches the end of subject.
[/QUOTE]
This is why we will need preg_match_all. If we use preg_match, the result just stuck in one pattern. Savvy??

end of preg_match.....

The $i<count($result[1]); "Count" counts the number of elements in an array right? So $result[1] is an array? would $result be the array? Or is the $result[1] have more elements in it, making it a multidimensional array? Where is this result coming from? The preg_match_all before it? If so, why are you using result[1]?

I mean result[1] would be the second element of the matches($result) in the preg_match_all right? And it wouldn't be in the first preg_match. So why are you using [1] wouldn't that skip the first match of the preg_match_all. Are you saying that if you use result[0] it would make an error because it's already using result[0] from the previous preg_match to match it's results from the preg_match_all

Therefore should there be a different name for the matches? like "outerresult" and "inneresult".
[/QUOTE]
This is where the preg_match_all function take part of the "celebration".:p

Let me make clear the work of preg_match_all() function here...

the code is :

preg_match_all("/[{2}(.*)]{2}/i",$result[0],$result);

Why is $result[0] as a subject, i hope you have already understood.

That function will have out the text that match to the pattern in $result.

And $result will contain 2 var. Because we have full pattern and one sub pattern.

$result[0] is the text that match the full pattern.

Because we use preg_match_all, the function is not stop when it meet one match. It will go through until the end of the subject. So, the function will have 2 matches. They make a $result[0] an array which have 2 var inside it..

next

$result[1] is the text that match the subpattern.

Again, because we use preg_match_all, the function will not stop when it meet one match but go through until the end of the subject. So $result[1] is an array too... The first is $result[1][0] contain THIS IS RIGHT and the second is $result[1][1] contain this is wee right.

The big NOTE in here is :

you not change the flag option in the preg_match_all function. The default flag is PREG_PATTERN_ORDER. If you set PREG_SET_ORDER, the story will be different.

The last thing is just processing it with echo.
----------

end of story here...

Phiuh...... hard work... ?

echo $result[1][$i];

//print THIS IS RIGHT and this is wee right

Then here it's saying echo the $result[1][i] meaning that $result[1] is an array and your cycling through the array to print out the elements in which it's print THIS IS RIGHT and this is wee right.

Again I still don't understand where did the result[1] come from and how did it become an array?

It you saying it's because of the subpattern? Because I don't quite understand how that works.

Sorry for the long post.[/QUOTE]

I hope you finally understand here...

Is there any doubts?? ?

peace,

Andre

NB :

Sorry for my bad english, and maybe i can not consistent in replying your post.

@andre4s_yDec 07.2007 — #as you wish... ?
[CODE]preg_match("/=+synonyms=+.*(*w*s*[{2}[w|s]*]{2}s*w*.n*s*)+={2}[a-z]+={2}/smi",$englishpart,$result)[/CODE]
There are so many alternatif of the pattern.. I can make it flexible for you...

?

that [ is my style.

I always escape any non [a-zA-Z0-9] characters.

Play it safe..[/QUOTE]
Sorry....

I want to correct my post before.

= is my style. You can use only =.

The correction is :

[ can not be a style, because [ and [ has a different meaning.

@DragonkaiauthorDec 08.2007 — #Oh...

It's starting to sink in.

But To make it more understandable why don't I change the names for the matches, like the first one will have "result" and the second one will be "finalresult"

How's about that?

Man, that is pretty complicated, I'll have to reread this tomorrow. Thanks for all your work, your the kind of people we need more in the world.

@andre4s_yDec 09.2007 — #
But To make it more understandable why don't I change the names for the matches, like the first one will have "result" and the second one will be "finalresult"

How's about that?
[/QUOTE]
Yeah....

those phrases [result and finalresult] maybe work with you, but not for the other.. ?

But if those help you, then its OK.. ?

Man, that is pretty complicated, I'll have to reread this tomorrow. Thanks for all your work, your the kind of people we need more in the world.
[/QUOTE]
You are welcome... :p

Also in #PHP _↴

Delete from mysql database [RESOLVED] Trouble with required fields in forms/email Change variable data on click

Success!

Help @Dragonkai spread the word by sharing this article on Twitter...

Tweet This

How to exclude parts when using regex.

9 Comments(s) _↴

The last thing is just processing it with echo.
----------

Also in #PHP _↴

Success!

Social

Version

How to exclude parts when using regex.

9 Comments(s) ↴

The last thing is just processing it with echo. ----------

Also in #PHP ↴

Success!

The web is an endless sea of information. Don't miss the boat... Subscribe!

Social

Version

9 Comments(s) _↴

The last thing is just processing it with echo.
----------

Also in #PHP _↴