/    Sign up×
Community /Pin to ProfileBookmark

Interesting RegExp dilemma

I have this kind of string, with a lot of parenthesis which might nest other parathesis, and so on:

[code]
var str=’Four score (and seven (years) ago) our fathers brought (forth onto) this continent’;
[/code]

Now I want to use a regular expression to remove all the parenthesis along with the nested substrings. The text above should become:

[code]
var finalstr=’Four score our fathers brought this continent’;
[/code]

First thing which came up into mind was a non-greedy RegExp to match everything between paranthesis:

[code]
/(.*?)/g
[/code]

But that works only if there are no other inner parenthesis [I]nested inside the parenthesis[/I], so that:

[code]
str=str.replace(/(.*?)/g,”);
[/code]

returns
‘Four score [COLOR=”Red”]ago)[/COLOR] our fathers brought this continent’

which is logical, as the replaced substrings are:

‘Four score [COLOR=”Red”]([/COLOR][COLOR=”Blue”] and seven (years[/COLOR][COLOR=”Red”])[/COLOR] ago) our fathers brought[COLOR=”Red”] ([/COLOR][COLOR=”Blue”]forth onto[/COLOR][COLOR=”Red”])[/COLOR] this continent’

but this is not what I want.

I can not see how could I replace the “most inner”[COLOR=”Blue”] (.*?)[/COLOR] first, and repeat the process, “climbing” to the next “parent delimiters”.

Any ideas?

to post a comment
JavaScript

16 Comments(s)

Copy linkTweet thisAlerts:
@KorauthorJul 23.2010 — I think I found a way. I am not sure it is the best possible solve, but it works, as far as I have seen:
<i>
</i>&lt;script type="text/javascript"&gt;
var str='Four score (and seven (years) ago) our fathers brought (forth onto) this continent';
while(str.match(/(.*?)/g)){
str=str.replace(/(.[^(]*?)/g,'');
}
&lt;/script&gt;


Sees anyone a better approach?
Copy linkTweet thisAlerts:
@Declan1991Jul 23.2010 — I remember having this exact same problem with tags. So if I nested tags with the same tag name (i.e. div), I couldn't separate on closing tag from another.

The other problem I had was if you remove the outer most parentheses, (or inner), you have to do a separate replace for each level of parentheses. My solution was normally something like this.str = str.replace(/([^()]*)/g,"");But as I say, you need to repeat that on the string for each layer of parentheses. So this (excluding the fact that there are too many spaces left) would be my solution.var str='Four score (and seven (years) ago) our fathers brought (forth onto) this

continent';
var reg = /([^()]*)/g;
while (reg.test(str)) {
str = str.replace(reg,"");
}
I feel there has to be a better, more elegant way, though. Perhaps not using regular expressions but iterating over the string and deleting characters would be better, an optomisied version of this in a function for example.var str='Four score (and seven (years) ago) our fathers brought (forth onto) this continent';
var par = 0, retstring = "";
for (var i = 0; i &lt; str.length; i++) {
if (str.charAt(i) == '(') {
par++;
}
if (!par) {
retstring+=str.charAt(i);
}
if (str.charAt(i) == ')') {
par --;
}
}


EDIT: You came to the same conclusion while I was writing.
Copy linkTweet thisAlerts:
@KorauthorJul 23.2010 — Yes, I came to the same conclusion. Thank you very much for your answer. I also think that sometimes JavaScript classical string methods could run faster than RegExp. But RegExp are soooo nice ?
Copy linkTweet thisAlerts:
@Declan1991Jul 23.2010 — I'm a sucker too. I get so sick of fiddling with strings (or character arrays finishing in '' if you insist :p) in C sometimes, it's such a relief to come to JavaScript and be able to pass functions to String.replace!
Copy linkTweet thisAlerts:
@Ay__351_eJul 23.2010 — <i>
</i>&lt;script type="text/javascript"&gt;

var str='Four score (and seven (years) ago) our fathers brought (forth onto) this continent';

var s = str.replace(/(.*?(?:(.*?).*?)?).*?/g,"");

alert(s); // Four score our fathers brought this continent

var m = str.match(/(.*?(?:(.*?).*?)?).*?/g);

alert(m[0]); // (and seven (years) ago)

alert(m[1]); // (forth onto)

&lt;/script&gt;
Copy linkTweet thisAlerts:
@rnd_meJul 23.2010 — [CODE]


function findBetween(A, B, str, charLimit) {
var opens = 0, closes = 0;
charLimit=charLimit||6000;
for (var i = 0; i < charLimit; i++) {
var cc = str.charAt(i);
if (cc === A) {
opens++;
}
if (cc === B) {
closes++;
}
if (opens && closes && opens <= closes) {
return [str.indexOf(A) - 1, i + 1];
}
if (!cc) {
return 0;
}
}
return 0;
}


function censor(str){
var last="";
while(last!=str){
var r=findBetween("(", ")", str);
last=str;
str=str.slice(0,r[0])+str.slice(r[1]);
}
return str
}



var str='Four score (and seven (years) ago) our fathers brought (forth onto) this continent';
var finalstr='Four score our fathers brought this continent';

alert(censor(str) === finalstr) //==true[/CODE]
Copy linkTweet thisAlerts:
@Declan1991Jul 23.2010 — Ay&#351;e's code is limited to a set number of parentheses deep. If you nest more than two it won't work properly.
Copy linkTweet thisAlerts:
@mrhooJul 24.2010 — You may have already thought of this-

its based on the 'inside- out' idea you suggested at the start of the thread.

//
[CODE]function stripParenths(s){
var pat=/[)(]/g, rx=/s*([^)(]*)s*/;
while(pat.test(s)){
s= s.split(rx).join(' ');
}
return s.replace(pat,'');
}[/CODE]

//test
[CODE]var str= 'Four score (and (seven (years)) ago) our fathers brought (forth unto) this continent';
stripParenths(str)[/CODE]


/* returned value: (String)

Four score our fathers brought this continent

*
/
Copy linkTweet thisAlerts:
@Ay__351_eJul 25.2010 —  <br/>
&lt;script type="text/javascript"&gt;

var s='Four score (and seven (years) ago) our fathers brought (forth onto) this continent';

var re = /(^[^)(]*(?=())|()[^)(]*()|()[^)(]*$)/g;

var m = s.match(re).join("").replace(/[)(]/g,"");

alert(m); // Four score our fathers brought this continent

&lt;/script&gt;

Copy linkTweet thisAlerts:
@Ay__351_eJul 30.2010 — My codes I wrote was not very well. So I wrote a new code. I think this code is not limited.

<br/>
&lt;script type="text/javascript"&gt;

var s='Four (two(score (and) seven (years) ago our) fathers) brought (forth onto) this continent';

s = s.replace(/([^)(]*)/g,"").replace(/[^)(]*)/g,")").replace(/([^)(]*/g,"(").replace(/[)(]*/g,"");

alert(s); // Four brought this continent

&lt;/script&gt;
Copy linkTweet thisAlerts:
@Ay__351_eJul 31.2010 — The codes I wrote without using loop did not give the desired result.

In order to reach the desired result, loop must be used. The problem was resolved for me.
Copy linkTweet thisAlerts:
@JonaJul 31.2010 — I'm sure a RegExp can achieve literally anything. I only fiddled for a couple minutes, but this is a starting place, I believe. Of course, it assumes that you're always dealing with words and parentheses; I haven't tested it with a string other than the one shown below.

One caveat: words that have trailing letters without a space or non-word character get truncated. For example, the string "o(rang)e" would turn into "o" instead of "oe." Adding a space so that it becomes "o(rang) e" produces the expected result, though.

[b]EDIT:[/b] Fixed the aforementioned caveat. The following RegExp should work. Let me know. ?

<i>
</i>var s = 'Four score (and (seven (ye(a)rs)) ago) our fathers brought (forth unto) (this (is) (a (nother) )) continent (blah) ble(e) o(rang)e asdf asdf asdf asdf () asdf (hello) world (w)o((lds))a';

var x = /(([^)]*)[)]*)*(?:w*))*/g;

var r = s.replace(x,'');

console.log(r);
Copy linkTweet thisAlerts:
@Ay__351_eAug 01.2010 — Jona,

Thank you for your code.

I tried your code. And It is working.
[CODE]
<script type="text/javascript">


var s = 'Four score (and (seven (ye(a)rs)) ago) our fathers brought (forth unto) (this (is) (a (nother) )) continent (blah) ble(e) o(rang)e asdf asdf asdf asdf () asdf (hello) world (w)o((lds))a';

var x = /(([^)]*)[)]*)*(?:w*))*/g;

var r = s.replace(x,'');

alert(r); // Four score our fathers brought continent ble oe asdf asdf asdf asdf asdf world oa


</script>
[/CODE]

I wrote a new code. It worked.
[CODE]
<script type="text/javascript">
// http://www.webdeveloper.com/forum/showthread.php?t=233252


var s = 'Four(score one (and (seven eaea (ye(a) rs aaabcd)) vvvvv ago) our fathers) brought (forth unto) (this (is) (a (nother) )) continent (blah) ble(e) o(rang)e asdf asdf asdf asdf () asdf (hello) world (w)o((lds))a';

var n =/([^(]*))/g;

var b = s.match(n);
alert(b);

var c = /([^)]*/g;

var d = s.match(c);

alert(d);

var f = /(([^)]*)([^(]*))/g;

var h = s.match(f);

alert(h);

var s = s.replace(f,"");

alert(s); // Four brought continent ble oe asdf asdf asdf asdf asdf world oa

</script>
[/CODE]

final code:
[CODE]
<script type="text/javascript">
// http://www.webdeveloper.com/forum/showthread.php?t=233252


var s = 'Four(score one (and (seven eaea (ye(a) rs aaabcd)) vvvvv ago) our fathers) brought (forth unto) (this (is) (a (nother) )) continent (blah) ble(e) o(rang)e asdf asdf asdf asdf () asdf (hello) world (w)o((lds))a';

var s = s.replace(/(([^)]*)([^(]*))/g,"");

alert(s); // Four brought continent ble oe asdf asdf asdf asdf asdf world oa

</script>
[/CODE]
Copy linkTweet thisAlerts:
@KorauthorAug 01.2010 — Thank you all for the various ideas. Thanks to a Forum we all can keep learning new approaches. ?
Copy linkTweet thisAlerts:
@Declan1991Aug 01.2010 — What's interesting too is the complexity of the final solution. The working RegExp solutions started off really complex, with lookahead and limitations, while the final solution is much more simple! Almost an example of Occam's Razor.
Copy linkTweet thisAlerts:
@JonaAug 01.2010 — Thank you all for the various ideas. Thanks to a Forum we all can keep learning new approaches. ?[/QUOTE]

I like a Regexp challenge. ;-)

Besides, I don't think there's every a time when a RegExp [i]and[/i] a loop should be used. It's generally faster to use other operations in a loop, or use a RegExp, but not both. It would truly be a shame for your application to have a performance bottleneck on a low-level operation such as a string replace.
×

Success!

Help @Kor spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 5.3,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,

tipper: @Samric24,
tipped: article
amount: 1000 SATS,
)...