when to and not to use RegExp

@scragarNov 05.2007

[RANT]
OK, I think we are all guilty of abusing regular expressions from time to time purely for the sake of writing speed, but I think it’s important to take a few uses I have seen used as examples, and point out the correct method of achieving the desired effects under the given situation.
~~[B]~~/^{^[a-z]*$/}[/B]
yes, there is a faster way of checking if you only use letters a-z, although it may take longer to type:

[code=php](strspn($STR,”abcdefghijklmnopqrstuvwxyz”) == strlen($STR))[/code]

what’s more is that this function becomes faster in relation to the regular expression the larger the list of valid characters becomes:

[code=php](strspn($STR, “abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234556789”) == strlen($STR))[/code]

ranks significantly quicker than the related regular expression.
~~[b]~~/^{^[^{^@]{1,63}@[^{^@]{1,255}$/}}}[/b]
very nice, test the length of 2 strings using a regular expression, never mind that the strlen function was created for this very thing, we can write a regular expression for it so it will all be fine. quite why the writer of this never thought that splitting the string on the @ symbol, then providing 3 checks of length(one on the array, 1 on each half of the result) I do not know, but although it sounds like a longer check it works out to STILL be more efficient.
~~[b]~~preg_replace(“/ />/”, “>”, $str)[/b]
sorry, but this is just silly, to remove the / just use a standard str_replace script.
[/RANT]
There are tons more example of this sort of abuse(being taught as if it should be done) online, but these are the worst offenders I could find(2 of which I only found this morning, congratulations roScripts.

to post a comment

PHP

6 Comments(s) _↴

@MrCoderNov 05.2007 — #Nice rant, but are they really faster and if so by how much?

@scragarauthorNov 05.2007 — #my results for strsn+strlen vs preg_match, each doing 1000 tests on strings increasing in length every time:

STRSPN:
 time taken: 0.021502017974854
 
 PREG:
 time taken: 0.026623010635376

that's 1/5th faster give or take, with a larger difference on long strings, I can provide more info on actual test and such if needed(I have a long page of results if you want them, listing string checked, time taken for each test and such).

@TJ111Nov 05.2007 — #For kicks I profiled two scripts. Here's the results:

[code=php]
 <?php
 $STR = "test";
 print (strspn($STR,"abcdefghijklmnopqrstuvwxyz") == strlen($STR)) ? "success" : "fail";
 
 //this script took 0.029 ms to execute
 ?>[/code]

[code=php]
 <?php
 $str = "test";
 print (preg_match('/^[a-z]*$/', $str)) ? "success" : "fail";
 
 //this script took 0.368 ms to execute
 //thats more than a 10x decrease in performance
 ?>[/code]

I was rather surprised at how big a difference there was between such small scripts. I know the regexp engine was slower, but not by that much. Personally, I really only use regular expressions for validation or for changing complicated strings, otherwise I just use string functions.

@scragarauthorNov 05.2007 — #ind that as the string get's larger preg realy starts to show it's stuff, however it was never able to catch up completely in my tests, it was always slightly behind, even with incredible large strings(in excess of 5,000 characters).

@NogDogNov 05.2007 — #

[code=php]
 if(ctype_alpha($string)) {
 [/code]

...is functionally equivalent to...

[code=php]
 if(preg_match('/^[a-z]+$/i', $string) {
 [/code]

...and is both shorter to type and faster to execute.

@bokehNov 10.2007 — #Regex is faster by miles.

[code=php]<?php
 
 microtime(true); # initialize
 
 $string='abcdefgijklmnopqrstuvwxyzabcdefgijklmnopqrstuvwxyzabcdefgijklmnopqrstuvwxyzabcdefgijklmnopqrstuvwxyzabcdefgijklmnopqrstuvwxyzabcdefgijklmnopqrstuvwxyzabcdefgijklmnopqrstuvwxyzabcdefgijklmnopqrstuvwxyzabcdefgijklmnopqrstuvwxyzabcdefgijklmnopqrstuvwxyz';
 
 $start = microtime(true);
 $regex = '/^[a-z]+$/i';
 for($i=0; $i<1000; $i++)
 {
 (preg_match($regex, $string));
 }
 echo "Regex method: ".(microtime(true)-$start)."<br>n";
 
 $start = microtime(true);
 for($i=0; $i<1000; $i++)
 {
 (strspn($string,"abcdefghijklmnopqrstuvwxyz") == strlen($string)) ;
 }
 echo "Scrager method: ".(microtime(true)-$start)."<br>n";
 
 ?>[/code]

Result:

[CODE]Regex method:   0.003291130065918
 Scrager method: 0.012798070907593[/CODE]

But if you change the string so it has a bad character early in the string Scrager's method doesn't seam quite so lazy:

[code=php]$string='a-bcdefgijklmnopqrstuvwxyzabcdefgijklmnopqrstuvwxyzabcdefgijklmnopqrstuvwxyzabcdefgijklmnopqrstuvwxyzabcdefgijklmnopqrstuvwxyzabcdefgijklmnopqrstuvwxyzabcdefgijklmnopqrstuvwxyzabcdefgijklmnopqrstuvwxyzabcdefgijklmnopqrstuvwxyzabcdefgijklmnopqrstuvwxyz';
 
 [/code]

Result:

[CODE]Regex method:   0.0022270679473877
 Scrager method: 0.0011720657348633[/CODE]

Also in #PHP _↴

Did I use htmlspecialchars() Correctly ?ereg -- Regular expression match How do I make text into a web link

Success!

Help @scragar spread the word by sharing this article on Twitter...

Tweet This

when to and not to use RegExp

6 Comments(s) _↴

Also in #PHP _↴

Success!

Social

Version

when to and not to use RegExp

6 Comments(s) ↴

Also in #PHP ↴

Success!

The web is an endless sea of information. Don't miss the boat... Subscribe!

Social

Version

6 Comments(s) _↴

Also in #PHP _↴