/    Sign up×
Community /Pin to ProfileBookmark

preg_match of string with embedded newlines

Fairly simple problem that I’ve been scratching my head with regex patters for too long.

Problem: Determine if a message is a forwarded message (Subject: FT.*). If so, strip off everthing above (and including) that line and return the rest.

Constraints: Must work for both Windoz and Unix newlines (‘rn’ and ‘n’ respectively).

I Have:

[CODE]
$fwdPat = ‘/.*subject:[ ]*fw.*[\r\n](.*)/msi’;

if( preg_match($fwdPat, $msgBuf, $results) ) {
$msgBuf = $results[1];
}
return($msgBuf);

[/CODE]

I get the match, but everything is returned in the pattern match ($result[0]) and nothing in the substring match ($result[1]).

I’m sure it’s simple, I just can’t see it.

Thanks in advance for any pointers.

tony

to post a comment
PHP

6 Comments(s)

Copy linkTweet thisAlerts:
@bokehSep 29.2006 — This is being caused due to improper control over greediness. For more help show me the text of the first few lines of the email.

Constraints: Must work for both Windoz and Unix newlines ('rn' and 'n' respectively).[/QUOTE]<CRLF> is the standard as per the RFC, irrespective of the platform.
Copy linkTweet thisAlerts:
@bokehSep 29.2006 — Probably something like this:[code=php]$fwdPat = '/^.*subject:s*fw[^rn]*r?n(.*)$/msi';[/code]
Copy linkTweet thisAlerts:
@tbirnsethauthorSep 29.2006 — Assume the greediness is the use of both 'm' and 's' modifiers....

Here's part of the message. Note: can't rely on the ---Original Message--- line. This particulary message is from a Windoz environment, hence there are both 'r' and 'n' as EOL character.

From: Orders [[email protected]]

Sent: Thursday, September 28, 2006 9:30 PM

To: Tony Birnseth

Subject: FW: Order 22619 from catalog smallelectrics



-----Original Message-----

From: Patty Person (through Yahoo! Store Order System) [mailto:[email protected]]

Sent: Thursday, September 28, 2006 2:30 PM

To: [email][email protected][/email]

Subject: Order 22619 from catalog smallelectrics

Date Thu Sep 28 14:29:31 PDT 2006
Copy linkTweet thisAlerts:
@NogDogSep 29.2006 — Also note that since you are single-quoting your regex, you don't need to escape the back-slashes before your r and n escape sequences. (Bokeh correctly changed them, I just wanted to explicitly point it out in case you didn't catch that.)
Copy linkTweet thisAlerts:
@tbirnsethauthorSep 29.2006 — Pattern works, I hadn't thought of negating the 'r' and 'n' character and then explicitly matching them at the end.

Thanks for the help. Now my head will stop bleeding!

tony
Copy linkTweet thisAlerts:
@bokehSep 29.2006 — The pattern in post #3 seems to work.Assume the greediness is the use of both 'm' and 's' modifiers.[/QUOTE]"[I]*[/I]" is a greedy quantifier by default whereas "[I]*?[/I]" is lazy. This means they match as much (greedy) or as little (lazy) as possible. If there is only one match both methods will find that one match, but if there is more than one match each method will find a different match.

you don't need to escape the back-slashes before your r and n escape sequences.[/QUOTE]That's right but in this instance the end result is the same:[code=php]<?php

echo '\r\n'; # rn

echo '<br>';

echo 'rn'; # rn

?>[/code]
×

Success!

Help @tbirnseth spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 5.2,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,

tipper: @Samric24,
tipped: article
amount: 1000 SATS,
)...