/    Sign up×
Community /Pin to ProfileBookmark

Regex help— match, ignore until, match

I’m pretty new to regex matching, but have been pretty successful using it to identify and eliminate a lot of SPAM from an email address I’d like to keep, but has unfortunately gotten out to the SPAM-A-LOT universe. For years I’d look at the headers, find the original “from” IP address, figure out the range of IPs I needed to screen (for example, the whole range associated with some Vietnam server if I have not friends there). I would then create a regex string to match. For example, If I found something like this in the header…

[CODE]from [[COLOR=#444444]171.232.66.71] blah blah blag…[/CODE]

and the range of IPs was

[/COLOR]

[CODE]171.224.0.0 – 171.255.255.255[/CODE]

then I might catch it with a regex string like…

[CODE]from [171.(?:22[4-9]|2[3-5][0-9]).[/CODE]

Of course that’s not really complete, but it worked, and when a message fails I actually return a failure message offering a link to a mail form, in case some friendly email got mistakenly tagged.

So this was working for years, but something has recently changed in the headers I see in much of my spam. Now when i look at the headers I might see a similar IP address in one of the headers that looks something like this…

[CODE]from [email protected] ([[COLOR=#444444]171.232.66.71]) blah blah blag…[/CODE]

[/COLOR]

Apparently its some kind of authentication where the originating email is included in the FROM string. Well, I can easily alter my regex to handle either the “([” or the “[” case. I guess there are many ways but I could precede my IP address criteria with something like

[CODE](?:[|([)[/CODE]

That would handle either “[” or “(” before the IP address. BUT, what I’d rather do is start by matching the literal “from”, and then ignore any number of characters until either the ‘[‘ or the ‘([‘ is found. That way, any header with an email authentication in the from field could still be “caught” regardless of what the email address is.

So the question is, how do you “IGNORE UNTIL” as my post title suggests…
MATCH (literal “from”)
IGNORE (any email address following) UNTIL (either “[” or “([“)

Its the “ignore until” operation that’s tripping me up.

to post a comment
Full-stack Developer

3 Comments(s)

Copy linkTweet thisAlerts:
@NogDogSep 01.2016 — How about:
<i>
</i>from [^[]*[12.34.56.78]

?

(I didn't do the IP-range stuff, just the idea that you could match on 0-n occurrences of anything that is not a left bracket before the IP.)
Copy linkTweet thisAlerts:
@PeterPan_321authorSep 01.2016 — How about:
<i>
</i>from [^[]*[12.34.56.78]

?

(I didn't do the IP-range stuff, just the idea that you could match on 0-n occurrences of anything that is not a left bracket before the IP.)[/QUOTE]


Thanks! My regex tester proves it does work, but I'm really having trouble understanding HOW this works... I thought "^", was an anchor that matches the first position where the next character , in this case "[" (escaped) would be found? But it in this case it seems like the "^" must mean "IS NOT", because the "*" means match the PREVIOUS element any number of times, right? Regex is really confusing!
Copy linkTweet thisAlerts:
@NogDogSep 01.2016 — Within a character class, which is what unescaped square brackets define, starting it with a "^" is a negation. So in this case, it matches anything that is not a "[". As to why they also use "^" outside of a character class to be the "beginning of the string" assertion, I have no idea. ?
×

Success!

Help @PeterPan_321 spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 5.4,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,

tipper: @Samric24,
tipped: article
amount: 1000 SATS,
)...