/    Sign up×
Community /Pin to ProfileBookmark

RegExp: Detect (=’) or (=") within any HTML tag?.

I need a regular expression that will peform the following:

If variable “x” contains HTML code and any given tag within the HTML code has an atribute with an = sign NOT followed by a double or single quote, return TRUE.

Basically, I need to trigger an error message for any HTML tags that are formatted like <a href=some_url.htm> vs <a href=”some_url.htm”> or <a href=’some_url.htm’>.

For those who have Regular Expressions mastered, please let me know how this can be accomplished.

Thank you!!

to post a comment
JavaScript

14 Comments(s)

Copy linkTweet thisAlerts:
@mjdamatoMar 07.2007 — Try this:
function checkText(text) {
regex = /.*([&lt;]).*(=[^"|']).*([&gt;]).*/g
if (regex.test(text)) {
alert("there seems to be missing quote marks in the HTML code");
}
}


EDIT: had a mistake, code has been updated
Copy linkTweet thisAlerts:
@mrhooMar 07.2007 — One thing to watch out for-

this is a common attribute in a meta tag in an html document:

content="text/html; charset=utf-8"

watch out you don't break up this attribute on the second '='.
Copy linkTweet thisAlerts:
@gregw74authorMar 07.2007 — mjdamato:

This works great! Thank you!

mrhoo:

I appreciate the heads-up regarding: charset=utf-8". Given that this script will generally be ran against HTML tags that exclude any meta and head tags, I'm thinking I should be ok.

Again, thank you both
Copy linkTweet thisAlerts:
@forty2Mar 08.2007 — /.*([<]).*(=[^"|']).*([>]).*/g this regular expression will check only last tag, because is greedy.
Copy linkTweet thisAlerts:
@gregw74authorMar 09.2007 — Thank you Forty2. This script was to mainly catch all colspan and rowspan tags that don't have their values surrounded by quotes. I made two instances of the function for each scenario. It works like a charm thanks to everyone's assistance!

I 'm using an HTML clean-up script (http://ethilien.net/websoft/wordcleaner/cleaner.htm) that I've been customizing (interface and all), and when it found any colspans or rowspans without the quotes, the attributes would be removed entirely, drastically changing the appearance of the tables. Now, I'm able to have these alerts prompting the user to fix these attributes.

I'm still working on getting a better understanding of RegExp's, it's just taking some time. Hopefully soon I won't have to bug people as much on how to write them :p
Copy linkTweet thisAlerts:
@gregw74authorMar 09.2007 — One last thing, this script I've been customizing is absolutely wonderful for removing excess code, MS Office HTML tags, classes, formatting, etc. The only draw back, is that it's [B]not [/B]case insensitive when it comes to the HTML tags and HTML attributes. If you test the script you'll see what I mean.

If you clean <A href="someurl"> the entire tag is removed in the output.

If you clean <a HREF="someurl"> the entire attribute is removed.

I'm not trying to be greedy here, but if anyone knows how to make this script http://ethilien.net/websoft/wordcleaner/cleaner.htm case insensitive I would greatly appreciate it!!

I had tried by adding /i and ".toLowerCase" in key areas of the script, but all my attempts failed.

Currently, I have a script that converts all HTML tags to lower case prior to the filter running. Now, if there was a script that ran converting all attributes to lowercase, I'd be cool with that too! Maybe the RegExps that have been suggested so far could be tweaked to accomplish this?

Thank you!
Copy linkTweet thisAlerts:
@mrhooMar 09.2007 — This converter doesn't do any heavy lifting-no validation of elements or attributes. It tales the entire text of an html file as a string.

Its main job is formatting whitespace, which you probably don't need,

but it does return the html with elements and attributes in lowercase, and quotes around the attributes.

I'm not going to comment it for you, but you should be able to use the while loop.


[CODE]function convertH(str){
var s= str,tem1='';
var ax= s.search(/<head/i);
if(ax!= -1){
tem1= s.substring(0,ax);
s= s.substring(ax);
}
s= s.replace(/>s*< */g,'>n<');
s= s.replace(/< */?(w+)/g,function(w){return w.toLowerCase()});
s= s.replace(/n< *(head|/?body|div|hd|form|ul|ol|dl|table/html)/g,'nn<$1');
var rX= /<w+[^>]+>/g;
var pat;
var tem= '',temp= '',str= '',x= 0;
while((pat= rX.exec(s)) != null){
tem+= s.substring(x,pat.index);
temp=pat[0];
if(/; *charset *=/i.test(temp)==false){
temp= temp.replace(/= *(" *)?([^ ">]+)( *")?/g,'="$2" ');
}
temp= temp.replace(/ [w-]+ *= */g,function(w){return w.toLowerCase()});
tem+= temp;
x= rX.lastIndex;
}
tem+= s.substring(x);
return tem1+tem ;
}[/CODE]
Copy linkTweet thisAlerts:
@gregw74authorMar 09.2007 — Phenomenal!! This works very very nicely! ? You genius! I cannot thank you enough for this!

This will replace several scripts that I've had to use in order to make up for this limitation, particularly, case insensitivity. All in all, this will make things much more efficient, flexible, and effective.

In addition, this eliminates the alerts I had needed for colspan and rowspan attributes not having their values within quotes, any attribute without quotes for that matter.

I'm just in awe right now....
Copy linkTweet thisAlerts:
@gregw74authorMar 09.2007 — If there is anyway to preserve the spaces within a set of quotes, please let me know. If not, I'll totally understand! What I've got here will still be a huge help! I just know there are some HTML attributes that sometimes contain them. Though I'm not so concerned about the face attribute in font tags, I mainly would like to maintain any paths that may be defined using HREF or SRC. Thanks again.

<a href="some url">

converts to

<a href="some">
Copy linkTweet thisAlerts:
@mrhooMar 10.2007 — temp= temp.replace(/= *(" *)?([^ ">]+)( *")?/g,'="$2" ');

This is the line you need to change to preserve your white space.
Copy linkTweet thisAlerts:
@gregw74authorMar 10.2007 — Should it require some trial and error? I've been changing it in different ways and haven't figured it out yet. I won't hold you back if you have more info LOL ?

Nonetheless, I'll keep plugin' away at it.
Copy linkTweet thisAlerts:
@gregw74authorMar 10.2007 — I have disabled the if statement that contains this code. Having done so seems to preserve the white space as desired, cool. Was this what you were thinking, or were you suggesting that this line of code be included but just altered in some way?
Copy linkTweet thisAlerts:
@gregw74authorMar 10.2007 — Ok, take two:

I re-enabled the if statement and changed the line of code you mentioned to the following:

[CODE]temp= temp.replace(/= *(" *)?([^">]+)( *")?/g,'="$2" ');[/CODE]

Spaces are now retained and if quotes are missing, they are added. Now I think I finally understand what you were suggesting to begin with.

Yes, my skull is a little thicker than average. Learning Regular Expressions is like learning Chinese, even the simplest expressions. I am hoping the tutorial that follows can prove helpful in gaining a better understanding. If there is any book you'd suggest, please let me know.

http://www.regular-expressions.info/tutorial.html
Copy linkTweet thisAlerts:
@mrhooMar 10.2007 — temp= temp.replace(/= *([^"s]+(?=[ >]))/g,'="$1" ');
×

Success!

Help @gregw74 spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 6.17,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @nearjob,
tipped: article
amount: 1000 SATS,

tipper: @meenaratha,
tipped: article
amount: 1000 SATS,

tipper: @meenaratha,
tipped: article
amount: 1000 SATS,
)...