/    Sign up×
Community /Pin to ProfileBookmark

Simple regex needed to parse html elements

Hi, my grasp of regex is a little tenuous since i learnt it years back, please help me with identifying html form input and textarea tags

for instance,

i need a php regex that can parse this html block :

[CODE]
<tr>
<td>
<input type=”submit” name=”submit” value=”submit” />
</td>
<td>
<input type=”hidden” name=”action” value=”process_step2″>
</td>
<td width=”306″>
<p><input type=”text” name=”url” value=”http://” size=”25″></p>
</td>
</tr>
[/CODE]

and just return all the 3 matches as

[CODE]
<input type=”submit” name=”submit” value=”submit” />
<input type=”hidden” name=”action” value=”process_step2″>
<input type=”text” name=”url” value=”http://” size=”25″>
[/CODE]

meaning i dont mind what type= attribute the input is. As long as its an input tag, it has to be matched, and some might have xhtml like “[space]/>” instead of “>” as the closing tag

and please help me get another regex for extracting from this html block:

[CODE]
<table><tr><td><p>
<textarea name=”testing”>
content
</textarea>
</p>
</td></tr></table></div>
[/CODE]

that just returns the following for every match

[CODE]
<textarea name=”testing”>
content
</textarea>
[/CODE]

thanks soo much.

to post a comment
PHP

3 Comments(s)

Copy linkTweet thisAlerts:
@ViseExcizerauthorJan 02.2010 — I've figured it out pretty much,

to get input tags :

[CODE]
'/<input.+?type=".+?".+?[{/?}|{s?}]?>/i'
[/CODE]

and for text area
[CODE]
'/<textarea.+?[{/?}|{s?}]?>.*?</textarea>/i'
[/CODE]
Copy linkTweet thisAlerts:
@NogDogJan 02.2010 — For the input tags, it might be simpler to do:
<i>
</i>'#&lt;input[^&gt;]*?&gt;#i'

Similarly, textarea could be:
<i>
</i>'#&lt;textarea[^&gt;]*?&gt;.*?&lt;/textarea&gt;#i'

Another alternative would be to use the [url=http://php.net/DOM]DOM extension[/url], using the DOMDocument::getElementsByTagName() method.
Copy linkTweet thisAlerts:
@Richard_WilliamJan 11.2010 — Here is yet another approach.


[CODE]var str html
cat "http://www.xxx.com/yyy.html" > $html
while ( { sen -r -c "^<input&/>^" $html } > 0 )
stex -r -c "^<input&/>^" $html[/CODE]



This is in biterscripting. So, the regex is [B]"^<input&/>^". [/B]To get the textarea in your second example, use the regular expression [B]"^<textarea&</textarea&>^".[/B] Ampersand & means any number of any characters. The syntax of the regex is explained at http://www.biterscripting.com/helppages/RE.html .
×

Success!

Help @ViseExcizer spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 6.17,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @nearjob,
tipped: article
amount: 1000 SATS,

tipper: @meenaratha,
tipped: article
amount: 1000 SATS,

tipper: @meenaratha,
tipped: article
amount: 1000 SATS,
)...