/    Sign up×
Community /Pin to ProfileBookmark

filter/stripping scraped content

I’m in the process of working with my affiliate to set up a page that scapes his “new items” page and displays them on my site.

I understand how to scrape the contents of that page, but I’m unsure of how to filter the scraped html to only display certain items from the page.

Also, his items page has form fields for each item to purchase. I want to strip away those and only display the title, price and picture.

I’m a bit rusty on php and any help is appreciated.

Here is the content I’m scraping…

[code=php]<html>
<head>
</head>

<body>

<!–*************************–>
<!– Begin Table of Products –>
<!–*************************–>
<table width=”100%” align=”center” border=”0″ cellpadding=”3″>
<TR><!– Start of ROW –>
<td valign=”top” align=”left”>&nbsp;<br><!— rec 992 –>
<a name=”992″></a>

<a href=”http://www.mydomain.net/product992.html”><img src=”http://www.mydomain.net/photos/acoustic/dg1klhbk2a.jpg” hspace=3 vspace=3 align=left></a>
<b><a href=”http://www.mydomain.net/product992.html”>SX DG1K Acoustic Guitar Pack Lefty B Stock</a></b>
<br><b>$39.95</b>
<br>
<form action=”http://www.mydomain.net/cgi-sys/cgiwrap/kurt/sc/order.cgi” method=post>
<input type=hidden name=storeid value=*1404e6bd983741161cc21d2a>
<input type=hidden name=dbname value=products>
<input type=hidden name=function value=add>

<input type=hidden name=itemnum value=992>

Quantity <input type=text size=2 name=”992:qnty” value=”1″ >&nbsp;&nbsp;

<input type=image src=”http://www.mydomain.net/media/themesmedia/tab_blue_button_add.gif” hspace=3 vspace=3 border=0 align=”absbottom” name=”Add to Cart” alt=”Add to Cart”>
</form>

</td>

<td valign=”top” align=”left”>&nbsp;<br><!— rec 980 –>
<a name=”980″></a>

<a href=”http://www.mydomain.net/CG150K34.HTML”><img src=”http://www.mydomain.net/photos/acoustic/cg150k342a.jpg” hspace=3 vspace=3 align=left></a>
<b><a href=”http://www.mydomain.net/CG150K34.HTML”>Valencia CG-150K 3/4 Short Scale Acoustic Pack</a></b>
<br><b>$39.95</b>
<br>
<form action=”http://www.mydomain.net/cgi-sys/cgiwrap/kurt/sc/order.cgi” method=post>

<input type=hidden name=storeid value=*1404e6bd983741161cc21d2a>
<input type=hidden name=dbname value=products>
<input type=hidden name=function value=add>
<input type=hidden name=itemnum value=980>

Quantity <input type=text size=2 name=”980:qnty” value=”1″ >&nbsp;&nbsp;

<input type=image src=”http://www.mydomain.net/media/themesmedia/tab_blue_button_add.gif” hspace=3 vspace=3 border=0 align=”absbottom” name=”Add to Cart” alt=”Add to Cart”>
</form>

</td>
<br>
<td valign=”top” align=”left”>&nbsp;<br><!— rec 981 –>
<a name=”981″></a>

<a href=”http://www.mydomain.net/CG150K34.HTML”><img src=”http://www.mydomain.net/photos/acoustic/cg150k342a.jpg” hspace=3 vspace=3 align=left></a>
<b><a href=”http://www.mydomain.net/CG150K34.HTML”>Valencia CG-150K 3/4 Short Scale Acoustic Pack</a></b>
<br><b>$39.95</b>
<br>
<form action=”http://www.mydomain.net/cgi-sys/cgiwrap/kurt/sc/order.cgi” method=post>

<input type=hidden name=storeid value=*1404e6bd983741161cc21d2a>
<input type=hidden name=dbname value=products>
<input type=hidden name=function value=add>
<input type=hidden name=itemnum value=980>

Quantity <input type=text size=2 name=”981:qnty” value=”1″ >&nbsp;&nbsp;

<input type=image src=”http://www.mydomain.net/media/themesmedia/tab_blue_button_add.gif” hspace=3 vspace=3 border=0 align=”absbottom” name=”Add to Cart” alt=”Add to Cart”>
</form>

</td>
</TR> <!– END OF ROW –>
</table>
</body>
</html>
[/code]

All I want from that are the following lines from each product…

[code=php]
<a href=”http://www.mydomain.net/product992.html”><img src=”http://www.mydomain.net/photos/acoustic/dg1klhbk2a.jpg” hspace=3 vspace=3 align=left></a>
<b><a href=”http://www.mydomain.net/product992.html”>SX DG1K Acoustic Guitar Pack Lefty B Stock</a></b>
<br><b>$39.95</b>
[/code]

to post a comment
PHP

0Be the first to comment 😎

×

Success!

Help @mikeyzc spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 5.18,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @AriseFacilitySolutions09,
tipped: article
amount: 1000 SATS,

tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,
)...