/    Sign up×
Community /Pin to ProfileBookmark

parsing external html…

Hi,

I looked through the threads and didn’t see anything quite what I wanted.
From my PHP page, can I call another page on another server and strip out only three or four lines of text? The text I guess you could say is dynamic but has a “<a name=”text”>” link tag for it. Is this possible or do I need to poat the page that I want to strip/parse from or explain in greater detail?
Thanks…..

to post a comment
PHP

17 Comments(s)

Copy linkTweet thisAlerts:
@bokehmanJun 01.2005 — Hi! This will do it but I am sure I have over complicated it.[code=php]<?php

$html = file_get_contents('http://domain.tld/filename.ext/'); // Put the URL here

preg_match_all('#<a name="text">(.+?)</a>#', $html, $matches);

foreach ($matches[1] as $match) {
print('<p>' . $match . '</p>');
}

?>[/code]


Obviously there should only be one occurance of <a name="text"> in a page but this could also be used to find everything between, for example, <b></b> tags.
Copy linkTweet thisAlerts:
@towerboyauthorJun 01.2005 — Ok, I tried it with my info. I get this error though:

Warning: file_get_contents(http://findu.com/cgi-bin/find.cgi?call=".$call."/): failed to open stream: HTTP request failed! HTTP/1.1 500 Internal Server Error in /var/www/html/plot2.php on line 112

This is the source that I am using:

[code=php]<?
$call = $_POST['input'];
$html = file_get_contents('http://www.findu.com/cgi-bin/find.cgi?call=".$call."/'); // Put the URL here

preg_match_all('#<a name="text">(.+?)</td>#', $html, $matches);

foreach ($matches[1] as $match) {
print('<p>' . $match . '</p>');
}

?> [/code]


I think what I changed is still ok but am not sure.
Copy linkTweet thisAlerts:
@bokehmanJun 01.2005 — You have got your " ' " ' quotes in a mess! And why have you changed the </a> to </td>?Try this: [code=php]<?
$call = $_POST['input'];
$html = file_get_contents('http://www.findu.com/cgi-bin/find.cgi?call=' . $call); // Put the URL here

preg_match_all('#<a name="text">(.+?)</a>#', $html, $matches);

foreach ($matches[1] as $match) {
print('<p>' . $match . '</p>');
}

?>[/code]
Copy linkTweet thisAlerts:
@towerboyauthorJun 01.2005 — I changed </a> to </td> because that is the next closing tag after the text I want. There is no </a> tag. I tried your new code. It doesn't error but it also doesn't show anything. I am not sure this can be done if there is no a </a> tag. Am I right?
Copy linkTweet thisAlerts:
@bokehmanJun 01.2005 — If there is an opening <a> there should be a closing one. If there isn't one the html will not validate. Also post the html here and I will have a look to make it work. Post the url too.
Copy linkTweet thisAlerts:
@towerboyauthorJun 01.2005 — Ok, here is the url:

[URL=http://www.findu.com/cgi-bin/find.cgi?call=ns7r]http://www.findu.com/cgi-bin/find.cgi?call=ns7r[/URL]

Here is the returned html from the CGI script calledin the url:

[code=html]<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd"><HTML>
<HEAD>
<meta http-equiv="expires" content="-1">
<meta http-equiv="pragma" content="no-cache">
<TITLE>NS7R Location</TITLE>
<META HTTP-EQUIV="Refresh" CONTENT="180">
</HEAD>
<BODY alink="#008000" bgcolor="#F5F5DC" link="#0000FF" vlink="#000080">
<center>
</center>
<center><table cellpadding="4" border="1" valign="top">
<tr><td colspan="2" align="center"><h1>findU: Position of NS7R</h1><a name="text">17.5 miles northeast of Central Point, OR
--- Report received 18 minutes 7 seconds ago<br>

Status: <font color="#900000">311840z. <?php echo ("[email protected]"); ?></font> &nbsp;&nbsp;&nbsp;&nbsp; <br>Raw packet: <font color="#900000">NS7R>APU25N,GRNBOX*,WIDE3-2,qAo,N7ZMR:=4237.29N/12249.92W-<?php echo ("towerboy.com"); ?> {UIV32N}</font><br>
</td></tr>
<tr><td valign="top"><!-- Search Google -->
<center>
<form method="get" action="http://www.google.com/custom" target="_top">
<table bgcolor="#ffffff">
<tr><td nowrap="nowrap" valign="top" align="left" height="32">
<a href="http://www.google.com/">

<img src="http://www.google.com/logos/Logo_25wht.gif" border="0" alt="Google" align="middle"></img></a>
<br/>
<input type="text" name="q" size="24" maxlength="255" value=""></input>
</td></tr>
<tr><td valign="top" align="left">
<input type="submit" name="sa" value="Search"></input>
<input type="hidden" name="client" value="pub-4245814686841137"></input>
<input type="hidden" name="forid" value="1"></input>
<input type="hidden" name="channel" value="3555275299"></input>
<input type="hidden" name="ie" value="ISO-8859-1"></input>
<input type="hidden" name="oe" value="ISO-8859-1"></input>
<input type="hidden" name="cof" value="GALT:#008000;GL:1;DIV:#336699;VLC:663399;AH:center;BGC:FFFFFF;LBGC:336699;ALC:0000FF;LC:0000FF;T:000000;GFNT:0000FF;GIMP:0000FF;FORID:1;"></input>
<input type="hidden" name="hl" value="en"></input>
</td></tr></table>
</form>
</center>
<!-- Search Google -->

<h3>findU links for NS7R</h3>
<a href="near.cgi?call=NS7R">- Nearby APRS activity</a>
<br><a href="raw.cgi?call=NS7R">- Raw APRS data</a>
<br><a href="msg.cgi?call=NS7R">- Messages</a>
<br><a href="find.cgi?call=ns7r&units=metric">- Metric units</a>
<br><a href="find.cgi?call=ns7r&units=nautical">- Nautical units</a>
<br><a href="track.cgi?call=NS7R">- Display track</a>
<br><a href="http://mm.aprs.net/cover.cgi?call=NS7R">- APRS Map Manager coverage</a>
<br><a href="find.cgi?call=ns7r&radar=***">- NexRAD Radar</a>

<br><a href="find.cgi?call=ns7r&topo=8">- Topographic map</a>
<br><a href="find.cgi?call=ns7r&terra=4">- Aerial Photo</a>
<br><a href="find.cgi?call=ns7r&relief=1">- photo-relief image</a>
<h3>External links for NS7R</h3>
<a href="http://www.qrz.com/detail/NS7R">- QRZ Lookup</a><br><a href="http://www.mapblast.com/map.aspx?L=USA0409&C=42.62150%2c-122.83200&A=7.16667&P=|42.62150%2c-122.83200|1|NS7R|L1|">- MSN map (North America)</a>
<br><a href="http://www.mapblast.com/map.aspx?L=EUR&C=42.62150%2c-122.83200&A=7.16667&P=|42.62150%2c-122.83200|1|NS7R|L1|">- MSN map (Europe)</a>
<br><a href="http://www.mapblast.com/map.aspx?L=WLD0409&C=42.62150%2c-122.83200&A=7.16667&P=|42.62150%2c-122.83200|1|NS7R|L1|">- MSN map (world)</a>
<br><a href="http://www.topozone.com/map.asp?lat=42.62150&lon=-122.83200&s=100&size=s">- TopoZone</a>

<br><a href="http://terraserver-usa.com/image.aspx?S=10&T=1&X=2568&Y=23593&Z=10&W=1">- TerraServer</a>
<br><a href="http://www.acme.com/mapper/?lat=42.62150&long=-122.83200&scale=10&theme=Image&width=3&height=2&dot=Yes">- ACME Mapper</a>
<br><a href="http://maps.google.com/maps?&ll=42.62150,-122.83200&spn=0.030,0.048">- Google Maps (beta, NA only)</a>
<h3>findU general links</h3>
<a href="http://www.findu.com/new.html">- Latest News</a><br><a href="http://www.findu.com/cgi.html">- Advanced cgi parameters</a><br><a href="emergency.cgi">- Emergency beacons</a><br><a href="errors.cgi">- Packet errors</a><br><a href="http://www.aprs.net/steve.html">- About K4HG</a><br><a href="mailto:[email protected]">- Email K4HG</a><p><center><script type="text/javascript"><!--
google_ad_client = "pub-4245814686841137";
google_ad_width = 160;
google_ad_height = 600;
google_ad_format = "160x600_as";
google_ad_channel ="";
google_ad_type = "text";
google_color_border = "2D5893";
google_color_bg = "99AACC";
google_color_link = "000000";
google_color_url = "000099";
google_color_text = "003366";
//--></script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">

</script><center>
</td><td valign="top"><center>
<table><tr><td>
<h3>Zoom</h3>
<a href="http://www.findu.com/cgi-bin/find.cgi?call=ns7r&degree=0.001">street</a><p><font color="#ff0000">neighborhood</font><p><a href="http://www.findu.com/cgi-bin/find.cgi?call=ns7r&degree=0.3">city</a><p><a href="http://www.findu.com/cgi-bin/find.cgi?call=ns7r&degree=2">state</a><p><a href="http://www.findu.com/cgi-bin/find.cgi?call=ns7r&degree=8">country</a><p><a href="http://www.findu.com/cgi-bin/find.cgi?call=ns7r&noaprsworld=1">(hide)</a><p></td><td><a href="http://mm.aprs.net/map.cgi?map=APRSworld&call=NS7R&range=0.1"><IMG SRC="plot.cgi?call=ns7r&degree=0.04&xsize=450&ysize=300"></a>
</td></tr></table>
<br>Click the map above for a zoomable map.
</td></tr></table>
<center><SMALL>(This page will refresh every three minutes)</SMALL></center>

</BODY>
</HTML>[/code]


I know the whole page might not be needed, but I figured give it once and not be asked again.
Copy linkTweet thisAlerts:
@bokehmanJun 01.2005 — Ok! My script doesn't work for you because the tags should be in pairs, ie opening and closing tags of the same type. Try this:[code=php]<?php

$input = $_POST['input'];

$html = file_get_contents('http://www.findu.com/cgi-bin/find.cgi?call=' . $input);

list($i,$ii) = split('<a name="text">',$html);
list($i,$ii) = split('</td>',$ii);

print $i;

?>[/code]
Copy linkTweet thisAlerts:
@towerboyauthorJun 01.2005 — Cool! One more thing. Is there a way to change the text color? Red on blue is not very good.
Copy linkTweet thisAlerts:
@bokehmanJun 02.2005 — Yeah! [code=php]// Change
print $i;
// to
print str_replace('<font color="#900000">', '', str_replace('</font>', '', $i));[/code]
Copy linkTweet thisAlerts:
@towerboyauthorJun 02.2005 — Great! I am curious why I can't just insert the size attribute into the font tag. Like this:

[code=php]print str_replace('<font size="-1" color="#900000">', '', str_replace('</font>', '', $i));[/code]
I also was wondering if the <br> tags can be striped from the source and iserted in different places in the target page.
Copy linkTweet thisAlerts:
@bokehmanJun 02.2005 — I also was wondering if the <br> tags can be striped from the source and iserted in different places in the target page.[/QUOTE]

[code=php]$string = 'source';
$remove = 'item to remove';
$replacement = 'the replacement';

str_replace($remove, $replacement,$string );[/code]
Copy linkTweet thisAlerts:
@towerboyauthorJun 02.2005 — Cool. Thanks for all your help. I would post the link to what I am working on here, but unless anyone was interested in amateur radio and automatic position reporting systems, it would be dull and pointless.

Thanks again.
Copy linkTweet thisAlerts:
@bokehmanJun 02.2005 — I've got an 'M' and an 'E' license
Copy linkTweet thisAlerts:
@towerboyauthorJun 02.2005 — I've got an 'M' and an 'E' license[/QUOTE]
Do you mean 'Extra' by 'E'?

What is 'M'?
Copy linkTweet thisAlerts:
@bokehmanJun 02.2005 — Do you mean 'Extra' by 'E'?

What is 'M'?[/QUOTE]
I am talking about the first character of the the callsign.
Copy linkTweet thisAlerts:
@SpectreReturnsJun 03.2005 — You have got your " ' " ' quotes in a mess! And why have you changed the </a> to </td>?Try this: [code=php]<?
$call = $_POST['input'];
$html = file_get_contents('http://www.findu.com/cgi-bin/find.cgi?call=' . $call); // Put the URL here

preg_match_all('#<a name="text">(.+?)</a>#', $html, $matches);

foreach ($matches[1] as $match) {
print('<p>' . $match . '</p>');
}

?>[/code]
[/QUOTE]


You shouldn't use lazy regex. It requires backtracking, and thus is not as effiecient as it could be.

Try:

[code=php]
preg_match_all('#<a name="text">(.+?)</a>#', $html, $matches);

// replace with

preg_match_all('#<a name="text">([^>]+)</a>#', $html, $matches); // now instead of being lazy it doesnt match >s.
[/code]
Copy linkTweet thisAlerts:
@towerboyauthorJun 03.2005 — SpectreReturns,

What does the second code do differently? I do have the parsing working great. Check out [URL=http://www.towerboy.com/find.php?call=ns7r]http://www.towerboy.com/find.php?call=ns7r[/URL].

The text above the maps is parsed form another webpage and inserted there.

In case anyone is going crazy about what "ns7r" is, it is my amateur radio callsign. The position on the map is my home station. I'll shut up now. Very boring I know.... ?
×

Success!

Help @towerboy spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 5.6,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @AriseFacilitySolutions09,
tipped: article
amount: 1000 SATS,

tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,
)...