/    Sign up×
Community /Pin to ProfileBookmark

I am in the process of writing a script to grab the HTML source code of a page when the address is passed to it.

I figured I should use the include function to grab the page but how to set the function to contain a variable, or how to get it to print the variables as text, or even better, in a text box, I wasn’t 100% sure.

Any ideas/suggestions on what I could do, what functions to use would be more than welcome.

I thought htmlspecialchars() would be the right one to use but other than that Im not sure how to put it all together…

to post a comment
PHP

22 Comments(s)

Copy linkTweet thisAlerts:
@shimonJun 28.2004 — I think [URL=http://www.php.net/fopen]fopen()[/URL] is probably the one you want.

If your server is configured to allow it, you would be able to fopen() a remote file, fread() it into a variable and then fclose() it again.
Copy linkTweet thisAlerts:
@GavinPearceauthorJun 28.2004 — [code=php]
<?php
$handle = fopen("http://www.example.com/", "rb");
$contents = '';
while (!feof($handle)) {
$contents .= fread($handle, 8192);
}
fclose($handle);

$new = htmlspecialchars($handle);
echo $new;
?>
[/code]


Why doesn't that work?

[i]PHP Warning: htmlspecialchars() expects parameter 1 to be string, resource given in /home/gavin/public_html/nochance.php on line 9[/i]

Also thinking about it, it would be nice to give it a colourised effect. Im guessin the one built into PHP wont colourise the HTML...

I'm getting way out of my depth here! Does anybody know of a script that does sort of what Im after that I can modify around a bit?
Copy linkTweet thisAlerts:
@ShrineDesignsJun 28.2004 — because the variable type of $handle is a resource not a string, try this:[code=php]<?php
$handle = fopen("http://www.example.com/", "rb");
$contents = '';

while(!feof($handle))
{
$contents .= fread($handle, 8192);
}
fclose($handle);

echo htmlspecialchars($content);
?>[/code]
Copy linkTweet thisAlerts:
@MstrBobJun 28.2004 — Ech, I hate the syntax hilighter for PHP 4. Apparently, PHP 5 uses valid <span> with CSS, but since I'm running PHP 4x, I had to use some str_replace() to make it valid XHTML. A little off topic, I know, but I felt it necessary. :p In terms of syntax highlighting HTML, that sounds like a good idea, I should try doing that...
Copy linkTweet thisAlerts:
@pyroJun 29.2004 — Take a look at [URL=http://www.php.net/manual/en/function.file-get-contents.php]file_get_contents()[/URL].
Copy linkTweet thisAlerts:
@MstrBobJun 29.2004 — Actually, wouldn't highlight_file() and show_source() do it all for you? Get's file contents, inserts into string, replaces HTML chars, highlights PHP syntax. If you don't want PHP to be revealed (sensitive information) you could call it through http://

show_source('http://www.yoursite.com/yourpage.php');
Copy linkTweet thisAlerts:
@GavinPearceauthorJun 29.2004 — Except I'm after highlighting the HTML not the PHP and the files are remote.

I've seen it done but im guessing it's not a function built into PHP.

Ideas?

Cheers all so far!

[i]Edit : -[/i]

Also this script when run just returns a space.
[code=php]
<?php
$handle = fopen("http://www.example.com/", "rb");
$contents = '';

while(!feof($handle))
{
$contents .= fread($handle, 8192);
}
fclose($handle);

echo htmlspecialchars($content);
?>
[/code]
Copy linkTweet thisAlerts:
@pyroJun 29.2004 — [code=php]<?PHP
$source = file_get_contents('http://www.w3.org/');
echo $source;
?>[/code]
Copy linkTweet thisAlerts:
@GavinPearceauthorJun 29.2004 — Ah yes, that works, I added thing is though, is there also any way to get it to keep the same formatting as in the source? It just bungs it all together.

I changed it to this to allow you to read the code, rather than the browser decode it.

[code=php]
<?PHP
$source = file_get_contents('http://www.w3.org');
echo htmlspecialchars($source);

?>
[/code]


Example of output at: http://gavinpearce.co.uk/nochance2.php

Im guessing a new line will have to be converted to a <br /> ? Where abouts would I fit that in?
Copy linkTweet thisAlerts:
@pyroJun 29.2004 — How about <pre> ?
Copy linkTweet thisAlerts:
@GavinPearceauthorJun 29.2004 — Lol yes I guess but its not my favortie tag. I would rather the text be formatted properly, I don't no why, just me being fussy I guess.

Anyway I managed to totally forget about nl2br(); :rolleyes: so i put that into the script and now have:
[code=php]
<?PHP
$source = file_get_contents('http://www.w3.org');
$readit = htmlspecialchars($source);

echo nl2br($readit);
?>
[/code]


Now to start working out how to colour code it all....
Copy linkTweet thisAlerts:
@shimonJun 29.2004 — Now to start working out how to colour code it all....[/quote]Surely the various PHP syntax-highlighting functions highlight HTML regardless of whether there's any PHP in there?

Lemme see:

[code=php]<div><a href="http://www.google.com">Test Link</a></div>[/code]

*clicks 'preview reply'*


yup...indeedy. I believe that's done with highlight_string() and I imagine the engine for highlight_file() works just the same, but I could be wrong.
Copy linkTweet thisAlerts:
@GavinPearceauthorJun 29.2004 — It doesnt work still.

Im using
[code=php]
<?PHP
$source = file_get_contents('http://www.w3.org');
echo highlight_string($source);
?>
[/code]


and it returns this: http://gavinpearce.co.uk/color.php

and it also seems to add a '1' at the bottom of the page?
Copy linkTweet thisAlerts:
@shimonJun 29.2004 — Hmm...darn...i just kinda assumed it would work.

The 1 is just the return value of highlight_string() - the function prints automatically so you can remove the echo() completely.

Now how the blazes do these forums do the highlighting then?
Copy linkTweet thisAlerts:
@MstrBobJun 29.2004 — By placing <?PHP at the beginning, the highlight functions will treat the resulting HTML as PHP and highlight, it just so happens that it works out. However, I've no idea how to remove the beginning <?PHP so that it doesn't show, but that's probably how this forum does it.
Copy linkTweet thisAlerts:
@zprogJun 29.2004 — I made a minor script that displays the source code of a file it is called from:

[code=php]
// returns a string that can be usually used to display the source of the calling script
// NOTE: Does not print the code, just returns it.
function source_code()
{
$url = $_SERVER['PHP_SELF'];
$loops = times_string_appears($_SERVER['PHP_SELF'], "/") - 1;
for($j = 0; $j ‹ $loops; $j++)
$url = "../" . $url;
$contents = file_get_contents($url); // raw contents
$contents = str_replace("‹", "‹", $contents); // replace the ‹
$contents = str_replace("›", "›", $contents); // replace the ›
$contents = str_replace("t", " ", $contents); // replace the tab character
$contents = nl2br($contents) . "‹br /›"; // make new lines ‹br›'s
return $contents;
}
[/code]
Copy linkTweet thisAlerts:
@MstrBobJun 29.2004 — Why must we reivent the wheel? What do you all think highlight_file(), htmlentities(), show_source() do? Like I said, adding <?PHP to the beginning of a string and running it through highlight_str() will highlight the html syntax like you want, but I don't know how one gets rid of the beginning <?PHP . I remember reading a thread here once about something similar, but I can't find it.
Copy linkTweet thisAlerts:
@GavinPearceauthorJun 29.2004 — Well if anyone knows, I guess you could write a script to str replace it with like a space or something for the first instance, but that all seems pointless to me. Cant you fool the PHP into thinking that it is all php code somehow?

If it is a remote file (ie not on my server), how am i going to put <?PHP before it?

This site must take HTML inputted then, and in the script tell the server to read it as PHP, and return it without any <?PHP to the page called to show the HTML code?

It doesn't automatically add <?PHP to all PHP entered though because if you enter a PHP script enclosed with: [ PHP ] & [/ PHP ] without the <? & ?> then it doesn't show you the colour highlighting, so hence it cant be telling it that it is always PHP?

..........???

Also as a footnote, adding <? to the start of the HTML does make it read it and colour code it: http://gavinpearce.co.uk/color3.php

However I have had to put the <? in the source code of the actual page it grabs.

Oh yea, it isnt perfect either, it plays around a bit on the HTML and PHP comment tags. http://gavinpearce.co.uk/color3.php

Not to worry, but I think a proper HTML function would be usful in future versions...

Anyway, where do we go from here?
Copy linkTweet thisAlerts:
@MstrBobJun 29.2004 — How about this?

[code=php]
<?PHP
$file = file_get_contents("index.php");
$file = "<?" . $file;
$file_source = highlight_string($file, 1);
echo(substr_replace($file_source, '', 51, 5));
?>
[/code]
Copy linkTweet thisAlerts:
@GavinPearceauthorJun 30.2004 — Ok so now I got this script:
[code=php]
<?PHP
if(isset($_POST['urlinput'])) {
$source = file_get_contents($_POST['urlinput']);
$output = htmlspecialchars($source);

if($_POST['linewrap'] == '1') {

//linewrap and PHP colour code

$source = "<?" . $source;
$file_source = highlight_string($source, 1);
echo(substr_replace($file_source, '', 51, 5));
} else {

//as it is in the text
echo "<pre>n $output n</pre>";
}
}
else{

// show form
$html .= "
<form action='$self' method='post'>
<input type='text' name='urlinput' />
<input type='radio' name='linewrap' value='1' checked /><br />
<input type='radio' name='linewrap' value='0' /><br /><br />
<input type='submit' name='Submit' value='Submit' />
</form>";
echo $html;
}
?>
[/code]

which works fine on most pages. But if someone has a ?> in there source code for any reason it throughs the whole thing out.

Try http://gavinpearce.co.uk/bothandform.php?url=http://gavinpearce.co.uk/test.htm&v=1 and it works fine (just about, it messes up the comment tag and makes fullstops and commas in normal text green) but try http://gavinpearce.co.uk/bothandform.php?url=http://w3.org&v=1 and it stops right at the top because of the ?> in the xml tag placed first.

It does it on my main website aswell, and any that follow W3 standards I guess...

It also seems to stop at </script> tags as in http://gavinpearce.co.uk/?url=http://gavnet.com&v=1

I can not believe that someone hasn't writen a script to highlight HTML syntax online? It is one of the most simple languages. It doesn't even have to include support for JavaScript if they couldn't be bothered. Just make it black. Hmmm...
Copy linkTweet thisAlerts:
@MstrBobJul 01.2004 — I'm sure someone has, but it would be large, I guess. As for the ?> problem, what do you want? We're tricking PHP into highlighting HTML as if it were PHP syntax, not what it's designed to do. Personally, I still think that its tres cool that it actually does any highlighting. Check out this link, though - [URL=http://www.beautifier.org/]Beautifier.org[/URL]
Copy linkTweet thisAlerts:
@GavinPearceauthorJul 01.2004 — Cheers for the link...

Ummm yea I guess.

Anyway I'll carry on, maybe I'll end up making my own. I'll post here if I do. HTML can't be that hard, there is only 4 real different types of objects in it. I'll see.

Thanks everyone for all your help.

Gav.
×

Success!

Help @GavinPearce spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 5.21,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @AriseFacilitySolutions09,
tipped: article
amount: 1000 SATS,

tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,
)...