/    Sign up×
Community /Pin to ProfileBookmark

Link checker

Hey, guys, I’ve got yet another question. I was thinking today, and wondering… How would I use PHP to go to a page and follow all of the links on that page and print their URIs? Any ideas? I haven’t come up with anything, but I am definitely ready and willing to learn this new field.

Thanks, ?

Jona

to post a comment
PHP

40 Comments(s)

Copy linkTweet thisAlerts:
@NevermoreMay 16.2003 — Is it possible to open pages outside your own server with PHP? I only know about fopen, and that is your-server only.
Copy linkTweet thisAlerts:
@JonaauthorMay 16.2003 — So do I, but it must be possible. I mean, unless you have to use CGI, but PHP should be able to do it, too.

http://validator.w3.org/ searches through the source of a specific URI (or an uploaded temporary file, but I know how to do that).

There is another site that checks spelling, links, etc., etc. on a Web page you specify. How do I make PHP follow links from page to page and write out the results (example: visit somesite.com, find all of the links on the page, print them all out, go to each one of those pages individually and print them out, etc.).

That's what I meant.... lol :p
Copy linkTweet thisAlerts:
@NevermoreMay 16.2003 — I know how you can get it to find links, but if I can't make it open pages, it isn't much use, really. Where's Pyro...
Copy linkTweet thisAlerts:
@JonaauthorMay 16.2003 — He's, "not at my desk" right now... Man, he would know... (lol)

While we wait, though, how's about you show me some PHP code about getting all the links on a page? (I could just use Javascript and frames.... lol, but I don't wanna do that..)
Copy linkTweet thisAlerts:
@NevermoreMay 16.2003 — validator.w3.org is using PHP. Now if they'd just written how they did it...
Copy linkTweet thisAlerts:
@NevermoreMay 16.2003 — How strange, if you try to go to it by using index.php in their chck directory, it finds it.

This PHP regular expression should find any and all hyperlinks and return them. You would just need to loop. (Soz, I haven't tested it. I'm not at home, I'm on a laptop with a dial up.)

[code=php]$look=ereg(<a [a-zA-Z0-9]* href="[a-zA-Z0-9]*" [a-zA-Z0-9]*>[a-zA-Z0-9]*</a>);[/code]
Copy linkTweet thisAlerts:
@JonaauthorMay 16.2003 — Hmmm.... I see. That looks interesting. It looks like one of the most used things in PHP is RegExps... I'll have to study more on those. lol
Copy linkTweet thisAlerts:
@NevermoreMay 16.2003 — I've been using PHP for a while, and they are quite hard to use. Most often they can be replaced by simpler code, so I don't use them enough to become good.
Copy linkTweet thisAlerts:
@JonaauthorMay 16.2003 — I see. Well, I'll get good at 'em nonetheless! lol ?
Copy linkTweet thisAlerts:
@AdamGundryMay 16.2003 — You should be able to open the links with [URL=http://www.php.net/manual/en/ref.curl.php]CURL[/URL].

Adam
Copy linkTweet thisAlerts:
@JonaauthorMay 16.2003 — Adam, I'm assuming I'll need to install the package on my server then, right? I don't think I can do that on a free server... Is there any other way possible? (I will check to see if it's already installed on my server, which hopefully it is.)
Copy linkTweet thisAlerts:
@pyroMay 17.2003 — Is this what you are looking for, Jona?

[code=php]<?php

$code = file ('http://www.yahoo.com/'); // file to open

foreach ($code as $line_num => $line) { // loop through lines
echo "<span style="font-weight:bold;">Line #$line_num :</span> " . htmlspecialchars($line) . "<br>n"; // echo lines to screen. Note htmlspecialchars() convers special characters to their HTML entities
}
?>[/code]
Copy linkTweet thisAlerts:
@JonaauthorMay 17.2003 — Cijori, two things. One: Your code doesn't work. Two, I changed it up to make it work, but when I do it prints nothing. How can I fix this?

Thanks.
Copy linkTweet thisAlerts:
@NevermoreMay 17.2003 — You might want to talk to Pyro about that - as I said, I'm not brilliant with RegExps. Are you looping through each line looking for things, then printing the contents of the variable? That's what I would try.
Copy linkTweet thisAlerts:
@JonaauthorMay 17.2003 — Well, all I know is this ereg/preg_match/preg_match_all stuff gets quite confusing--more so when it doesn't work as expected. lol
Copy linkTweet thisAlerts:
@NevermoreMay 17.2003 — Can you use require() on a file that is on a different server?
Copy linkTweet thisAlerts:
@JonaauthorMay 17.2003 — [b][font=arial][color=maroon]I've never tried (or used) it, how's the syntax? require("http://myotherothersite.com/myfile.php"); right?[/color][/font][/b]
Copy linkTweet thisAlerts:
@NevermoreMay 17.2003 — That's it; if that works then you could grab other files.
Copy linkTweet thisAlerts:
@JonaauthorMay 17.2003 — [b][font=arial][color=maroon]Hmmm... It's an idea. Hold on let me test it.[/color][/font][/b]
Copy linkTweet thisAlerts:
@AdamGundryMay 17.2003 — Yes, and it should work as long as [URL=http://www.php.net/manual/en/ref.filesystem.php#ini.allow-url-fopen]allow-url-fopen[/URL] is set on the server. (I just checked the docs).

Adam
Copy linkTweet thisAlerts:
@JonaauthorMay 17.2003 — [b][font=arial][color=maroon]Yes, it does work. ? That's neat.. Now about getting all of the URLs in it... :rolleyes:[/color][/font][/b]
Copy linkTweet thisAlerts:
@NevermoreMay 17.2003 — I'm starting to think you may have to go via CGI...
Copy linkTweet thisAlerts:
@JonaauthorMay 17.2003 — That's a possibility. Man, I sure wanted to use PHP, though. It's so much easier to find a free server that way. *Sigh*

CGI is more powerful than PHP, though, isn't it? It's also quite a bit harder... Well, not too hard, but the syntax is a bit different.... And you have to learn how to, "move around" in it.
Copy linkTweet thisAlerts:
@NevermoreMay 17.2003 — If you want to find a free web host, try [URL]http://www.clickherefree.com[/URL]. It's a free database of free web hosts. freewebspace.net is another (I think).
Copy linkTweet thisAlerts:
@JonaauthorMay 17.2003 — [color=maroon][font=arial][b]Don't worry about me, I can find one for free. lol I've been practicing that for over a year now. lol[/b][/font][/color]
Copy linkTweet thisAlerts:
@NevermoreMay 17.2003 — Yeah, I've been through geocities, IT3, brinkster, Tripod and only recently have moved on to paying for hosting - at the moment I'm hosting my own, but I'm probably going to move to colocation soon.
Copy linkTweet thisAlerts:
@JonaauthorMay 17.2003 — [b][font=arial][color=maroon]That's cool.[/color][/font][/b]
Copy linkTweet thisAlerts:
@NevermoreMay 17.2003 — If you don't need anything that Bravenet isn't already giving you, [URL=http://www.brinkster.com]Brinkster[/URL] might be better for you. They offer MySQL and ASP, put no ads on your pages, and are free.
Copy linkTweet thisAlerts:
@JonaauthorMay 17.2003 — [b][color=maroon][font=arial]Well, I don't know ASP and I don't want to learn it yet.. It's too hard! lol I don't know why Microsoft makes everything in caps and stuff... Also, Brinkster doesn't have much bandwidth at all--something that I need. ?



There is also http://freewebs.com/ which offers no ads, PHP support, and is free...[/font][/color]
[/b]
Copy linkTweet thisAlerts:
@NevermoreMay 17.2003 — Freewebs also have CGI, so you could use them for your link checker. What do you want to check the links of, by the way?
Copy linkTweet thisAlerts:
@JonaauthorMay 17.2003 — [b][font=arial][color=maroon]I basically just want to learn all of the practical (or impractical, lol) uses of PHP. I want to learn all that "extra" stuff that no one bothers with. I want to just learn all I can! ? I don't have a book or anything to learn from... All I have is http://php.net/ a useful resource, but not a tutorial area.



Programming and HTML have their advantages, HTML is easy to learn yet you have to have [url=http://validator.w3.org/]valid[/url] HTML; in programming there is no "valid" or "invalid" unless it's a syntax error or something... Which is its advantage over HTML. Programming is also more powerful (duh, how do you think they came up with HTML? lol).



BTW, don't say to go to Webmonkey.lycos.com or whatever it is for PHP tutorials because none of the ones there are any good... At least, not to me. :rolleyes:[/color]
[/font][/b]
Copy linkTweet thisAlerts:
@AdamGundryMay 17.2003 — HTML is easy to learn[/QUOTE] I suppose it depends on how you code - I found programming fairly easy to pick up, but learning to hard-code HTML took longer. HTML does have editors though, which makes it a lot easier.

Of course, you then get on to whether a RAD tool like Delphi or another IDE is an editor, and to "hard-code" you should be doing everything manually.

Good luck with learning PHP - it's a great language. I'm gradually learning, and I agree with you - the best resources is the website.

Adam

P.S. A good way I found to learn was (i) to make something I enjoy (internet games), and (ii) set up a webserver on my computer so I can test much more easily.
Copy linkTweet thisAlerts:
@JonaauthorMay 17.2003 — Adam, that's exactly what I do. I just don't enjoy making games, I enjoy making more complex things... I satisfy myself more often when I accomplish something and don't get frustrated. lol

Also, I have the http://aprelium.com/ Web server installed on my system so I can run PHP (and I can also download CGI) scripts on my local machine. The one thing is I can't CHMOD folders...
Copy linkTweet thisAlerts:
@NevermoreMay 17.2003 — Thanks for the link to the server - it's the only one I've seen that I can run. Now I can test PHP more easily. I was running a link through my server - they aren't exactly miles from one another...
Copy linkTweet thisAlerts:
@JonaauthorMay 17.2003 — [b][font=arial][color=maroon]No problem. ?[/color][/font][/b]
Copy linkTweet thisAlerts:
@pyroMay 19.2003 — I think this is what you are looking for, Jona:

[code=php]<html>
<head>
<title>Link Validator</title>
<style type="text/css">
a {
color:darkblue;
}
</style>

</head>
<body>

Input a full url (ie. http://www.infinitypages.com/index.php).

<form action="checklinks.php" method="post">
<input type="text" name="url" size="50">
<input type="submit" name="submit" value="Check links">
</form>

<?php

#######################################################
# This script is Copyright 2003, Infinity Web Design #
# Written by Ryan Brill - [email protected] #
# All Rights Reserved - Do not remove this notice #
#######################################################

if ($_POST["url"]) {

$file = $_POST["url"];

echo "Links in file <a href="$file">$file</a>:<br/><br/>n";

$x = 1;
$valid = 0;
$invalid = 0;

$filename = split("/",$file);
$filename = $filename[count($filename)-1];
$path = split($filename, $file);
$path = $path[0];

$contents = @file($file) or die ("Failed to open <a href="$file">$file</a> to check links. Please be sure it is an absolute URL.");

foreach ($contents as $line_num => $line) {
if (preg_match('/<a href=.*?>/', $line, $a)) {
for ($i = 0; $i < count($a); $i++) {
$url = preg_split("/href=['"]/", $a[$i]);
$url2 = preg_split("/['"]/", $url[1]);
$spliturl = parse_url($url2[0]);

if ($spliturl[scheme] == "") {
$finalurl = $path.$url2[0];
}
else {
$finalurl = $url2[0];
}

if (strtolower($spliturl[scheme]) != "mailto") {
$code = @file ($finalurl);// file to open
if (!$code) {
echo "<span style="color:darkred;">Invalid:</span> <a href="$finalurl">$finalurl</a><br/>n";
$invalid++;
}
else {
echo "<span style="color:green;">Valid:</span> <a href="$finalurl">$finalurl</a><br/>n";
$valid++;
}
}
}
}
}
}

if ($x == 1) {
echo "<br/>n";
echo $valid+$invalid." links checked.<br/>n";
if ($valid > 0) {
echo "<span style="color:green;">$valid valid links found.</span><br/>n"; }
if ($invalid > 0) {
echo "<span style="color:darkred;">$invalid invalid links found.</span><br/>n";
}
}
?>

</body>
</html>[/code]
Copy linkTweet thisAlerts:
@NevermoreMay 19.2003 — Woah...
Copy linkTweet thisAlerts:
@pyroMay 19.2003 — Like it? ?
Copy linkTweet thisAlerts:
@JonaauthorMay 19.2003 — [b][font=arial][color=maroon]Yup, Cijori, Pyro knows his stuff and he knows it well![/color][/font][/b]
×

Success!

Help @Jona spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 6.18,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @nearjob,
tipped: article
amount: 1000 SATS,

tipper: @meenaratha,
tipped: article
amount: 1000 SATS,

tipper: @meenaratha,
tipped: article
amount: 1000 SATS,
)...