I work for a company which insists on using A LOT of static pages throughout their site. One of the problems we often see due to this is 404 errors, literally hundreds of them of a day. Tracking all of these is quite a pain, mostly because some of them are a pain.. and others are just curious or ignorant users playing with link structures.
I was assigned the task of figuring out how to make our own 404 errors be self reporting to the web design department. We have a few different domains and sub-domains, so that added a touch of complication to the process.
I came up with the following script that I in turn placed in the headers of our custom 404 page.
[code=php]
$refererChecks = array(“bar.com”,”foo.bar.com”,”myfoo.bar.com”);
$referer = $_SERVER[‘HTTP_REFERER’];
$refererDomain = $referer;
$refererDomain = str_replace(“http://”,””,$refererDomain);
$refererDomain = explode(“/”,$refererDomain);
$myURI = “http://”.$_SERVER[‘HTTP_HOST’].$_SERVER[‘REQUEST_URI’];
foreach($refererChecks as $thisDomain){
if($thisDomain == $refererDomain[0]){
$to = “[email protected]”;
$from = “[email protected]”;
$title = “Self Reporting Broken Link”;
$bodyText = “++++++++++++++ SELF REPORTING BROKEN LINK ++++++++++++++++nnnn
Link That Is Broken:
“.$myURI.”nn
Linked From:
“.$referer.”nnn
—- REPLYING TO THIS E-MAIL WILL SEND A RESPONSE TO ALL PARTIES ON THE MAILING LIST —-nn”;
@mail($to,$title,$bodyText,”From: $fromn”);
}
}
Since the server fires off the 404 error for everything, be it images, or a physical page, or even an improperly typed url with a / after the filename, its a very thorough system.
BUT now we have a weird re-occurring bug. If a directory does not have an index.php/html/shtml/etc. file to produce for a index listing we give a 403 Forbidden and show the 404 page via Apache.
Now if users are requesting a page from a directory without (whatever/thisthing.php) it will for some reason on occasion fire off a 404 email to the web design dept. even though not only are there no links to the directory but also, there are not even any references to it.
I’m stumped and can’t figure out what the heck is going on.
Any input is appreciated
Thanks;
Chad
P.S. I also have a feeling it could be something to do with the way apache is handling the files, but I figured I would try in the PHP section first because I think I could possibly by pass or kill whatever apache is doing.