Hey guys,
I need a bit of help.
Without changing php.ini to increase execution time, is there any way I can get around it? I don’t mind if I have to click next each time to do it…
Basically I’m building a new search engine for our companies website. I’ve written a crawler script that will go through each page and log all words and occurrences. This takes a lot longer than 30 seconds given the number of pages we have.
The pages are read in from our cache pages and then logged since the cache pages are static HTML.
Here’s a small example of the setup I use to read the files
[code=php]
if (file_exists($filename)) {
if ($handle = opendir($filename)) {
while (false !== ($file = readdir($handle))) {
############## Here is where I open the file and do all the stuff with the text line by line.
}
closedir($handle);
}
}
Can anyone think of a method of interrupting this, say, every 5 pages so that I have to click “Next” to proceed? Unless someone can think of another method. I don’t really want to change php.ini as the execution script could bring the server to a halt.
It’s our own server, but I don’t want to take the risk really.
[code=php]set_time_limit(60); // Set the max execution time allowed for the script (60 seconds -the default is usually 30 secs- ). [/code]
[code=php]
if (file_exists($directory)) {
if ($handle = opendir($directory)) {
while (false !== ($file = readdir($handle))) {
array_push($pages, $file);
}
closedir($handle);
}
}
foreach ($pages as $p => $f) {
echo $p . " : " . $f . "<br />";
}[/code]
[code=php]
$counter = ($page_id + $page_limit);
foreach ($pages as $p => $file) {
if ($p <= $counter && $p >= ($page_id - ($page_limit - 1))) {
############### INSERT words, etc... ###########
}
if ($p == $counter && $counter < count($pages)) {
echo "<a href='index.php?option=search&task=crawl&page=" . ($p + $page_limit) . "'>Next</a>";
}
}
[/code]
[code=php]
for( $i = 0; $words[$i]; $i++ ){
for( $j = 0; $words[$i][$j]; $j++ ){
######### INSERT WORD, INSERT OCCURRENCE
}
}
[/code]
[code=php]$start_page = $page_id * $page_limit;
$end_page = ($page_id + 1) * $page_limit - 1;
$next_page = $page_id + 1;
echo $start_page . " START PAGES . " . $end_page . " END PAGES <br /><br />";
foreach ($pages as $p => $file) {
if ($p >= $start_page && $p <= $end_page) {
echo $p . " : " . $file . "<br />";
}
if ($p == $end_page && $p < count($pages)) {
echo "<a href='index.php?option=search&task=crawl&page=" . $next_page . "'>Next</a>";
}
[/code]
If your a machine on the same lan or network, the net admin or head admin should have no objections to adding you to a trust or as a trusted user to have access to those resources, it is by the sound of it your task to deliver a project and it is the net admin that is hindering your progress.
A little word in the right ear will soon have the net admin cooperating.
Network adimns have this god complex and see themselves as above everyone in that company yet they forget who pays their wages... put your argument in the right persons ears and why you need the access and they will have a word with the net admin.
Your alternative would be to have your deadline pass and push all the blame on the net admin who will be roasted at the next company BBQ and they will have to answer questions as to why he ignored your requests. Sort of puts them on the spot.[/QUOTE]
Unfortunately that's not the case.
Our web server is hosted externally, and our network administrator is a third party company we pay to look after our systems (why I don't know, they barely do a thing)
I even asked them for a copy of our SLA with them not too long ago and they haven't provided me with a copy.[/QUOTE]
[code=php]
if (strlen($cur_word) >= $smallest || !isset($common[$cur_word])) {
[/code]
[code=php]
## in a different file
define("common", array("but", "and", "maver"));
## in crawler
$common = common;
[/code]
Our servers (except web server) are through them, our internet is through them, they look after all of our software licenses and have remote access to all computers on the network except mine (I disabled it).[/QUOTE]
Wow!
The bob's will only understand numbers, that's why it was farmed out in the first place. Before that pay grade meeting I would suggest calculating the cost of setting up an internal server room and employing someone to admin it or assist you. Balance that against the current cost of the hosting, which will be a small drop and that's all the bean counters see. Then factor in the man hours rates to fix and chase every issue and, for a little FUD, disaster recovery. Put a number on the risk factor or the current admin stealing and selling data, lost customer confidence and the marketability of increased security of an in house solution etc. If you get creative you should be able to make a 5 year plan for an internal server room look like the best option from a business plan point of view and financially too. I'm sure we could all come up with a few more ideas if you wanted to make a post about it. If nothing else you'll look like a serious security/business minded, forward thinking valuable member of staff, which can only help with the pay grade talks ?[/QUOTE]
0.1.9 — BETA 5.29