Folks,
I can always use a php crawler to crawl sites. Crawler can be installed on paid vps or my localhost (Xampp).
Now, providing I run the php crawler on my localhost, do I have to keep the Admin Section (on the vps host) open on my browser (client side) for the crawler to keep on crawling ? If I close down my web browser and go to sleep then will the php crawler stop crawling ?
Same question goes if I was running the php crawler on a paid vps.
I do not want to keep my pc on 24/7 regardless of whether the crawler is on my end or the paid host end. Understand ?
You will ask me which php code I will use. Well, I am searching right now. I will copy from tutorials. You have a fair idea what kind of crawlers they are built with php using DOM.
To begin with, let’s say I will use the crawler found on this following tutorial:
NOTE: It only downloads the first starting page’s title, description, h1, download time. Not the crawled pages’. I can amend the script code so it extracts the same for the crawled links.
The above link was just an example I am showing you what kind of php scripts I will use found on tutorials or in open source markets. So, to run such crawlers, do I have to keep my pc on with my web browser open ?
If so then what’s the solution ? I have no experience with CRON jobs.
Any workarounds ?