/    Sign up×
Community /Pin to ProfileBookmark

Do you know any existing indexing script?

I’d like to make an online index of existing web pages.
The website to index is not mine, but it doesn’t have a search tool nor have it anytime soon.

I can download them all to my local computer, and make them all wordpress pages (I’m good at it, but not at SQL) but I think my missing link is how to correlate the content with the real online page. If I had an existing tool / system to index pages that would probably fill in the gap, because I don’t really need the content other than to create the index. After that, the content is useless.

So the found pages should link to the original website, not to the one I’ll put up online, which will be only a search form.

Any idea?

to post a comment
PHP

3 Comments(s)

Copy linkTweet thisAlerts:
@chrisranjanaMar 12.2012 — Here is an indexing script written in php

http://www.sphider.eu/docs.php#options
Copy linkTweet thisAlerts:
@sergiozambranoauthorMar 12.2012 — Thanks! It's exactly what I was looking for ?
Copy linkTweet thisAlerts:
@sergiozambranoauthorMar 15.2012 — Ok, I've tried the script:

The problem is… I can't get it to browse as a browser's agent and it keeps connecting as a "robot", and relying on the robots.txt file, failing to index the pages marked as disallow… or at least so says the error message: "File checking forbidden by required/disallowed string rule".

I tried to change some if conditions, to make it NOT to find the robots file, or ignore it, but it didn't work. I also tried a mod I found online to "ignore robots" but it did the same, except there was no error. it just ended. Sphider-plus (1.6) did the same.

If anyone knows how to hack it, I'd appreciate the tip.

Thanks.
×

Success!

Help @sergiozambrano spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 5.25,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @AriseFacilitySolutions09,
tipped: article
amount: 1000 SATS,

tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,
)...