Making Sphider ignore disallowed pages?

@sergiozambranoMar 22.2012

I meant: Ignore the disallow instruction, going ahead and retrieve the page.

I tried sphider, sphider-plus and some mods to make it “ignore robots”, but it seems not enough.
I’m trying to index a third party website to help users to find other’s posts, since the owner seems too busy with the “sales, sales, sales” part.
The problem is seems they deliberately want us not to find help because they also added some “disallow” rule.

I can browse the pages, and even changed sphider agent to Firefox’s, no success.

Is it even possible to browse a website as a browser, other than faking the user agent? in other words: How many ways a server has to figure out whether it’s a robot or not what is reading the pages?

What I’m stating could be wrong, and there could be other instructions/rules in robot.txt or somewhere else, but bear with me ?

Thanks.

to post a comment

PHP

3 Comments(s) _↴

@CharlesMar 22.2012 — #The internet is designed to spread information, not keep it safe. As with life itself, the best that you can do is ask politely for the spiders to leave you alone.

@sergiozambranoauthorMar 22.2012 — #The internet is designed to spread information, not keep it safe. As with life itself, the best that you can do is ask politely for the spiders to leave you alone.[/QUOTE]
ejem… amen?

What?

Did you read my description or just the title?

Is that an answer? or your signature in an empty post?

@sergiozambranoauthorMar 27.2012 — #Stupidly I didn't check HOW the links appear, just where the links pointed to.

It seems the links open the pages I want with JavaScript, which Sphider can't process.

At least I know how the pages are called, and I can increment the query string while downloading. That won't index the original pages but I'll be able to create a DB I can work with.

Is there any php script or Mac Software (or Firefox/Chrome extension?) to download webpages from a url range?

Any idea?

Also in #PHP _↴

User Firendly URLs How To Check Url Params With Min Code Possible ?Convert to UTF-8

Success!

Help @sergiozambrano spread the word by sharing this article on Twitter...

Tweet This

about: ({
version: 0.1.9 — BETA 5.8,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});

changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...

recent_tips: (
tipper: @AriseFacilitySolutions09,
tipped: article
amount: 1000 SATS,

tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,
)...

Making Sphider ignore disallowed pages?

3 Comments(s) _↴

Also in #PHP _↴

Success!

Social

Version

Making Sphider ignore disallowed pages?

3 Comments(s) ↴

Also in #PHP ↴

Success!

The web is an endless sea of information. Don't miss the boat... Subscribe!

Social

Version

3 Comments(s) _↴

Also in #PHP _↴