Scraping with Google’s I’m Feeling Lucky

@fcbastiatAug 01.2008

I’m originally posted this in client-side general because this isn’t really language specific, but I think server-siders might have a better idea. So I apologize if you feel that this is inappropriate.

I’ve been building a PHP app for several weeks now. I’ve finally finished it and am in the testing phase. My app does a lot of scraping to pull info from websites with no API like IMDb, animenewsnetwork, Internet Broadway Database, etc, to get media info. So far I’ve just been using Google’s “I’m Feeling Lucky” feature to find the appropriate pages on IMDb for a user’s search. It only took me a few minutes before I stopped getting any search results. After some troubleshooting, I found out that Google doesn’t allow non-human searching.

I’m really having a hard time thinking of a work around. My app is useless if my users have to get the relevant URLs themselves. I was hoping some of you might have some suggestions. Thanks.

to post a comment

PHP

@MrCoderAug 01.2008 — #Google did this to stop people doing what you are trying to do.

A while ago Google issued some Search API keys, but I don't think they offer them any longer because they were being abused?

@SyCoAug 01.2008 — #You could look into cURL it can post to a url and mimick search done on a website. It will return the source of the page which can be parsed for the information you require. I'd say it's still pretty sketchy way of doing things but might work.

@NewsGrailAug 01.2008 — #It will run into the same issue though. It's against their terms of service, they'll just go further out of their way to stop you if you really won't give up. Even if you can get around it for now, it won't be a long-term solution and you'll eventually find it stops working without warning. And there are ethical issues of course, they don't want you doing it.

@SyCoAug 01.2008 — #I was thinking of going direct ot the IMDB site and using their search.

@infinivertAug 02.2008 — #Another option would be to install your own search engine that will spider and index a list of sites you add (I know sphider will do this), and then direct your searches to that search engine.

The beautiful thing about this solution is that you can set up a cron job to index your sites at a regular rate rather than hoping Google has indexed those pages recently.

--Josh

@fcbastiatauthorAug 02.2008 — #Thanks, infinivert. That's a solution I hadn't considered and seems the best. Right now I'm managing with each site's search engines, but it's much slower because it has to load two full pages instead of 1 and a Google redirect. Also, the search results are far worse than Google's. I'll look into installing my own search engine. Thanks.

@infinivertAug 02.2008 — #Cool! Let me know how that works out for you, and whether you go with sphider or something else.

I'd love to see your site when you get it running!

Success!

Help @fcbastiat spread the word by sharing this article on Twitter...

Tweet This

about: ({
version: 0.1.9 — BETA 5.25,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});

changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...

recent_tips: (
tipper: @AriseFacilitySolutions09,
tipped: article
amount: 1000 SATS,

tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,
)...

Scraping with Google’s I’m Feeling Lucky

7 Comments(s) _↴

Also in #PHP _↴

Success!

Social

Version

Scraping with Google’s I’m Feeling Lucky

7 Comments(s) ↴

Also in #PHP ↴

Success!

The web is an endless sea of information. Don't miss the boat... Subscribe!

Social

Version

7 Comments(s) _↴

Also in #PHP _↴