/    Sign up×
Community /Pin to ProfileBookmark

What To Lookout For When Building A Bot ?

Folks,

To build a web crawler like Google Bot, I made following notes:

  • * Teach bot not to crawl already crawled links. Duplicate Links Filter.

  • * Teach bot to identify file types to prevent crawling download links.

  • * Teach bot to understand orders listed in the no robots file. Teach bot to understand the file’s instruction format to make sense of the instructions.

  • * Teach bot to avoid loop traps.

  • * Teach bot to avoid crawling large files that will overload the bot or drain the botting resources.

  • * Teach bot to stay on domain, if only crawling initial website.

  • * Teach bot to learn Word Synonyms to figureout what the crawled page is about.

  • * Teach bot to not visit links that are over 255 chars in length.

  • * Teach bot to not lose control of itself (don’t want crooks making use of the bot in anyway for spreading spam, malware or viruses. But how to do this ?).

  • * Teach bot to …….
    What else should be on my list ?
  • to post a comment
    PHP

    9 Comments(s)

    Copy linkTweet thisAlerts:
    @viakgroupFeb 16.2021 — Here are the basic steps to build a crawler:

    Step 1: Add one or several URLs to be visited.

    Step 2: Pop up a link from the URL you want to visit and add it to the Visited URLs thread.

    Step 3: Use the ScrapingBot API to extract the content of the page and grab the data you are interested in.

    Copy linkTweet thisAlerts:
    @VITSUSAFeb 16.2021 — @developer_web#1628015 Click on the mentioned hyperlink to know about how you can build a web crawler -

    https://www.scraping-bot.io/how-to-build-a-web-crawler/
    Copy linkTweet thisAlerts:
    @developer_webauthorFeb 19.2021 — @viakgroup#1628030

    Cheers.

    Scraping Bot API ?
    Copy linkTweet thisAlerts:
    @developer_webauthorFeb 19.2021 — @VITSUSA#1628049

    Cheers. Doing that now!
    Copy linkTweet thisAlerts:
    @developer_webauthorFeb 19.2021 — Drat! Scraping Bot not in PHP but Node.js which I know nothing about. If only some programmer converted it to PHP equivalent code:

    https://www.scraping-bot.io/how-to-build-a-web-crawler/
    Copy linkTweet thisAlerts:
    @developer_webauthorFeb 19.2021 — @viakgroup#1628030

    Look at the crawler Iwas building nearly a week ago:

    https://www.webdeveloper.com/d/392763-folks
    Copy linkTweet thisAlerts:
    @marksmith121Feb 23.2021 — A bot (short for "robot") is an automated program that runs over the Internet. Some bots run automatically, while others only execute commands when they receive specific input. There are many different types of bots, but some common examples include web crawlers, chat room bots, and malicious bots.
    Copy linkTweet thisAlerts:
    @NogDogFeb 23.2021 — @marksmith121#1628411

    Continuing to post replies like this that have nothing to do with answering the original question (apparently just copy/pasting some google search result) will result in banning. No further warning may be received.
    Copy linkTweet thisAlerts:
    @developer_webauthorFeb 25.2021 — @marksmith121#1628411

    I know what a bot is. I build .exe ones. Now trying to learn to build .PHP or web ones.
    ×

    Success!

    Help @developer_web spread the word by sharing this article on Twitter...

    Tweet This
    Sign in
    Forgot password?
    Sign in with TwitchSign in with GithubCreate Account
    about: ({
    version: 0.1.9 BETA 5.18,
    whats_new: community page,
    up_next: more Davinci•003 tasks,
    coming_soon: events calendar,
    social: @webDeveloperHQ
    });

    legal: ({
    terms: of use,
    privacy: policy
    });
    changelog: (
    version: 0.1.9,
    notes: added community page

    version: 0.1.8,
    notes: added Davinci•003

    version: 0.1.7,
    notes: upvote answers to bounties

    version: 0.1.6,
    notes: article editor refresh
    )...
    recent_tips: (
    tipper: @AriseFacilitySolutions09,
    tipped: article
    amount: 1000 SATS,

    tipper: @Yussuf4331,
    tipped: article
    amount: 1000 SATS,

    tipper: @darkwebsites540,
    tipped: article
    amount: 10 SATS,
    )...