What To Lookout For When Building A Bot ?

@developer_webFeb 15.2021

Folks,

To build a web crawler like Google Bot, I made following notes:

* Teach bot not to crawl already crawled links. Duplicate Links Filter.

* Teach bot to identify file types to prevent crawling download links.

* Teach bot to understand orders listed in the no robots file. Teach bot to understand the file’s instruction format to make sense of the instructions.

* Teach bot to avoid loop traps.

* Teach bot to avoid crawling large files that will overload the bot or drain the botting resources.

* Teach bot to stay on domain, if only crawling initial website.

* Teach bot to learn Word Synonyms to figureout what the crawled page is about.

* Teach bot to not visit links that are over 255 chars in length.

* Teach bot to not lose control of itself (don’t want crooks making use of the bot in anyway for spreading spam, malware or viruses. But how to do this ?).

* Teach bot to …….
What else should be on my list ?

to post a comment

PHP

@viakgroupFeb 16.2021 — #Here are the basic steps to build a crawler:

Step 1: Add one or several URLs to be visited.

Step 2: Pop up a link from the URL you want to visit and add it to the Visited URLs thread.

Step 3: Use the ScrapingBot API to extract the content of the page and grab the data you are interested in.

@VITSUSAFeb 16.2021 — #@developer_web#1628015 Click on the mentioned hyperlink to know about how you can build a web crawler -

https://www.scraping-bot.io/how-to-build-a-web-crawler/

@developer_webauthorFeb 19.2021 — #@viakgroup#1628030

Cheers.

Scraping Bot API ?

@developer_webauthorFeb 19.2021 — #@VITSUSA#1628049

Cheers. Doing that now!

@developer_webauthorFeb 19.2021 — #Drat! Scraping Bot not in PHP but Node.js which I know nothing about. If only some programmer converted it to PHP equivalent code:

https://www.scraping-bot.io/how-to-build-a-web-crawler/

@developer_webauthorFeb 19.2021 — #@viakgroup#1628030

Look at the crawler Iwas building nearly a week ago:

https://www.webdeveloper.com/d/392763-folks

@marksmith121Feb 23.2021 — #A bot (short for "robot") is an automated program that runs over the Internet. Some bots run automatically, while others only execute commands when they receive specific input. There are many different types of bots, but some common examples include web crawlers, chat room bots, and malicious bots.

@NogDogFeb 23.2021 — #@marksmith121#1628411

Continuing to post replies like this that have nothing to do with answering the original question (apparently just copy/pasting some google search result) will result in banning. No further warning may be received.

@developer_webauthorFeb 25.2021 — #@marksmith121#1628411

I know what a bot is. I build .exe ones. Now trying to learn to build .PHP or web ones.

Success!

Help @developer_web spread the word by sharing this article on Twitter...

Tweet This

about: ({
version: 0.1.9 — BETA 5.18,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});

changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...

recent_tips: (
tipper: @AriseFacilitySolutions09,
tipped: article
amount: 1000 SATS,

tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,
)...

What To Lookout For When Building A Bot ?

9 Comments(s) _↴

Also in #PHP _↴

Success!

Social

Version

What To Lookout For When Building A Bot ?

9 Comments(s) ↴

Also in #PHP ↴

Success!

The web is an endless sea of information. Don't miss the boat... Subscribe!

Social

Version

9 Comments(s) _↴

Also in #PHP _↴