/    Sign up×
Bounties /Pin to ProfileBookmark

Is a Python-related User Agent ever a legit traffic source?

I’m noticing some odd looking user agent details in server logs that vary from the typical mozilla, chrome, and other expected browsers.

Is there any normal reason for a user agent to be python-requests or python-urllib?

Are these type of traffic hits usually bad scrapers, or can they be “good” crawlers that shouldn’t be blocked?

to post a answer
Server Management

3 Replies

Copy linkTweet thisAlerts:
@samontabOct 03.2022 — Since it's a generic name for the Python lib, in some cases it can be a "good bot", and in others it can be a "bad bot".

The best way to know would be to use a database of known bots.

Here are a couple of projects that might be useful:

https://github.com/omrilotan/isbot

https://github.com/monperrus/crawler-user-agents

You might also want to block anything that is not well-known to be a "good bot" though.
Copy linkTweet thisAlerts:
@samontabOct 03.2022 — Those are just generic names used by the libraries, so they can be anything really. Ultimately it's up to you to decide what you want to block.

Here's a useful repository that lists known crawler user agents: https://github.com/monperrus/crawler-user-agents

Here's a direct link to the data itself: https://github.com/monperrus/crawler-user-agents/blob/master/crawler-user-agents.json

I would suggest you to have a look at that list and only allow the bots that you are happy with, and block anything else.
Copy linkTweet thisAlerts:
@adam248Oct 08.2022 — First, this is purely your decision on how to manage such traffic.

However, blocking such traffic is most likely a waste of time due to the fact that User Agent metadata can easily be spoofed in code.

eg.

https://stackoverflow.com/questions/27652543/how-to-use-python-requests-to-fake-a-browser-visit-a-k-a-and-generate-user-agent

2 possible solutions:

1. If bad scrapers are a real problem:
then you will need to focus on IP addresses that are the main offenders that are causing abnormal server load.

2. If you wish to make the website content harder to scrap then reacting to use React or Vue will require a Javascript rendering engine which most basic low-level scrapers don't have.
This still isn't a pure solution as it is possible to build a scraper that uses a rendering engine as well.

It is a fight that you best avoid unless it is a serious problem, to begin with.
×

Success!

Help @buildinteractive spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 4.26,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,

tipper: @Samric24,
tipped: article
amount: 1000 SATS,
)...