Is a Python-related User Agent ever a legit traffic source?

Question

Is a Python-related User Agent ever a legit traffic source?

@buildinteractive → + 20,000Sep 20.2022

I’m noticing some odd looking user agent details in server logs that vary from the typical mozilla, chrome, and other expected browsers.

Is there any normal reason for a user agent to be python-requests or python-urllib?

Are these type of traffic hits usually bad scrapers, or can they be “good” crawlers that shouldn’t be blocked?

to post a answer

Server Management

3 Replies _↴

@samontabOct 03.2022 — #Since it's a generic name for the Python lib, in some cases it can be a "good bot", and in others it can be a "bad bot".

The best way to know would be to use a database of known bots.

Here are a couple of projects that might be useful:

https://github.com/omrilotan/isbot

https://github.com/monperrus/crawler-user-agents

You might also want to block anything that is not well-known to be a "good bot" though.

@samontabOct 03.2022 — #Those are just generic names used by the libraries, so they can be anything really. Ultimately it's up to you to decide what you want to block.

Here's a useful repository that lists known crawler user agents: https://github.com/monperrus/crawler-user-agents

Here's a direct link to the data itself: https://github.com/monperrus/crawler-user-agents/blob/master/crawler-user-agents.json

I would suggest you to have a look at that list and only allow the bots that you are happy with, and block anything else.

Also in #ServerManagement _↴

Scalable hosting for a Next.js server?How can I exclude React code from my server bundle in a hybrid site?How to run the bun dev server outside of localhost?

score 1 · Accepted Answer

First, this is purely your decision on how to manage such traffic.

However, blocking such traffic is most likely a waste of time due to the fact that User Agent metadata can easily be spoofed in code.

eg.

https://stackoverflow.com/questions/27652543/how-to-use-python-requests-to-fake-a-browser-visit-a-k-a-and-generate-user-agent

2 possible solutions:

1. If bad scrapers are a real problem:
then you will need to focus on IP addresses that are the main offenders that are causing abnormal server load.

2. If you wish to make the website content harder to scrap then reacting to use React or Vue will require a Javascript rendering engine which most basic low-level scrapers don't have.
This still isn't a pure solution as it is possible to build a scraper that uses a rendering engine as well.

It is a fight that you best avoid unless it is a serious problem, to begin with.

Is a Python-related User Agent ever a legit traffic source?

3 Replies _↴

Also in #ServerManagement _↴

Success!

Social

Version

Is a Python-related User Agent ever a legit traffic source?

3 Replies ↴

Also in #ServerManagement ↴

Success!

The web is an endless sea of information. Don't miss the boat... Subscribe!

Social

Version

3 Replies _↴

Also in #ServerManagement _↴