Get Started

Hosting

Get Started

Service
botnet

Block Networks That Host Bad Bots With Cloudflare Firewall

You might have seen my post on how to block bad bots that steal your content and bandwidth. However, sometimes blocking bots by looking at the user agent might not be enough. This is because the user agent can be easily spoofed, and many bots do not honestly declare themselves.

A more effective way of stopping bad bots is by blocking or challenging the networks where the bots are hosted. This will target all of them, regardless of user agent. Setting up a block or challenge is easily done using the Cloudflare WAF (Web Application Firewall).

How to look for bot traffic in a log

Bot traffic is usually easy to spot in your access logs. In the excerpt below, you can see the bot is making requests at a second apart, something a human visitor would never do. Also, you will notice that the files being requesting are old backup and configuration files that the sysadmin may have left on the server.

log view 1
Access log view

How to find the networks the bots are hosted on

Copy the source IP address, go to Cloudflare Radar and paste in the address in the search field. After you search for the IP, Cloudflare will provide you some info about the ASN.

ASN info cloudflare
Cloudflare Radar

How to create a firewall rule that blocks or challenges the bots

Next, go to WAF and create a new rule. It can be names anything you like. Choose “AS Num” in Field, “equals” in Operator and fill the ASN number in the Value field.

Notice I’m including an operator to disregard the rule for bots known by Cloudflare, in order to avoid blocking search engine bots, for example. It is entirely up to you, if you want to allow these. You should be aware that this list also includes AI web scrapers.

cloudflare asn waf
Cloudflare WAF rules

For the action to take, I prefer “Managed Challenge”, but you can choose “Block”, “JS Challenge”, or “Interactive Challenge” as well. You can read about the different challenge types in the Cloudflare Docs.

The reason why I choose to challenge rather than block is a precaution, in the very rare case that a human is visiting from that ASN. Usually, these networks are used for hosting servers and not for providing internet connections to people, but I prefer to be more democratic and allow humans a chance to visit.

Some common networks that often host bad bots

The networks listed below are the ones I have personally seen traffic from in my logs, but there are surely many more — these are just some of the worst offenders.

  • AS14618 AMAZON-AES — currently 94% bot traffic
  • AS136907 HWCLOUDS-AS-AP — currently 73% traffic
  • AS16509 AMAZON-02 — currently 86% traffic
  • AS210743 BABBAR-AS — currently 100% traffic

For the full list I currently challenge, take a look at this post on ASNs.

Why not just challenge all traffic besides good bots?

You can easily set up Cloudflare to challenge all incoming traffic, and some sites choose to do that. The upside of this is that it will stop 99% of all bots. The downside that human visitors will have to wait for the Cloudflare check to finish, and in some cases have to click a checkbox. This worsens the user experience

Another option is to segment your site and challenge traffic to sensitive areas only (such as logins or admin areas). This increases security in those areas will not slowing down site load for normal users.

Do you need help blocking scrapers and hacking bots from your websites?

If you have no time or interest in doing this work by yourself, I completely understand. If you choose to get my website protection service, all of this and much more is already included.

Do you know of any other networks with lots of bad bot traffic? Let me know! The networks listed above are the ones I have personally seen traffic from in my logs, but there are surely many more.

Leave a Reply

Your email address will not be published. Required fields are marked *