Block Networks That Host Bad Bots With Cloudflare Firewall
You might have seen my post on how to block bad bots that steal your content and bandwidth. However, sometimes blocking bots by looking at the user agent might not be enough. This is because the user agent can be easily spoofed, and many bots do not honestly declare themselves.
A more effective way of stopping bad bots is by blocking or challenging the networks where the bots are hosted. This will target all of them, regardless of user agent. Setting up a block or challenge is easily done using the Cloudflare WAF (Web Application Firewall).
How to look for bot traffic in a log
Bot traffic is usually easy to spot in your access logs. In the excerpt below, you can see the bot is making requests at a second apart, something a human visitor would never do. Also, you will notice that the files being requesting are old backup and configuration files that the sysadmin may have left on the server.
How to find the networks the bots are hosted on
Copy the source IP address, go to Cloudflare Radar and paste in the address in the search field. After you search for the IP, Cloudflare will provide you some info about the ASN.
How to create a firewall rule that blocks or challenges the bots
Next, go to WAF and create a new rule. It can be names anything you like. Choose “AS Num” in Field, “equals” in Operator and fill the ASN number in the Value field.
Notice I’m including an operator to disregard the rule for bots known by Cloudflare, in order to avoid blocking search engine bots, for example. It is entirely up to you, if you want to allow these. You should be aware that this list also includes AI web scrapers.
For the action to take, I prefer “Managed Challenge”, but you can choose “Block”, “JS Challenge”, or “Interactive Challenge” as well. You can read about the different challenge types in the Cloudflare Docs.
The reason why I choose to challenge rather than block is a precaution, in the very rare case that a human is visiting from that ASN. Usually, these networks are used for hosting servers and not for providing internet connections to people, but I prefer to be more democratic and allow humans a chance to visit.
Some common networks that often host bad bots
The networks listed below are the ones I have personally seen traffic from in my logs, but there are surely many more — these are just some of the worst offenders.
- AS14618 AMAZON-AES — currently 94% bot traffic
- AS136907 HWCLOUDS-AS-AP — currently 73% traffic
- AS16509 AMAZON-02 — currently 86% traffic
- AS210743 BABBAR-AS — currently 100% traffic
For the full list I currently challenge, take a look at this post on ASNs.
Why not just challenge all traffic besides good bots?
You can easily set up Cloudflare to challenge all incoming traffic, and some sites choose to do that. The upside of this is that it will stop 99% of all bots. The downside that human visitors will have to wait for the Cloudflare check to finish, and in some cases have to click a checkbox. This worsens the user experience
Another option is to segment your site and challenge traffic to sensitive areas only (such as logins or admin areas). This increases security in those areas will not slowing down site load for normal users.
Do you need help blocking scrapers and hacking bots from your websites?
If you have no time or interest in doing this work by yourself, I completely understand. If you choose to get my website protection service, all of this and much more is already included.
Do you know of any other networks with lots of bad bot traffic? Let me know! The networks listed above are the ones I have personally seen traffic from in my logs, but there are surely many more.