Back in March 2020 I wrote about the MJ12bot and its propensity to steal your bandwidth by hammering your site with tens of thousands of unwanted http requests. This month it's the turn of the PetalBot, aka the Aspiegel bot to be doing the hammering.
In common with the MJ12bot, its website suggests a benign intent. This month (Nov 2020) it's stolen about 12GB of my client's bandwidth. Others have commented on the origin of this bot in the Huawei cloud, suggesting it's Huawei's new search engine. Either way, if you aren't interested in marketing your site to a Chinese audience, it seems reasonable to take counter-measures.
I've recently started using Cloudflare and been impressed with the range of tools they provide for security. Previously, I would just put a rule in my .htaccess file:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^MJ12bot [OR]
RewriteCond %{HTTP_USER_AGENT} ^PetalBot
RewriteRule ^.* - [F,L]
</IfModule>
Nothing wrong with this, but for more granular control, check out Cloudflare's Firewall Rules and Tools. There's an option to create a firewall rule that matches a UserAgent string against a pre-defined value, including the option to use 'contains' and regular expressions. I've started a 'BadBots' rule that blocks anything containing 'MJ12bot' or 'PetalBot'. So far, so good, my site traffic has dropped precipitously.
Another irritation is bots pretending to be Google or Bing. Cloudflare firewall has a tool called 'User Agent Blocking' that allows you to challenge a bot - the assumption being that genuine bots will pass the challenge and malicious bots will fail. I'm still playing with this one but it looks like the Bing bot is successfully meeting the challenge so here's another option to filter bots, especially if you're under attack.
All in all, Cloudflare feels to me like a nicer, more powerful toolkit than .htaccess rules so I'm running with it for now.