The Majestic MJ12 bot alleges that it's a "UK based specialist search engine used by hundreds of thousands of businesses in 13 languages and over 60 countries to paint a map of the Internet independent of the consumer based search engines". Sounds reasonable. What I can't figure is out is why it has hit my site 43,000 times and churned through > 7GB of my bandwidth from a variety of IP addresses in one month.

This isn't its first offense either. The last time it happened I contacted This email address is being protected from spambots. You need JavaScript enabled to view it. and reported the excessive activity - you can see from the screen shot that it's an order of magnitude greater than its next nearest rival, the Googlebot. I received a reply that advised me to add this to my robots.txt file:

User-agent: MJ12bot
Disallow: /

You're welcome to try this, it might work for you - it didn't for me. I continued to get a daily bombardment from a variety of IPs, which ended with the usual whack-a-mole game of trying to block large CIDR ranges.

In the end, I resorted to a mod_rewrite in my .htaccess file:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^MJ12 [NC]
RewriteRule ^.* - [F,L]
</IfModule>

So far this seems to be having the desired effect, but it does ask the question of whether every request that alleges to be the MJ12bot actually is, or is it a vector for malicious bots. Looking at the URLs being crawled, I suspect the latter.

An example of a typical request:

37.57.218.243 - - [31/Mar/2020:14:18:01 +0100] "GET /category/events/?mode=grid/page/4//page/2//page/3//page/5//page/3//page/3//page/4//page/6//page/2//page/2//page/5//page/4//page/3//page/3//page/2//page/6//page/3//page/3//page/3//page/4//page/6//page/3//page/3/ HTTP/1.1" 200 177038 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)"

You can also try to mitigate these using the Wordfence plugin by rate-limiting the non-human visitors but I didn't find this helped much.