Limiting Bot Index Speed

This article will describe how to configure your robots.txt in such a way so as to prevent bots from hammering your site mercilessly in an attempt to fully index it.

While it is certainly advantageous (operating under the assumption that you want people to see your website) to have bots index your pages for SEO purposes, there’s nothing official stating that it must be done in under five seconds. In fact, if multiple crawlers index your site at once, it can cause very serious resource problems for your server, depending on the content being indexed.

Let’s begin…

First, we will need to create a file called “robots.txt” (without quotes) inside our document root. For example, the document root for the primary domain of your cPanel account will be /home/YouUsernameHere/public_html . So, we need to create the following file: /home/YourUsernameHere/public_html/robots.txt . You can create this file through your cPanel File Manager, SSH or even via FTP. It is also worth noting that this file can be created locally (like in a Notepad document, for example) and then uploaded to the document root (/home/UserName/public_html) of your website.

After the robots.txt file has been created in the document root of the site, we will need to add the following content to it:

#ROBOTS START
User-agent: *
Disallow:
Crawl-delay: 30
#ROBOTS END

The above entry will make it so that all bots that honor the directives found inside robots.txt files will only fetch a page once every 30 seconds, as opposed to as quickly as the bot possibly can. The directive also does not block any content from the bots. If you want to block content from being indexed by bots in addition to limiting crawl speed, you would need to simply add the disallow statements as needed. An example of this:

#ROBOTS START
User-agent: *
Disallow: /cgi-bin/
Disallow: /administration/
Crawl-delay: 30
#ROBOTS END

It does not matter which order you put the Crawl-delay or disallow statements in. If you want to configure individual bots, here are the main ones from Google, Microsoft and Yahoo!:

# GOOGLE BOTS
User-agent: Mediapartners-Google
Disallow:
Crawl-delay: 30

User-agent: Googlebot
Disallow:
Crawl-delay: 30

User-agent: Adsbot-Google
Disallow:
Crawl-delay: 30

User-agent: Googlebot-Image
Disallow:
Crawl-delay: 30

User-agent: Googlebot-Mobile
Disallow:
Crawl-delay: 30

#MICROSOFT BOTS

User-agent: msnbot
Disallow:
Crawl-delay: 30

User-agent: bingbot
Disallow:
Crawl-delay: 30

#YAHOO BOTS

User-agent: Slurp
Disallow:
Crawl-delay: 30

User-agent: Yahoo! Slurp
Disallow:
Crawl-delay: 30

In conclusion, it is a good idea to limit crawlers, especially as your site expands. If you’re using a development language such as PHP, it becomes very important, as each page fetch could potentially spin up the PHP engine, which will expend server resources.

NOTE: It can take up to 48 hours or more for crawlers to abide by the directives in your robots.txt file after you have successfully installed it.

 

Updated on April 28, 2021

Was this article helpful?

Related Articles

Need Support?
24/7 support is available through the My Rochen portal.
Login