Robots Txt Crawl Delay – Why We Use Crawl Delay & Getting Started

A robots.txt file is one of the primary ways of telling the search engine where on the website it can and cannot go. It is essentially a text file containing the directives that search engine spiders, also called robots, read to follow a strict syntax. This file can be used to tell the search engines about your website’s engagement rules. Search engines check the website’s robots.txt file regularly to find instructions for crawling the site. These instructions are called directives. If the robots.txt file is not present, the search engine will crawl the entire website. Robots.txt plays a critical role in SEO of the website as it tells the search engines how they can best crawl the site. Using this file, you can prevent duplicate content, block the search engines from accessing parts of your website and guide them on crawling the site more efficiently. In this post, we discuss the Crawl-Delay directive in Robots.txt file and its use.

Robots Txt Crawl Delay – What Is It?

An unofficial robots.txt directive, Crawl-delay can be used to prevent overloading servers with a large number of requests. Search engines like Bing, Yahoo and Yandex are at times crawl-hungry and they can be slowed down using this directive to which they respond. Though different search engines have their own methods of reading the directive, the result is generally similar.

The crawl-rate is defined as the time frame between any two requests bots make to a website. It is basically the rate at which the bot can crawl your website. A crawl-delay setting tells the bot to wait for a specific amount of time between two requests. Crawl-delay is an effective way to tame bots not to consume extensive hosting resources. However, it is important to be careful while using this directive in robots.txt file. By setting a delay of 10 seconds, the search engines are allowed to access only 8640 pages per day. Though this may seem to be a big number for a small site, it isn’t many for larger sites. In case, you don’t get any traffic from such search engines, this measure is a good choice to save bandwidth.

Google Crawl Delay – Getting Started

Google does not consider the crawl-delay setting. This is why there is no need to worry about the effect of using such a directive on your Google standings. You can use it safely to deal with other search bots which are aggressive. Though it is unlikely to experience problems with Googlebot crawling, it is still possible to lower the crawl-rate for Google using the Google Search Console. Here are the simple steps to define crawl-rate for Google bot.

  1. Log on to the Google Search Console.
  2. Select the website you wish to set the crawl-delay for.
  3. Click on the gear icon available at the top right corner and choose ‘Site Settings’.
  4. Look for the ‘Crawl Rate’ option featuring a slider that allows setting the preferred crawl rate. The rate is set to a recommended value by default.

Why We Use Crawl Delay?

If your website has a large number of pages and a lot of them are linked from the index, it is possible that the bot which starts crawling the site generates too many requests to the website for a short time period. Such a traffic load can probably lead to depletion of hosting resources as recorded on an hourly basis. In case, your website faces such a problem, one way to deal with it is setting a crawl-delay of 1-2 seconds so that the search bot crawls the website on a moderate rate and avoids causing traffic peaks. Search engines like Yahoo, Yandex and Bing respond to the crawl-delay directive and it can be used to keep them for a while. Setting a crawl-delay of 10 seconds means the search engines would wait for ten seconds before re-accessing the website after crawling once. Each time a search bot crawls the site, it takes up a lot of bandwidth and other resources from the server. The websites with a lot of pages and content like e-commerce sites would be in trouble as crawlers can drain the resources pretty quickly. Using robots.txt file would keep the bots from accessing images and scripts to retain resources for visitors.

Crawl-Delay Rule Ignored By Googlebot

The Crawl-delay directive for robots.txt file was introduced by search engines like Bing, Yahoo and Baidu and they still respond to the directive. The purpose was to allow webmasters to specify how many seconds a search engine crawler should wait between single requests in order to limit the load on a server. Though this is not a bad idea, Google does not support the crawl-delay rule because their servers are dynamic and following the time frame between requests doesn’t make sense for it. The value specified in seconds for the time between requests is not so useful now as most servers are capable of handling so much traffic per second. Google automatically adjusts the crawling depending on the server reaction rather than following the crawl-delay rule. In case it sees a server error or slowdown, they slow the crawling. Webmasters can specify in the robots.txt file about the parts of their websites they don’t want to get crawled.

Conclusion

Robots.txt file provides an effective way to control how crawlers access the website. Creating this file correctly can improve the user experience for visitors and the website SEO. By allowing bots to spend time crawling the most relevant things, they will be able to organize and display the content the way you want it to be shown in the SERPs. Crawl-delay directive is quite useful to control aggressive search engine bots and can save your server resources to benefit your site and visitors.