What does Crawl Budget mean in SEO?

Search engines like Google and Bing send crawlers to read the contents of websites but with billions of pages out there on the Internet search engines just don't have the resources to crawl everything every day.

As such they estimate how busy websites are from their rankings and click through rates in the search results and then allocate a certain amount of resources to each site. This is known as the websites Crawl Budget.

Crawl Budget is a general conceptual term, there is no accepted unit of measurement so I could say "This website has a Crawl Budget of 67 and yours has a Crawl Budget of 22".

However you can get an idea of what Google's crawl budget of your website is in your Google Search Console where Google reports how many pages it crawls and how many kilobytes of data it downloads every day.

On average Google crawls 100 pages of this website each day but on any exact date the number can vary dramatically. In the last 3 months, for this site, the pages crawled on any one day were as low as 14 and as high as 317.

Obviously for very busy sites, say like CNN, crawl budgets are going to be insanely high and content is likely to be crawled several times per day, perhaps key content such as the home page may even be crawled several times an hour.

But if you are not CNN then you need to consider how to use your Crawl Budget to maximum effect which is really another way of saying you need to focus Google's limited time on your core content.

A first step is to register a sitemap.xml file with Bing Webmaster Tools and Google Search Console. This is a list of pages you want Google to crawl. You can state in a sitemap.xml how important you see each of these urls but I've seen no evidence that Google or Bing pays any attention to this.

Second you can send signals to search engines of what you feel is important via your internal linking structure. If you place a link to a certain url fairly near the top of every page on your website you're telling Google you see that content as core to your site. Such content is sometimes referred to as 'Cornerstone Content'.

These repetitive links are also useful in making sure crawlers find new content before running out of crawl budget - hence why so many blocks have a 'recent posts' area.

Thirdly, and most effective, is to tell search engines what to ignore although this can require some technical knowledge.

This page is fundamentally made up of three parts - a header (where my logo and menu is), a footer (that bit at the bottom that is the same on every page) and the main content (this bit you are reading right now).

This website is built with on php code which allows me to create a page using three files - the header and footer are two files which the code calls on every page but I don't want Google to crawl those files as they are only parts of a page and make no sense on their own.

If I had a low crawl budget I could use my robots.txt file to block Google from looking at those files so it would then use its time crawling what I wanted it to crawl.

You can use your robots.txt file in a similar way to block crawlers from pages that have no value to your objectives such as your terms and conditions or privacy policy pages - it doesn't help you if Google spends too much of its limited time crawling and indexing these when you really want it to be aware of your latest content that could generate leads, customers, clients or whatever else your goals are.