One SEO professional was concerned about their crawl rate, and asked John Mueller during a hangout about it.
During the last few weeks, an SEO professional noticed a large drop in their crawl stats from 700 to approximately 50 per day.
Is there a way to understand from the Search Console report what could be the cause of this drop?
Could it be source page load? How can they correctly read the crawl request breakdown?
John explained that on their side, there are a few things that go into the amount of crawling that Google does.
On one hand, they try to figure out how much they need to crawl from a website to keep things fresh and useful in their search results.
And that relies on understanding the quality of their site and how things change on that site. Google calls it the crawl demand.
On the other hand, there are also limitations that they see from the server, from the site, and from the network infrastructure. And this applies to how much Google can crawl on a site.
Therefore, they try to strike a balance between those two.
The restrictions tend to be tied to two main things. On the one hand, it’s response time to requests to the website. And on the other – the number of errors, specifically server errors that Google sees during crawling.
So if they see a lot of server errors, they will slow down the crawl, because they don’t want to cause any more problems.
If they see that the server is getting slower, then they will also slow down crawling. Because again, they don’t want to cause any problems with crawling.
So these are the two main things that come into play.
The difficulty, John believes, with the speed aspect is that they have two different ways of looking at speed.
These can be confusing when you look at crawl rate.
And then there comes things like Core Web Vitals, and how quickly the page loads in the browser.
But, the speed at which the browser loads the URL is not directly related to the speed that it takes for Google to fetch an individual URL on their site.
John also discussed several other interesting points, and explained further that any time you make a change to your website’s hosting, you must anticipate that the crawl rate is going to drop as a result.
This happens at approximately the 35:09 mark in the video.
John Mueller Hangout Transcript
John (Question) 35:09
During the last few weeks, I’ve noticed a huge drop in crawl stats from 700 to 50 per day. Is there a way to understand from the Search Console report what could be the cause of this drop? Could it be source page load? How can I correct–correctly read the crawl request breakdown?
John (Answer) 35:27
So on our side, there are a few things that go into the amount of crawling that we do. On the one hand, we try to figure out how much we need to crawl from a website to keep things fresh and useful in our search results. And that kind of relies on understanding the quality of your website, how things change on your website, we call that the crawl demand.
And on the other hand, there’s kind of the limitations that we see from your server, from your website, from your network infrastructure. With regards to how much we can crawl on a website. And we try to balance those two. And the restrictions tend to be tied to two main things. On the one hand, the overall response time to requests to the website. And on the other hand, the number of errors, specifically server errors, that we see during crawling.
So if we see a lot of server errors, we will slow down crawl, because we don’t want to cause more problems. If we see that your server is getting slower, then we will also slow down crawling, because again, we don’t want to cause any problems with the crawling. So those are kind of the two main things that come into play there. The difficulty, I think, with the speed aspect is that we have two essentially different ways of looking at speed.
And sometimes that gets confusing when you look at the crawl rate. So specifically for the crawl rate, we just look at how quickly can we request a URL from your server. And the other aspect, that of speed that you probably run into is everything around Core Web Vitals, and how quickly a page loads in a browser. And the speed that it takes in a browser tends not to be related directly to the speed that it takes for us to fetch an individual URL on a website.
Because in the browser, you have to process the JavaScript, pull in all of these external files, render the content, and recalculate the positions of all of the elements on the page. And that takes a different amount of time than just fetching that URL. So that’s kind of one thing to watch out for. If you’re trying to diagnose a change in crawl rate, then don’t look at how long it takes for a page to render. And instead look at just purely how long it takes to fetch that URL from the server.
The other thing that comes in here, as well, is that from time to time, we are well, depending on what you do, we try to understand where the website is actually hosted. In the sense that if we recognize that a website is changing hosting from one server to a different server–that could be to a different hosting provider, that could be moving to a CDN or changing CDNs or anything like that–then our systems will automatically go back to some safe rate where we know that we’re not going to cause any problems.
And then step by step increase again. So anytime you make a bigger change on your website’s hosting, then I would assume that the crawl rate will drop. And then over the next couple of weeks, it’ll go back up to whatever we think we can safely crawl on our website. And that might be something that you’re seeing here. The other thing is that from time to time our algorithms determine how we kind of classify websites and servers, they can update as well.
So it can certainly happen that at some point, even if you don’t change anything with your hosting infrastructure, that our algorithms will try to figure out well, oh, actually, this website is hosted on this server. And this server is one that is frequently overloaded. So we should be more cautious with crawling this website, so that we don’t cause any problems. And that’s something that also settles down automatically over time, usually over a couple of weeks. And if that were the case, then probably things will settle down and kind of get back into a reasonable state again.
The other thing that you can do in Search Console, you can specify a crawl rate. I believe it’s in the setting per site. And that helps us to understand that you have specific settings, specifically for your website, and we’ll try to take that into account. The difficulty with the crawl rate setting is that it’s a maximum setting. It’s not a sign that we should crawl as much as that, but rather that we should crawl at most that what you specify there.
So usually, that setting is more useful for times when you need to reduce the amount of crawling, not when you want to increase the amount of crawling. And finally, it’s like so many things that come into this. Finally, one thing that you can also do is, in the Help Center for Search Console, we have a link to reporting problems with Googlebot.
And if you notice that the crawling of your website is way out of range for what you would expect it to be, then you can report problems with Googlebot through that link. What you need to do there is specify some of the IP addresses of Googlebot when it tries to crawl your page and give some information on the type of issue that you’re seeing.
And that could be that Googlebot is crawling a lot less than it could. Or it could be that Googlebot is crawling way too much, or Googlebot is crawling all of these totally irrelevant URLs and they make totally no sense. And all of these reports go to the engineering team that works on Googlebot. And they tend to go through those and try to figure out what they need to tweak on their side to improve the crawl.
For the most part, you won’t get a reply to the request that you send there. But they all get read by the Googlebot team. And they do try to either figure out, do we need to do something specific for this one site? Or is this perhaps something that we can improve for these systems overall?