In the John Mueller hangout on 08/13/2021, a webmaster was concerned regarding a number of 500 errors showing up in Google Search Console.
John explained that there aren’t any real thresholds on this. He also explained that Google will retry these 5xx errors over and over again and they will remain indexed.
After several tries, if Google determines that these pages really are still gone, they will drop them.
These are the effects you will see for 500 errors. If there’s more than a 1 percent issue with these errors—compared to the number of pages on your site—you will want to examine your site for major issues just in case something is broken.
The webmaster was also concerned about 504 errors, and whether or not having a 200 error read at exactly the same time—on that same URL as the 500 error—is a problem.
The webmaster asks this question at approximately the 13:46 mark in the hangout.
John Mueller Hangout 08/13/2021 Transcript
Cool. Thanks, John. So some background, our systems, and content delivery network are designed to let all real users view our content and filter out some bots while letting others like Googlebot through. And just for some additional context. Earlier this year, we changed our server monitoring suite. And we thought we’d carried over all of the reporting needs. We had to ensure that, you know, Googlebot could access our content.
Unfortunately, it seems we’d missed some and noticed in Search Console, some 500 series errors cropping up starting late last month. The question came up from our technology team, whether this represented real user impact and why we would look specifically at Googlebot, not real user metrics to prove that there’s an issue here. So given that context, I have a few questions. The first is just to get the technology concern out of the way. From your perspective, how does Googlebot view 500 series errors? And could you give any clarity on, you know, established thresholds at which point Googlebot Will, you know, crawl source content based—less based on those errors?
John 14:58
We don’t have any strong thresholds on that. But essentially, what happens with 500 errors is we’ll try to retry them. And if we continue to see that they’re 500 errors, then we will kind of slow down crawling. And if we continue to see that they they’re 500 errors, then we will drop those URLs from the index. So that’s something where if every now and then individual pages have a 500 error is like, no big deal, like we will retry them, they’ll remain indexed.
And the next time we retry them that time. But if a large part of a site consistently has 500 errors, and we might assume that maybe we’re causing the problem, and we’ll slow down crawling of the whole site, and at some point, we’ll say, “Well, it looks like these pages are really gone, we’re going to drop them.” So that’s essentially the effects that you would see there.
And if you’re talking about a large site, and wondering like, what percentage of 500 errors is okay? I don’t know, my, my feeling is, if you’re seeing something more than 1 percent, then that’s, that sounds like something is, is kind of broken, and probably would be something where we would start to slow down. But I don’t think we have any hard thresholds where we’d say like this many requests, and this many errors means this much slowing down.
Webmaster 4 16:28
Okay, cool. Thank you. Also, our server logs show a 200 given to Googlebot at the same timestamp as the 504 in Search Console, and our content delivery network is telling us that if Googlebot gets a 504 from a CDN, then they’ll automatically try to fetch from the origin. Could you confirm, deny or possibly confuse that for me?
John 16:54
I don’t think we do anything special with regards to 504. But I kind of need to, I’d have to double-check. So in the search developer documentation, we just put up a page with all of the HTTP status codes and how we react to that. I don’t think we have anything where we would say we would defer to the origin other than a CDN because we don’t actually see that difference. Because from our point of view, we access the domain name. And if the domain name resolves to the CDN, then that’s, that’s what we get. We’re not going to say, “Well, we will take a different IP address and then try again.”
Webmaster 4 17:39
Okay, cool. That actually will help further that discussion with our CDN. So thank you. Final question. If I manage a website for a tea shop will setting all of my pages to a 418? You know, help me rank better? That’s probably not. Right.
John 17:56
I actually don’t know what we would do with that. We probably dropped them as 400 errors. So that seems like a bad idea. Sorry. I just loved the 418 status code. It’s just so fun.