One SEO professional asked John Mueller during a hangout about the crawling and indexing issue they were experiencing.
Their main question was: what is a reason why certain pages aren’t indexed, even though they may get crawled more than once?
John explained that it can happen. He would also assume that it’s not happening that frequently because generally, if Google can crawl it, they could potentially index it. But it can happen that Google crawls a page and in the end, they decide “Oh, actually we don’t need to index it.”
One common situation where it can happen though is if there’s an error code on the page, since Google has to crawl the page first. Then they can see the error code.
If there’s a noindex tag on the page, Google also has to crawl it first. And then they see the noindex. If the page is a complete duplicate of something else Google has already seen, then they crawl it, they see it’s a duplicate, and then they focus on the primary page again.
So these are the situations where Google would crawl something but perhaps not index it. By the time they get to indexing, Google may decide “Oh well, actually they want to get something else from the website instead.”
The SEO pro also asked John about site quality and the role this would play in that decision.
John said that usually, if Google is not convinced about the site’s quality, then they would likely also not crawl the page in the first place.
John continued, pointing out that in Search Console, pretty much for every site you would likely have the grouping of discovered but not indexed, as well as crawled but not indexed. He believes that it’s pretty common across websites in general.
This happens at approximately the 14:10 mark in the video.
John Mueller Hangout Transcript
Okay, okay. I see. So, one more question is about crawling. Yeah. What’s the possible reason that maybe when certain pages didn’t get indexed, even though they were, like crawled multiple times?
It can happen. I would assume it’s not that frequent, because usually when we decide to crawl something, we’re also pretty happy to go off and index it. But it can happen that we crawl a page and then in the end, decide, oh, actually, we don’t need to index it.
One common, I guess, like some common situations where that can happen, which perhaps don’t–don’t apply in your case is if there’s an error code on the page, we have to crawl it first. And then we see the error code.
If there’s no index on the page, we also have to crawl it first. And then we see the no index. If the page is a complete duplicate of something else that we’ve already seen, then we crawl it, we see it’s a duplicate, but we focus on the primary page again.
So those are kind of the normal situations where we would crawl something and not index it. But it can also happen that we crawl something, and then by the time we get to indexing, we decide, oh, well, actually, we want to get something else from the website instead.
SEO Professional 3 15:40
So for this one gets something else, does it have some, like, example, example factors, like what others other factors that may cause the Googlebot to like decide, oh, we don’t want to index it?
I don’t know offhand. I think the overall website quality definitely plays a role there. But usually, if we’re not convinced about the website quality, then we would probably not also crawl the page in the first place. So that’s I think- kind of a tricky situation.
And if you look in Search Console, I think pretty much for every site, you will have the grouping of discovered but not indexed and also crawled and not indexed. That’s, I think, just pretty common across sites.
SEO Professional 3 16:41
So the reason why I will ask this is we want to know, like, except page quality and some technical SEO factors, like you just mentioned, mentioned above, like some in no index, meta tag, like something like, other than that, is there any other factors that algorithm may use to determine? Yeah, whether these pages were worth indexing or not?
I don’t think we have anything specific otherwise documented. And I also think it’s important to not over focus on that specific page. So if, if you’re sure that from a technical point of view, everything was okay.
I wouldn’t assume that the quality of that specific page is a problem, but rather kind of the perceived quality of that part of the website or the whole website itself.
That’s kind of the place where I would try to see what you can do to improve, not not just that individual page that didn’t get indexed, but kind of like, what is the bigger picture around that page?
SEO Professional 3 17:57
Okay, so like the overall quality, overall site quality?