One SEO professional asked John Mueller during a hangout about the crawling and indexing issue they were experiencing.
Their main question was: what is a reason why certain pages aren’t indexed, even though they may get crawled more than once?
John explained that it can happen. He would also assume that it’s not happening that frequently because generally, if Google can crawl it, they could potentially index it. But it can happen that Google crawls a page and in the end, they decide “Oh, actually we don’t need to index it.”
One common situation where it can happen though is if there’s an error code on the page, since Google has to crawl the page first. Then they can see the error code.
If there’s a noindex tag on the page, Google also has to crawl it first. And then they see the noindex. If the page is a complete duplicate of something else Google has already seen, then they crawl it, they see it’s a duplicate, and then they focus on the primary page again.
So these are the situations where Google would crawl something but perhaps not index it. By the time they get to indexing, Google may decide “Oh well, actually they want to get something else from the website instead.”
The SEO pro also asked John about site quality and the role this would play in that decision.
John said that usually, if Google is not convinced about the site’s quality, then they would likely also not crawl the page in the first place.
John continued, pointing out that in Search Console, pretty much for every site you would likely have the grouping of discovered but not indexed, as well as crawled but not indexed. He believes that it’s pretty common across websites in general.
This happens at approximately the 14:10 mark in the video.