One webmaster asked John during the 10/1/2021 hangout if Google recommends a way to clean up indexable internal search pages.
John explained that the indexation of internal search pages makes crawling a lot harder.
Google’s preference is that internal search pages are all noindexed. All internal search pages should be noindexed.
Also, it creates a security risk if anyone can just go off and create a million new pages on your website by linking to random URLs and words that they create through internal search.
This conversation happens at approximately the 52:28 mark in the video.
John Mueller Hangout Transcript
Webmaster 10 52:28
Hey, John, good morning. Yes, I had a question in regards to internal search pages.
So we’re allowing indexation of onsite searches. So sometimes someone does a search on our site, we create a page for that. And now that’s going to add a bit of control.
So we have hundreds of millions of these pages. So how would you recommend we sort that out and if there is actually any benefits to cleaning that up, or if we shouldn’t worry about it?
John 52:57
I think, for the most part, it does make sense to clean that up because it makes crawling a lot harder. So that’s kind of the I don’t know, the direction I would look at there is to think about which pages you actually don’t want to have crawled and indexed and to help our systems to focus on that.
Not so much that like, you should get rid of all internal search pages, some of these might be perfectly fine to showing search. But really try to avoid the situation where anyone can just go off and create a million new pages on your website by linking to random URLs or words that you might have on your pages.
So to kind of take it and say, well, you have control of which pages you want to have crawled and indexed, rather than, like whatever randomly happens on the internet.
Webmaster 10 53:54
And those pages then that, say we, so we control this, and the pages then aren’t any longer links, they’re kind of these orphan URLs, but they’re still going to be crawled and probably indexed. What do we do with them, do we [inaudible], do we noindex?
John 54:14
I mean, if they’re still usable by users, I would keep them accessible, but I might add a noindex to them.
So one thing you could do is create a list of search queries where you think we have reasonable content. And just double-check that list. And if it’s on the list, then like, let it be indexed, if it’s not on the list, then add a noindex to those pages, something along those lines.
And finding out which of these pages are actually reasonable is sometimes a bit tricky because it’s easy to take simple metrics and just say oh, I will just take the top ten percent pages from traffic.
But I don’t know if those are actually the good pages. Like if you would create category pages for those, are those the ones that you would pick are not. So…
Webmaster 10 55:06
And it’s continually changing too. You might get pages that suddenly have demand and then don’t have demand.
So it’s a tricky thing to manage.
Is there any kind of benefits, then do you think, longer-term so if we do this clean up, obviously the business is, would argue, you know, we’re going to do this work is investment from our side, what is the return on this is SEO benefits of that?
John 55:32
So on the one hand, you have the crawling side, which will be affected. So if we go off and don’t have to crawl all over those pages, we can focus more on the actual content you do care about.
And especially with an e-commerce site, you have things like price changes all the time. And if we can’t pick up the price changes quickly enough, then that’s kind of a bad thing. Whereas if we can concentrate our crawling more on the content that you do want us to care about, then that helps to kind of like improve those pages.
So it’s not so much that there’s an immediate ranking boost for your website. Overall, if you take care of that, it’s more well you’re focusing the energy a little bit better. The other aspect that sometimes plays a role is that these pages might be seen as being low-quality pages, or almost like doorway pages, in that you just have different variations of the same keywords over and over again.
And that is something that could pull down the overall quality of a website. But it really depends on the specific situation. And it’s not the case that you can say, “Well, I clean up my internal search results, and then Google will think my website is suddenly significantly higher quality.”
It can be like a small change. And sometimes a small change is relatively large for a website. But I wouldn’t rely on that aspect. I would really focus more on the crawling aspect.