During a hangout, one SEO Professional asked John Mueller during the submitted Question and Answer segment about why Google indexes parameter URLs.
The question was: why do parameter URLs end up in Google index even though they have excluded them from crawling with the robots.txt file, and with the parameter settings in Google Search Console?
How do they get parameter URLs out of the index again, without endangering the canonical URLs?
John explained that there’s likely a general assumption that parameter URLs are bad for a website.
This is not the case. So, it’s definitely not the case that you need to fix the indexed URLs of your site to get rid of all parameter URLs.
From this perspective, John would see it as something where you’re basically polishing the site a bit to make it better. But, it’s not something where it’s critical.
Regarding the robots.txt file, and the parameter handling tool, usually the parameter handling tool is the place where you can do these things.
John thinks that the parameter handling tool is a bit hard to find and harder to use by most people.
So personally, he would try to avoid that.
Instead, use the more scalable approach and the robots.txt file. But, you’re welcome to use it in Google Search Console.
With the robots.txt file, you’re preventing crawling of these URLs, you’re not preventing indexing of these URLs. And this means that if you do something like a site: query for those URLs, it’s highly likely that you will still find these URLs in the index, even without the content itself being indexed.
This happens at approximately the 30:31 mark in the video.