During a hangout submitted Question and Answer segment, an SEO professional asked John Mueller about robots.txt files.
Their question was: To what degree does Google honor the robots.txt file? They are working on a new version of their site that’s currently blocked via robots.txt.
They intend to use robots.txt to block indexing of some URLs that are important for usability, but not for search engines.
So they want to understand that doing this is okay.
John replied that this is perfectly fine.
When Google recognizes disallow entries in a robots.txt file, they will absolutely follow them. The only kind of situation John has seen where this did not work is where they were not able to process the robots.txt file properly.
But, if Google can process the robots.txt file properly, if it’s properly formatted, then Google will absolutely stick to that when it comes to crawling.
Another caveat there is usually they update the robots.txt files, perhaps once a day, depending on the website.
If they change their robots.txt file now, it may take around a day or so before it takes effect regarding blocking or crawling web pages.
He said that the SEO pro mentioned blocking and indexing, but essentially the robots.txt file would block crawling. So if you blocked the crawling of pages that are important for usability, but not for search engines, usually this is just fine.
What could happen is that Google could index the URL without the content. So if you do a site query for those specific URLs, then you would still see it.
If, however, the content is on your crawlable pages, then for any normal query that people do when they search for a term, Google will be able to focus on the pages that are actually indexed and crawled,and they will show these in the search results.
From that perspective, that’s all just fine.
This happens at approximately the 42:34 mark in the video.
John Mueller Hangout Transcript
John (Submitted Question) 42:34
To what degree does Google honor the robots.txt? I’m working on a new version of my website that’s currently blocked with a robots.txt file. And I intend to use robots.txt to block indexing of some URLs that are important for usability, but not for search engines. So I want to understand that this is okay.
John (Answer) 42:52
That’s perfectly fine. So when we recognize disallow entries in a robots.txt file, we will absolutely follow them. The only kind of situation I’ve seen where that did not work is where we were not able to process the robots.txt file properly. But if we can process the robots.txt file properly, if it’s properly formatted, then we will absolutely stick to that when it comes to crawling. Another caveat there is usually we update the robots.txt files, maybe once a day, depending on the website.
So if you change your robots.txt file now, it might take a day until it takes effect. With regards to blocking, crawling, so you mentioned blocking, indexing, but essentially, robots. txt file would block crawling. So if you blocked crawling of pages that are important for usability, but not for search engines, usually, that’s fine. What would happen or could happen is that we would index the URL without the content.
So if you do a site query for those specific URLs, you would still see it. But if the content is on your crawlable pages, then for any normal query that people do, when they search for a specific term on your pages, we will be able to focus on the pages that are actually indexed and crawled, and show those in the search results. So from that point of view, that’s all fine.