During a hangout submitted Question and Answer segment, an SEO professional asked John Mueller about robots.txt files.
Their question was: To what degree does Google honor the robots.txt file? They are working on a new version of their site that’s currently blocked via robots.txt.
They intend to use robots.txt to block indexing of some URLs that are important for usability, but not for search engines.
So they want to understand that doing this is okay.
John replied that this is perfectly fine.
When Google recognizes disallow entries in a robots.txt file, they will absolutely follow them. The only kind of situation John has seen where this did not work is where they were not able to process the robots.txt file properly.
But, if Google can process the robots.txt file properly, if it’s properly formatted, then Google will absolutely stick to that when it comes to crawling.
Another caveat there is usually they update the robots.txt files, perhaps once a day, depending on the website.
If they change their robots.txt file now, it may take around a day or so before it takes effect regarding blocking or crawling web pages.
He said that the SEO pro mentioned blocking and indexing, but essentially the robots.txt file would block crawling. So if you blocked the crawling of pages that are important for usability, but not for search engines, usually this is just fine.
What could happen is that Google could index the URL without the content. So if you do a site query for those specific URLs, then you would still see it.
If, however, the content is on your crawlable pages, then for any normal query that people do when they search for a term, Google will be able to focus on the pages that are actually indexed and crawled,and they will show these in the search results.
From that perspective, that’s all just fine.
This happens at approximately the 42:34 mark in the video.