One SEO professional asked John Mueller during a hangout whether they should be blogging with a robots.txt file, or using the meta robots tag on the page?
How can they best prevent crawling?
John explained that this comes up from time to time, and they did do a podcast episode a while back on this topic.
In practice, there is a subtle difference here, and if you’re an SEO and you have worked with search engines you probably understand that already.
But, for people who are new, it’s sometimes unclear exactly where all the lines fall.
With robots.txt, which is the first one that they mentioned in the question, you can block crawling, so you can prevent Googlebot from even looking at your pages.
With the robots meta tag, when Googlebot looks at your pages and sees the robots meta tag, you can do things like blocking and indexing.
In practice, both of these actions can result in your pages not appearing in the search results.
But, they are subtly different.
So if Google can’t crawl the page, then they don’t know what they are missing. And it could be that they say well, actually, there’s a lot of references to this page, maybe it’s useful for something, they don’t know.
And then this URL could appear in the search results without any of its content, because Google can’t look at it.
Whereas with the robots meta tag, if they can look at the page, then they can look at the meta tag and see if there’s a noindex there, for example, then they stop indexing that page, and then they drop it completely from the search results.
If you’re trying to block crawling, then robots.txt is the way to go. If you don’t want the page to appear in the search results, then John would pick whatever one is easier for you to implement.
One some sites, it’s easier to set a checkbox saying that you don’t want this page in search, and then it adds the noindex meta tag on others.
Or maybe editing the robots.txt file is easier for you. It depends on what you have.
This happens at approximately the 14:20 mark in the video.