One SEO professional asked John Mueller during a hangout whether they should be blogging with a robots.txt file, or using the meta robots tag on the page?
How can they best prevent crawling?
John explained that this comes up from time to time, and they did do a podcast episode a while back on this topic.
In practice, there is a subtle difference here, and if you’re an SEO and you have worked with search engines you probably understand that already.
But, for people who are new, it’s sometimes unclear exactly where all the lines fall.
With robots.txt, which is the first one that they mentioned in the question, you can block crawling, so you can prevent Googlebot from even looking at your pages.
With the robots meta tag, when Googlebot looks at your pages and sees the robots meta tag, you can do things like blocking and indexing.
In practice, both of these actions can result in your pages not appearing in the search results.
But, they are subtly different.
So if Google can’t crawl the page, then they don’t know what they are missing. And it could be that they say well, actually, there’s a lot of references to this page, maybe it’s useful for something, they don’t know.
And then this URL could appear in the search results without any of its content, because Google can’t look at it.
Whereas with the robots meta tag, if they can look at the page, then they can look at the meta tag and see if there’s a noindex there, for example, then they stop indexing that page, and then they drop it completely from the search results.
If you’re trying to block crawling, then robots.txt is the way to go. If you don’t want the page to appear in the search results, then John would pick whatever one is easier for you to implement.
One some sites, it’s easier to set a checkbox saying that you don’t want this page in search, and then it adds the noindex meta tag on others.
Or maybe editing the robots.txt file is easier for you. It depends on what you have.
This happens at approximately the 14:20 mark in the video.
John Mueller Hangout Transcript
John (Question)
Okay, let’s see. Simplifying a question a little bit: which is better, blogging with robots.txt, or using the robots meta tag on the page? How do we best prevent crawling?
John (Answer)
So this also comes up from time to time. We actually did a podcast episode recently about this as well. So I would check that out. The podcasts are also on the YouTube channel. So you can click around a little bit and you’ll probably find that quickly. In practice, there’s a subtle difference here, where if you’re an SEO and you’ve worked with search engines and probably understand that already, but for people who are kind of new to the area, it’s sometimes unclear exactly where all of these lines are.
And with robots.txt, which is the first one that you mentioned in the question, essentially, you can block crawling, so you can prevent Googlebot from even looking at your pages. And with the robots meta tag, when Googlebot looks at your pages and sees that robots meta tag, you can do things like blocking indexing. In practice both of these kind of results in your pages not appearing in the search results. But they’re subtly different. So if we can’t crawl, then we don’t know what we’re missing.
And it might be that we say, well, actually, there’s a lot of references to this page, maybe it is useful for something, we just don’t know. And then that URL could appear in the search results without any of its content, because we can’t look at it. Whereas with the robots meta tag, if we can look at the page, then we can look at the meta tag and see if there’s like noindex there, for example, then we stop indexing that page, and then we drop it completely from the search results. So if you’re trying to block crawling, then definitely robots.txt is the way to go. If you just don’t want the page to appear in the search results, then I would like pick whichever one is easier for you to implement. On some sites, it’s easier to kind of set a checkbox saying that I don’t want this page found in search. And then it adds the noindex meta tag. On others, maybe editing the robots.txt file is easier, kind of depends on what you have there. All right.