An SEO professional asked John Mueller during a hangout about their robots.txt file. They were concerned about disallowed pages getting organic traffic.
They explained that they have disallowed some of the pages, but it is possible that Google had indexed these pages in the past. Although they disallowed crawling, to this day, they still see these pages getting organic sessions.
Why is this happening? And how can they fix that?
They asked about the noindex directive. But is this the right way to go about it?
John explained that if these are pages that you don’t want indexed, then using noindex is better than using the disallow in robots.txt.
The noindex would be a meta tag on the page. And you would need to allow crawling via robots.txt for noindex to physically happen.
This happens at approximately the 16:41 mark in the video.
John Mueller Hangout Transcript
SEO Professional 4 16:41
Got it. And what you’re saying is the probable–like, the reason for this is that your data is getting limited for the product description pages, pages. So the limits are present. Okay. And the next question is, in my robots.txt file, what I’ve done is I have disallowed some of the pages, certain pages that have been disallowed, but it is quite possible that Google had probably in the past indexed those pages. And when I have blocked them, disallowed crawling, and today to this date, I see them like getting organic sessions. Why is that happening and how can I fix that? I read there is something called noindex directive. But is that the right way to go about it? Or should I pursue with this…?
If these are pages that you don’t want to have indexed, then using noindex would be better than using the disallow in robots.txt. The no index would be a meta tag on the page, though. So it’s a robots meta tag with no index. And you would need to allow crawling for that to happen.
SEO Professional 4 17:49
Got it, good. So I have to allow that in the robots. txt, but I would have to add the noindex meta tag on these URLs.