John Mueller had a question asked by an SEO professional during a Question and Answer segment in a hangout, about adding all the meta tags to a page, even though the pages are blocked by robots.txt.
Their main question is: should they add a noindex tag to a page even though the page is blocked by robots.txt, and the page has a canonical as well?
John answers: probably not. He explained that, if the URL is blocked by robots.txt, Google will not see any of the meta tags on the page.
Google won’t see the rel=canonical tag on the page, because if it’s blocked by robots.txt, then Google won’t crawl that page at all.
If you want Google to take into consideration the rel=canonical or noindex that you add to a page, you have to make sure that Google can crawl the page itself.
The other aspect here is that often, these pages may get indexed, if they’re blocked by robots.txt, but they’re indexed without any of the content, because Google can’t crawl it.
And usually this means that these pages don’t show up in the search results anyway. So if someone is searching for some kind of product that you sell on your site, then they’re not going to dig and see if there’s also a page that is blocked by robots.txt, which would be relevant because they already have really good pages from the website that they can index normally and can show.
On the other hand, if one does a site:query for that specific URL, then perhaps you will still see that URL in the search results, but without any content.
This happens at approximately the 44:23 mark in the video.