One SEO professional asked John Mueller during a hangout about indexing images, and robots.txt requirements for doing so.
So this SEO pro runs a recipe website. In terms of rich recipe results, they have hundreds of thousands of recipes that are indexed.
They have lots of traffic that is coming from the recipe gallery. Then, suddenly, over a period of time, it stopped.
All of the metadata checked out. Google Search Console also said that, “Yep. This is all rich recipe content. It’s all good. It can be shown.”
They finally noticed in the preview, when you preview a result, that the image was missing. And it appears to be due to a change at Google.
The change appeared to be that if a robots.txt file was required in order for images to be retrieved, then nothing they could see in the tools was actually saying anything was invalid.
And so, it’s a bit awkward, right?
When you check something to say – “Is this a valid recipe result?” And it says “Yeah.” It’s great, it’s absolutely great.
However, it turns out that behind the scenes, there was a new requirement that you have to have a robots.txt file.
John asked the SEO pro what they mean that they have to have a robots.txt file.
The SEO pro stated that what they found is, if they request the robots.txt from their CDN, it will give them a 500 error.
When they put a robots.txt there, immediately the previews started appearing correctly, and this involves crawling and putting on the static site. Operationally, they found that adding the robots.txt file is what did the job in terms of repairing the indexing issues.
John explained that from Google’s perspective, it’s not that a robots.txt file is required, but a robots.txt file has to have the proper result code.
So if you don’t have one, it should return a 404 error.
If you do have one, then Google can obviously read that.
But, if you return a server error for the robots.txt file, then their systems will assume that perhaps there’s an issue with a server and they won’t crawl.
And that’s something that’s been like that since the beginning.
However, these types of issues where, especially when you’re on a CDN, and it’s on a separate hostname, sometimes that’s really hard to spot.
And John imagines the rich results test, as far as he knows, focuses on the content that’s on the HTML page.
So the JSON-LD markup that they have, it probably doesn’t check to see if the images are actually fetchable on the server. If they can’t be fetched, then of course they cannot use them on the carousel.
So, this is something that Google may need to figure out better.
This happens at approximately the 51:45 mark in the video.