One SEO professional asked John Mueller in a recent hangout about translated content on a large international site because some of their content is not indexed.
In April last year, all in one go, all of their translated content moved from valid to excluded and crawled, but currently not indexed.
Because it happened all at once, they thought it might have been due to a systemic change on their side, because they just did a massive change to their hosting platform, CMS (content management system), etc.
They looked at the code extensively, their content, and they can’t find anything that would be responsible for this.
They also looked at the Google Search release notes that look as if they would be affecting them.
Nearly as they can tell, anyway.
They have also been going through and doing practice searches in Google Search Console. They’ve cleaned up their hreflang, their canonicals, URL parameters, manual actions, and any other tool that’s listed on developers.com/search.
They are just about out of ideas. They don’t know what’s happened, or what to do next to fix the issue.
John explained that one of the things that he thinks is sometimes tricky is that the SEO pro has a parameter at the end with the language code – HLA= …?
From Google’s perspective, what can happen with this is that if they recognize that there are a lot of these parameters there which lead to the same content, then their systems can get stuck in a situation where they say “Well, this parameter is not very useful. Maybe we should just ignore it.”
To John, it seems as if something like this is actually what happened and why this site is experiencing such crawl issues.
You can actually use the URL parameter tool in Google Search Console to help with this. You want to make sure that the parameter there is actually set to index everything.
Partially, what you could also do is crawl a portion of your site with a local crawler just to see exactly what kind of parameter URL actually gets picked up.
And then double-check that these pages actually have useful content for these languages.
In particular, things that John has seen that are common issues is that there are all languages linked up.
And the Japanese version says “oh, we don’t have a Japanese version. Here’s the English one instead.” And then Google’s systems say “Well, the Japanese version is the same as the English version, maybe there are other languages the same as the English version. They should just ignore that.”
And sometimes this is from links within the website. Sometimes it’s also external links, people who are linking to your site.
If the parameter is at the end of the URL, then it’s very common that there is some kind of garbage attached to the parameter also.
And if Google crawls all of these parameters with this garbage, and then they say “Oh, well, this is not a valid language. Here’s the English version.” Then this reinforces the crawl loop that Google is in.
The cleaner approach is to have these garbage URLs redirect to the clean ones in your structure.
This happens at approximately the 53:16 mark in the video.