One SEO professional asked John Mueller in a recent hangout about translated content on a large international site because some of their content is not indexed.
In April last year, all in one go, all of their translated content moved from valid to excluded and crawled, but currently not indexed.
Because it happened all at once, they thought it might have been due to a systemic change on their side, because they just did a massive change to their hosting platform, CMS (content management system), etc.
They looked at the code extensively, their content, and they can’t find anything that would be responsible for this.
They also looked at the Google Search release notes that look as if they would be affecting them.
Nearly as they can tell, anyway.
They have also been going through and doing practice searches in Google Search Console. They’ve cleaned up their hreflang, their canonicals, URL parameters, manual actions, and any other tool that’s listed on developers.com/search.
They are just about out of ideas. They don’t know what’s happened, or what to do next to fix the issue.
John explained that one of the things that he thinks is sometimes tricky is that the SEO pro has a parameter at the end with the language code – HLA= …?
From Google’s perspective, what can happen with this is that if they recognize that there are a lot of these parameters there which lead to the same content, then their systems can get stuck in a situation where they say “Well, this parameter is not very useful. Maybe we should just ignore it.”
To John, it seems as if something like this is actually what happened and why this site is experiencing such crawl issues.
You can actually use the URL parameter tool in Google Search Console to help with this. You want to make sure that the parameter there is actually set to index everything.
Partially, what you could also do is crawl a portion of your site with a local crawler just to see exactly what kind of parameter URL actually gets picked up.
And then double-check that these pages actually have useful content for these languages.
In particular, things that John has seen that are common issues is that there are all languages linked up.
And the Japanese version says “oh, we don’t have a Japanese version. Here’s the English one instead.” And then Google’s systems say “Well, the Japanese version is the same as the English version, maybe there are other languages the same as the English version. They should just ignore that.”
And sometimes this is from links within the website. Sometimes it’s also external links, people who are linking to your site.
If the parameter is at the end of the URL, then it’s very common that there is some kind of garbage attached to the parameter also.
And if Google crawls all of these parameters with this garbage, and then they say “Oh, well, this is not a valid language. Here’s the English version.” Then this reinforces the crawl loop that Google is in.
The cleaner approach is to have these garbage URLs redirect to the clean ones in your structure.
This happens at approximately the 53:16 mark in the video.
John Mueller Hangout Transcript
SEO Professional 9 53:16
I work in a fairly large, multilingual site. And in April last year, just all in one go, all of our translation content, or translated content moved from valid to excluded, crawled currently not indexed. And there it has stayed since April. We’ve gone through, you know, because it happened all at once, we thought maybe there was some systemic change on our side.
We did a massive change to our hosting platform, Content Management System, etc. We’ve gone through the code extensively, we can’t find anything. We can’t find any change to content, we don’t see any notes in the Google–Google Search release notes that look like they would be affecting us, as far as we can tell.
We’ve also been pretty thorough going through and just doing best practice searches with Search Console, we’ve cleaned up our hreflang, canonicals, URL params, manual actions, and every other tool that’s listed on developers.google.com/search. I’m just about out of ideas. I don’t know what’s happened, or what to do next, to try to fix the issue. But I’d really like to get our translated content back in the index.
John 54:17
I think you posted a question as well with a link to a forum thread, was that you?
SEO Professional 9 54:22
I did, with a little bit more detail as well. Okay. Yeah,
John 54:25
I took a look at that briefly before and passed some of that on to the team here as well. One of the things that that I think is sometimes tricky is you have the parameter at the end with the language code, I think hla=, right? From our point of view, what can happen is that when we recognize that there are a lot of these parameters there that lead to the same content.
Then our systems can kind of get stuck into a situation where we may say “Well, maybe this parameter is not very useful, and we should just ignore it.” And to me, it sounds a lot like something around that line happened. And partially, you can help this with the URL parameter tool in Search Console to make sure that that parameter is actually set. I do want to have everything indexed.
Partially what you could also do is maybe to crawl a portion of your website with, I don’t know, a local crawler to see what kind of parameter URLs actually get picked up. And then double check that those pages actually have useful content for those languages. In particular, things like a common one that I’ve seen on sites is, maybe you have, I don’t know, all languages linked up.
And the Japanese version says, Oh, we don’t have a Japanese version, here’s our English one instead. Then our systems could say, well, the Japanese version is the same as the English version, maybe there’s some other languages the same as the English version, we should just ignore that. And sometimes this is from links within the website.
Sometimes it’s also external links, people who are linking to your site. If the parameter is at the end of your URL, then it’s very common that there’s some kind of garbage attached to the parameter as well. And if we crawl all of those URLs with that garbage, and we say, Oh, well, this is not a valid language, here’s the English version, then it again, kind of kind of reinforces that loop, where our systems say, well, maybe this parameter is not so useful.
So the cleaner approach there would be if you have kind of garbage parameters to redirect to the cleaner ones, or to maybe even show a 404 page and say, well, we don’t we don’t know what you’re talking about, with this URL. To really cleanly make sure that whichever URLs we find, we actually get some useful content that is not the same as other content, which we’ve already seen.
SEO Professional 9 56:53
Okay. I think the URL parameter thing we’d had it set to ‘this parameter translates’ like Googlebot decided for a very long time. And we recently changed it to always crawl. And we can see now that it is always crawling, but it’s still not indexing. You also, you mentioned something earlier on in today’s session about not relying on sitemaps for content, for discovery.
I don’t know that we have a lot of great links from say, the English version of a site to another version, another language version of the same site on the page, but we use hreflang attributes in the sitemap to try to indicate that. Is that sufficient for that use case? Or do I also need to provide very easily discoverable links to translated versions?
John 57:39
Ideally, you would have a link to the translated versions. And usually, usually, what happens with sites is you don’t need a link from every page to every page. I mean, that’s kind of the best practice anyway, that you can switch between languages directly on the page. But for crawling, you don’t necessarily need that.
But rather, as soon as we find one page that is in French, from there, we can usually crawl out and see, well, here’s the rest of the site in French and pick all of that up. So that’s something where sometimes it’s enough to have, I don’t know, in the footer of the home page, like this site is also available in these different languages. And you just directly link to those individual languages, kind of the homepage, and based on that we can actually discover the rest of the site.