One SEO professional was worried that they were on the wrong track. They started with creating a website that had around 350,000 URLs, and were worried about how Google may crawl it.
Creating a website this large, they ran into issues. Only around half of their URLs are getting indexed.
While everything was indexed at first, when you go to the excluded list of URLs in Google Search Console, this would show that half of the URLs would be excluded, to the tune of 160,000.
They were not 100 percent sure why this was happening. So they wanted to ask John: could this be because of the language being in English, or could it be because of the different URL structures?
John explained that, with a website of that size, Google is unable to handle that volume of URLs.
With 70 different country and language versions on top of each other, combined with the existing large quantities of URLs, this is shooting yourself in the foot from a technical perspective.
By submitting so much content that’s multiplied by 70, this means that Google has to start somewhere by indexing some things and ignoring other things.
He recommends, in these situations, to begin creating your site with a smaller number of pages, rather than all of the English language versions as well.
Then, once everything is confirmed to be correct, you can move forward with the international implementation in small sections at a time.
If you go too large, then Google may end up not crawling or indexing the site at all because of its tremendous volume.
And this just makes the entire process of crawling, indexing, and ranking all that much harder.
If you’re not already a well-known site with those large page counts, then doing something like this is going to lead to a very different result where you may not get everything crawled at all.
This happens at approximately the 15:49 mark in the video.
Looking for a new way to improve your SEO audits? Our Ultimate SEO Audit Template could be right up your alley!
John Mueller Hangout Transcript
SEO Professional 5 15:49
Hi, John. This is the first time actually I’m attending this session, it’s regarding our ecommerce website. This year only we have launched one ecommerce website and for 70 plus countries, we have created hreflang, and our URL structure for different countries we have created like we have established UAE, so in UAE we have like en_a. And when it comes to the U.S., our website structure is xyz.com/en_us, like that we have created for 70 plus countries, and the language is in English.
And for all the regions our content is same except for some regions we have mentioned a few cities, like UAE we have mentioned cities like Arab Emirates, like Dubai Sharjah, and when it comes to US, but overall the content is the same. And we have more than 800,000 products on our website. And this year only we have launched. And for other countries also even though language is not in English.
In English, we have created locally like bg_bg, like for Bulgaria and all. And initially we have submitted all the sitemaps for all these regions as well. And first, we have experienced that Google started crawling your website, and almost 350,000 URLs got indexed. Then we have started experiencing like the indexing of the URLs.
So we first thought it can be the speed of their website, and then we have increased the speed. And then again, we started experiencing “Okay, the URL got indexed. Latest is that again, it stopped.” Then what we did was we removed all the submitted URLs. And then we have the URL structure like xyz.com/en. Then we submitted the URL.
Now we are experiencing the indexing of the URL, and when we submitted the website structure like /en, we have experienced that all the indexed URLs are going to the excluded list. And we don’t know what is the exact reason behind that? But what can be the reason? Is it because the language is in English, and we have created different URL structure? We don’t know exactly. Now the indexed URL is showing only 160,000.
John 18:11
I don’t know your website. So it’s really hard to say but I can say from what you’ve mentioned, it sounds like you have a lot of URLs and you have those 70 different country and language versions on top of that. So everything is multiplied by like 70. From my kind of offhand assumption is that this is just too much. So it’s not that from a technical point of view that we can’t handle that.
But essentially, you’re submitting so much content, and you have everything multiplied by 70, essentially. And that means for us that we like we start somewhere and we start indexing some things, but there’s almost no chance for us to get through to everything. So usually, what I recommend in these kinds of situations is to start off with a very small number of different country and language versions, and make sure that they’re working well and then expand incrementally from there.
And in particular, if you have a lot of English variations, then try to make sure that you just have maybe one English landing…or English site, rather than like all of the different English country versions, which are essentially the same thing. Because if you have fewer versions of the English content, it’s a lot easier for us to focus on that for indexing and to kind of treat that a little bit better when it comes to ranking overall.
So that’s kind of my general rule. So that’s kind of the thing. With regards to international versions, it’s very easy to take a website and say, I will just make all English Language Countries and create, like 100 different versions of a website. But it just causes so many problems and makes everything and the whole crawling and indexing and ranking cycle makes everything so much harder. So that’s, that would be kind of the direction I would head is just start off a lot smaller and then build out from there.
SEO Professional 5 20:36
So that means Google is considering us a duplicate?
John 20:40
Sure, I mean, if these English versions are all essentially the same thing, then for the most part, we will treat it as duplicates. And then it’s kind of hit and miss, which version actually ends up being indexed. So that’s, I think, a large part of the issue there is that we just see all of these different copies of the same thing, and we don’t know what you really want us to do, we have to focus on a small part of it, we can’t do everything at once. So it ends up causing almost more problems for your website, by having all of these different versions,
SEO Professional 5 21:15
That means we are going in the wrong direction.
John 21:19
Probably. Or moving too fast. I don’t know. It’s something where it’s very easy to take a really large website that’s doing well and say, Oh, we will just copy their system and take all of the different English versions and language versions and do the same thing. But if you’re not in the situation of that large, well known website already, then it’s very different for you. And then it’s a lot harder to actually get to, to kind of expanding. So I would try to pick one English version and maybe one other language version to start off with and then expand from there.
SEO Professional 5 21:56
Yeah, because since we want to target worldwide, because it’s an e-commerce website, and we want to target worldwide, and we want to improve our SEO, we just targeted in this way, creating different URLs and locating them in different countries like that. So I think we can focus only on the main domain instead of creating locally. Right?
John 22:18
Yeah, I mean, that’s, that’s what I would do. Yeah.
SEO Professional 5 22:20
And no need of going for the en–I mean, the URL structure should not–it should be like www.xyz.com instead of the locality, right?
John 22:30
You can have the locale in there, it doesn’t really matter. But again, I would make it so that you primarily just have one English version, rather than all of the other variations of English as well.