One SEO professional asked John Mueller during a hangout about canonical URLs.
Their question was: They have set canonical URLs on five pages. However, Google is showing a third page also.
Why is it not only showing the URLs where the SEO professional has set a canonical on it for?
John explained that the rel=canonical is a way for you to specify which of the pages within a set of duplicate pages you want to have indexed – which address you want to have used.
In particular, if you have one page, perhaps with the file name in uppercase, and one page with the file name in lower case, then in some situations, your server might show exactly the same content.
Technically speaking, they are different addresses – upper and lower case are slightly different.
From a practical point of view, your server is showing exactly the same thing.
And Google, when it looks at that, says, “Well, it’s not worthwhile to index two addresses with the same content. Instead, I’ll pick one of these addresses and use it to index that piece of content.”
With the rel=canonical, you give Google a signal and tell it – “Hey, Google, I really want you to use the lowercase version of the address when you’re indexing this content. You may have seen the uppercase version, but I really want you to use the lowercase version.”
That’s essentially what a rel=canonical does. It’s not a guarantee that Google will use the version that’s specified there. But it’s a signal for Google to help figure out that, all things being equal, the SEO professional really prefers that address.
This happens at approximately the 17:12 mark in the video.
John Mueller Hangout Transcript
John (Submitted Question) 17:12
Let’s see. I have a set of canonical–or I have set canonical URLs on five pages. But Google is showing it at third page as well. Why is it not only showing the URLs where I’ve set a canonical on it for?
John (Answer) 17:30
So I’m not 100% sure I understand this question correctly. But kind of paraphrasing, it sounds like on five pages of your website, you’ve set a rel=canonical. And there are other pages on your website where you haven’t set a rel=canonical, and Google is showing all of these pages kind of indexed in–essentially in various ways. And I think the thing to keep in mind is the rel=canonical is a way of you specifying which of the pages within a set of duplicate pages you want to have indexed like that.
Or essentially, which address you want to have used. So in particular, if you have one page, maybe with the file name in uppercase, and one page with the file name in lowercase, then in some situations, your server might show exactly the same content. Technically, they’re different addresses, uppercase and lowercase are slightly different. But from a practical point of view, your server is showing exactly the same thing. And Google, when it looks at that says, Well, it’s not worthwhile to index two addresses with the same content.
Instead, I will pick one of these addresses and use it kind of to index that piece of content. And with the rel=canonical, you give Google a signal and tell it, Hey Google, I really want you to use maybe the lowercase version of the address. When you’re indexing this content. You might have seen the uppercase version, but I really want you to use the lowercase version. And that’s essentially what the rel=canonical does. It’s not a guarantee that we would use the version that you specify there, but it’s, it’s a signal for us, it helps us to figure out all things else being kind of equal, you really prefer this address. So we will try to use that.
And that’s kind of the preference part that comes into play here. And it comes into play when we’ve recognized there are multiple copies of the same piece of content on your website. And for everything else, we will just try to index it to the best of our abilities. And that also means that for the pages where you have a rel=canonical on it, sometimes we will follow that advice that you give us. Sometimes our systems might say well, actually, I think maybe you have it wrong, you should have used the other address as the canonical.
That can happen. It doesn’t mean it will rank differently or it will be worse off in search. It’s just, well Google systems are choosing a different one. And for other pages on your website, you might not have a rel=canonical set at all. And for those, we will just try to pick one ourselves. And that’s also perfectly fine. And in all of these cases, the the ranking will be fine, the kind of the indexing will be fine.
It’s really just the address that is shown in the search results that varies. So if you have the canonical set on some pages, but not on others, we will still try to index those pages and find the right address to use for those pages when we show them in search. So that’s kind of like it’s a good practice to have the rel=canonical on your pages, because you’re trying to take control over this vague possibility that maybe a different address will show. But it’s not an absolute necessity to have a rel=canonical.