Have you ever wondered what happens when search engines see a piece of content that’s similar or duplicate? Have you ever thought that it may present search engines with some confusion when indexing and ranking your website? If so, you are correct.
Canonical tags are a method of solving this particular issue.
You might already know that canonical tags (also known as canonical URL) are important for SEO. But did you know they also play a role in user experience? Let’s take a look at why these tags are useful and how to implement them correctly.
Canonical tags help you control how your content appears to search engines, and whether or not this content creates confusion, in terms of both indexing and crawling.
Search engines use canonical tags to identify which page is the “master copy” of that page. It will then treat all other pages similar to that page as secondary in ranking priority.
Canonical Tags vs. Canonical URLs: What’s The Difference?
Despite the confusion that may be easily experienced by a beginner, canonical tags and canonical URLs are entirely different things.
Canonical Tags
The canonical tag is a way to mark the original source of an answer, so that it can be found easily in search results. It’s not used for anything else (except perhaps as a link back to the question).
It doesn’t mean “this is the best” or “this is the most correct,” but more so that “this is where search engines should look if they want to find the preferred version of that page.”
Canonical tags are also the code that you enter into your source code in order to define that canonical tag.
So what are canonical tags, more specifically, and why do they matter?
The term “canonical tag” is used to describe the HTML markup that tells search engines which pages should be indexed and ranked higher in their results. The most important thing about a canonical tag is that it helps you tell Google what content on your site is more relevant than other content. This makes sense because if you don’t have a canonical tag, Google will choose the version it considers to be the best one, without your input. This tag is your opportunity to provide that input.
Canonical URLs
While the canonical tag is a tag, appearing as rel=canonical, the canonical URL is just that: the choice of URL that you wish to be the physical canonical version of that page.
Here is an example of the differences between the two:
The Canonical URL:
https://www.thisisatesturl.com/something-or-other
https://www.thisisatesturl.com/SoMething-Or-Other
https://www.thisisatesturl.com/SOMETHING-OR-OTHER
https://www.thisisatesturl.com/Something-or-otheR
https://www.thisisatesturl.com/something-or-other/
The choice of URL in this scenario could be https://www.thisisatesturl.com/something-or-other/, and all the other URLs would be ignored.
The Canonical Tag
The canonical tag would look like this:
<link rel=”canonical” href=”https://www.thisisatesturl.com/something-or-other/”>
The way that the URL is chosen as canonical would affect how Google sees it as the priority URL in that group of URLs.
This is useful for avoiding duplicate content issues because you specify this particular URL as being the priority over all other types of versions of URLs.
If you have relatively careless internal linking, resulting in all of those versions of URLs that we mentioned, you could cause significant confusion for Google because of this canonicalization issue.
How Else Can Canonical URLs Be Specified?
You can specify canonical URLs in several ways, and you don’t always have to use the rel=canonical tag. You can use the following methods to set your canonical URLs:
Use an HTTP header response for Rel=canonical
You can also use 301 redirects.Google does recommend that you utilize these methods to set your canonicals, but they are not applicable to every possible situation. There are certain special use cases where you would want to use these methods, as opposed to physically using rel=canonical.
Why Does Duplicate Content Happen?
Before going into everything else, it’s useful to examine why duplicate content occurs, and how you should utilize canonical URLs in order to mitigate these types of issues on your site.
It’s worth noting at this juncture that, contrary to popular opinion, nobody intends to create duplicate content from the start.
Instead, duplicate content is an issue that tends to build until it impacts your site negatively in the SERPs (search engine results pages).
One of the most common causes of these types of issues is that a CMS (or content management system) may create more than one URL when you publish a page, or when you are careless about leaving different versions of your site that are indexable. Maybe you set alternate versions for more than one type of device, or maybe you utilize dynamic URLs. You set a different desktop version, or a different mobile version without using the proper best practices for doing so.
Whatever the case may be, creating duplicate content is not an exercise with great intention behind it. Most often, it’s the case of errors with your CMS or other situation that creates this content automatically.
Building on our prior example, let’s examine the following URLs. For the sake of assumptions, assume that they are all visible and indexable to Google, and they display all of the same content.
- https://www.thisisatesturl.com/something-or-other
- https://www.thisisatesturl.com/SoMething-Or-Other
- https://www.thisisatesturl.com/SOMETHING-OR-OTHER
- https://www.thisisatesturl.com/Something-or-otheR
- https://www.thisisatesturl.com/something-or-other/
- https://thisisatesturl.com/something-or-other
- https://thisisatesturl.com/SoMething-Or-Other
- http://thisisatesturl.com/SOMETHING-OR-OTHER
- http://thisisatesturl.com/Something-or-otheR
- http://thisisatesturl.com/something-or-other/
- https://www.thisisatesturl.com/something-or-other/category/something/
- https://thisisatesturl.com/something-or-other/services/name-of-service/
- https://thisisatesturl.com/SoMething-Or-Other/service-2/service.html
When a search engine crawls these pages, they are not seeing the individual content. They are seeing all of these specific URLs, as well as all instances of the physical content on the page.
The last 3 of these URLs can occur from saving the URLs with both versions, such as the category name.
The URL examples listed in 8 – 10 could have been created because there are different versions of http:// and https:// that have not been properly redirected.
It’s not a challenge to accidentally create these kinds of issues. But it can be a challenge finding them all and repairing them properly.
Google’s Webmaster Guidelines also specifies that sections of a site that are appreciably similar to other sections could count as duplicate content. This is why an SEO strategy that focuses on original content is going to win over another strategy that doesn’t, especially due to some of the SEO issues the former strategy could cause.
How Internal Links Can Cause Duplicate Content Problems
Creating internal links carelessly can cause duplicate content problems because of the fact that you are referencing more than one version of that URL.
It doesn’t even have to exist.
You just have to be careless when you type out the URL. Maybe you prefer to use https:// non-www version of that URL because you prefer typing less.
Or perhaps you use the longer version because it’s easier to read.
Either way, being careless with your internal links can also lead to these types of issues, provided that you aren’t redirecting them accordingly.
Even if you were, you introduce even more issues as a result.
For example, you never want URLs in your sitemaps to end in 301 redirects. If you specify a canonical URL that ends in a 301 redirect, you are also causing issues there as well.
The SEO Benefit of Rel=canonical
The biggest benefit of rel=canonical is that it exists as a tool to help you resolve duplicate content problems. Duplicate versions of content on your site are not a great thing, and can land you in trouble with Google’s duplicate content filter.
But, it’s a mistake to think it’s a penalty, because it’s not. It’s a filter that will only show the most unique page among a common set of pages.
The more duplicate pages you have, the more you confuse this filter, leading Google to not understand exactly which page you want to have crawled and indexed.
The other major issue with many duplicate pages is that they can also cause problems with Google properly reading page rank, because these duplicate pages split the page rank value too much.
This can cause significant issues with ranking because this split page rank means that those pages don’t get enough value from the links going to those pages, and it can throw off your ranking by quite a lot.
A site that otherwise is stronger in unique content will not have these issues, and should ultimately have no major problems with ranking, once it builds up enough authority.
This is why the canonical tag exists, to help you alleviate duplicate content issues as a result of careless content management.
How to Correctly Implement the Rel=canonical Tag
The best way to implement the rel=canonical tag (at least the most common one) is by making sure that you use the rel=canonical tag in your page header within your source code.
It looks like this:
Implementing it is not that hard.
But, what can be downright challenging is keeping up with all the changes on your site in such a way that you know which pages have been marked up, which pages have not, and which pages need to ignore the canonical entirely.
For example, you cannot have noindex plus canonical, that is a contradictory directive. You can have one or the other. If you have both, then Google will understand that you want to index the page with the canonical, even if you have the noindex directive.
Common Mistakes with Rel=Canonical
There are common mistakes that one can make with rel=canonical. It is not an easy SEO directive, and things can get super confusing real fast.
And there are many moving parts that need to coalesce in order to create a proper rel=canonical implementation.
We wish it were as simple as implementing one tag on your page and calling it a day, but it’s really not. You really do have to think through your strategy, figure out exactly which pages need that canonical and which don’t, and make sure that you do not accidentally introduce confusion to Google as a result.
This is why it’s not a one and done deal, and why you need to pay attention to all these little details.
Mistake 1: Setting a Canonical That Goes to a Page that’s 4xx
This is a pretty big one. If you set a canonical tag that goes to a 404 error, then you’re basically telling Google to noindex that page.
Because of that, Google won’t see the canonical tag, the page rank will instead be transferred to the canonical version.
If you have a bunch of pages like this, and only a few indexable pages, you have just inadvertently told Google to deindex your entire site.
Mistake 2: Having More Than One Canonical Tag On the Page
Google will ignore multiple canonical tags. The effort that you spent in creating those tags is something that is canceling out anything that may have happened to that page.
You might as well have done nothing.
This is why it’s important to make sure that you don’t have any duplicate canonical tags, because if you do, then you may not receive any benefit from any of those particular SEO efforts.
Mistake 3: Choosing Only One Page As the Canonicalized Destination for Hundreds of Pages
This is a very bad move. Mostly because you’re introducing mega confusion to Google. If you choose one page as the canonicalized destination for hundreds of pages, you’re basically telling the search engine that that page should be considered the priority page over all hundreds of other pages pointing to that page.
Pages should not be canonicalized to the same root paginated page in that particular series. Instead, you want to make sure that you spread out, and either canonicalize a page to itself, or make sure that it’s canonicalized to the nearest relevant page.
If it’s in a subdirectory, that’s fine to canonicalize back to the root. But don’t do the same thing to hundreds of these particular pages.
Mistake 4: Absolute URLs Canonicalized to Relative URLs When You Mean to Use the Absolute URL
Say that you have a particular page that’s named delorean.html. But then you have the full URL as https://welovedeloreancarsandtheyarethemostawesomethingintheworld.com/delorean.html.
If you use the relative URL when linking, Google is likely to default to http:// when guessing at what you mean.
In this situation, the algorithm will likely ignore the specified rel=canonical.
This means that what you did with the canonical will ultimately not matter.
Mistake 5: Adding Canonical Tags Within the Body Text Instead of the Head Tag
As a standard practice, it’s imperative that you add your canonical tag to your page within the head tag (
) rather than within the body text.Doing this (adding it to the body) can cause issues with parsing your canonical tag and cause issues with Google when it crawls the page.
Always make sure that you avoid including canonical tags within the body text, and always include them within the
section of the page.Why Using Rel=Canonical Is an SEO Professional’s Best Friend
Using rel=canonical effectively can really boost your rankings and help you achieve your Google SEO dreams. Using them ineffectively can lead to your worst SEO nightmares.
But, make sure that you use canonical tags wisely.
If you don’t, then you might cause serious issues when it comes to Google crawling, indexing, and understanding where to rank your website.
Always be sure to dot the i’s and cross those t’s when you work with canonical tags.
How do you plan on using canonical tags next?