Whether you’ve realized it or not, you’ve come across plenty of duplicate content just by browsing the web. Sometimes it’s obvious—ever seen the same news article published on two different sites? That’s duplicate content. Other times, it’s so subtle it’s barely noticeable. If you type a URL into your browser bar without the www prefix but are automatically sent to a URL that does have it, you’ve just been moved from one duplicate page to another.
This can be achieved with the canonical tag, a powerful tool that tells search engines which duplicate page is the original (canonical) version and which is the secondary (non-canonical) version. By using this tag effectively, you can avoid duplicate content, boost search result rankings and authority, and improve a site’s user experience.
How the Canonical Tag Helps You Control Duplicate Content
Duplicate content is to search engines what a wrench is to a spinning gear. When a site has two or more identical versions of the same content, Google doesn’t know which version to index, which should rank for search results or what to do with link metrics. Worse still, its bots may spend valuable time crawling multiple copies of the same page rather than crawling the site’s new or updated content.
However, it’s important to note that in general, duplicate content is not inherently deceptive. A site might have duplicate content for a variety of valid reasons, from inadvertent URL variations to printer-friendly page versions. Accordingly, Google doesn’t typically penalize any duplicate content it perceives as innocuous.
Even so, the less duplicate content a site has, the more likely it is to be efficiently crawled and achieve greater prominence in search results.
This begs the question, how can duplicate content be avoided in the first place? The answer lies in the canonical tag, an HTML element that prevents and eliminates duplicate content. When you tag a page in this way, its address becomes a canonical URL.
By learning how to use this tag properly, you can boost a site’s visibility, performance and user experience in one fell swoop.
Choose a Canonical Page
First, specify which version of a page you want Google to view as canonical—i.e., choose which page version you want people to see in search results. This should be the version with the best performance. If all versions perform equally well, pick whichever you prefer.
The simplest way to indicate which page is canonical is to use the canonical link tag. In the(not) of a non-canonical site page, insert the tag to direct search engines to the canonical one. The tag itself is both straightforward and brief:
To be clear, the canonical tag is technically an HTML element, not a tag—that designation belongs to the portion of the element. Nevertheless, it’s almost always colloquially referred to as such, so to avoid confusion we’ll call it the rel=canonical tag or canonical tag for short.
You can also use the canonical HTTP header instead of the canonical link tag. Google added this option to give webmasters the ability to canonicalize non-HTML pages such as PDF files without increasing page size.
By adding the canonical HTTP header to a non-HTML page, you can choose the HTML page you want to direct search engine crawlers to.
Please note that using the canonical HTTP header is considered to be an advanced technique. Google has marked it as such because the headers can be difficult to keep up with on large sites or sites with frequently changing URLs.
If you feel up to the challenge, you can insert a canonical HTTP header using the following code snippet:
Throughout the canonicalization process, remember self-referential canonical tags are OK. For example, the homepage www.example.com can point to the same URL, www.example.com. This may seem unnecessary, but it can help further clarify to search engines which page you want to be indexed.
Identify Canonicalization Issues
If the canonical tag is already being used on a site, it can be difficult to find pre-existing issues.
To avoid a tedious manual search and replace process, try using a free tool like the Screaming Frog SEO Spider.
Once you’ve downloaded the SEO Spider, enter the URL of the site you want to analyze.
Then, click on the “canonicals” tab. This will bring up a complete list of the site’s canonical URLs and show you which pages are indexable and which aren’t.
When reviewing this list, keep an eye out for canonical tags that:
- Point to the wrong page. For instance, one non-canonical page might point to another non-canonical page instead of the canonical one.
- Use the wrong URL. For example, a tag might point to a URL that doesn’t include a trailing slash (www.example.com) when it should be pointing to a URL that does (www.example.com/).
- Send mixed messages. This can occur when page X points to page Y, and page Y points to page X. In that scenario, search engines won’t know which page is canonical.
- Point to the first page of a paginated series. If the second page of a series points to the first page, then the second page will not be indexed.
- Contain relative rather than absolute URLs. Relative URLs don’t specify the protocol (example.com), while absolute URLs do (https://example.com). If a tag contains a relative URL, search engines will likely interpret it incorrectly.
- Appear multiple times on the same page. If a single page has more than one canonical tag, search engines will ignore all of them.
When fixing any canonicalization issues, always keep in mind even slight canonical URL differences can matter in the eyes of search engines.
The following URLs are all viewed as distinct by site crawlers:
So, be sure to keep things consistent while addressing existing canonicalization issues or adding new canonical tags.
Give Precedence to HTTPS Pages
Google specifies in its canonicalization guidelines that it prefers HTTPS pages over HTTP pages by default.
What’s the difference between the two? HTTP, or hypertext transfer protocol, is a communications protocol used to transfer information via the internet. HTTPS, or hypertext transfer protocol secure, is the same type of protocol, except it’s encrypted.
With HTTPS, data is transferred using transport layer security (TSL) protocol. TLS offers three key security benefits: encryption, data integrity and authentication.
In an effort to protect user data and promote widespread encryption, Google has expressly encouraged the adoption of HTTPS over HTTP. Given that 94 percent of Google traffic was encrypted as of March 14, 2020, that effort has proven to be largely successful.
With that in mind, it’s prudent to specify HTTPS pages as being canonical while specifying any duplicate HTTP pages as non-canonical.
Allow Indexing on Canonicalized Pages
You can use the noindex directive to stop Google from indexing pages you don’t want to be included in search results, such as login or thank you pages.
At first glance, it seems logical to include noindex on non-canonical pages, too. If you’re going to point search engines toward one main canonical page anyway, why not block indexing on the pages you don’t want to rank?
The answer has to do with link equity, a process in which links pass authority and ranking power to other links.
By adding the noindex directive to a non-canonical page, you’ll be losing any link equity that page may have, which can lower the ranking power of the canonical page.
To avoid losing valuable link equity, allow indexing on canonicalized pages and save the noindex directives for pages that truly shouldn’t be indexed.
It’s also worth noting Google no longer supports noindex in robots.txt, so be sure to use noindex in either the HTTP response headers or page HTML instead. To check if Google’s bots can access a page you don’t want to be indexed, try using the URL Inspection Tool.
Take Advantage of Cross-Domain Canonicalization
By canonicalizing pages across multiple sites, you can tell search engines you’d like them to index a page’s content on a single domain rather than each one individually.
This is referred to as cross-domain canonicalization, a strategy often used to generate traffic from syndicated content, or content that’s re-published on sites other than the original.
For instance, one news website (site A) may publish an original article that’s then re-published on another news site (site B). Site A gets exposure and increased organic traffic, and site B gets fresh and relevant content.
Site A can benefit from that scenario even further, though, by canonicalizing the article. Even though the article is re-published on site B, the canonical tag will tell search engines the canonical version of the article is on site A.
As a result, users are more likely to see the original article as it appears on site A in search results.
Cross-domain canonicalization isn’t just useful for canonicalizing content on third-party sites, either. If multiple domains belong to the same owner and the same article is published on several of them, the site owner can use the canonical tag to specify which domain they want to show up in search results.
Prioritize Responsive Design
For sites with separate mobile URLs, you can set the desktop version as the canonical URL to tell search engines to index it instead of the mobile version.
However, Google explicitly recommends eschewing separate URLs in favor of responsive web design, which automatically adjusts page layout to suit the device being used to view it.
With responsive design, if a user visits a page on a mobile device, they won’t be redirected to a separate mobile-specific URL (for instance, www.m.example.com).
Instead, the website will identify the type of device they’re using and alter its layout accordingly while using the same URL (www.example.com). This provides a better user experience, eliminates the need to manually create multiple layouts of the same site, and streamlines analytics and performance tracking.
As such, when you want to ensure a positive user experience across platforms and devices, it’s best to prioritize responsive web design over a separate canonical URL when possible.
Know When to Use 301 Redirects
There are times when the canonical tag isn’t the best way to specify which page is canonical. For example, when deprecating a duplicate page, Google recommends using a 301 redirect instead.
In users’ eyes, the difference between the two options is a 301 redirect means they never see the page they were trying to reach in the first place. With the canonical tag, they’re still taken to the URL they entered or clicked on, such as a page for a product in a specific color, even if it’s non-canonical.
What makes 301 redirects different from other types of redirects? Unlike 302 and 307 redirects, 301 redirects tell search engines the page in question has been permanently moved to a new location. By comparison, 302 and 307 redirects indicate a page has been temporarily moved to another location.
The result is that pages with 301 redirects immediately transfer about 95 percent of their link equity to the new destination page.
Preserving link equity this way can make a significant difference to a site’s authority and ranking performance. Although 302, 307 and other types of redirects are no longer directly penalized, it takes some time for Google to realize the redirect is no longer temporary and start passing on link equity accordingly.
You can also replace a site’s 404 pages with 301 redirects when appropriate. For instance, you could use a 301 for a URL leading to a non-existent (but previously well-trafficked) page about custom shoes.
The redirect sends visitors to a current page about custom clothing and ensures Google indexes the correct page. Remember, though, to always guide visitors to the most relevant alternative page possible.
If you do decide a 404 page would be more appropriate (for instance, for a URL that received minimal traffic or was never functional to begin with), consider using custom 404 pages for a better user experience.
The Rel=Canonical Tag? You Can Rel=Conquer It
Proper use of canonical URLs isn’t as intimidating as it may initially seem. From implementation to troubleshooting to fine-tuning, anyone can master the rel=canonical tag with the right strategies.
Once you’ve canonicalized a site’s pages with the canonical tag, you’ll be able to unify pesky duplicate content, direct search engine traffic where you want it to go while boosting your authority, and create a streamlined and intuitive user experience.
Screenshot by author / March 2020