Whether you’ve realized it or not, you’ve come across plenty of duplicate content just by browsing the web. Sometimes it’s obvious—ever seen the same news article published on two different sites? That’s duplicate content. Other times, it’s so subtle it’s barely noticeable. If you type a URL into your browser bar without the www prefix but are automatically sent to a URL that does have it, you’ve just been moved from one duplicate page to another.
This can be achieved with the canonical tag, a powerful tool that tells search engines which duplicate page is the original version (i.e. the canonical version) and which is the secondary version (i.e. the non-canonical version). By using this tag effectively and sidestepping common mistakes, you can avoid duplicate content, boost search result rankings and authority, and improve a site’s user experience.
How the Canonical Tag Helps You Control Duplicate Content
Duplicate content is to search engines what a wrench is to a spinning gear. When a site has two or more identical versions of the same piece of content, Google doesn’t know which version to index, which should rank for search results or what to do with link metrics. Worse still, its bots may spend valuable time crawling multiple copies of the same page rather than crawling the site’s new or updated content.
However, it’s important to note that in general, duplicate content is not inherently deceptive. A site might have duplicate content for a variety of valid technical reasons, including inadvertent URL variations, improperly configured content management systems (CMS), different language variants and printer-friendly page versions. Accordingly, Google doesn’t typically penalize any duplicate content it perceives as innocuous.
Even so, the less duplicate content a site has, the more likely it is to be efficiently crawled and achieve greater prominence in search results.
This begs the question, how can duplicate content be avoided in the first place? The answer lies in the canonical tag, an HTML element that prevents and eliminates duplicate content issues. When you tag a page in this way, its address becomes a canonical URL.
By learning how to use this tag properly, you can boost a site’s visibility, performance and user experience in one fell swoop.
Choose a Canonical Page
First, specify which version of a page you want Google to view as canonical—i.e., choose which page version you want people to see in search results. Your preferred version should be the one with the best performance. If all versions perform equally well, pick your favorite one.
The simplest way to indicate which page is canonical is to use the canonical link element. In the(not) of a non-canonical site page, insert the tag to direct search engines to the canonical one. The tag itself is both straightforward and brief:
To be clear, the canonical tag is technically an HTML element, not a tag—that designation belongs to the portion of the element. Nevertheless, it’s almost always colloquially referred to as such, so to avoid confusion we’ll call it the rel=canonical tag or canonical tag for short.
You can also use the canonical HTTP header instead of the canonical link element. Google added this option to give webmasters the ability to canonicalize non-HTML documents such as PDF versions without increasing page size.
By adding the canonical HTTP header to a non-HTML page, you can choose the HTML page you want to direct search engine crawlers to.
Please note that using the canonical HTTP header is considered to be an advanced technique. Google has marked it as such because the headers can be difficult to keep up with on large sites or sites with frequently changing URLs and URL parameters .
If you feel up to the challenge, you can insert a canonical HTTP header using the following snippet of code in your page’s source code:
Link: <http://www.example.com/resources/ebook.pdf>; rel=”canonical”
Throughout the canonicalization process, remember self-referencing canonical tags are OK. For example, the homepage www.example.com can point to the same URL, www.example.com. This may seem unnecessary, but it can help further clarify to search engines which page you want to be indexed.
Identify Canonicalization Issues
If the canonical tag is already being used on a site, it can be difficult to find pre-existing issues.
To avoid tediously combing through your XML sitemap, try using a free tool like the Screaming Frog SEO Spider (the Yoast SEO WordPress plugin is a stellar option too).
Once you’ve downloaded the SEO Spider, enter the URL of the site you want to analyze.
Then, click on the “canonicals” tab. This will bring up a complete list of the site’s canonical URLs and show you which pages are indexable and which aren’t.
When reviewing this list, keep an eye out for canonical tags that:
- Point to the wrong page. For instance, one non-canonical page might point to another non-canonical page instead of the canonical one.
- Use the wrong URL. For example, a tag might point to a URL that doesn’t include a trailing slash (www.example.com) when it should be pointing to a URL that does (www.example.com/).
- Send mixed signals to search engines. This can occur when page X points to page Y, and page Y points to page X. In that scenario, search engines won’t know which page is canonical.
- Point to the first page of a paginated series. If the second page of a series points to the first page, then the second page will not be indexed.
- Contain relative rather than absolute URLs. Relative URLs don’t specify the protocol (example.com), while absolute URLs do (https://example.com). If a tag contains a relative URL, search engines will likely interpret it incorrectly.
- Appear multiple times on the same page. If a single page has more than one canonical tag, search engines will ignore all of them.
When fixing any canonicalization issues, always keep in mind even slight canonical URL differences can matter in the eyes of search engines.
The following URLs are all viewed as distinct by search crawlers:
- http://www.example.com
- https://www.example.com
- http://www.example.com/
- https://www.example.com/
- http://example.com
- https://example.com
- http://example.com/
- https://example.com/
- www.example.com
- example.com
So, be sure to keep things consistent while addressing existing canonicalization issues or adding new canonical tags, whether you’re using Yoast SEO, the Screaming Frog SEO Spider or another tool altogether.
Give Precedence to HTTPS Pages
Google specifies in its canonicalization guidelines that it prefers HTTPS pages over HTTP pages by default.
What’s the difference between the two? HTTP, or hypertext transfer protocol, is a communications protocol used to transfer information via the internet. HTTPS, or hypertext transfer protocol secure, is the same type of protocol, except it’s encrypted.
With HTTPS, data is transferred using transport layer security (TSL) protocol. TLS offers three key security benefits: encryption, data integrity and authentication.
In an effort to protect user data and promote widespread encryption, Google has expressly encouraged the adoption of HTTPS over HTTP. Given that 95 percent of Google traffic was encrypted as of September 17, 2022, that effort has proven to be largely successful.
With that in mind, it’s prudent to specify HTTPS pages as being canonical while specifying any duplicate HTTP pages as non-canonical.
Allow Indexing on Canonicalized Pages
You can use the noindex directive to stop Google from indexing pages you don’t want to be included in search results, such as login or thank you pages.
At first glance, it seems logical to include noindex on non-canonical pages, too. If you’re going to point search engines toward one main canonical page anyway, why not block indexing on the pages you don’t want to rank?
The answer has to do with link equity, once known as link juice, a process in which external or internal linkspass authority and ranking power to other links.
By adding the noindex directive to a non-canonical page, you’ll be losing any link equity that page may have, which can lower the ranking power of the canonical page. But since canonical tags pass link equity, they don’t create the same problem.
To avoid any negative impact on valuable internal link equity, allow indexing on canonicalized pages and save the noindex directives for pages that truly shouldn’t be indexed.
It’s also worth noting Google no longer supports noindex in robots.txt, so be sure to use noindex in either the HTTP response headers or page HTML instead. To check if search crawlers can access a page you don’t want to be indexed, try using the Google Search Console URL Inspection Tool.
Take Advantage of Cross-Domain Canonicalization
By canonicalizing pages across multiple sites, you can tell search engines you’d like them to index a page’s content on a single domain rather than each one individually.
This is referred to as cross-domain canonicalization, a strategy often used to generate traffic from content syndication, i.e. content that’s re-published on sites other than the original.
For instance, one news website (site A) may publish original content that’s then re-published on another news site (site B). Site A gets exposure and increased organic traffic, and site B gets fresh and relevant content.
Site A can benefit from that scenario even further, though, by canonicalizing the article. Even though the article is re-published on site B, the canonical tag will tell search engines the definitive version of the article is on site A.
As a result, users making relevant search queries are more likely to see the original article as it appears on site A in search results.
Cross-domain canonicalization isn’t just useful for canonicalizing content on third-party sites, either. If multiple domains belong to the same owner and the same article is published on several of them, the site owner can use the canonical tag to specify which domain they want to show up in search results.
Prioritize Responsive Design
For sites with separate mobile URLs, you can set the desktop version as the canonical URL to tell search engines to index it instead of the mobile version.
However, Google explicitly recommends eschewing separate URLs in favor of responsive web design, which automatically adjusts page layout to suit the device type being used to view it.
With responsive design, if a user visits a page on a mobile device, they won’t be redirected to a separate mobile-specific URL (for instance, www.m.example.com).
Instead, the website will identify the type of device they’re using and alter its layout accordingly while using the same URL (www.example.com). This provides a better user experience, eliminates the need to manually create multiple layouts of the same site, and streamlines analytics and performance tracking.
As such, when you want to ensure a positive user experience across platforms and devices, it’s best to prioritize responsive web design over a separate canonical URL when possible.
Know When to Use 301 Redirects
There are times when the canonical tag isn’t the best way to specify which page is canonical. For example, when deprecating a duplicate page, Google recommends using a 301 redirect (also known as a 301 status code) instead.
In users’ eyes, the difference between the two options is a 301 redirect means they never see the page they were trying to reach in the first place. With the canonical tag, they’re still taken to the URL they entered or clicked on, such as a URL for products in a specific color, even if it’s non-canonical.
What makes 301 redirects different from other types of redirects? Unlike 302 and 307 redirects, 301 redirects tell search engines the page in question has been permanently moved to a new location. By comparison, 302 and 307 redirects indicate a page has been temporarily moved to another location.
The result is that pages with 301 redirects immediately transfer about 95 percent of their link equity to the new destination page.
Preserving link equity this way can significantly impact a site’s authority and search rankings. Although 302, 307, and other types of redirects are no longer directly penalized, it takes some time for Google to realize the redirect is no longer temporary and start passing on link equity accordingly.
You can also replace a site’s 404 pages with 301 redirects when appropriate. For instance, you could use a 301 for a URL leading to a non-existent (but previously well-trafficked) page about custom shoes.
The redirect sends visitors to a current page about custom clothing and ensures Google indexes the correct page. Remember, though, to always guide visitors to the most relevant alternative page possible.
If you do decide a 404 page would be more appropriate (for instance, for a URL that received minimal traffic or was never functional to begin with), consider using custom 404 pages for a better user experience.
The Rel=Canonical Tag? You Can Rel=Conquer It
Proper use of canonical URLs isn’t as much of a complicated subject as it may initially seem. From implementation to troubleshooting to fine-tuning, anyone can master the rel=canonical tag with the right strategies (and reap SEO benefits in the process).
Once you’ve canonicalized a site’s pages and blog posts with the canonical tag, you’ll be able to unify pesky duplicate versions of identical content, direct search engine traffic where you want it to go while boosting your authority, improve overall search engine optimization and create a streamlined and intuitive user experience.
Image Credits
Screenshot by author / March 2020