If you’ve read my article explaining what keyword cannibalization is, you’ve probably come across the idea of canonicalization, which I touched upon. In this post I’ll explain in more depth what canonical URLs are and I’ll explore the following:
- What they do.
- Why they are important.
- How you can implement them.
But before I delve into this, it’s worth providing a little history.
The Need for Canonical URLs?
Going back prior to 2009, duplicated content was a major problem for search engines.
In the black hat marketing world, it was very simple to create spam sites using copied content from from across the Web. These so called “scraper” sites could literally publish copied content without any modification.
Scraper sites would contain thousands of copied pages that could rank highly for long-tail keyword searches.
Even white hat site owners unknowingly published pages that were duplicates of others within their sites: they still do actually if they’re not careful.
What do I mean by this? Consider the following two URLS:
Without intervention in some way, these two URLs represent two different pages to search engines. However, the content of both will be the same.
Why? Because the second example has a query string (highlighted) attached to the end of the URL… it is effectively the same page as the first example.
To a search engine, the URLs are different but the content is precisely the same. In this case you’ll have a duplicate content issue wherever the same content (or very similar) is available on multiple URLs.
Is Duplicate Content Limited to Your Domain?
There’s a clear answer to this question and Google tells us precisely:
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin.Source: Google
Bing too tells the same story:
Duplicating content across multiple URLs can lead to Bing losing trust in some of those URLs over time.Source: Bing
The explicit answer is if your post content is similar to other content on your site, or someone else’s, it may be seen as duplicate.
Duplicated Content, Copied Content & Penalties
There is a good deal of confusion about duplicate content and whether you can receive a penalty for it. Although Bing’s wording in the above quote seems to suggest you may be slapped, the reality is duplicate content is generally not penalized. However it can impact how your pages appear in results.
Search engines won’t see such duplication as high quality content. Actually the opposite because it appears to fall into Google’s definition of what low quality content is:
We will consider content to be low quality if it is created without adequate time, effort, expertise, or talent/skill.Source: Google Search Quality Evaluator Guidelines
Copied content is something entirely different. If large amounts of text are a direct copy of text at another URL, you may be subject to a manual penalty or an algorithmic slap:
We do have some things around duplicate content … that are penalty worthySource: John Mueller, Webmaster Trends Analyst at Google
So there’s a difference between duplicating and copying:
- Duplicating is mostly “not deceptive in origin” and therefore not likely to result in a penalty, but pages might be rated as low-value.
- Copying with “minimal alteration”, especially if it is domain-wide, is characteristically spammy and might be penalized.
As long as you are not trying to deceive search engines to manipulate search results, duplicated content will not harm your site. However, search engines will probably see your content as low quality and you will not be rewarded for it.
What Do Canonical URLs Look Like & Where Do You Put Them?
As I touched upon earlier, duplicate content is actually very common. Any site that uses query string on a URL (page filters, pagination, etc.) have the capacity to create multiple URLs with the same content.
This is where canonicalization is your friend.
Canonical URLs are a way to tell search engines that the content on one URL has a canonical (or master) version elsewhere.
You tell them this by using a canonical link element within the HTML of a master page, to point to itself. This means that any page using that content always refers search engines back to the master URL.
The canonical link element looks like this:
<link rel="canonical" href="http://example.com/yourmasterpage">
The link should be placed in between the <head></head> tags in the HTML on any page URL where there is likely to be another version using the same content. This might include:
- Your homepage.
- Any page that might add query strings to the URL of that page.
Canonicalizing Your Home Page
Why would you want to have your homepage referencing itself with a canonical URL?
This may seem counter-intuitive, but duplicate versions of your homepage URL are extremely common. Consider these URLs:
Although the http / https elements are not pages within themselves, they form part of a page’s URL structure. In my example, these four URLs are all different, but they each take you to the same content.
Anyone can link to you from their site using any one of these URLs and of course you can’t control this. However, you can specify to search engines which one is the master version by adding the canonical link element to the code on your homepage.
In my case I use this:
<link rel="canonical" href="https://sidegains.com" />
This tells search engines to only consider https://sidegains.com in determining which of the four pages to use in search results.
Canonicalizing Potentially Duplicated Pages
You should canonicalize any page in your site that might pull the same content as another through query strings that build onto the master URL. This might include pages that use product filters or pagination.
The process for canonocalizing them is the same as the example for the homepage I gave above. You just add the canonical link element between the <head></head> tags in your HTML. Any dynamic page using the canonical page to generate its content will link back to the master version.
A final place you might want to consider adding canonicalization might be on other domains you own, if you share content between them. Many businesses share content in this way on other portals in their online estate.
Best practise is to add the canonical link element to the posts or pages on the other domains back to the URL on the domain where the content originates.
Adding Canonical Link Elements to WordPress Posts & Pages
I won’t comment on how to add the rel=”canonical” link element for every blog platforms available here, but I will advise how you can do it in WordPress, the platform I use.
Firstly, you should install the Yoast SEO plugin. It’s a free plugin that adds so many useful SEO enhancements to your WordPress blog that it warrants a separate post on it’s own!
Once you’ve installed and activated it you’ll automatically see an area like this at the bottom of your posts and pages when you are in editing mode.
You just need to add the URL of the page you are editing and Yoast will populate the tags into your HTML for you!
If you’re not using WordPress, check the documentation for the platform you are using, and follow the steps for to add canonical URLs from there.
- Search engines can tell if your content is duplicated or copied.
- Duplicate content is generally not penalized unless its intention is to manipulate results. However it can decrease your page quality rating.
- Copied content may be subject to a manual or an algorithmic penalty. Learn more about Google SEO penalties.
- Duplicate content can confuse search engines about which page to show in search results.
- Add canonical link elements to any child version of a master page.
- If you share content across multiple domains, you should use canonical URLs between them to point to the master content.
Finally… a word of warning. Be careful about using canonical URLs on page content that closely resembles that of other pages in your site. It’s okay to use canonical URLs on pages that use filters to present similar content, such as product pages if the content is built from the master URL.
However, if you add canonicalization on pages that do not share the same content, you may do them an injustice. They may not show in the search results.
That’s it for today.
If you have any questions about canonical URLs, drop be a question in the comments below.
<— Share this image on Pinterest