English majors are familiar with the term “literary canon,” which describes a group of works accepted as particularly influential within a particular genre or theme. Search engineers, similarly devoted to elegant syntax in their own discipline, use “canon” to describe the “main” or “correct” URL to use when faced with multiple variants of that URL.
The big news coming out of SMX West last week was that the three major search engines (Google, Yahoo, and MSN/Live) announced joint support for a new link element that enables engines to reduce their guesswork when it comes to identifying your site’s canonical URLs. Today’s column runs through the basics of the element: what it is, and when, why, and how you should use it.
What Is the Canonical Element?
In short, the canonical element is a line of code that you add to pages that may be duplicates. In this code, you designate the “canonical,” or “proper,” URL. Engines, in turn, note this URL and apply link popularity and authority to the canonical version instead of applying them to duplicate URLs. In theory, this will consolidate the authority and link popularity into a single URL, as opposed to splintering them among several similar URLs.
The basic message you’re sending to engines, as put by Google in its support documents, is that “of all these pages with identical content, this page is the most useful. Please prioritize it in search results.”
When Should You Use the Canonical Element?
Over the last couple years, I’ve discussed several different types of duplicate content. Following is a sample list of different types of duplicate content. Each of these (with the possible exception of the “pagination” usage) is a suitable candidate for the new rel=”canonical” element:
www vs. non-www. Go to any URL on your site. Does it work either with or without the “www” subdomain affixed to the URL? If so, this applies to you.
Secure vs. unsecure. Similarly, when a URL works with either http or https in the address field, it’s a duplicate.
Affiliate/vendor tracking. Some sites have affiliate relationships that use tracking codes in URLs. For example, you might want your affiliates to drive traffic to your /products/ page, but if your site assigns a unique code to each affiliate, you may end up receiving traffic at the following pages, all of which show the same content:
Load balancing. Some sites with heavy traffic balance server loads by diverting traffic to servers such as www2, www3, and so on, which often leads to heavy indexing of non-www variants.
/ vs. /default.aspx. This is similar to www and non-www. Many platforms resolve a page at both the root level (or at the folder level) as well as at an associated filename.
Navigation-based tracking. The default setup of some platforms can show the same URL in several different formats, based upon the internal route used to get to the page. For example, you might have a page called http://www.yoursite.com/products/, but if you navigate to that page from the side navigation bar, your content management system (CMS) might produce a URL such as http://www.yoursite.com/products/?from=sidenav.
Pagination. Usage of the canonical element on paginated articles (those broken into multiple chunks, such as /article.aspx, /article.aspx?p=2, /article.aspx?p=3, and so on) isn’t for everyone and should be carefully considered case by case. But if you prefer that users enter your articles only at the first page and you don’t particularly mind if the second or third pages of your articles don’t rank at engines, consider the canonical element for this purpose.
How Do You Use the Canonical Tag?
The syntax of the new canonical attribute is quite simple. Between the <head> and </head> tags on your page (the same place you put your title element and meta data), place this line:
<link rel=”canonical” href=”http://www.yoursite.com /correct-url/” />
The actual URL in the line above, of course, is a placeholder for the actual canonical URL that you’ll put in your file.
As an example, let’s assume you have duplicate content issues that result in all of the following URLs resolving correctly, and that the first URL in this list is the “correct” one:
- and so on.
To apply the canonical element, just add the following line to the <head> section of all pages:
<link rel=”canonical” href=”http://www.yoursite.com/correct-url/” />
You may have these related questions:
How Do I Isolate All the Duplicate Variants of My Canonical URL in the First Place?
In theory, you shouldn’t need to know how many duplicate variants you have before implementing this element. If you automate this process, just ensure (and make sure you test and confirm) that your CMS is hard-coding the canonical URL into the <head> section. Any duplicate variants of that URL should still contain the correct canonical URL inside.
How Will This Change the User Experience?
It won’t affect users at all. This element is completely invisible to the user, regardless of browser, regardless of script-enabled status, regardless of everything. There is no redirection, as far as the user is concerned. Any comparisons to a 301 redirect that you read about in relation to this element are strictly symbolic.
In addition, this element will not affect analytics. All page views, page landings, and user actions will still be counted as they normally have been.
Historically, when the big three engines gang up and announce a coordinated effort, it’s been wise to pay attention, because the results can benefit sites of all sizes. The Sitemaps.org project is one example. The agreement on advanced Robots Exclusion Protocol is another. Plenty of people will be watching this development very closely to see how it acts in the real world (and building tools to help implement it), so stay tuned for further information.