Digital MarketingSearch MarketingCanonicalization Made Simple

Canonicalization Made Simple

The road to short and sweet search URLs.

Technically speaking, canonicalization is “the process of converting data that has more than one possible representation into a ‘standardized’ canonical representation.”

Search engine algorithms include a mathematical equation that compares different representations for similarity, counting the number of distinct data structures, to impose a meaningful, canonical sorting order.

That makes sense… right? Maybe for software engineers, computer programmers, math majors, and the like. But let’s make this a bit simpler.

Plainly speaking, search engines like Google use a canonicalization process to present users with short and sweet URLs. Think about this for a moment and consider which URL the average user would most likely click on when presented with these choices:

  • www.yourdomain.com
  • yourdomain.com/nasapp/index.jsp?
  • http://www.yourdomain.com/home.jsp;
    jsessionid=UJ2LLSBRQH4VMCWQNWRSCOYK0BW0IIWE?
    _requestid=55555

If you believe Google’s canonical preference would be www.yourdomain.com, even when all three URLs arrive at the same destination, you can proudly say you understand the fundamentals of canonicalization.

Let’s take a look at the major search engines’ canonical preferences more closely to try to determine what other factors go into determining which URL is presented in search query results.

For the sake of discussion, let’s complete a search for “milwaukee brewers” in Google, Yahoo, and MSN to compare the results.

Google offers the following top results:

The Official Site of The Milwaukee Brewers: Homepage
Features scores, game schedules, roster, news, history and forums.
brewers.mlb.com/ – 78k – CachedSimilar pages
Schedule : 2007 Brewers Schedulemilwaukee.brewers.mlb.com/NASApp/mlb/s…
Active Rostermilwaukee.brewers.mlb.com/…/roster_active.jsp?c_id=mil
Ticket Centermilwaukee.brewers.mlb.com/…/ticketing/index.jsp?c_id=mil
Help : Job Opportunitiesmlb.mlb.com/NASApp/mlb/mlb/help/jobs.jsp?c_id=mil
More results from brewers.mlb.com »

Yahoo offers the following top result:

Milwaukee Brewers
Official site of the Milwaukee Brewers. Features up-to-date stats and results, player bios, minor league information, ticket and merchandise ordering info, player …
Category: Major League Baseball > Milwaukee Brewers
www. milwaukeebrewers.com79kCachedMore from this site

And MSN Live Search offers the following top results:

Milwaukee Brewers : The Official Site
MLB Sites MLB.com Angels Astros Athletics Blue Jays Braves Brewers Cardinals Cubs Devil Rays Diamondbacks Dodgers Giants Indians Mariners Marlins Mets Nationals Orioles Padres Phillies Pirates Rangers …
www. brewers.mlb.com

Note that no one top result is more relevant than the other. All indexed listings resolve to http://milwaukee.brewers.mlb.com/index.jsp?c_id=mil by way of a temporary redirect (302).

Why, then, is one domain displayed in Google and MSN and another in Yahoo for the same result? Are the Milwaukee Brewers spoofing the search engines using temporary redirects and multiple domains?

Not exactly. Canonicalization processes simply level the playing field. These algorithmic elements vary from search engine to search engine.

Google knows the two domains are exactly the same and treats them as such when it comes to inbound links. Using query string commands, Google reveals it acknowledges 2,200 links to both link:brewers.mlb.com and link:www.milwaukeebrewers.com.

A lot of SEO (define) folks have talked about Google’s preference for subdomains. This is proof of that preference because that’s how the site’s actually crawled and indexed. Do a query for “site:brewers.mlb.com” and you’ll get some 7,880 pages. Do the same for “site:www.milwaukeebrewers.com,” and you’ll get “did not match any documents.”

To provide users with its preferred results, Google relegates www.milwaukeebrewers.com to its no man’s land of non-indexation. Google canonically prefers to display the pretty little subdomain, brewers.mlb.com, as its most relevant result for a “milwaukee brewers” search query.

MSN Live Search just isn’t as bright when it comes to algorithmic adjustments. It indexes nearly 1,300 pages of “site:brewers.mlb.com” and six pages of “site:www.milwaukeebrewers.com“. Its algorithms credit “link:www.milwaukeebrewers.com” with nearly 14,000 inbound links and “link:brewers.mlb.com” with over 14,000. MSN Live Search duplicates its own results by including the non-canonical URL in the results.

Getting any bright ideas about MSN Live Search, subdomains, and temporary redirects? Small wonder MSN Live Search has its filters set to “high” to stop spamming itself and present any semblance of canonicalization.

The question that remains is Yahoo’s preference forbrewers.mlb.com over its subdomain counterpart, brewers.mlb.com. Based on information from Yahoo Site Explorer, brewers.mlb.com has 735 pages indexed and 228 inbound links. Meanwhile, www.milwaukeebrewers.com has 45 pages indexed and 6,331 inbound links.

Should Webmasters redesign their sites to include subdomains if they want to make headway in Google and MSN Live Search? Absolutely not. Subdomains are not a secret weapon for improved indexation.

Subdomains do make sense, however, when each subsection of a top-level domain contains completely unique content addressing different topics, such as the collection of baseball teams at mlb.com.

It would be interesting to test the best way to shift canonicalization processes in the major search engines. Would submitting the top-level domain as the preferred result influence Google and MSN Live Search indexation? Could XML sitemap feeds encourage Yahoo to present the subdomain in natural search results? These are questions for another day while we see if mlb.com will play ball.

Join us for Search Engine Strategies in London, February 13-15, at ExCel London.

Want more search information? ClickZ SEM Archives contain all our search columns, organized by topic.

Related Articles

Customer reviews: The not-so-secret SEO tactic

Content Marketing Customer reviews: The not-so-secret SEO tactic

2m Mike O'Brien
How to carry out an effective PPC competitor analysis

Paid Search How to carry out an effective PPC competitor analysis

2m Clark Boyd
Are your keyword research tools ready for Google's next update?

Search Tools Are your keyword research tools ready for Google's next update?

3m Clark Boyd
What does visual search mean for ecommerce in 2017?

Ecommerce What does visual search mean for ecommerce in 2017?

5m Chris Camps
What does voice search mean for your local SEO strategy?

Ecommerce What does voice search mean for your local SEO strategy?

7m Chris Camps
Attribution, integration and replication: The challenges facing advertisers in the digital age

Digital Advertising Attribution, integration and replication: The challenges facing advertisers in the digital age

7m Andrew Warren-Payne
The three best B2B marketing channels for small businesses (and how to use them)

Content Marketing The three best B2B marketing channels for small businesses (and how to use them)

7m Chris Camps
Sears: The Holy Grail is marrying data, mobile, marketing and merchandising

Analytics Sears: The Holy Grail is marrying data, mobile, marketing and merchandising

8m Andy Favell