Canonicalization Made Simple

  |  February 14, 2007   |  Comments

The road to short and sweet search URLs.

Technically speaking, canonicalization is "the process of converting data that has more than one possible representation into a 'standardized' canonical representation."

Search engine algorithms include a mathematical equation that compares different representations for similarity, counting the number of distinct data structures, to impose a meaningful, canonical sorting order.

That makes sense... right? Maybe for software engineers, computer programmers, math majors, and the like. But let's make this a bit simpler.

Plainly speaking, search engines like Google use a canonicalization process to present users with short and sweet URLs. Think about this for a moment and consider which URL the average user would most likely click on when presented with these choices:

  • www.yourdomain.com

  • yourdomain.com/nasapp/index.jsp?

  • http://www.yourdomain.com/home.jsp;
    jsessionid=UJ2LLSBRQH4VMCWQNWRSCOYK0BW0IIWE?
    _requestid=55555

If you believe Google's canonical preference would be www.yourdomain.com, even when all three URLs arrive at the same destination, you can proudly say you understand the fundamentals of canonicalization.

Let's take a look at the major search engines' canonical preferences more closely to try to determine what other factors go into determining which URL is presented in search query results.

For the sake of discussion, let's complete a search for "milwaukee brewers" in Google, Yahoo, and MSN to compare the results.

Google offers the following top results:

The Official Site of The Milwaukee Brewers: Homepage
Features scores, game schedules, roster, news, history and forums.
brewers.mlb.com/ - 78k - Cached - Similar pages
Schedule : 2007 Brewers Schedule - milwaukee.brewers.mlb.com/NASApp/mlb/s...
Active Roster - milwaukee.brewers.mlb.com/.../roster_active.jsp?c_id=mil
Ticket Center - milwaukee.brewers.mlb.com/.../ticketing/index.jsp?c_id=mil
Help : Job Opportunities - mlb.mlb.com/NASApp/mlb/mlb/help/jobs.jsp?c_id=mil
More results from brewers.mlb.com »

Yahoo offers the following top result:

Milwaukee Brewers
Official site of the Milwaukee Brewers. Features up-to-date stats and results, player bios, minor league information, ticket and merchandise ordering info, player ...
Category: Major League Baseball > Milwaukee Brewers
www. milwaukeebrewers.com - 79k - Cached - More from this site

And MSN Live Search offers the following top results:

Milwaukee Brewers : The Official Site
MLB Sites MLB.com Angels Astros Athletics Blue Jays Braves Brewers Cardinals Cubs Devil Rays Diamondbacks Dodgers Giants Indians Mariners Marlins Mets Nationals Orioles Padres Phillies Pirates Rangers ...
www. brewers.mlb.com

Note that no one top result is more relevant than the other. All indexed listings resolve to http://milwaukee.brewers.mlb.com/index.jsp?c_id=mil by way of a temporary redirect (302).

Why, then, is one domain displayed in Google and MSN and another in Yahoo for the same result? Are the Milwaukee Brewers spoofing the search engines using temporary redirects and multiple domains?

Not exactly. Canonicalization processes simply level the playing field. These algorithmic elements vary from search engine to search engine.

Google knows the two domains are exactly the same and treats them as such when it comes to inbound links. Using query string commands, Google reveals it acknowledges 2,200 links to both link:brewers.mlb.com and link:www.milwaukeebrewers.com.

A lot of SEO (define) folks have talked about Google's preference for subdomains. This is proof of that preference because that's how the site's actually crawled and indexed. Do a query for "site:brewers.mlb.com" and you'll get some 7,880 pages. Do the same for "site:www.milwaukeebrewers.com," and you'll get "did not match any documents."

To provide users with its preferred results, Google relegates www.milwaukeebrewers.com to its no man's land of non-indexation. Google canonically prefers to display the pretty little subdomain, brewers.mlb.com, as its most relevant result for a "milwaukee brewers" search query.

MSN Live Search just isn't as bright when it comes to algorithmic adjustments. It indexes nearly 1,300 pages of "site:brewers.mlb.com" and six pages of "site:www.milwaukeebrewers.com". Its algorithms credit "link:www.milwaukeebrewers.com" with nearly 14,000 inbound links and "link:brewers.mlb.com" with over 14,000. MSN Live Search duplicates its own results by including the non-canonical URL in the results.

Getting any bright ideas about MSN Live Search, subdomains, and temporary redirects? Small wonder MSN Live Search has its filters set to "high" to stop spamming itself and present any semblance of canonicalization.

The question that remains is Yahoo's preference forbrewers.mlb.com over its subdomain counterpart, brewers.mlb.com. Based on information from Yahoo Site Explorer, brewers.mlb.com has 735 pages indexed and 228 inbound links. Meanwhile, www.milwaukeebrewers.com has 45 pages indexed and 6,331 inbound links.

Should Webmasters redesign their sites to include subdomains if they want to make headway in Google and MSN Live Search? Absolutely not. Subdomains are not a secret weapon for improved indexation.

Subdomains do make sense, however, when each subsection of a top-level domain contains completely unique content addressing different topics, such as the collection of baseball teams at mlb.com.

It would be interesting to test the best way to shift canonicalization processes in the major search engines. Would submitting the top-level domain as the preferred result influence Google and MSN Live Search indexation? Could XML sitemap feeds encourage Yahoo to present the subdomain in natural search results? These are questions for another day while we see if mlb.com will play ball.

Join us for Search Engine Strategies in London, February 13-15, at ExCel London.

Want more search information? ClickZ SEM Archives contain all our search columns, organized by topic.

ClickZ Live Chicago Join the Industry's Leading eCommerce & Direct Marketing Experts in Chicago
ClickZ Live Chicago (Nov 3-6) will deliver over 50 sessions across 4 days and 10 individual tracks, including Data-Driven Marketing, Social, Mobile, Display, Search and Email. Check out the full agenda and register by Friday, Oct 3 to take advantage of Early Bird Rates!

ABOUT THE AUTHOR

P.J. Fusco

P.J. Fusco has been working in the Internet industry since 1996 when she developed her first SEM service while acting as general manager for a regional ISP. She was the SEO manager for Jupitermedia and has performed as the SEM manager for an international health and beauty dot-com corporation generating more than $1 billion a year in e-commerce sales. Today, she is director for natural search for Netconcepts, a cutting-edge SEO firm with offices in Madison, WI, and Auckland, New Zealand.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Search newsletter delivered to you. Subscribe today!

COMMENTS

UPCOMING EVENTS

Featured White Papers

IBM: Social Analytics - The Science Behind Social Media Marketing

IBM Social Analytics: The Science Behind Social Media Marketing
80% of internet users say they prefer to connect with brands via Facebook. 65% of social media users say they use it to learn more about brands, products and services. Learn about how to find more about customers' attitudes, preferences and buying habits from what they say on social media channels.

An Introduction to Marketing Attribution: Selecting the Right Model for Search, Display & Social Advertising

An Introduction to Marketing Attribution: Selecting the Right Model for Search, Display & Social Advertising
If you're considering implementing a marketing attribution model to measure and optimize your programs, this paper is a great introduction. It also includes real-life tips from marketers who have successfully implemented attribution in their organizations.

Jobs

    • Tier 1 Support Specialist
      Tier 1 Support Specialist (Agora Inc.) - BaltimoreThis position requires a highly motivated and multifaceted individual to contribute to and be...
    • Recent Grads: Customer Service Representative
      Recent Grads: Customer Service Representative (Agora Financial) - BaltimoreAgora Financial, one of the nation's largest independent publishers...
    • Managing Editor
      Managing Editor (Common Sense Publishing) - BaltimoreWE’RE HIRING: WE NEED AN AMAZING EDITOR TO POLISH WORLD-CLASS CONTENT   The Palm...