Canonicalization Made Simple

The road to short and sweet search URLs.

Technically speaking, canonicalization is “the process of converting data that has more than one possible representation into a ‘standardized’ canonical representation.”

Search engine algorithms include a mathematical equation that compares different representations for similarity, counting the number of distinct data structures, to impose a meaningful, canonical sorting order.

That makes sense… right? Maybe for software engineers, computer programmers, math majors, and the like. But let’s make this a bit simpler.

Plainly speaking, search engines like Google use a canonicalization process to present users with short and sweet URLs. Think about this for a moment and consider which URL the average user would most likely click on when presented with these choices:

  • www.yourdomain.com
  • yourdomain.com/nasapp/index.jsp?
  • http://www.yourdomain.com/home.jsp;
    jsessionid=UJ2LLSBRQH4VMCWQNWRSCOYK0BW0IIWE?
    _requestid=55555

If you believe Google’s canonical preference would be www.yourdomain.com, even when all three URLs arrive at the same destination, you can proudly say you understand the fundamentals of canonicalization.

Let’s take a look at the major search engines’ canonical preferences more closely to try to determine what other factors go into determining which URL is presented in search query results.

For the sake of discussion, let’s complete a search for “milwaukee brewers” in Google, Yahoo, and MSN to compare the results.

Google offers the following top results:

The Official Site of The Milwaukee Brewers: Homepage
Features scores, game schedules, roster, news, history and forums.
brewers.mlb.com/ – 78k – CachedSimilar pages
Schedule : 2007 Brewers Schedulemilwaukee.brewers.mlb.com/NASApp/mlb/s…
Active Rostermilwaukee.brewers.mlb.com/…/roster_active.jsp?c_id=mil
Ticket Centermilwaukee.brewers.mlb.com/…/ticketing/index.jsp?c_id=mil
Help : Job Opportunitiesmlb.mlb.com/NASApp/mlb/mlb/help/jobs.jsp?c_id=mil
More results from brewers.mlb.com »

Yahoo offers the following top result:

Milwaukee Brewers
Official site of the Milwaukee Brewers. Features up-to-date stats and results, player bios, minor league information, ticket and merchandise ordering info, player …
Category: Major League Baseball > Milwaukee Brewers
www. milwaukeebrewers.com79kCachedMore from this site

And MSN Live Search offers the following top results:

Milwaukee Brewers : The Official Site
MLB Sites MLB.com Angels Astros Athletics Blue Jays Braves Brewers Cardinals Cubs Devil Rays Diamondbacks Dodgers Giants Indians Mariners Marlins Mets Nationals Orioles Padres Phillies Pirates Rangers …
www. brewers.mlb.com

Note that no one top result is more relevant than the other. All indexed listings resolve to http://milwaukee.brewers.mlb.com/index.jsp?c_id=mil by way of a temporary redirect (302).

Why, then, is one domain displayed in Google and MSN and another in Yahoo for the same result? Are the Milwaukee Brewers spoofing the search engines using temporary redirects and multiple domains?

Not exactly. Canonicalization processes simply level the playing field. These algorithmic elements vary from search engine to search engine.

Google knows the two domains are exactly the same and treats them as such when it comes to inbound links. Using query string commands, Google reveals it acknowledges 2,200 links to both link:brewers.mlb.com and link:www.milwaukeebrewers.com.

A lot of SEO (define) folks have talked about Google’s preference for subdomains. This is proof of that preference because that’s how the site’s actually crawled and indexed. Do a query for “site:brewers.mlb.com” and you’ll get some 7,880 pages. Do the same for “site:www.milwaukeebrewers.com,” and you’ll get “did not match any documents.”

To provide users with its preferred results, Google relegates www.milwaukeebrewers.com to its no man’s land of non-indexation. Google canonically prefers to display the pretty little subdomain, brewers.mlb.com, as its most relevant result for a “milwaukee brewers” search query.

MSN Live Search just isn’t as bright when it comes to algorithmic adjustments. It indexes nearly 1,300 pages of “site:brewers.mlb.com” and six pages of “site:www.milwaukeebrewers.com“. Its algorithms credit “link:www.milwaukeebrewers.com” with nearly 14,000 inbound links and “link:brewers.mlb.com” with over 14,000. MSN Live Search duplicates its own results by including the non-canonical URL in the results.

Getting any bright ideas about MSN Live Search, subdomains, and temporary redirects? Small wonder MSN Live Search has its filters set to “high” to stop spamming itself and present any semblance of canonicalization.

The question that remains is Yahoo’s preference forbrewers.mlb.com over its subdomain counterpart, brewers.mlb.com. Based on information from Yahoo Site Explorer, brewers.mlb.com has 735 pages indexed and 228 inbound links. Meanwhile, www.milwaukeebrewers.com has 45 pages indexed and 6,331 inbound links.

Should Webmasters redesign their sites to include subdomains if they want to make headway in Google and MSN Live Search? Absolutely not. Subdomains are not a secret weapon for improved indexation.

Subdomains do make sense, however, when each subsection of a top-level domain contains completely unique content addressing different topics, such as the collection of baseball teams at mlb.com.

It would be interesting to test the best way to shift canonicalization processes in the major search engines. Would submitting the top-level domain as the preferred result influence Google and MSN Live Search indexation? Could XML sitemap feeds encourage Yahoo to present the subdomain in natural search results? These are questions for another day while we see if mlb.com will play ball.

Join us for Search Engine Strategies in London, February 13-15, at ExCel London.

Want more search information? ClickZ SEM Archives contain all our search columns, organized by topic.

Subscribe to get your daily business insights

Whitepapers

US Mobile Streaming Behavior
Whitepaper | Mobile

US Mobile Streaming Behavior

5y

US Mobile Streaming Behavior

Streaming has become a staple of US media-viewing habits. Streaming video, however, still comes with a variety of pesky frustrations that viewers are ...

View resource
Winning the Data Game: Digital Analytics Tactics for Media Groups
Whitepaper | Analyzing Customer Data

Winning the Data Game: Digital Analytics Tactics for Media Groups

5y

Winning the Data Game: Digital Analytics Tactics f...

Data is the lifeblood of so many companies today. You need more of it, all of which at higher quality, and all the meanwhile being compliant with data...

View resource
Learning to win the talent war: how digital marketing can develop its people
Whitepaper | Digital Marketing

Learning to win the talent war: how digital marketing can develop its peopl...

2y

Learning to win the talent war: how digital market...

This report documents the findings of a Fireside chat held by ClickZ in the first quarter of 2022. It provides expert insight on how companies can ret...

View resource
Engagement To Empowerment - Winning in Today's Experience Economy
Report | Digital Transformation

Engagement To Empowerment - Winning in Today's Experience Economy

2m

Engagement To Empowerment - Winning in Today's Exp...

Customers decide fast, influenced by only 2.5 touchpoints – globally! Make sure your brand shines in those critical moments. Read More...

View resource