In an effort to discover Google's methods for determining what appears in Sitelinks, here is an in-depth look at the Google Analytics data on the Sitelink pages for 10 websites.
Google Sitelinks are somewhat of a mystery. Google does not publish the algorithm (or give good hints) for how it generates Sitelinks. However, as we all look at the Sitelinks for our sites, we immediately realize that Google has cleverly found some of the most popular pages on a site and surfaced them to the search results.
In Google's documentation on Sitelinks, they do not elaborate on their algorithm for selection of Sitelinks, but they do give the following clue.
"Our systems analyze the link structure of your site to find shortcuts that will save users time and allow them to quickly find the information they're looking for."
Often Sitelinks appear on branded searches and therefore the search term does not always give Google evidence on the intent of the visitor. So we have to guess how Google would determine how to "save users time" without knowing exactly what the user is looking for.
To help determine how Google determines which pages appear as Sitelinks I decided to look at the Google Analytics data on the Sitelink pages for 10 websites. Examining Web analytics data on site content to try and "save users time" is something we do on a daily basis. My initial hunch is that Google has a clever way of determining the popularity of site pages without knowing the website analytics. To better understand Google's algorithm and test my own assumptions I decided to compare the popularity of pages that appear in Sitelinks with Google Analytics data. First I made a small index of websites across multiple industries (i.e. finance, high technology, enterprise software, health care, professional services, e-commerce, etc). Then I examined data points that help determine popularity such as pageviews, landing pageviews, repeat visits, time on page, and assisted conversions for each page that appears in the Sitelinks for that site.
For the first data points, I looked at site pages that were top ranked at acquiring organic traffic and referrer traffic. Below are three examples of the sites and how the pages ranked.
|Company 1||Ranking in Landing Pages||Ranking in SEO Landing Pages||Ranking in Referral Landing Pages|
|Company 2||Ranking in Landing Pages||Ranking in SEO Landing Pages||Ranking in Referral Landing Pages|
|Company 3||Ranking in Landing Pages||Ranking in SEO Landing Pages||Ranking in Referral Landing Pages|
Looking at this data it becomes quickly apparent that Sitelinks are not directly correlated with a page's ability to acquire traffic through any organic or referral channel. There are some pages that appear to be closely correlated and others that do not seem to be correlated with traffic acquisition at all. For example, the page that appears as the first Sitelink on Company 3 is the 39th most popular SEO landing page on the site and the 49th most popular referral landing page on the site.
Next I looked at engagement data such as pageviews, a page's ability to drive repeat traffic, time on page, bounce rates, and exit rates. The following are examples of some of the engagement data of the Sitelink pages.
|Company 1||Ranking in Pageviews||Time on Page||Repeat Visits||Exit Rate|
|Company 2||Ranking in Pageviews||Time on Page||Repeat Visits||Exit Rate|
|Company 3||Ranking in Pageviews||Time on Page||Repeat Visits||Exit Rate|
Similar to the traffic acquisition data, we see loose affiliations with the engagement data points and a page's status as a Sitelink as well as some outliers in the data. After looking at traffic acquisition data and engagement data for each of the Sitelink pages across 10 sites, I also looked how well the Sitelink pages assisted in goal conversions as well as auxiliary data points such as page depth and backlinks. In every data point I came to the same conclusion: there appears to be a loose affiliation (not a direct one) and there are always outlier records.
What could this mean?
Google advertises that in determining relevance in search results the popular search engine analyzes more than 100 different website and Web page attributes, including everything from page load times to content freshness to social backlinks. It's very possible that in the determination of Sitelinks Google is analyzing a combination of data points together that determine the overall desirability of the page to the user. This would explain why many of the Sitelink pages correlated closely with healthy rankings in pageviews, landing pageviews, time on page, and low exit rates. There is still the unanswered question of why Google left some pages out of Sitelinks on each site that ranked very favorably in the same data points. There is also the question as to why with each site we saw outliers in the Sitelinks. These outliers did not seem to rank well in important data points such as landing pageviews, SEO ranking, bounce rates, or pageviews.
But What About the Outliers?
With each site and each set of Sitelinks we see outliers. That is, Google is including pages in the Sitelinks that don't appear to be the healthiest pages in traffic acquisition, engagement, or any other supporting data point. Why is Google selecting these pages? Below are three possible explanations:
The method that Google utilizes to determine Sitelinks is still somewhat of a mystery. But we have corollary data that shows that Google is anticipating the popularity and health of pages within a site and in many cases including one Sitelink that is not typically very popular on the site. Google will likely continue to tweak the Sitelinks algorithm over time and as they do, they will expose more of the attributes of their algorithms. For now, it seems that the best method we have for urging Google to use specific Sitelinks is to position the pages within the site with the right links and content to give them healthy traffic acquisition and healthy content engagement attributes.
On the heels of a fantastic event in New York City, ClickZ Live is taking the fun and learning to Toronto, June 23-25. With over 15 years' experience delivering industry-leading events, ClickZ Live offers an action-packed, educationally-focused agenda covering all aspects of digital marketing. Register today!
Want to learn more? Join us at ClickZ Live San Francisco, Aug 10-12!
Educating marketers for over 15 years, ClickZ Live brings together industry thought leaders from the largest brands and agencies to deliver the most advanced, educational digital marketing agenda. Register today and save $500!
Mark leads the analyst team to develop ROI goals, data strategies, digital channel reporting, and establish processes for data analysis for EXTRACTABLE clients. Since joining EXTRACTBLE 14 years ago, he has worked on numerous high-profile websites including Yahoo, DirecTV, Visa, FedEx, and HTC. The most trafficked web page that he's ever worked on received 15 million unique visitors in one day, he has run analytics analysis on over 150 sites, and the biggest ROI he's ever seen on a corporate website redesign was > 800 percent. He is an active member of the Digital Analytics Association and has contributed to the DAA Education Committee for over five years.
Hong Kong, May 5-6, 2015
Gartner Magic Quadrant for Digital Commerce
This Magic Quadrant examines leading digital commerce platforms that enable organizations to build digital commerce sites. These commerce platforms facilitate purchasing transactions over the Web, and support the creation and continuing development of an online relationship with a consumer.
Paid Search in the Mobile Era
Google reports that paid search ads are currently driving 40+ million calls per month. Cost per click is increasing, paid search budgets are growing, and mobile continues to dominate. It's time to revamp old search strategies, reimagine stale best practices, and add new layers data to your analytics.
May 6, 2015
12:00pm ET/9:00am PT