Comparing Google Sitelinks With Google Analytics Data

Google Sitelinks are somewhat of a mystery. Google does not publish the algorithm (or give good hints) for how it generates Sitelinks. However, as we all look at the Sitelinks for our sites, we immediately realize that Google has cleverly found some of the most popular pages on a site and surfaced them to the search results. 


In Google’s documentation on Sitelinks, they do not elaborate on their algorithm for selection of Sitelinks, but they do give the following clue.

“Our systems analyze the link structure of your site to find shortcuts that will save users time and allow them to quickly find the information they’re looking for.”

Often Sitelinks appear on branded searches and therefore the search term does not always give Google evidence on the intent of the visitor. So we have to guess how Google would determine how to “save users time” without knowing exactly what the user is looking for.

To help determine how Google determines which pages appear as Sitelinks I decided to look at the Google Analytics data on the Sitelink pages for 10 websites. Examining Web analytics data on site content to try and “save users time” is something we do on a daily basis. My initial hunch is that Google has a clever way of determining the popularity of site pages without knowing the website analytics. To better understand Google’s algorithm and test my own assumptions I decided to compare the popularity of pages that appear in Sitelinks with Google Analytics data. First I made a small index of websites across multiple industries (i.e. finance, high technology, enterprise software, health care, professional services, e-commerce, etc). Then I examined data points that help determine popularity such as pageviews, landing pageviews, repeat visits, time on page, and assisted conversions for each page that appears in the Sitelinks for that site.

For the first data points, I looked at site pages that were top ranked at acquiring organic traffic and referrer traffic. Below are three examples of the sites and how the pages ranked.

Company 1  Ranking in Landing Pages  Ranking in SEO Landing Pages  Ranking in Referral Landing Pages 
 1st Sitelink 1st 1st 1st
 2nd Sitelink 7th 7th  5th 
 3rd Sitelink 5th  5th  3rd 
 4th Sitelink 30th  27th  47th 
 5th Sitelink 8th  30th  21st 
 6th Sitelink 46th  29th 



Company 2  Ranking in Landing Pages  Ranking in SEO Landing Pages  Ranking in Referral Landing Pages 
 1st Sitelink  3rd 1st  4th
 2nd Sitelink  19th 4th  43rd 
 3rd Sitelink  2nd 2nd  3rd 
 4th Sitelink 27th  6th  45th 
 5th Sitelink 45th  13th  121st 
 6th Sitelink 11th  3rd  8th


Company 3  Ranking in Landing Pages  Ranking in SEO Landing Pages  Ranking in Referral Landing Pages 
 1st Sitelink 51st 39th 49th
 2nd Sitelink 3rd 7th 5th
 3rd Sitelink 7th 25th 7th
 4th Sitelink 35th 57th 70th

Looking at this data it becomes quickly apparent that Sitelinks are not directly correlated with a page’s ability to acquire traffic through any organic or referral channel. There are some pages that appear to be closely correlated and others that do not seem to be correlated with traffic acquisition at all. For example, the page that appears as the first Sitelink on Company 3 is the 39th most popular SEO landing page on the site and the 49th most popular referral landing page on the site. 

Next I looked at engagement data such as pageviews, a page’s ability to drive repeat traffic, time on page, bounce rates, and exit rates. The following are examples of some of the engagement data of the Sitelink pages.

Company 1  Ranking in Pageviews  Time on Page  Repeat Visits  Exit Rate 
 1st Sitelink 4th 0:53  23%  1.1% 
2nd Sitelink 8th 1:57  38%  1.5% 
 3rd Sitelink 6th  1:22  19%  2.5% 
 4th Sitelink 11th  1:44  19%  8.3% 
 5th Sitelink 19th  3:24  34%  40% 
 6th Sitelink 27th  1:46  41%  25%


 Company 2 Ranking in Pageviews Time on Page Repeat Visits Exit Rate

   1st Sitelink

2nd 2:49 32% 44%
  2nd Sitelink 10th 3:28 29% 53%
   3rd Sitelink 1st 3:05 41% 53%
   4th Sitelink 17th 0:37 21% 51%
   5th Sitelink 7th 1:24 57% 21%
   6th Sitelink 5th 3:00 67% 59%


Company 3  Ranking in Pageviews  Time on Page  Repeat Visits  Exit Rate 
 1st Sitelink 17th 4:53 17% 32%
 2nd Sitelink 1st 6:21 20% 48% 
 3rd Sitelink 3rd 3:47 21% 33%
 4th Sitelink 7th 5:17 19% 39%

Similar to the traffic acquisition data, we see loose affiliations with the engagement data points and a page’s status as a Sitelink as well as some outliers in the data. After looking at traffic acquisition data and engagement data for each of the Sitelink pages across 10 sites, I also looked how well the Sitelink pages assisted in goal conversions as well as auxiliary data points such as page depth and backlinks. In every data point I came to the same conclusion: there appears to be a loose affiliation (not a direct one) and there are always outlier records.

What could this mean?

Google advertises that in determining relevance in search results the popular search engine analyzes more than 100 different website and Web page attributes, including everything from page load times to content freshness to social backlinks. It’s very possible that in the determination of Sitelinks Google is analyzing a combination of data points together that determine the overall desirability of the page to the user. This would explain why many of the Sitelink pages correlated closely with healthy rankings in pageviews, landing pageviews, time on page, and low exit rates. There is still the unanswered question of why Google left some pages out of Sitelinks on each site that ranked very favorably in the same data points. There is also the question as to why with each site we saw outliers in the Sitelinks. These outliers did not seem to rank well in important data points such as landing pageviews, SEO ranking, bounce rates, or pageviews.

But What About the Outliers?

With each site and each set of Sitelinks we see outliers. That is, Google is including pages in the Sitelinks that don’t appear to be the healthiest pages in traffic acquisition, engagement, or any other supporting data point. Why is Google selecting these pages? Below are three possible explanations:

  1. Something for Everyone: Google is trying to provide convenient links deeper into a site for different audiences and so the outliers are determined for the variety that they provide. 
  2. Search Trends: With each group of Sitelinks Google provides one page that may not have healthy usage data but it correlates well with new search trends and search phrase popularity.
  3. Testing: Google is selecting pages on sites somewhat randomly to test the click-through ratios of those pages within the Sitelinks. This might be Google’s way of realizing the corporate site navigation isn’t always the most effective at surfacing great content. If Google can A/B/n test Sitelinks to this deeper content they might be able to find great content deep within a site without estimating usage data. 

The method that Google utilizes to determine Sitelinks is still somewhat of a mystery. But we have corollary data that shows that Google is anticipating the popularity and health of pages within a site and in many cases including one Sitelink that is not typically very popular on the site. Google will likely continue to tweak the Sitelinks algorithm over time and as they do, they will expose more of the attributes of their algorithms. For now, it seems that the best method we have for urging Google to use specific Sitelinks is to position the pages within the site with the right links and content to give them healthy traffic acquisition and healthy content engagement attributes.

Related reading

Big Data & Travel
Flat design modern vector illustration concept of website analytics search information.