Comparing Google Sitelinks With Google Analytics Data
In an effort to discover Google's methods for determining what appears in Sitelinks, here is an in-depth look at the Google Analytics data on the Sitelink pages for 10 websites.
In an effort to discover Google's methods for determining what appears in Sitelinks, here is an in-depth look at the Google Analytics data on the Sitelink pages for 10 websites.
Google Sitelinks are somewhat of a mystery. Google does not publish the algorithm (or give good hints) for how it generates Sitelinks. However, as we all look at the Sitelinks for our sites, we immediately realize that Google has cleverly found some of the most popular pages on a site and surfaced them to the search results.
In Google’s documentation on Sitelinks, they do not elaborate on their algorithm for selection of Sitelinks, but they do give the following clue.
“Our systems analyze the link structure of your site to find shortcuts that will save users time and allow them to quickly find the information they’re looking for.”
Often Sitelinks appear on branded searches and therefore the search term does not always give Google evidence on the intent of the visitor. So we have to guess how Google would determine how to “save users time” without knowing exactly what the user is looking for.
To help determine how Google determines which pages appear as Sitelinks I decided to look at the Google Analytics data on the Sitelink pages for 10 websites. Examining Web analytics data on site content to try and “save users time” is something we do on a daily basis. My initial hunch is that Google has a clever way of determining the popularity of site pages without knowing the website analytics. To better understand Google’s algorithm and test my own assumptions I decided to compare the popularity of pages that appear in Sitelinks with Google Analytics data. First I made a small index of websites across multiple industries (i.e. finance, high technology, enterprise software, health care, professional services, e-commerce, etc). Then I examined data points that help determine popularity such as pageviews, landing pageviews, repeat visits, time on page, and assisted conversions for each page that appears in the Sitelinks for that site.
For the first data points, I looked at site pages that were top ranked at acquiring organic traffic and referrer traffic. Below are three examples of the sites and how the pages ranked.
Company 1 | Ranking in Landing Pages | Ranking in SEO Landing Pages | Ranking in Referral Landing Pages |
1st Sitelink | 1st | 1st | 1st |
2nd Sitelink | 7th | 7th | 5th |
3rd Sitelink | 5th | 5th | 3rd |
4th Sitelink | 30th | 27th | 47th |
5th Sitelink | 8th | 30th | 21st |
6th Sitelink | 46th | 29th | 13th |
Company 2 | Ranking in Landing Pages | Ranking in SEO Landing Pages | Ranking in Referral Landing Pages |
1st Sitelink | 3rd | 1st | 4th |
2nd Sitelink | 19th | 4th | 43rd |
3rd Sitelink | 2nd | 2nd | 3rd |
4th Sitelink | 27th | 6th | 45th |
5th Sitelink | 45th | 13th | 121st |
6th Sitelink | 11th | 3rd | 8th |
Company 3 | Ranking in Landing Pages | Ranking in SEO Landing Pages | Ranking in Referral Landing Pages |
1st Sitelink | 51st | 39th | 49th |
2nd Sitelink | 3rd | 7th | 5th |
3rd Sitelink | 7th | 25th | 7th |
4th Sitelink | 35th | 57th | 70th |
Looking at this data it becomes quickly apparent that Sitelinks are not directly correlated with a page’s ability to acquire traffic through any organic or referral channel. There are some pages that appear to be closely correlated and others that do not seem to be correlated with traffic acquisition at all. For example, the page that appears as the first Sitelink on Company 3 is the 39th most popular SEO landing page on the site and the 49th most popular referral landing page on the site.
Next I looked at engagement data such as pageviews, a page’s ability to drive repeat traffic, time on page, bounce rates, and exit rates. The following are examples of some of the engagement data of the Sitelink pages.
Company 1 | Ranking in Pageviews | Time on Page | Repeat Visits | Exit Rate |
1st Sitelink | 4th | 0:53 | 23% | 1.1% |
2nd Sitelink | 8th | 1:57 | 38% | 1.5% |
3rd Sitelink | 6th | 1:22 | 19% | 2.5% |
4th Sitelink | 11th | 1:44 | 19% | 8.3% |
5th Sitelink | 19th | 3:24 | 34% | 40% |
6th Sitelink | 27th | 1:46 | 41% | 25% |
Company 2 | Ranking in Pageviews | Time on Page | Repeat Visits | Exit Rate |
1st Sitelink | 2nd | 2:49 | 32% | 44% |
2nd Sitelink | 10th | 3:28 | 29% | 53% |
3rd Sitelink | 1st | 3:05 | 41% | 53% |
4th Sitelink | 17th | 0:37 | 21% | 51% |
5th Sitelink | 7th | 1:24 | 57% | 21% |
6th Sitelink | 5th | 3:00 | 67% | 59% |
Company 3 | Ranking in Pageviews | Time on Page | Repeat Visits | Exit Rate |
1st Sitelink | 17th | 4:53 | 17% | 32% |
2nd Sitelink | 1st | 6:21 | 20% | 48% |
3rd Sitelink | 3rd | 3:47 | 21% | 33% |
4th Sitelink | 7th | 5:17 | 19% | 39% |
Similar to the traffic acquisition data, we see loose affiliations with the engagement data points and a page’s status as a Sitelink as well as some outliers in the data. After looking at traffic acquisition data and engagement data for each of the Sitelink pages across 10 sites, I also looked how well the Sitelink pages assisted in goal conversions as well as auxiliary data points such as page depth and backlinks. In every data point I came to the same conclusion: there appears to be a loose affiliation (not a direct one) and there are always outlier records.
What could this mean?
Google advertises that in determining relevance in search results the popular search engine analyzes more than 100 different website and Web page attributes, including everything from page load times to content freshness to social backlinks. It’s very possible that in the determination of Sitelinks Google is analyzing a combination of data points together that determine the overall desirability of the page to the user. This would explain why many of the Sitelink pages correlated closely with healthy rankings in pageviews, landing pageviews, time on page, and low exit rates. There is still the unanswered question of why Google left some pages out of Sitelinks on each site that ranked very favorably in the same data points. There is also the question as to why with each site we saw outliers in the Sitelinks. These outliers did not seem to rank well in important data points such as landing pageviews, SEO ranking, bounce rates, or pageviews.
But What About the Outliers?
With each site and each set of Sitelinks we see outliers. That is, Google is including pages in the Sitelinks that don’t appear to be the healthiest pages in traffic acquisition, engagement, or any other supporting data point. Why is Google selecting these pages? Below are three possible explanations:
The method that Google utilizes to determine Sitelinks is still somewhat of a mystery. But we have corollary data that shows that Google is anticipating the popularity and health of pages within a site and in many cases including one Sitelink that is not typically very popular on the site. Google will likely continue to tweak the Sitelinks algorithm over time and as they do, they will expose more of the attributes of their algorithms. For now, it seems that the best method we have for urging Google to use specific Sitelinks is to position the pages within the site with the right links and content to give them healthy traffic acquisition and healthy content engagement attributes.