Site Extension: Big Data in Practice

“Big data” promises a leap forward in online advertising sophistication, but many media buyers are struggling to bridge the gap from nice-to-know analytics to actionable insights. The claim is that massive volumes of semi-structured data coupled with highly scalable computing platforms enable advertisers to answer questions that were previously prohibitively complex, and then leverage those insights to refine media buying strategies. At a macro level, this line of logic appears sound, but putting theory into practice has many advertisers scratching their heads. What specific questions can big data answer, and how can those insights be applied to digital media buying?

One powerful application of big data analytics is the development of custom content channels to meet specific advertising goals. Imagine an advertiser who has found that impressions served on are very effective in meeting campaign objectives (high response rate, high brand recall). The advertiser wants to allocate additional budget to and would even be willing to pay a premium rate for incremental impressions. But at a certain point, won’t be able to offer any additional scale. Where else should this advertiser buy ad space, and what inventory is most likely to perform well?

The traditional approach is to manually construct a custom channel of sites that are similar to by scouring the comScore 500 for sites that have motorcycle-related content. Media buyers may also use tools like Nielsen’s @Plan to identify sites in related verticals like action sports and autos. These manual efforts are highly time-intensive and imprecise, and would inevitably fail to identify some high-quality sites. Additionally, deploying manually-constructed custom channels typically requires multiple iterations of testing and refinement to achieve acceptable performance. Big data analytics can provide a much more robust solution.

“Site extension” is the big data approach to this problem – a method of precisely and efficiently reaching target audiences across online display. At the most basic level, the concept of site extension is to identify a custom network of sites whose visitor populations are very similar. If a campaign is successful on one site in the network, that performance can then be extended across the other similar sites. The solution requires two key ingredients – a massive data warehouse containing information on billions of monthly impressions and a flexible analytics platform that enables event-level reporting.

Here’s how this works for The site extension process first defines a “seed group” of users who frequently visit The process then iterates through every other site that the seed group visits (hundreds of thousands of them) and assigns a quality score to each, based on how effectively that site attracts the seed group. The results fall into three categories:

  • Some sites (like very rarely attract users from the seed group.
  • Other sites (like attract a large portion of the seed group, but also attract many users who aren’t in the seed group.
  • Some sites (like emerge that attract an audience that is highly overlapped with the seed group, in that they attract a large portion of the seed group and those users represent a large percentage of the total visitor population. These sites belong to the network.


In some cases, these overlapping sites are obvious and would likely have been identified by a traditional manual approach. However, many sites identified by the site extension process would likely be missed by a manual approach either because the site is small (like or not obviously correlated with motorcycle enthusiasts (like In addition to improved completeness, the site extension approach has a precision advantage. Media buyers who take a manual approach to building content channels often struggle to understand which sites in the network perform best and which perform worst. The site extension process assigns a quality score to each site based on how effectively it attracts the seed group. The advertiser can make informed upfront decisions about which sites should be included in the network, and select only those sites with a high quality score.

In spite of the massive computational complexity of the site extension approach, sophisticated analytics platforms can complete the total job in minutes. The output is a rigorously defined custom channel that can be immediately applied to exchange-based campaigns. Site extension insights can also inform broader media buying decisions by prioritizing potential upfront deals and assessing appropriate pricing.

Site extension is just one of many applications of big data analytics currently being developed in the advertising space. Expect many more similar tools to emerge in the coming months. Ad tech companies are constantly fielding questions from advertisers that would have been unanswerable 12 months ago. Keep the questions coming, and we’ll keep finding new ways to leverage big data to answer them.

Related reading

An image of a pie chart styled as a medal, with stats about video content underneath.
A series of green arrows pointing into the distance, one reading NO ADS in white block capitals.