Recently I attended the Association for Computing Machinery’s (ACM) annual Knowledge Discovery and Data Mining (KDD’11) conference in San Diego to present a research paper on data-driven attribution modeling. The idea of data-driven attribution started exactly one year ago when I attended KDD’10. I noticed that as the largest data-mining conference, there was a lot of interest in predictive modeling, but much less interest in interpretive modeling. To put it more simply, people spend a lot of effort (R&D cycles, budgets, etc.) on “what” can happen – and not much effort to explain “why” something happens. In digital advertising, to explain “why,” a user takes an action that falls within the area of attribution.
In previous columns, I’ve addressed why the correct attribution model is important (“New Year, New Attribution Model,” “Attribution in Real-Time Audience Targeting“). This month, I’d like to talk about the math behind the attribution model.
Before we dive into the math, let’s explore why interpretive models (which are used for attribution) are so different from predictive models. To illustrate the point, let’s use a pancake and butter example. Let’s say we want to build a predictive model to predict whether someone will gain weight based on their love for pancakes and their love for butter. After some number crunching, the model finds that the love for butter is a much better indicator for weight gain, and that there’s no need to consider the love for pancakes. Now, if we need to explain “why” someone gained weight, we need to take into account the contribution of not only the butter but also the pancakes. Furthermore, the amount of pancakes consumed may be an even greater contribution to weight gain than the butter, even though the butter is a better predictor variable.
The KDD paper that I presented proposed a framework for comparing different attribution models and two specific modeling methods we came up with. In this column, let’s look at one of the two models – the simple probabilistic model approach based on the concept of conditional probabilities. For example, if 100 people have received a green envelope and 30 of those people accept the invitation inside the envelope, then the conditional probability of attendance based on the green envelope is 30 percent.
If we encode every attribute of the ad placement such as layout, creative design, site, time, user demographics, and user behavior, then every ad touch point – online display ads, video ad, search click, and site visit – can be represented by the combination of these attributes. We can then calculate the conditional probabilities of each type of touch point. Knowing that some of these touch points would interact with each other, we would further calculate the joint conditional probability of seeing all the pair-wise combinations of these touch points. Once we obtained all the probability measures based on historical campaign data, we are ready to compute the actual attribution assignment for each incoming user.
While serious readers can read the actual formula from our paper, let me summarize – for each converted user, every touch point leading up to the conversion will be considered. The contribution of each touch point is calculated as the sum of its individual contribution represented by the probability of this touch point, and its contribution to other touch points via joint probabilities. Let’s say for one converted user, email contributes 15 percent alone and makes other channels such as display and video 20 percent better. The overall contribution of email to this conversion is 35 percent.
The contributions of all the touch points are then normalized to ensure they always add up to 100 percent. In other words, the credit of the conversion is proportionately assigned to each touch point based on its relative contribution in increasing the user’s probability to convert. This model is entirely data-driven and based on deterministic logic, meaning that the same data set will yield the same results every time.
The correct attribution model is essential for creating a healthy digital advertising ecosystem. That is why my company has decided to make our research results entirely public in an academic conference for every company or individual to access, understand, and improve upon. For too long our industry has been caught up with “gaming” the traditional last-touch attribution model and it’s been one of the barriers preventing marketers from understanding the full value of digital advertising. Solving the attribution problem will make online advertising dramatically more appealing to brands and with the additional advertising spend that will shift from offline to online, I think we’ll all be able to celebrate with a big stack of pancakes.
Advertisers are more concerned than ever about brand safety, and one of the primary ways they're trying to keep their ads from appearing in unfriendly places is through whitelisting. But as more and more brands turn to whitelisting, some are talking about the impact this will have.
We all know that Facebook is a viable source of huge amounts of mobile traffic with relatively cheap CPCs). It’s too good an opportunity to ignore in today’s digital landscape - even if your mobile landing-page experience isn’t up to snuff.
For years, advertisers have tolerated a big elephant in the room: the fact that their digital ads aren't always appearing where they would want them to.
Deep learning tools are the next major area of AI-based research, and it will spark a wave of future innovation in every industry – bringing a new era of marketing which both advertisers and end-users will benefit from.