De-Mystifying Models

Many ad networks and ad technology providers claim to build “predictive models” that are used in the course of buying ad space for a campaign and/or figuring out which ad to show to a given impression. Models are relatively new to display advertising, but they’re a common and relatively standard practice in other forms of direct marketing.

In the mid-1990s, “data mining” and statistical modeling became quite fashionable in the direct marketing/database marketing industry. IBM entered the fray with a product called “Intelligent Miner,” and after likely having a few too many refreshments with their ad agency team, the IBM execs agreed to run a TV spot to promote the service.

Somewhere along the way, the idea of explaining “models” to the TV-viewing audience turned into a commercial spot set on the runway of a fashion show: beautiful Swedish models found time mid-catwalk to tell each other about how they were optimizing sales of their own fashion merchandise using data mining from IBM.

It was a little confusing to have fashion models talking about statistical models, but the point was that both are models in the sense that they’re representations of something in the real world. A fashion model is supposed to show you what a particular jacket would look like if you wore it, so you can decide if you want to buy the jacket. A predictive model for real-time display ad media buying is supposed to tell you what would happen if you delivered your ad in a particular impression, so you can decide if you want to buy the impression. One big difference between fashion models and predictive models for media buying is the desired bias in the models. Marketers in the fashion industry realize that it’s in their interest to make their models a tad optimistic — you won’t actually become tall, thin, and get high-cheekbones when you buy the jacket. But predictive models for display ad media buying are intended to have no bias, so that they accurately predict the likelihood of response.

Predictive models are different from what you might call “plain old bidding rules.” While models are usually more granular and incorporate many features of the impression blended together with math to arrive at some kind of score or estimated chance of response, rules typically include just a few features.

For example, a bidding rule might be “Bid $2.24 on all impressions on for users who’ve seen less than three impressions of my ad in the last day.”

A predictive model might be: “Take 0.92 and multiply it by itself as many times as the user has seen this ad in the last day, then multiply that number by .013 if the current page is on Yahoo, or else .006 if it’s on another site. Then multiply by the historical post-click conversion rate for this ad, and this is the estimated chance that the user will convert if we buy this impression for our ad.” In this case, we’re assuming the goal is predicting conversions, but you can model anything you can measure, including ad engagement, clicks, or other kinds of goals.

The way a predictive model works with real-time bidding on exchanges is that your bidding server software basically gets a poke many times per second, where the exchange says, “OK, I’ve got browser #AH842DEH19 on right now and I need a 300×250 ad…what do you bid?” The bid server would look up the user’s frequency (say it’s three) and the ad’s historical post-click conversion rate (let’s say it’s one in 100), then multiply (.92 by .92 by .92) by .006 by .01 to get 0.0000467 or roughly .005 percent as the chance this particular impression will yield a conversion. If the target CPA (define) is $40, then we can afford to bid $40 times .000467 or $.00187 for this impression, which equates to a $1.87 CPM (define) rate.

There’s more math to it that deals with how we figure out which features of the impression we should include in the model in the first place, what the various coefficients should be (for example, why .92 is a good decay factor for frequency), how to deal with scenarios when we don’t yet have enough data to estimate things like the historical post-click conversion rate, what to do when we’re bidding on behalf of not just one ad but many ads, and how to do “portfolio optimization” across multiple exchanges.

Good models always win versus guesswork because there are just too many factors for a person to pay attention to. Also, intuition about the kinds of users that will respond to an ad, or the kinds of Web sites they can be found on, is often good but incomplete, and models can almost always find subsegments within an intuited audience that are inefficient, or conversely new segments of inventory to buy that outperform.

Related reading