De-Mystifying Models
Many ad networks and ad technology providers claim to build "predictive models" that are used in the course of buying ad space for a campaign and/or figuring out which ad to show to a given impression. Models are relatively new to display advertising, but they're a common and relatively standard practice in other forms of direct marketing.
In the mid-1990s, "data mining" and statistical modeling became quite fashionable in the direct marketing/database marketing industry. IBM entered the fray with a product called "Intelligent Miner," and after likely having a few too many refreshments with their ad agency team, the IBM execs agreed to run a TV spot to promote the service.
Somewhere along the way, the idea of explaining "models" to the TV-viewing audience turned into a commercial spot set on the runway of a fashion show: beautiful Swedish models found time mid-catwalk to tell each other about how they were optimizing sales of their own fashion merchandise using data mining from IBM.
It was a little confusing to have fashion models talking about statistical models, but the point was that both are models in the sense that they're representations of something in the real world. A fashion model is supposed to show you what a particular jacket would look like if you wore it, so you can decide if you want to buy the jacket. A predictive model for real-time display ad media buying is supposed to tell you what would happen if you delivered your ad in a particular impression, so you can decide if you want to buy the impression. One big difference between fashion models and predictive models for media buying is the desired bias in the models. Marketers in the fashion industry realize that it's in their interest to make their models a tad optimistic -- you won't actually become tall, thin, and get high-cheekbones when you buy the jacket. But predictive models for display ad media buying are intended to have no bias, so that they accurately predict the likelihood of response.
Predictive models are different from what you might call "plain old bidding rules." While models are usually more granular and incorporate many features of the impression blended together with math to arrive at some kind of score or estimated chance of response, rules typically include just a few features.
For example, a bidding rule might be "Bid $2.24 on all impressions on People.com for users who've seen less than three impressions of my ad in the last day."
A predictive model might be: "Take 0.92 and multiply it by itself as many times as the user has seen this ad in the last day, then multiply that number by .013 if the current page is on Yahoo, or else .006 if it's on another site. Then multiply by the historical post-click conversion rate for this ad, and this is the estimated chance that the user will convert if we buy this impression for our ad." In this case, we're assuming the goal is predicting conversions, but you can model anything you can measure, including ad engagement, clicks, or other kinds of goals.
The way a predictive model works with real-time bidding on exchanges is that your bidding server software basically gets a poke many times per second, where the exchange says, "OK, I've got browser #AH842DEH19 on myyearbook.com right now and I need a 300x250 ad...what do you bid?" The bid server would look up the user's frequency (say it's three) and the ad's historical post-click conversion rate (let's say it's one in 100), then multiply (.92 by .92 by .92) by .006 by .01 to get 0.0000467 or roughly .005 percent as the chance this particular impression will yield a conversion. If the target CPA (define) is $40, then we can afford to bid $40 times .000467 or $.00187 for this impression, which equates to a $1.87 CPM (define) rate.
There's more math to it that deals with how we figure out which features of the impression we should include in the model in the first place, what the various coefficients should be (for example, why .92 is a good decay factor for frequency), how to deal with scenarios when we don't yet have enough data to estimate things like the historical post-click conversion rate, what to do when we're bidding on behalf of not just one ad but many ads, and how to do "portfolio optimization" across multiple exchanges.
Good models always win versus guesswork because there are just too many factors for a person to pay attention to. Also, intuition about the kinds of users that will respond to an ad, or the kinds of Web sites they can be found on, is often good but incomplete, and models can almost always find subsegments within an intuited audience that are inefficient, or conversely new segments of inventory to buy that outperform.

George H. John is CEO of Rocket Fuel Inc., a computational advertising company whose premium ad network allows agencies and advertisers to run successful online campaigns and whose technology platform offers optimized ad delivery. For over 20 years, George has helped companies boost revenues via more efficient data-driven sales and marketing, working with such companies as Amazon, Kraft, McDonald's, and Wells Fargo. Prior to Rocket Fuel, George led groups at IBM, E.piphany, salesforce.com, and Yahoo, where he led teams responsible for delivering behavioral targeting, recommendations, optimization, and click fraud products.
As a kid, George spent too much time watching "Star Trek," which led to a short-lived interest in model rocketry (his eyebrows grew back) and a lifelong interest in technology. George earned BS, MS, and PhD degrees in computer science from Stanford, specializing in artificial intelligence and advanced statistics. During his graduate studies, he won a National Science Foundation fellowship and worked with NASA in the summers, earning his "rocket scientist" credentials.
Article Archives by George John
Rationality in Online Display Advertising - Jan 20, 2010
Craigslist Has No Ads? - Dec 23, 2009
Everything Matters in Display Ad Campaigns - Oct 28, 2009
Advertising Wisdom From the No. 1 Ladies Detective Agency - Sep 30, 2009
More article archives
Archive









