Test Versus Control, Part 3
Ready for some advanced number-crunching? Meet logistic regression. Mark walks you through it step by step.
Ready for some advanced number-crunching? Meet logistic regression. Mark walks you through it step by step.
Just when you thought it was safe to go back to your inbox, “Test Versus Control” rears its ugly head again. Last month, I wrote a series on how customer data can be analyzed to optimize email campaigns (“Test Versus Control, Part 1” and “Test Versus Control, Part 2″). Thanks for all your positive comments.
One email caught my attention. Michael Wexler, director of research at e-Dialog, suggested the use of logistic regression on customer data as a method to optimize email campaigns. I generally think it a good idea to listen to someone who counts the National Football League (NFL), consumer packaged goods company Welch’s, and Harvard Business School Publishing as clients. In addition, I thought you would be interested in learning a little more about email testing, specifically about logistic regression.
First, Ordinary Regression
To understand logistic regression, we must first understand ordinary regression. At the risk of oversimplification (and with apologies to Dr. Bill Stewart, my stats professor at William and Mary), ordinary regression is a method used to determine the impact of independent variables on a dependent variable. As an example, ordinary regression could be used to predict sales of a product (the dependent variable). The independent variables that impact sales are price, advertising dollars invested, and channel incentive dollars provided. Historical data for the dependent variable (sales) and each of the independent variables (price, advertising dollars invested, channel incentive dollars provided) are used to develop a mathematical equation that can be used to predict future sales. The mathematical equation, or regression equation, quantifies the impact each of the independent variables has on sales.
Using the regression equation, a marketing manager can adjust the price, advertising dollars invested, and channel incentive dollars provided to maximize sales. She can run what-if scenarios. Illustratively, the marketing manager would use the equation to predict sales if the price is $10.00, advertising dollars invested is $100,000, and channel incentive dollars provided is $250,000. She would then predict sales if the price is $9.00, advertising dollars invested is $50,000, and channel incentive dollars provided is $150,000. The regression equation would predict the combination of the three independent variables that would maximize sales. Quite a nice tool.
Understanding Logistic Regression
Logistic regression is a variation of ordinary regression. It’s used to determine the impact of multiple independent variables presented simultaneously to produce a desired result. The main difference between the two is in ordinary regression the dependent variable is an amount (e.g., sales) or score, while in logistical regression the dependent variable is an event occurrence. The nice aspect of logistic regression is it allows the analyst to measure the impact of multiple variables at the same time, something that is not possible in the testing methodology presented in part two of this series.
That’s where Michael Wexler comes into the story. “Independent variables work in concert to create a desired effect. That effect can be clicking, buying, churning. Rarely does a single variable drive the desired effect. Typically the desired effect is driven by the impact of multiple independent variables working together,” Wexler said.
Illustrating Logistic Regression
Wexler knows what he’s talking about. He’s been at e-Dialog for almost three years and previously was part of the Microsoft team that recommended MSN make Carpoint and MoneyCentral the focal point of its Web site. He uses logistic regression at e-Dialog to measure the impact from testing multiple variables at once. He provided the following example to illustrate.
The dependent variable we are attempting to predict is open rate. In this example, there are two independent variables:
There are four possible combinations, or four cells, for the subject line:
Each cell represents an email sent to a random sample of 5,000 recipients to ensure a low margin of error and high level of confidence (sample size varies depending on the complexity of the test). After a minimum of eight hours, results, specifically the open rate for each cell, are fed into a statistical software package such as SPSS, and a logistical regression equation is generated. The equation predicts the impact of each of the independent variables on encouraging the consumer to take the desired action: opening the email.
Wexler continued, “Logistic regression is particularly powerful when it is used to evaluate the interrelationship between four or more independent variables. You are able to determine which combination of the independent variables leads consumers to take the desired action most frequently. Although extremely useful, that type of analysis is also more difficult.”
He also points out the importance of running tests against individual customer segments. Some customer segments may respond to a combination of independent variables to which other customer segments are not responsive. “We ran a test recently for a consumer packaged goods company and determined that the independent variables only had a positive impact on recipients who were 34 and younger. The independent variables had no impact on recipients 35 and older. You can bet we’ll continue to use those variables in combination when sending emails to recipients 34 and younger, but will adjust those variables for recipients 35 and older. That is valuable learning will we use time and time again.” said Wexler.
If your head’s spinning, don’t worry. To quote “National Lampoon’s Animal House,” “This is heavy stuff.” Michael Wexler and the team at e-Dialog are ahead of the curve when it comes to analyzing customer data to optimize the performance of email campaigns.