The increased use of data mining and predictive analytical techniques within organizations to reduce risk and improve decision-making means that managers will be exposed to the results of these approaches.
"All models are wrong, but some are useful."
So said the statistician George Box. Clarifying what he meant, Box went on to say, "Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful?" I think he had a point that's worth thinking about a bit.
The increased use of data mining and predictive analytical techniques within organizations to reduce risk and improve decision-making means that managers will be exposed more and more to the results of these approaches. They will be increasingly using them to make recommendations or to decide on courses of action. So, how do you know how wrong the model is and whether or not it can be useful?
All Models Are Wrong
This is a statement of fact, rather than a controversial opinion. After all, the best model of a house is the house itself. A scale model of the house is one representation of the real thing and will give you a 3D perspective but possibly not some of the detail that you're looking for. The set of the architect's drawings will potentially have the detail you're looking for but it may be difficult to visualize what the finished house might look like. A painting of the house set in its landscape will give you a different context. If you're building a house you may end up using all three approaches to make decisions about how the building should go.
It's the same with analytical models. They are all representations of the real thing, simplified to a greater or lesser degree. All of them are "wrong" to a greater or lesser degree. So, how can you tell how wrong they are?
Most models have measures of fit or error of one type or another. There are different ways fit and errors can be measured depending on the type of modelling technique being used. For example, in simple linear regression, which probably most people are familiar with, the R squared or correlation coefficient is a basic measure of the quality of the fit of the model. It broadly explains how much of the variation in the data can be explained by the model. But it's only one measurement of how good the model is and modellers will be balancing that measurement with others to come up with the best model for the purpose for which it's intended. That's the art in the science of modelling.
But Some Are Useful
We can construct some notion of "wrong" from metrics and statistics, but how do we develop our notion of "useful"? Whereas "wrong" in this case is essentially an analytical concept, the notion of "useful" is really a commercial or business concept. It's useful if it helps me make better decisions and reduce risks. But the best models are not necessarily the most useful. Here are a couple of examples.
Cluster analysis is often used as a technique for creating customer segments. These segments may be required to drive some type of target marketing activity. Cluster analysis is what is known as an unsupervised learning technique, which broadly means you give it some data, it does its own thing, and then gives an answer. You then have to figure out what the answer is telling you. The technique will give the best model it can from an algorithmic point of view but it may not be that useful. For example, the segments may not add to your existing body of understanding or they may not be that actionable. That could mean that a slightly poorer model may be more useful because you can translate the segmentation into a marketing program you can execute on.
Another example is in econometric modelling. This technique is often used for marketing mix performance analysis where you're looking to understand the impact of various elements of the marketing mix on something like product sales. It's possible to build quite elaborate models that explain a great deal about what drives sales from marketing factors to competitive factors to macro-economic factors. However, the model is difficult to use because if you want to look at different scenarios or forecast the impact of a change, there's so much data that needs to be inputted into it that it becomes a time-consuming and laborious process. In this case a simpler model may actually be more effective because it's easier to use.
So, if you're reviewing some outputs from a piece of modelling work that's been done, it's always useful to keep George Box in mind and ask yourself (or the modeller) a couple of questions:
In fact, thinking about it, that probably applies to any piece of analysis.
Image on home page via Shutterstock.
Join the Industry's Leading eCommerce & Direct Marketing Experts in Chicago
ClickZ Live Chicago (Nov 3-6) will deliver over 50 sessions across 4 days and 10 individual tracks, including Data-Driven Marketing, Social, Mobile, Display, Search and Email. Check out the full agenda and register by Friday, Oct 3 to take advantage of Early Bird Rates!
Neil Mason is SVP, Customer Engagement at iJento. He is responsible for providing iJento clients with the most valuable customer insights and business benefits from iJento's digital and multichannel customer intelligence solutions.
Neil has been at the forefront of marketing analytics for over 25 years. Prior to joining iJento, Neil was Consultancy Director at Foviance, the UK's leading user experience and analytics consultancy, heading up the user experience design, research, and digital analytics practices. For the last 12 years Neil has worked predominantly in digital channels both as a marketer and as a consultant, combining a strong blend of commercial and technical understanding in the application of consumer insight to help major brands improve digital marketing performance. During this time he also served as a Director of the Web Analytics Association (DAA) for two years and currently serves as a Director Emeritus of the DAA. Neil is also a frequent speaker at conferences and events.
Neil's expertise ranges from advanced analytical techniques such as segmentation, predictive analytics, and modelling through to quantitative and qualitative customer research. Neil has a BA in Engineering from Cambridge University and an MBA and a postgraduate diploma in business and economic forecasting.
IBM Social Analytics: The Science Behind Social Media Marketing
80% of internet users say they prefer to connect with brands via Facebook. 65% of social media users say they use it to learn more about brands, products and services. Learn about how to find more about customers' attitudes, preferences and buying habits from what they say on social media channels.
An Introduction to Marketing Attribution: Selecting the Right Model for Search, Display & Social Advertising
If you're considering implementing a marketing attribution model to measure and optimize your programs, this paper is a great introduction. It also includes real-life tips from marketers who have successfully implemented attribution in their organizations.
September 23, 2014
September 30, 2014
1:00pm ET/10:00am PT
October 23, 2014
1:00pm ET/10:00am PT