modelling

'All Models Are Wrong, but Some Are Useful'

  |  August 8, 2013   |  Comments

The increased use of data mining and predictive analytical techniques within organizations to reduce risk and improve decision-making means that managers will be exposed to the results of these approaches.

"All models are wrong, but some are useful."

So said the statistician George Box. Clarifying what he meant, Box went on to say, "Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful?" I think he had a point that's worth thinking about a bit.

The increased use of data mining and predictive analytical techniques within organizations to reduce risk and improve decision-making means that managers will be exposed more and more to the results of these approaches. They will be increasingly using them to make recommendations or to decide on courses of action. So, how do you know how wrong the model is and whether or not it can be useful?

All Models Are Wrong

This is a statement of fact, rather than a controversial opinion. After all, the best model of a house is the house itself. A scale model of the house is one representation of the real thing and will give you a 3D perspective but possibly not some of the detail that you're looking for. The set of the architect's drawings will potentially have the detail you're looking for but it may be difficult to visualize what the finished house might look like. A painting of the house set in its landscape will give you a different context. If you're building a house you may end up using all three approaches to make decisions about how the building should go.

It's the same with analytical models. They are all representations of the real thing, simplified to a greater or lesser degree. All of them are "wrong" to a greater or lesser degree. So, how can you tell how wrong they are?

Most models have measures of fit or error of one type or another. There are different ways fit and errors can be measured depending on the type of modelling technique being used. For example, in simple linear regression, which probably most people are familiar with, the R squared or correlation coefficient is a basic measure of the quality of the fit of the model. It broadly explains how much of the variation in the data can be explained by the model. But it's only one measurement of how good the model is and modellers will be balancing that measurement with others to come up with the best model for the purpose for which it's intended. That's the art in the science of modelling.

But Some Are Useful

We can construct some notion of "wrong" from metrics and statistics, but how do we develop our notion of "useful"? Whereas "wrong" in this case is essentially an analytical concept, the notion of "useful" is really a commercial or business concept. It's useful if it helps me make better decisions and reduce risks. But the best models are not necessarily the most useful. Here are a couple of examples.

Cluster analysis is often used as a technique for creating customer segments. These segments may be required to drive some type of target marketing activity. Cluster analysis is what is known as an unsupervised learning technique, which broadly means you give it some data, it does its own thing, and then gives an answer. You then have to figure out what the answer is telling you. The technique will give the best model it can from an algorithmic point of view but it may not be that useful. For example, the segments may not add to your existing body of understanding or they may not be that actionable. That could mean that a slightly poorer model may be more useful because you can translate the segmentation into a marketing program you can execute on.

Another example is in econometric modelling. This technique is often used for marketing mix performance analysis where you're looking to understand the impact of various elements of the marketing mix on something like product sales. It's possible to build quite elaborate models that explain a great deal about what drives sales from marketing factors to competitive factors to macro-economic factors. However, the model is difficult to use because if you want to look at different scenarios or forecast the impact of a change, there's so much data that needs to be inputted into it that it becomes a time-consuming and laborious process. In this case a simpler model may actually be more effective because it's easier to use.

So, if you're reviewing some outputs from a piece of modelling work that's been done, it's always useful to keep George Box in mind and ask yourself (or the modeller) a couple of questions:

  • "How wrong is it?" (i.e., is it robust enough?)
  • "What can I do with it?" (i.e., is it useful?)

In fact, thinking about it, that probably applies to any piece of analysis.

Image on home page via Shutterstock.

Tags:

ClickZ Live Chicago Join the Industry's Leading eCommerce & Direct Marketing Experts in Chicago
ClickZ Live Chicago (Nov 3-6) will deliver over 50 sessions across 4 days and 10 individual tracks, including Data-Driven Marketing, Social, Mobile, Display, Search and Email. Check out the full agenda and register by Friday, August 29 to take advantage of Super Saver Rates!

ABOUT THE AUTHOR

Neil Mason

Neil Mason is SVP, Customer Engagement at iJento. He is responsible for providing iJento clients with the most valuable customer insights and business benefits from iJento's digital and multichannel customer intelligence solutions.

Neil has been at the forefront of marketing analytics for over 25 years. Prior to joining iJento, Neil was Consultancy Director at Foviance, the UK's leading user experience and analytics consultancy, heading up the user experience design, research, and digital analytics practices. For the last 12 years Neil has worked predominantly in digital channels both as a marketer and as a consultant, combining a strong blend of commercial and technical understanding in the application of consumer insight to help major brands improve digital marketing performance. During this time he also served as a Director of the Web Analytics Association (DAA) for two years and currently serves as a Director Emeritus of the DAA. Neil is also a frequent speaker at conferences and events.

Neil's expertise ranges from advanced analytical techniques such as segmentation, predictive analytics, and modelling through to quantitative and qualitative customer research. Neil has a BA in Engineering from Cambridge University and an MBA and a postgraduate diploma in business and economic forecasting.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Analytics newsletter delivered to you. Subscribe today!

COMMENTS

UPCOMING EVENTS

Featured White Papers

BigDoor: The Marketers Guide to Customer Loyalty

The Marketer's Guide to Customer Loyalty
Customer loyalty is imperative to success, but fostering and maintaining loyalty takes a lot of work. This guide is here to help marketers build, execute, and maintain a successful loyalty initiative.

Marin Software: The Multiplier Effect of Integrating Search & Social Advertising

The Multiplier Effect of Integrating Search & Social Advertising
Latest research reveals 68% higher revenue per conversion for marketers who integrate their search & social advertising. In addition to the research results, this whitepaper also outlines 5 strategies and 15 tactics you can use to better integrate your search and social campaigns.

Jobs

    • Sales Planner
      Sales Planner (Verve ) - New YorkAbout Verve   Verve is the leader in location powered mobile advertising. We manage one of the largest mobile...
    • Systems Analyst/Support
      Systems Analyst/Support (Agora Inc. ) - BaltimoreIRIS (Increased Revenue Intuitive Software ) is proprietary software that helps marketers...
    • Client Services Support Specialist
      Client Services Support Specialist (Agora Inc. ) - Delray Beach OVERVIEW:  This position requires a highly motivated and resourceful individual...