'All Models Are Wrong, but Some Are Useful'

  |  August 8, 2013   |  Comments

The increased use of data mining and predictive analytical techniques within organizations to reduce risk and improve decision-making means that managers will be exposed to the results of these approaches.

"All models are wrong, but some are useful."

So said the statistician George Box. Clarifying what he meant, Box went on to say, "Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful?" I think he had a point that's worth thinking about a bit.

The increased use of data mining and predictive analytical techniques within organizations to reduce risk and improve decision-making means that managers will be exposed more and more to the results of these approaches. They will be increasingly using them to make recommendations or to decide on courses of action. So, how do you know how wrong the model is and whether or not it can be useful?

All Models Are Wrong

This is a statement of fact, rather than a controversial opinion. After all, the best model of a house is the house itself. A scale model of the house is one representation of the real thing and will give you a 3D perspective but possibly not some of the detail that you're looking for. The set of the architect's drawings will potentially have the detail you're looking for but it may be difficult to visualize what the finished house might look like. A painting of the house set in its landscape will give you a different context. If you're building a house you may end up using all three approaches to make decisions about how the building should go.

It's the same with analytical models. They are all representations of the real thing, simplified to a greater or lesser degree. All of them are "wrong" to a greater or lesser degree. So, how can you tell how wrong they are?

Most models have measures of fit or error of one type or another. There are different ways fit and errors can be measured depending on the type of modelling technique being used. For example, in simple linear regression, which probably most people are familiar with, the R squared or correlation coefficient is a basic measure of the quality of the fit of the model. It broadly explains how much of the variation in the data can be explained by the model. But it's only one measurement of how good the model is and modellers will be balancing that measurement with others to come up with the best model for the purpose for which it's intended. That's the art in the science of modelling.

But Some Are Useful

We can construct some notion of "wrong" from metrics and statistics, but how do we develop our notion of "useful"? Whereas "wrong" in this case is essentially an analytical concept, the notion of "useful" is really a commercial or business concept. It's useful if it helps me make better decisions and reduce risks. But the best models are not necessarily the most useful. Here are a couple of examples.

Cluster analysis is often used as a technique for creating customer segments. These segments may be required to drive some type of target marketing activity. Cluster analysis is what is known as an unsupervised learning technique, which broadly means you give it some data, it does its own thing, and then gives an answer. You then have to figure out what the answer is telling you. The technique will give the best model it can from an algorithmic point of view but it may not be that useful. For example, the segments may not add to your existing body of understanding or they may not be that actionable. That could mean that a slightly poorer model may be more useful because you can translate the segmentation into a marketing program you can execute on.

Another example is in econometric modelling. This technique is often used for marketing mix performance analysis where you're looking to understand the impact of various elements of the marketing mix on something like product sales. It's possible to build quite elaborate models that explain a great deal about what drives sales from marketing factors to competitive factors to macro-economic factors. However, the model is difficult to use because if you want to look at different scenarios or forecast the impact of a change, there's so much data that needs to be inputted into it that it becomes a time-consuming and laborious process. In this case a simpler model may actually be more effective because it's easier to use.

So, if you're reviewing some outputs from a piece of modelling work that's been done, it's always useful to keep George Box in mind and ask yourself (or the modeller) a couple of questions:

  • "How wrong is it?" (i.e., is it robust enough?)
  • "What can I do with it?" (i.e., is it useful?)

In fact, thinking about it, that probably applies to any piece of analysis.

Image on home page via Shutterstock.



Neil Mason

Neil Mason is SVP, Customer Engagement at iJento. He is responsible for providing iJento clients with the most valuable customer insights and business benefits from iJento's digital and multichannel customer intelligence solutions.

Neil has been at the forefront of marketing analytics for over 25 years. Prior to joining iJento, Neil was Consultancy Director at Foviance, the UK's leading user experience and analytics consultancy, heading up the user experience design, research, and digital analytics practices. For the last 12 years Neil has worked predominantly in digital channels both as a marketer and as a consultant, combining a strong blend of commercial and technical understanding in the application of consumer insight to help major brands improve digital marketing performance. During this time he also served as a Director of the Web Analytics Association (DAA) for two years and currently serves as a Director Emeritus of the DAA. Neil is also a frequent speaker at conferences and events.

Neil's expertise ranges from advanced analytical techniques such as segmentation, predictive analytics, and modelling through to quantitative and qualitative customer research. Neil has a BA in Engineering from Cambridge University and an MBA and a postgraduate diploma in business and economic forecasting.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Analytics newsletter delivered to you. Subscribe today!



Featured White Papers

US Consumer Device Preference Report

US Consumer Device Preference Report
Traditionally desktops have shown to convert better than mobile devices however, 2015 might be a tipping point for mobile conversions! Download this report to find why mobile users are more important then ever.

E-Commerce Customer Lifecycle

E-Commerce Customer Lifecycle
Have you ever wondered what factors influence online spending or why shoppers abandon their cart? This data-rich infogram offers actionable insight into creating a more seamless online shopping experience across the multiple devices consumers are using.




  • SEO Specialist
    SEO Specialist (Marcel Digital) - ChicagoSearch Engine Optimization (SEO) Specialist   Marcel Digital is an award winning digital marketing...
  • SEO / SEM Manager
    SEO / SEM Manager (CustomInk) - FairfaxAre you a friendly, motivated, and inquisitive individual? Are you a positive, out-going leader? Are you...
  • SEO Analyst
    SEO Analyst (XO Group) - New YorkSEO Analyst @ XO Group About this Job, You and Our Team: The XO Group SEO Team is looking for you, a passionate...