In previous columns, I’ve talked about the initial steps of conducting research on the Web: selecting the sample, then designing and deploying surveys. The final step in the research process is to analyze and interpret your results. Sometimes simple summaries and charts will tell you all you need to know, but more often than not you will want to probe a little deeper below the surface. That means you will run into statistical analysis.
Today’s column will cover some of the main ideas of statistical analysis. Don’t worry; this will not be a nightmare flashback to your college stats course — no formulas or details about specific statistical methods. Instead, I’ll present some key concepts that will make you a more intelligent user of statistics in your day-to-day business life.
The Basic Purpose Behind Statistics: Accuracy
Customer research tries to learn something about your customers (the “population”) by looking at feedback from just a few of them (the “sample”). In other words, you’re trying to project your findings. Every sample taken from a population will be slightly different from the population, so your findings will never be exactly right. This is called “sampling error.”
What if you had the time and resources to survey every single person in the population of interest? Then you would have no sampling error and no need for any statistical methods. Statistical analysis is all about measuring sampling error.
Using statistics, you can make a pretty good guess about the accuracy of your findings from the sample itself. A simple summary might tell you that the average age in your sample of customers is 36. Statistical analysis takes things a bit further by telling you that the true average age of all customers lies between 35.3 and 36.7. The more data you collect, the narrower this range gets.
The Mysterious Confidence Interval Demystified
If you’re around a statistician or market researcher for more than five minutes, you’re likely to hear the term “confidence interval.” For example, in the average age example above, I could have more accurately said that the 90 percent confidence interval for the average of your customers is between 35.3 and 36.7. Now what does that statement mean?
It definitely does not mean that 90 percent of the data falls within that range. The real answer is quite simple: If I were able to keep pulling samples from the exact same population (after each pull I would be putting the sample back), 90 percent of the time the average age would be between 35.3 and 36.7.
If I were taking the exact same size sample from the population but I wanted to be 95 percent certain of the average age, would the interval get smaller or larger? Larger. The more confident you are, the larger the interval will need to be. You want to be more accurate with your prediction, so you have to give yourself more room to be right.
When Insignificant Data Is Really Significant
What about that savvy market researcher who is always asking “Was that result statistically significant?” A statistically significant difference is a difference in your sample that is so large that you can be pretty sure the difference in your population is not zero (i.e., there is a real difference). That’s it — nothing more, nothing less.
The biggest misconception some researchers have is that “statistically significant” is the same thing as “significant to your business.” It’s easy for technically astute research managers to get caught up in the details of running statistical tests and miss this key point. It is not the same thing at all. A large enough sample size will almost always produce a statistically significant result, because large samples have very little sampling error.
On the other hand, a small study might turn up something extremely real and important to your business, even though the results are not statistically significant. Let’s say you surveyed 10 people who had just finished using your online customer-service chat feature. Reading over the results, you see that one of those respondents said that the customer service rep she was chatting with told her to go jump in a lake.
Now, due to the small sample size, we can’t say that these results are statistically significant in terms of predicting that 10 percent of your total customer-support chat experiences are this bad. But that doesn’t make the data any less relevant in terms of its ability to show you that there is some level of terrible customer support taking place from at least one rep!
My last few columns have been focused on general market research principles that are applicable to marketing on the Web. Next time, I’ll return to Web specifics and deliver the article I promised on traffic measurement.