Primer: Deterministic vs. Discovery Segmentation

In my most recent columns, I’ve examined segmentation strategies, including the different types of segmentation strategies and the role each can play in building up a core understanding of your customers or prospective customers. So once you’ve decided what to segment (demographic, behavioral, or attitudinal), the next question is: how do you create the segments? Remember, the goal of segmentation is to create groups of people who have something in common.

When it comes to creating segmentations, there are two main alternative approaches:

  • Deterministic segmentation strategies
  • Discovery segmentation strategies

Deterministic Segmentation Strategies

With deterministic segmentations, the user or customer segments are based on some kind of hypothesis and then the data is analyzed to see whether the segments are interesting and useful. For example, demographic segmentations often tend to be “deterministic.” You may segment your customers on the basis of gender and age in the belief the criteria are useful and interesting. Also, most segmentation on Web data that’s done at the moment is done this way. Most of the Web analytics tools that people are using have some kind of segmentation capabilities built into them, allowing you to start to create hypotheses about what might be useful segments to analyze, understand, and track. For example, you might be interested in looking at the differences in behaviors based on the number of times people visited the site, or the channel they came in on, or the search terms used.

Deterministic approaches can be successful, but they can also involve a lot of time in analysis, particularly when dealing with large and complex data sets. Many iterations might be required in order to identify segments that are meaningful, interesting, and useful. The power and functionality of your analytics tools become vitally important at this point. If it takes you ages to create a segment and to see the results, then this will inevitably mean that you won’t arrive at an optimal solution.

Discovery Segmentation Strategies

Discovery-based segmentation approaches use statistical and data mining algorithms to look for differences in user behavior. Typical methodologies used here in segmentation studies include cluster analysis, neural networks, and decision trees. Methods such as cluster analysis look for statistically meaningful differences between different user groups based on the data that fed into the analysis process. This is a massive area, as there are many different types of segmentation techniques. Even when talking about cluster analysis, there are many different variants of cluster analysis such as k-means, hierarchical, two-step, and so on.

Each approach has its strengths and weaknesses, and even within a single variant of cluster analysis, there are many different ways that the analysis can be run. Although these solutions can be viewed as very technical, there is as much analytical “art” behind a successful outcome as there is “science.” Once those groups have been determined, further analysis is done to profile the groups to understand what those differences are and whether they are meaningful or not. Just because something is statistically significant, it doesn’t mean that it is necessarily commercially significant!

Discovery-based methods can yield user segments that may not be immediately obvious from the data. This is one of the benefits of using this type of approach. Quite often in the work that we have done using these types of techniques on Web data, we find that some of the more interesting and valuable segments are quite small, and this is because Web analytics data typically contains a lot of noise from people who only ever visit the site once or twice and do nothing of value. However, discovery-based approaches require specialist skills and are highly iterative and, consequently, are likely to be more costly in terms of both time and money.

Related reading

Big Data & Travel
Flat design modern vector illustration concept of website analytics search information.