Data Mining and Predictive Analytics, Part 2

In part one of this series, I examined visitor segmentation, a data-mining technique. Now, let’s look at how data mining can be used to understand important visitor behavior over time.

Quite often when we use Web analytics systems, we focus on what visitors do during a particular visit. The classic conversion funnel is a good example of this trend.

Most Web analytic systems look at the conversion funnel in the context of a single visit, that is, they report on how people got to page A, then B, then C, and so on within a single visit. This information is useful because it helps identify potential process areas that need improvement. But if we think about those times when a visitor might make multiple visits to a site before a conversion, the classic conversion funnel might not give you a true perspective on what’s happening.

Take the example of buying car insurance online. In the U.K., it’s a very competitive business. Consumers typically shop around for quotes and go for the best value proposition. As a result, it’ s very unlikely people will arrive on a site and buy car insurance on their first visit. Maybe they’ll arrive from a search engine, check out the proposition, and bookmark the site for future reference. Maybe later they’ll come back, get a quote, and leave to compare it to other quotes. Hopefully they’ll return to complete the policy application process, and a sale is made.

A generic conversion funnel analysis will contain an amalgam of all three types of behavior: research, quote, purchase. As a result, you’re not seeing a true reflection of your ability to convert opportunity into value unless you analyze visitor behavior over sequences of visits, rather than just within the single visit.

If you work with Web analytics data, you know it’s hard enough to understand what’s going on when examining a person’s behavior in a single visit. Analyzing behavior over multiple visits adds complexity. Here, data mining and predictive analytical techniques come into play.

If we accept (as in the car insurance example) that conversion is often a multivisit process, we must understand the process’s key drivers over time if we are to influence that visitor’s behavior. We must find out what behaviors over multiple visits are most likely to lead to a successful outcome.

Using a decision-tree technique like CHAID (define) can help you understand how different visitor behaviors over multiple visits may increase or decrease the likelihood of converting a browser into a buyer. CHAID, which is highly visual, shows factors that influence conversion in a tree diagram in the order they influence people.

As with the segmentation approach described in part one, data must be in the right shape before an analysis is started. That requires extracting and summarizing data to key activities and events in each visit of the visitor lifecycle. I often think that data mining and predictive analytics are part art, part science. The art requires possessing the right data in the right format for algorithms to provide meaningful and useful results. In these days of automated analytics, anyone can produce a model. It’s a question of whether the model is good or not.

In working with these techniques, we commonly find there are a small number of highly influential conversion drivers over multiple visits. Naturally those drivers vary from site to site, but the importance of time is usually one thing they share in common. The time between the first and second visit, and the second and third visit, and so on, are quite often a good predictor of the subsequent outcome.

As the need to tune the online marketing processes continues, organizations must add capabilities to their analytics tool kit. Data-mining and predictive analytical techniques are firmly established within other marketing disciplines. Perhaps their time is now coming in the online world.

I’ll be at Emetrics in Washington, D.C., on Wednesday, October 17. Come hear me speak on “Cutting Through the Noise: Applications of Data Mining and Predictive Analytics.”

Related reading

site search hp