Multivariate Testing: Parametric Data Analysis

The purpose of multivariate testing is to simultaneously gather information about multiple variables, and then conduct an analysis of the data to determine which recipe results in the best performance.

Multivariate data can be analyzed by using either parametric or non-parametric analysis methods. This post will dive into some of the nuances of parametric data analysis.

A Parametric Primer

Parametric data analysis in landing page optimization builds a model of how the variables tested (the “independent variables”) impact the conversion rate (the “dependent variable”). For each recipe in your search space, the model will produce a prediction of the expected conversion rate (or other optimization criterion of interest).

Unless you happened to have sampled data on the exact recipe predicted by the model as being the best, you don’t really know if the prediction will hold up. That is why it is critical to run follow-up A/B split tests between the predicted best challenger recipe and the original baseline recipe for all parametric data analysis methods.

Building a Model

After you collect your data, you can build a model that expresses how your dependent variable (e.g., the conversion rate) varies based on the settings of your independent variables (e.g., your tuning elements and their specific values).

The models are made up of two types of components.

  • Main effects describe the impact of an individual variable value on the results. In other words, they look at each variable in isolation, and see how changing its value affects the results.
  • Interaction effects consider combinations of variable values, and how they influence each other when presented together. Interaction effects are possible among two or more variable values. For example, if you had five variables in your test, you could have interactions involving any subset of two, three, four, or five variable values. Interactions involving smaller numbers of variables are called lower order, while those involving many variables are called higher order.

Variables vs. Factors

Variables are commonly referred to as factors in parametric multivariate testing terminology. Likewise, variable values are often referred to as levels. If a variable has a branching factor of two, the levels are often referred to as “high” and “low” (or are denoted by “+1” and “-1”). Similarly, three levels are often denoted by “+1”, “0”, and “-1”.

Usually parametric multivariate testing uses the general mathematical class of linear models based on the analysis of variance (ANOVA). In other words, you are trying to predict the output variable by adding up the contributions of all of the possible main effects and interaction effects of the input variables. You start with the average value of your output variable in the test, and then add in the positive or negative impact of your input variables and their interactions.

A Simple Parametric Example

Let’s consider the simplest possible multivariate example. Assume that you are testing a new call-to-action button and are considering two colors (blue, green) and two font styles (Arial, Times Roman) for the text:

  • V1a = blue
  • V1b = green
  • V2a = Arial
  • V2b = Times Roman

You can create a model of the conversion rate that “fits” your data as well as possible and uses the average value, main effects, and all interactions. The coefficients (denoted by c’s in front of each effect) indicate the magnitude of the contribution of each effect and can be either positive or negative:

CR = c1 + c2, V1a + c3, V1b + c4, V2a + c5, V2b

+ c6, V1a:V2a + c7, V1a:V2b + c8, V1b:V2a + c9, V1b:V2b

C1 represents the average value, c2 to c5 are the main effects, and c6 to c9 are the two variable interaction effects (involving all four possible combinations of the two variables).

Here’s Where It Gets Complicated

Let’s assume that your experiment is slightly larger. You now add a third two-way variable to the test (designated by V3a and V3b). The full model with all interactions is shown below:

CR = c1 + c2, V1a + c3, V1b + c4, V2a + c5, V2b

+ c6, V3a + c7, V3b

+ c8, V1a:V2a + c9, V1a:V2b + c8, V1b:V2a + c9, V1b:V2b

+ c10, V1a:V3a + c11, V1a:V3b + c12, V1b:V3a + c13, V1b:V3b

+ c14, V1a:V3a + c15, V1a:V3b + c16, V1b:V3a + c17, V1b:V3b

+ c18, V1a:V2a:V3a + c19, V1a:V2a:V3b

+ c20, V1a:V2b:V3a + c21, V1a:V2b:V3b

+ c22, V1b:V2a:V3a + c23, V1b:V2a:V3b

+ c24, V1b:V2b:V3a + c25, V1b:V2b:V3b

As you can see, the number of coefficients that you must now estimate in the model has mushroomed from 9 to 25. For the first time, you see the presence of three-variable interaction effects.

If you have a higher branching factor for each variable, or a larger number of variables, the number of coefficient terms in the model grows very quickly.

With a large number of coefficient terms in a parametric model, it becomes impossible to accurately estimate each one.

Fractional factorial parametric approaches force you to choose the complexity of your model ahead of time. This means you must somehow decide in advance which main effects are important, and also which interactions will be included in the model.

The simpler your model is, the fewer recipes will need to be sampled during data collection.

Parametric Analysis Resolution

Based on the complexity of your parametric model, you can determine its resolution. The resolution is a scale that describes your ability to separate out the main effects and lower-order interactions with a particular data collection experimental design. The meaning of “confounded” in statistical parlance refers to the failure to distinguish among different things or mixing them up.

Resolution II

  • Main effects are confounded with others.

Resolution III

  • Can estimate main effects, but they may be confounded by two variable interactions.

Resolution IV

  • Can estimate main effects unconfounded by two variable interactions.
  • Can estimate two variable interactions, but they may be confounded by other two variable interactions.

Resolution V

  • Can estimate main effects unconfounded by three (or lower) variable interactions.
  • Can estimate two variable interactions unconfounded by other two variable interactions.
  • Can estimate three variable interactions, but they may be confounded by two variable interactions.

Resolution VI

  • Can estimate main effects unconfounded by four (or lower) variable interactions.
  • Can estimate two variable interactions unconfounded by other three (or lower) variable interactions.
  • Can estimate three variable interactions, but they may be confounded by other three variable interactions.

The most common types of designs are Resolution III to V. Resolution II designs are not useful because you cannot even estimate the main effects properly. Resolution VI and above are too complex and assume that high-order interactions are common. Higher resolution designs sample across a larger fraction of the whole search space. Simpler Resolution III designs are sparse and sample only a small proportion of the search space.

In my next post, we’ll take a similar deep-dive into full factorial and fractional factorial data collection methods.

Related reading

A cartoon depicting web analytics. It shows a bubble with the letters WWW in it, surrounded by "click! click!". An arrow leads from the bubble to a notepad with graphs and charts on it. Another arrow leads to a chart on a piece of paper, with a lightbulb next to it, which leads on to a spanner with the words "tweak! tweak!" next to it. In the bottom left corner is a box with four bullet points in it: gather, report, analyse and optimise.