Probability theory is a branch of mathematics that deals with the description and analysis of random events. The key building blocks of this framework are as follows:
- A random variable is a quantity whose value is random or unpredictable, and to which we can assign a probability distribution function.
- The probability distribution function determines the set of possible values that can be assigned to the random variable, along with their likelihood. The total of all possible outcomes’ likelihood must by definition equal one (i.e., one of the possible outcomes must happen, and its value will be assigned to the random variable).
Let’s use a fair gaming die as an example. The top face of the die can take on one of six possible outcomes (i.e., one, two, three, four, five, six). The probability distribution function is uniform (i.e., there is an equal one-in-six chance of any value between one and six coming up). When you sum up all of the possible probabilities, they add up to exactly one.
There are two kinds of processes considered in probability theory: deterministic and stochastic.
- A deterministic process will go along a set path depending on its starting conditions. In other words, if you know where it starts, you can exactly compute where it will end up at some point in the future.
- A stochastic process (also called a random process) is more difficult to understand. You cannot tell exactly where it will end up, but you know (based on its probability distribution function) that certain outcomes are more likely. In the simplest case, a stochastic process can be described as a sequence of samples from random variables. If these samples can be associated with particular points in time, it is a time series (a series of data points that were measured at successive times).
In our die example, the stochastic process is the repeated roll of the die. Each roll will produce a random variable outcome (one of the six possible values), and successive rolls are independent of each other (what was rolled on the previous attempt has no influence on the likelihood of any particular number coming up on the next roll).
An event in probability theory is a set of all possible outcomes to which a probability is assigned (also called the sample set). In the simplest case, the set of possible outcomes is finite. Each of the basic possible outcomes is called an elementary event, but more complex events can be constructed by selecting larger groupings of elementary events (a proper subset of the sample space).
In our die example, the elementary events are individual possible values of a die roll. But we can also construct other events and assign the proper probabilities to them (e.g., an even roll of the die – with a probability of one-half – or a roll with a value greater than four – with a probability of one-third).
Probability Applied to Landing Page Testing
So how does all of this apply to landing page optimization?
The random variables are the visits to your site from the traffic sources that you have selected for the test. As I have already mentioned, the audience itself may be subject to sampling bias. The probability distribution function is pretty simple in most cases. You are counting whether or not the conversion happened as a result of the visit. You are assuming that there is some underlying and fixed probability of the conversion happening, and that the only other possible outcome is that the conversion does not happen (that is, a visit is a Bernoulli random variable that can result in conversion, or not).
As an example, let’s assume that the actual conversion rate for a landing page is 2 percent. So there is a small chance that the conversion will happen (2 percent), and a much larger chance that it will not (98 percent) for any particular visitor. As you can see, the sum of the two possible outcome probabilities exactly equals one (2 percent + 98 percent = 100 percent) as required.
The stochastic process is the flow of visitors from the traffic sources used for the test. Key assumptions about the process are that the behavior of the visitors does not change over time, and that the population from which visitors are drawn remains the same. Unfortunately, both of these are routinely violated to a greater or lesser extent in the real world. The behavior of visitors changes due to seasonal factors, or with changing sophistication and knowledge levels about your products or industry. The population itself changes based on your current marketing mix. Most businesses are constantly adjusting and tweaking their traffic sources (e.g., by changing PPC bid prices and the resulting keyword mix that their audience arrives from). The result is that your time series, which is supposed to return a steady stream of yes or no answers (based on a fixed probability of a conversion) actually has a changing probability of conversion. In mathematical terms, your time series is non-stationary and changes its behavior over time.
The independence of the random variables in the stochastic process is also a critical theoretical requirement. However, the behavior on each visit is not necessarily independent. A person may come back to your landing page a number of times, and their current behavior would obviously be influenced by their previous visits. You might also have a bug or an overload condition where the actions of some users influence the actions that other users can take. For this reason it is best to use a fresh stream of visitors (with a minimal percentage of repeat visitors if possible) for your landing page test audience. Repeat visitors are by definition biased because they have voluntarily chosen to return to your site, and are not seeing it for the first time at random.
This is also a reason to avoid using landing page testing with an audience consisting of your in-house e-mail list. The people on the list are biased because they have self-selected to receive ongoing messages from you, and because they have already been exposed to previous communications.
The event itself can also be more complicated than the simple did-the-visitor-convert determination. In an e-commerce catalog, it is important to know not only whether a sale happened, but also its value. If you were to tune only for higher conversion rate, you could achieve that by pushing low-margin and low-cost products that people are more likely to buy. But this would not necessarily result in the highest profits.
Many of the engagements at my company involve tuning for the highest possible revenue per visitor (or profit per visitor after considering the variable costs of the conversion action). For these kinds of situations, you need to consider real-valued random variables and their cumulative distribution functions. That discussion is more involved and is beyond the scope of this column.
Law of Large Numbers
The law of large numbers states that if a random variable with an underlying probability (p) is observed repeatedly during independent experiments, the ratio of the observed frequency of that event to the total number of experiments will converge to p.
Let’s continue with our die rolling example. The law of large numbers guarantees that if we roll the die enough times, the percentage of sixes rolled will approach exactly one-sixth of the total number of rolls (i.e., its expected percentage in the probability distribution function). An intuitive way of understanding this is that over the long run, any streaks of rolling non-sixes will eventually be counteracted by streaks of rolling extra sixes.
The exciting thing about this law is that it ties something that you can observe (the actual conversion percentage in our test) to the unknown underlying actual conversion rate of your landing page. It guarantees the stable long-term results of the random visitor events.
However, before you start celebrating, it is important to realize that this law is based on a very large number of samples, and only guarantees that you will over the long term eventually come close to the actual conversion rate. In reality, your knowledge of the actual conversion rate will accumulate slowly.
Moreover, the law of large numbers does not guarantee that you will converge to the correct answer with a small amount of data. In fact, it almost guarantees that over a short period of time, your estimate of conversion rate will be incorrect. Short-term streaks can and do cause conversion rates to significantly deviate from the true value.
The best way to look at this situation is to keep in mind that collecting more data allows you to make increasingly more accurate estimates of the true underlying conversion rate. However, your estimate will always be subject to some error; moreover, you can only know approximate bounds on the size of this error.
Emily Ma, product director of Tencent’s advertising platform products department, was a keynote speaker at ClickZ Live Shanghai where she discussed the ... read more
The terms that customers type into your site search function can help you to gain an understanding of user behaviour and can be used to optimise ... read more
Google Analytics comes with lots of standard reports and settings, but with a little customisation you can extract much more value. One way is ... read more