Probability theory is a branch of mathematics that deals with the description and analysis of random events. The key building blocks of this framework are as follows:
Let's use a fair gaming die as an example. The top face of the die can take on one of six possible outcomes (i.e., one, two, three, four, five, six). The probability distribution function is uniform (i.e., there is an equal one-in-six chance of any value between one and six coming up). When you sum up all of the possible probabilities, they add up to exactly one.
There are two kinds of processes considered in probability theory: deterministic and stochastic.
In our die example, the stochastic process is the repeated roll of the die. Each roll will produce a random variable outcome (one of the six possible values), and successive rolls are independent of each other (what was rolled on the previous attempt has no influence on the likelihood of any particular number coming up on the next roll).
An event in probability theory is a set of all possible outcomes to which a probability is assigned (also called the sample set). In the simplest case, the set of possible outcomes is finite. Each of the basic possible outcomes is called an elementary event, but more complex events can be constructed by selecting larger groupings of elementary events (a proper subset of the sample space).
In our die example, the elementary events are individual possible values of a die roll. But we can also construct other events and assign the proper probabilities to them (e.g., an even roll of the die - with a probability of one-half - or a roll with a value greater than four - with a probability of one-third).
Probability Applied to Landing Page Testing
So how does all of this apply to landing page optimization?
The random variables are the visits to your site from the traffic sources that you have selected for the test. As I have already mentioned, the audience itself may be subject to sampling bias. The probability distribution function is pretty simple in most cases. You are counting whether or not the conversion happened as a result of the visit. You are assuming that there is some underlying and fixed probability of the conversion happening, and that the only other possible outcome is that the conversion does not happen (that is, a visit is a Bernoulli random variable that can result in conversion, or not).
As an example, let's assume that the actual conversion rate for a landing page is 2 percent. So there is a small chance that the conversion will happen (2 percent), and a much larger chance that it will not (98 percent) for any particular visitor. As you can see, the sum of the two possible outcome probabilities exactly equals one (2 percent + 98 percent = 100 percent) as required.
The stochastic process is the flow of visitors from the traffic sources used for the test. Key assumptions about the process are that the behavior of the visitors does not change over time, and that the population from which visitors are drawn remains the same. Unfortunately, both of these are routinely violated to a greater or lesser extent in the real world. The behavior of visitors changes due to seasonal factors, or with changing sophistication and knowledge levels about your products or industry. The population itself changes based on your current marketing mix. Most businesses are constantly adjusting and tweaking their traffic sources (e.g., by changing PPC bid prices and the resulting keyword mix that their audience arrives from). The result is that your time series, which is supposed to return a steady stream of yes or no answers (based on a fixed probability of a conversion) actually has a changing probability of conversion. In mathematical terms, your time series is non-stationary and changes its behavior over time.
The independence of the random variables in the stochastic process is also a critical theoretical requirement. However, the behavior on each visit is not necessarily independent. A person may come back to your landing page a number of times, and their current behavior would obviously be influenced by their previous visits. You might also have a bug or an overload condition where the actions of some users influence the actions that other users can take. For this reason it is best to use a fresh stream of visitors (with a minimal percentage of repeat visitors if possible) for your landing page test audience. Repeat visitors are by definition biased because they have voluntarily chosen to return to your site, and are not seeing it for the first time at random.
This is also a reason to avoid using landing page testing with an audience consisting of your in-house e-mail list. The people on the list are biased because they have self-selected to receive ongoing messages from you, and because they have already been exposed to previous communications.
The event itself can also be more complicated than the simple did-the-visitor-convert determination. In an e-commerce catalog, it is important to know not only whether a sale happened, but also its value. If you were to tune only for higher conversion rate, you could achieve that by pushing low-margin and low-cost products that people are more likely to buy. But this would not necessarily result in the highest profits.
Many of the engagements at my company involve tuning for the highest possible revenue per visitor (or profit per visitor after considering the variable costs of the conversion action). For these kinds of situations, you need to consider real-valued random variables and their cumulative distribution functions. That discussion is more involved and is beyond the scope of this column.
Law of Large Numbers
The law of large numbers states that if a random variable with an underlying probability (p) is observed repeatedly during independent experiments, the ratio of the observed frequency of that event to the total number of experiments will converge to p.
Let's continue with our die rolling example. The law of large numbers guarantees that if we roll the die enough times, the percentage of sixes rolled will approach exactly one-sixth of the total number of rolls (i.e., its expected percentage in the probability distribution function). An intuitive way of understanding this is that over the long run, any streaks of rolling non-sixes will eventually be counteracted by streaks of rolling extra sixes.
The exciting thing about this law is that it ties something that you can observe (the actual conversion percentage in our test) to the unknown underlying actual conversion rate of your landing page. It guarantees the stable long-term results of the random visitor events.
However, before you start celebrating, it is important to realize that this law is based on a very large number of samples, and only guarantees that you will over the long term eventually come close to the actual conversion rate. In reality, your knowledge of the actual conversion rate will accumulate slowly.
Moreover, the law of large numbers does not guarantee that you will converge to the correct answer with a small amount of data. In fact, it almost guarantees that over a short period of time, your estimate of conversion rate will be incorrect. Short-term streaks can and do cause conversion rates to significantly deviate from the true value.
The best way to look at this situation is to keep in mind that collecting more data allows you to make increasingly more accurate estimates of the true underlying conversion rate. However, your estimate will always be subject to some error; moreover, you can only know approximate bounds on the size of this error.
Meet Your Favorite ClickZ Contributors
Many of ClickZ's leading expert contributors will be at ClickZ Live, the new online and digital marketing event kicking off in New York (March 31-April 3). Hear from the likes of: Jeremy Hull, Lisa Raehsler, Andrew Goodman, Bryan Eisenberg, Mathew Sweezey, Aaron Kahlow, Stephanie Miller, Simms Jenkins, Jeanne S. Jennings, Dave Hendricks and more!
Tim Ash is CEO of SiteTuners.com, a landing page optimization firm that offers conversion consulting, full-service guaranteed-improvement tests, and software tools to improve conversion rates. SiteTuners' AttentionWizard.com visual attention prediction tool can be used on a landing page screenshot or mock-up to quickly identify major conversion issues. He has worked with Google, Facebook, American Express, CBS, Sony Music, Universal Studios, Verizon Wireless, Texas Instruments, and Coach.
Tim is a highly-regarded presenter at SES, eMetrics, PPC Summit, Affiliate Summit, PubCon, Affiliate Conference, and LeadsCon. He is the chairperson of ConversionConference.com, the first conference focused on improving online conversions. A columnist for several publications including ClickZ, he's host of the weekly Landing Page Optimization show and podcast on WebmasterRadio.fm. His columns can be found in the Search Engine Watch archive.
He received his B.S. and M.S. during his Ph.D. studies at UC San Diego. Tim is the author of the bestselling book, "Landing Page Optimization."
Connect with Tim on Google+.
March 19, 2014