Probability Theory for Landing Page Testing

  |  February 28, 2011   |  Comments

A tutorial on what probability theory is, the two kinds of processes, and how it all applies to landing page testing.

Probability theory is a branch of mathematics that deals with the description and analysis of random events. The key building blocks of this framework are as follows:

  • A random variable is a quantity whose value is random or unpredictable, and to which we can assign a probability distribution function.
  • The probability distribution function determines the set of possible values that can be assigned to the random variable, along with their likelihood. The total of all possible outcomes' likelihood must by definition equal one (i.e., one of the possible outcomes must happen, and its value will be assigned to the random variable).

Let's use a fair gaming die as an example. The top face of the die can take on one of six possible outcomes (i.e., one, two, three, four, five, six). The probability distribution function is uniform (i.e., there is an equal one-in-six chance of any value between one and six coming up). When you sum up all of the possible probabilities, they add up to exactly one.

Stochastic Processes

There are two kinds of processes considered in probability theory: deterministic and stochastic.

  • A deterministic process will go along a set path depending on its starting conditions. In other words, if you know where it starts, you can exactly compute where it will end up at some point in the future.
  • A stochastic process (also called a random process) is more difficult to understand. You cannot tell exactly where it will end up, but you know (based on its probability distribution function) that certain outcomes are more likely. In the simplest case, a stochastic process can be described as a sequence of samples from random variables. If these samples can be associated with particular points in time, it is a time series (a series of data points that were measured at successive times).

In our die example, the stochastic process is the repeated roll of the die. Each roll will produce a random variable outcome (one of the six possible values), and successive rolls are independent of each other (what was rolled on the previous attempt has no influence on the likelihood of any particular number coming up on the next roll).

Events

An event in probability theory is a set of all possible outcomes to which a probability is assigned (also called the sample set). In the simplest case, the set of possible outcomes is finite. Each of the basic possible outcomes is called an elementary event, but more complex events can be constructed by selecting larger groupings of elementary events (a proper subset of the sample space).

In our die example, the elementary events are individual possible values of a die roll. But we can also construct other events and assign the proper probabilities to them (e.g., an even roll of the die - with a probability of one-half - or a roll with a value greater than four - with a probability of one-third).

Probability Applied to Landing Page Testing

So how does all of this apply to landing page optimization?

The random variables are the visits to your site from the traffic sources that you have selected for the test. As I have already mentioned, the audience itself may be subject to sampling bias. The probability distribution function is pretty simple in most cases. You are counting whether or not the conversion happened as a result of the visit. You are assuming that there is some underlying and fixed probability of the conversion happening, and that the only other possible outcome is that the conversion does not happen (that is, a visit is a Bernoulli random variable that can result in conversion, or not).

As an example, let's assume that the actual conversion rate for a landing page is 2 percent. So there is a small chance that the conversion will happen (2 percent), and a much larger chance that it will not (98 percent) for any particular visitor. As you can see, the sum of the two possible outcome probabilities exactly equals one (2 percent + 98 percent = 100 percent) as required.

The stochastic process is the flow of visitors from the traffic sources used for the test. Key assumptions about the process are that the behavior of the visitors does not change over time, and that the population from which visitors are drawn remains the same. Unfortunately, both of these are routinely violated to a greater or lesser extent in the real world. The behavior of visitors changes due to seasonal factors, or with changing sophistication and knowledge levels about your products or industry. The population itself changes based on your current marketing mix. Most businesses are constantly adjusting and tweaking their traffic sources (e.g., by changing PPC bid prices and the resulting keyword mix that their audience arrives from). The result is that your time series, which is supposed to return a steady stream of yes or no answers (based on a fixed probability of a conversion) actually has a changing probability of conversion. In mathematical terms, your time series is non-stationary and changes its behavior over time.

The independence of the random variables in the stochastic process is also a critical theoretical requirement. However, the behavior on each visit is not necessarily independent. A person may come back to your landing page a number of times, and their current behavior would obviously be influenced by their previous visits. You might also have a bug or an overload condition where the actions of some users influence the actions that other users can take. For this reason it is best to use a fresh stream of visitors (with a minimal percentage of repeat visitors if possible) for your landing page test audience. Repeat visitors are by definition biased because they have voluntarily chosen to return to your site, and are not seeing it for the first time at random.

This is also a reason to avoid using landing page testing with an audience consisting of your in-house e-mail list. The people on the list are biased because they have self-selected to receive ongoing messages from you, and because they have already been exposed to previous communications.

The event itself can also be more complicated than the simple did-the-visitor-convert determination. In an e-commerce catalog, it is important to know not only whether a sale happened, but also its value. If you were to tune only for higher conversion rate, you could achieve that by pushing low-margin and low-cost products that people are more likely to buy. But this would not necessarily result in the highest profits.

Many of the engagements at my company involve tuning for the highest possible revenue per visitor (or profit per visitor after considering the variable costs of the conversion action). For these kinds of situations, you need to consider real-valued random variables and their cumulative distribution functions. That discussion is more involved and is beyond the scope of this column.

Law of Large Numbers

The law of large numbers states that if a random variable with an underlying probability (p) is observed repeatedly during independent experiments, the ratio of the observed frequency of that event to the total number of experiments will converge to p.

Let's continue with our die rolling example. The law of large numbers guarantees that if we roll the die enough times, the percentage of sixes rolled will approach exactly one-sixth of the total number of rolls (i.e., its expected percentage in the probability distribution function). An intuitive way of understanding this is that over the long run, any streaks of rolling non-sixes will eventually be counteracted by streaks of rolling extra sixes.

The exciting thing about this law is that it ties something that you can observe (the actual conversion percentage in our test) to the unknown underlying actual conversion rate of your landing page. It guarantees the stable long-term results of the random visitor events.

However, before you start celebrating, it is important to realize that this law is based on a very large number of samples, and only guarantees that you will over the long term eventually come close to the actual conversion rate. In reality, your knowledge of the actual conversion rate will accumulate slowly.

Moreover, the law of large numbers does not guarantee that you will converge to the correct answer with a small amount of data. In fact, it almost guarantees that over a short period of time, your estimate of conversion rate will be incorrect. Short-term streaks can and do cause conversion rates to significantly deviate from the true value.

The best way to look at this situation is to keep in mind that collecting more data allows you to make increasingly more accurate estimates of the true underlying conversion rate. However, your estimate will always be subject to some error; moreover, you can only know approximate bounds on the size of this error.

Tags:

ClickZ Live Chicago Join the Industry's Leading eCommerce & Direct Marketing Experts in Chicago
ClickZ Live Chicago (Nov 3-6) will deliver over 50 sessions across 4 days and 10 individual tracks, including Data-Driven Marketing, Social, Mobile, Display, Search and Email. Check out the full agenda and register by Friday, Oct 3 to take advantage of Early Bird Rates!

ABOUT THE AUTHOR

Tim Ash

Tim Ash is CEO of SiteTuners.com, a landing page optimization firm that offers conversion consulting, full-service guaranteed-improvement tests, and software tools to improve conversion rates. SiteTuners' AttentionWizard.com visual attention prediction tool can be used on a landing page screenshot or mock-up to quickly identify major conversion issues. He has worked with Google, Facebook, American Express, CBS, Sony Music, Universal Studios, Verizon Wireless, Texas Instruments, and Coach.

Tim is a highly-regarded presenter at SES, eMetrics, PPC Summit, Affiliate Summit, PubCon, Affiliate Conference, and LeadsCon. He is the chairperson of ConversionConference.com, the first conference focused on improving online conversions. A columnist for several publications including ClickZ, he's host of the weekly Landing Page Optimization show and podcast on WebmasterRadio.fm. His columns can be found in the Search Engine Watch archive.

He received his B.S. and M.S. during his Ph.D. studies at UC San Diego. Tim is the author of the bestselling book, "Landing Page Optimization."

Connect with Tim on Google+.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Analytics newsletter delivered to you. Subscribe today!

COMMENTS

UPCOMING EVENTS

Featured White Papers

IBM: Social Analytics - The Science Behind Social Media Marketing

IBM Social Analytics: The Science Behind Social Media Marketing
80% of internet users say they prefer to connect with brands via Facebook. 65% of social media users say they use it to learn more about brands, products and services. Learn about how to find more about customers' attitudes, preferences and buying habits from what they say on social media channels.

An Introduction to Marketing Attribution: Selecting the Right Model for Search, Display & Social Advertising

An Introduction to Marketing Attribution: Selecting the Right Model for Search, Display & Social Advertising
If you're considering implementing a marketing attribution model to measure and optimize your programs, this paper is a great introduction. It also includes real-life tips from marketers who have successfully implemented attribution in their organizations.

Jobs

    • Internet Marketing Campaign Manager
      Internet Marketing Campaign Manager (Straight North, LLC) - Downers GroveWe are looking for a talented Internet Marketing Campaign Manager...
    • Internet Marketing Specialist
      Internet Marketing Specialist (InteractRV) - DallasInternet Marketing Specialist InteractRV - Anywhere Telecommute, USA SEM | SEO | Content Creator...
    • Tier 1 Support Specialist
      Tier 1 Support Specialist (Agora Inc.) - BaltimoreThis position requires a highly motivated and multifaceted individual to contribute to and be...