5 Strategies for Improved A/B Testing

In a metrics-driven culture, product developers and marketers often become too reliant on A/B testing. This clogs the test pipeline with non-needle moving initiatives, unnecessary validations of well-known design/product principles, or ISO-testing features that couldn’t be launched without other bundled features. In essence, companies over-test, test incorrectly, and don’t look for opportunities to streamline the testing process.

During my 10+ years working at some of the world’s top search companies, I conducted many A/B tests both on the user interface (UI) as well as back-end features. Given the usage and revenue associated with a single search page, we were careful to test almost everything. We recognized there were many small changes that could alter user behavior or revenue significantly that were counter-intuitive. One surprising example was changing the bullets in front of the sponsored links from square to round. This minor change impacted click-through rate on the search ads significantly, leading to a multi-million dollar change in search revenue. Through my years, I’ve learned several strategies making the test process speedier, more streamlined, and better optimized to increase revenue.

1. Be aggressive with the first test to gauge the level of potential lift.

With the first test, ensure users will see the change with an aggressive version or call out of the feature. This will quickly determine whether the change will significantly move the metric that you are aiming to impact. If an extremely large or highlighted version does not move the needle, a tamer implementation will more than likely not. For instance, strip a lead form nearly bare rather than see if one form field makes a difference.

2. Don’t use A/B testing in place of design best practices or product sense.

Those in charge of product development can become too reliant on A/B tests: they allow their product sensibility to take a back seat and allow test results to make every decision for them. This results in product people testing every possible variation and clogging up the test pipeline. An example was when we were integrating image results on the search page. A product manager wanted to test four or five variations of the same test, using up many of our test slots for that test cycle. Some of the variations that were proposed for testing were three large images versus five smaller images versus four medium images. General design principles tell us that people like to view items in groupings of three or five and our product intuition tells us that people like larger images rather than smaller images.

3. Test your competitors’ user interface changes to understand their impact, especially if they have a metrics or big A/B testing culture.

Your competitors are testing new features and UI changes all the time. It is a best practice from a competitive intelligence and a product understanding perspective to test these changes and differences between your site’s UI and the competitions. This will uncover metrics changes that may be caused by these different implementations. Of course, this assumes your competitors are competent and are using a reliable A/B testing framework.

4. Use isolation testing on individual features but also try some big jumps and combinations.

It is a best practice to try isolation (ISO) testing; that is testing individual features in isolation. But oftentimes when three features are bundled, the features are first ISO tested and then combo tested. If the features are supposed to be bundled together, try a combo first or test the ISO and the combo at the same time. This saves a lot of time and test slots.

5. Test identical buckets against each other to gauge significance and see if your systems are working correctly.

Often times, as product developers, we wonder if our bucket results are moving our product in the right direction or if it’s sending us a strong enough signal to validate whether the product change is positive or not. How do we know if something’s wrong with our test system? It’s best practice to occasionally run identical buckets against each other and look at the variances between the two to see if there are issues or see if there is significant variance between identical buckets due to sample size or other factors.

While A/B testing and data analysis should be a fundamental part of your product development process for Internet products and services, we all need to step back and think how we can speed up our product development and reduce the bottlenecks in launching new products and feature sets. The number of A/B tests that can run concurrently on a website is constrained by:

  • The number of statistically significant tests that can be run with a particular site’s audience size.
  • Data collection and analysis work involved.
  • Potential disruption in user experience.

It’s essential these limited test slots are used in the most efficient manner and these common pitfalls are avoided.

Related reading

A white chalk drawing of a bird holding a hashtag, set against a green chalkboard backdrop.
An image of a pie chart styled as a medal, with stats about video content underneath.