Two Simple Tests to Try With Google AdWords Campaign Experiments

Issues of test design, keyword auction dynamics, and statistical significance are complex. To help marketers more reliably test key variables in their AdWords accounts, Google has created a killer little feature called AdWords Campaign Experiments.

You’ve no doubt heard of the concept of A/B testing landing pages by evenly rotating them to website visitors. The same concept, applied to ad copy, is behind the ad rotation feature in major paid search platforms, introduced by Google in February 2002. (Note documentation in today’s version of the feature. Ever sticklers for accuracy, Google now promises to rotate ads “more evenly,” not entirely “evenly.”) The reason these tests must be done with strict form is, of course, the scientific notion of control and an experimental group that lend statistical validity to any experiment. As much as possible, all other factors must be held constant to gain insight into the independent impact of changes in a single variable. Since those changes in themselves can touch off a dynamic response involving other variables, the two traffic streams being compared must be run concurrently with no bias in conditions.

The same principle can now be applied to various elements of your account, like keyword bids, matching options, ad group structure, and the presence or absence of certain keywords. That, in a nutshell, is what the AdWords Experiment feature does.

Although it’s bound to be developed further in the future, the Experiments setup already has some flexibility. You can decide whether to split the traffic flowing to the experiment and control groups 50-50, for example, or some other proportion.

Make no mistake: a lot of complexity lurks behind a simple test. So it doesn’t pay to get too fancy at first, either in testing or in explanation.

So here, I’ll start the conversation with two simple uses of the Experiments feature: (1) testing the (ROI and volume) impact of significant increases in bids on a few high-value keywords; (2) adding a new match type for a keyword to an ad group, overlapping with the same keyword that uses a different match type.

  1. Bid test experiment. In principle, this test sounds simple. Assume you have a strong interest in the keyword “wedding dresses.” In a week, you typically see about 500 clicks, a CTR of 3 percent, an 8 percent conversion rate, 40 orders, and a cost per order of $46, from an average ad position of 4.3. You could nibble around and bump your bid up or down by 10 percent, because you’re reasonably happy. But what you’re really interested in finding out is the real-world differences between you and those advertisers who are more typically sitting way up there in positions one and two more of the time. To achieve this, you believe (using the bid simulator) that you’d have to increase your bid by a whopping 60 percent.This change creates a potential cascade of responses. Just by moving ad position and possibly showing up in the top (“premium”) positions more of the time, you speculate that you’re likely to see drastically different conversion behavior, to say nothing of increases in CTR, impressions, clicks, sales, etc. The impact on sales volume and ROI is unknown. And as we saw from a previous bid test that did not use the Experiments setting, it’s more than a simple matter of being “slotted” in a higher ad position due to your higher bid. Due to the complex way that Google Quality Score works, you may be “eligible” to show up more often. So many moving parts! Gaaah!

    Experiments, then, are going to come in very handy. You don’t really want to run a serial test as we did previously, as helpful as it was. You’ll be left with experiment results that feel helpful, but are statistically invalid. Experiments will allow the dynamics of either bid amount to play out on their own, splitting the two bid settings evenly and concurrently.

    And what statistics will you be interested in? With a 60 percent higher bid running against your original bid, and assuming you have AdWords conversion tracking set up to your liking, you will quickly see:

    • Average ad positions associated with the two bids during the test period
    • CTRs for each
    • CPCs for each
    • Total spend for each
    • Transactions or conversions for each
    • Cost per transaction

    How does the interface handle the issue of statistical significance in results? There is a notation system that shows one arrow for the resulting outcome being 95 percent likely to be accurate (not random), two arrows for 99 percent, and three arrows for 99.9 percent. This way, you can decide for yourself if the change was worth it, even if the differentials in outcome are small.

  2. Introducing a new match type to an ad group. Adding a simple keyword sounds simple, doesn’t it? But anyone experienced at paid search knows that if you add another instance of the same keyword using a different match type, you are now creating a world of overlap, wherein the newly-added match type not only potentially shows ads in different positions to new or different users typing different actual search queries, it also potentially reduces the number of associated impressions and clicks from the existing match type, as the new type is now associated with some of the former query flow, depending on bids and quality scores. Yep, paid search really can make something seemingly simple into something that complicated.

Prima facie. I tend to believe that the new modified broad match option is more cost-effective than the ordinary broad match setting that allows for “expanded match.” But wouldn’t it be nice to see valid statistics for an ad group showing – not before and after – but concurrent A/B, what the ad group does with and without the addition of, say, the keyword: +wedding +dresses. So, I’d like to come in with the modified broad match on that keyword, and bid it higher, and then see how the two versions of the ad group do on all the core metrics, especially transactions and cost per order.

Now, with Experiments, we can do just this.

To me, the above tests are in themselves quite complex. Before rushing into esoteric experiments, try simple tests, building on the solid fundamentals of what you know about the workings of your campaigns. This setting isn’t for amateurs. Even pros should take care not to overdo it. Remember, you’re a marketer first, and a mad scientist in your spare time.

Related reading

Screenshot shows a Google search for outdoor grills, the shopping ads shows images with “in store” showing the product is available nearby.
Brand Top Level Domains