Why Can’t Johnny Count Ad Impressions?

“We need standards.”

How many times have you heard that sentence uttered at conferences, in the press, and in casual conversation over the last few years?

This industry has come a long way, and various segments have indeed begun to adhere to standards. Isn’t it funny, though, how we used to have hundreds of different sizes for ads online, then the Coalition for Advertising Support Information (CASIE)/Interact Advertising Bureau (IAB) standards came out and our choices narrowed down to around 10? Then the market went to hell, and everyone started introducing all kinds of crazy various sizes again. We had standards. Now that’s no longer the case.

The lack of standards in the ad-serving arena has been particularly frustrating. This is one of the most exasperating parts of my job. It blows my mind that after all this time and all the evolution of Web technology, we still can’t get third-party ad-serving numbers to match publisher numbers reliably. Especially in recent months, when advertising budgets have been tight and clients demanding massive efficiency gains and lower management fees, it’s driven me crazy that we waste so much time reconciling ad-serving numbers. Some industry folks I spoke to while researching this piece suggested the industry average for ad-serving discrepancies could be 25 percent or higher.

We’re not going to be able to solve this issue here, but we can have a look at some of the most common causes of discrepancies. If you’re currently using a third-party ad server, odds are the software maker has also published documentation about discrepancies specific to its system — this is definitely worth a look.

Technical Quirks

There are, unfortunately, many different reasons for ad-serving discrepancies. Perhaps the most wearisome is differing definitions of the various metrics. Publishers and third-party servers often record the impression at different points in the serving process, creating a lot of room for error.

For example, the user could move away from the page before the ad is completely downloaded. The publisher’s ad system serves the page and ad tag, records an impression, and calls to the third-party ad server. While the ad is loading, the user clicks on another link on the page, and the third-party system doesn’t record an impression.

Another big issue is filtration of robots and other types of nonhuman traffic. The IAB publishes a list of known spiders, but even this list has some optional bots on it. So, not everyone is filtering out the same traffic. As if we don’t have enough other problems with counting methodologies and whatnot, we’re not even looking at the same data set all of the time.

For example, one of the optional filters is the known user/agent string for offline browsers. Some folks believe this should not be counted and, therefore, filter out the traffic generated. But others leave it in, counting impressions just as if the user were actually online.

Some systems also filter by IP address. Most big networks, for example, filter out known bad IPs, but most third-party ad servers do not.

Human Error

So far, we’ve looked at technological quirks that are causing discrepancies, but the other thing we have to consider is good old human error. This actually might be the most aggravating cause. I’ve seen it happen repeatedly. We traffic our third-party tags to the publisher, and the tags somehow are implemented incorrectly. Our staff spends a lot of time doing banner checks to make sure that ads are showing up where they’re supposed to and that they click through to the right page. Unfortunately, they can only check the visible stuff; there are many things that could go wrong under the hood, so to speak, that could throw off impression or click counts.

Take random-number generation, for example. Some ad servers use a random number dynamically inserted into the image URL to force a browser to call the image from the server, not from the cache. Other ad servers use the random number not as a cache buster but rather to correlate clicks and impressions. This works but is not a perfect solution (see my column, “Cache Busting: Busted?”). In addition to that menacing problem, random number scripts are often buggy or set up improperly, adding more confusion and misinformation to an already complex issue.

Truth be told, ad servers shouldn’t have to use random numbers for cache busting — why not use HTTP no-cache protocols? Rumor has it that both DoubleClick’s and Bluestreak’s Ion ad servers are using these no-cache protocols, but not all ad servers have followed their lead.

The odds are squarely stacked against any kind of quick resolution to the ongoing discrepancies issue. Perhaps one day we as an industry will come to a solution. Until then, however, we’re stuck with reconciliation.

Related reading