Cache Busting: Busted?

Ad-serving discrepancies are perhaps the biggest thorn in the side of online media buyers and analytics teams everywhere. This ubiquitous issue creates massive inefficiencies in agency operations and processes; it launches huge battles between buyers and publishers; it is one of a few reasons usually cited by traditional advertisers for not currently using the Web; and it is generally a sticky wicket.

White paper upon white paper outlines potential causes of these differences, which can range anywhere from 5 percent to well over 150 percent in extreme cases. And many of these papers make valid points — varying definitions of “impression,” proxy server and caching issues, and so on.

Technical folks who work on ad-serving systems have fought hard to resolve these issues, but it’s not an easy task. They apply patch after patch in an effort to get numbers to match, with minimal success. In fact, the grand irony is that one such patch fixes one problem but spawns another, far more insidious glitch. And I, in my great wisdom, have uncovered this bug.

And in saying “my great wisdom,” I mean that someone on the inside blabbed it to me. I did a little bit of fact-checking and corroborating and quickly discovered that this insider was right on the money.

One of the most significant causes of counting discrepancies has always been caching. Individual user browsers have their own caches of Web code and images, and large Internet service providers (ISPs) often cache frequently accessed content on their own proxy servers. The basic issue is that if an advertising image is called from a cache — either a browser or a proxy cache — the ad-serving software is not going to record it as an impression, even though by most definitions it should be counted.

Let’s walk through a user scenario to fully understand this issue. Let’s say I visit my personalized page on a major portal — myportal.com. My browser will request the HTML code and images from the portal’s servers. Some of the image calls will be redirected out to third-party servers to retrieve ad images.

Myportal.com’s ad-serving software will record an impression as it makes the request from the third-party server. Even though it might take less than a second for the ad to be delivered, lots of things could happen along the way: I might click “stop,” I might close the browser window, or perhaps I quickly click a link on the page before the ad is finished loading. In each case, myportal.com will count an impression but most third-party servers will not.

But let’s assume that everything goes perfectly and both servers record an impression. I continue about my business, reading a few articles on the site. Perhaps three or four pages later, the same ad from the home page might be called for again. My browser will notice that the image has exactly the same name, and it will simply load the image from the local cache. Myportal.com will likely record this as an impression, but since the third-party ad server was never contacted, the server will not.

Unless a cache-busting system is being used. The technical wizards who develop ad-serving software came up with a fantastic idea: If you insert a random number into the image’s URL, then the cache will not be able to recognize it as the same image. It will have a different name, and the browser will be forced to retrieve the image from the server rather than the cache.

Brilliant.

But here comes the problem. Many ad-serving systems — from those that were homegrown at most of the bigger sites to commercially available, “shrink-wrapped” packages to third-party systems — began to use a time stamp as the random number.

That would work… except that some of them made the time stamp accurate down to only the second. And think about it — how many ad requests does a major portal (or worse yet, one of the big third-party ad servers) receive in one second? Think hundreds, maybe thousands.

This is where it gets tricky. Let’s say that five users request an ad from a third-party server in the same second. The cache-busting system will assign each one of those users the same “random” number. Let us further suppose that the advertiser has six banners in an equal rotation. User 1 gets banner 1, user 2 gets banner 2, and so on. Each impression is likely to get recorded, but if all five users click (or even if two of them click) we have a problem.

Since they all have the same “random” number, the third-party server is likely to correlate all five clicks with the first banner it served — banner 1.

So what we wind up with is a flawed cache-busting system that is — at very best — only slightly more accurate than an ad-serving system with no cache buster. And it completely skews performance numbers, which agencies use religiously to optimize campaigns and make assumptions about the “right” mix of creative and sites for future campaigns.

Many sites and third parties are well aware of this issue and are working toward resolutions. Some have already rolled out new patches. Agencies continue to use these tools because the benefits have continued to outweigh the shortcomings. Hopefully, continued evolution will eliminate such issues as these.

But I’d suggest having a second look at your ad-serving system to see if it is affected by this issue. Keep it in the back of your mind, and push your vendor to correct the problem. Otherwise, you’re optimizing on numbers that might not be as accurate as you’d like.

Related reading

women-in-tech
nurcin-erdogan-loeffler_wikipedia-definition-the-future_featured-image
pwc_experience-centre_hong-kong_featured-image
12919894_10154847711668475_3893080213398294388_n
<