My mission for this week’s column: to accurately portray the shoddy industry-standard methods of collecting performance data without making you all throw your hands up in either confusion or disgust. Or both. This will be tricky.
Server Chains and Caching and Logs, Oh My
Let’s start by getting just a little techy, so everyone understands the reasons behind some of our uncertainty. We’ll start by chronicling the wild adventures of a single impression.
Our impression starts off innocently enough. Somewhere, a user clicks on a link to a page on a site. That site calls together content from two servers: a content server and an ad server. Most times, both return their desired HTML results, and they get displayed on the user’s browser on one page. Both servers log the request and the act of serving the content.
Already, some uncertainty rears its head. At the end of the campaign, a site might base its performance numbers off of the banner server’s numbers or the content server’s numbers. It might choose to count the requests that came in for banners, or the number of banners that it sent out. Believe it or not, these numbers differ significantly. These log files were not initially designed to measure real-world performance. They were originally intended to measure server loads and efficiency in serving requests computer stuff, not human stuff.
Alas, when the advertising industry “standardized” the definition of an impression a couple of years ago, the process was run mostly by the sites. Partially as a result, the current definition is so broad that a site can use about any methodology it chooses and still come up with “standard” impressions. When your grandmother sneezes, that probably costs some online advertiser somewhere around 10 cents. But before we start talking about how to rein in this disorder, we have to follow our impression further along its adventure down the wild side.
Caching the act of saving a page in a temporary file so your computer can quickly re-render it without pulling it back down from the Internet again adds another set of problems. Some sites employ one of a variety of cache-busting techniques that force your computer to go back to the Internet to make sure that a page viewed again gets counted as another impression. That different servers do this in different ways only adds to the confusion when comparing numbers from one source to another. Worse still, many types of caching exist. Your browser has a cache. Your ISP might employ a type of cache. Perhaps your employer has yet another cache for its network. One cache-busting technique may or may not get around some of these.
Several additional servers may also be inserted along the way, further removing the reported performance from reality. Sometimes ad agencies want the site’s banner server to call their own server so that they can measure the event firsthand. If the creative employs a richer form of media, it might require yet another specialized server for that purpose. Each link added into this chain can introduce statistically significant noise.
Our industry definition of an impression doesn’t help us much. It remains silent on all of these issues. There are even some sites that still count the impressions served to the “bots” that search engines use to scour the Internet to index content. When a banner has a feature called an image map, these bots even go ahead and click on every single pixel on the banners (that’s about 30,000 times for those of you paying by the click). Many users unnecessarily double-click on links, which actually sends two requests to the servers, sometimes doubling performance.
So what’s a poor buyer to do? Relax, it gets better from here. Next week’s column will show how we can derive more realistic numbers by collecting as much information as possible whether it be all the sites’ various methodologies or employing a completely different method of measurement on top of everything else.