Synchronize Your Serving Data

Two weeks ago, I wrote a piece on the Interactive Advertising Bureau’s (IAB’s) effort to address the issue of spiders’ and robots’ effects on reported traffic. Working with ABC Interactive (ABCi), the IAB is making a list of spiders and robots to make it easier to filter this activity from traffic reports. At the time, I made a bit of a gaffe when I explained that this list was sorted by IP address. Thankfully, ABCi was quick to call me on it, and I had a more in-depth conversation with the organization about how this database will help the industry.

The database utilizes user-agent strings, and not IP addresses as I had originally claimed. User-agent strings are what identify browser clients to Web servers. For example, a user surfing the Web with Netscape 4.0 might return a user-agent string containing “Mozilla/4.02.” Spiders, particularly the ones originating from the search engines, usually identify themselves through the user-agent string. A spider from AltaVista might contain “Altavista” within its user-agent string, making it easily identifiable. This makes the IP address matching issues I mentioned in the prior article somewhat moot.

In speaking with a representative from ABCi last week, I also got some insight into how the organizations envision the database being used. Their tool is meant to pick up on what they call commercial spiders — the ones that are in regular use by search engines and other online information services and that make a material impact on traffic reports. The database is not meant to be a catchall for all types of spiders and robots. ABCi was careful to make a distinction between commercial spiders and personal ones. An example of a personal spider might be the bot written by a computer science student, as in the example I gave in my article.

The database wasn’t designed to be used as a real-time filtration system. Subscribers can use the data to filter traffic reports based on the information contained in it, but it may not be suitable for real-time filtration of things such as advertising activity reports. This raises the question: Ad servers have been filtering spiders by observing their behavior for years. Would it not be better to filter spiders at the server level by observing their behavior?

Perhaps getting rid of these spurious numbers might best be achieved by a combination of solutions. Behavioral filtration can take care of the “little guys,” while the ABCi database filters the ones that are known for causing the big problems.

What we know for certain is that cached ad activity, diverse counting methodologies, and the presence of spiders and robots are the leading causes of discrepancies between advertiser-side and publisher-side ad activity reports. Many agencies are able to synch their activity reports with those of publishers to within 10 percent by addressing the first two issues. Based on my past experience with spiders and robots, I can say that this mechanical activity may be responsible for a good chunk of the remaining 10 percent.

If advertisers and publishers can get their reports synched up fairly well, we will have made significant progress toward addressing the counting methodology problem that has plagued our industry since ad serving first appeared on the scene.

Related reading

Man fly fishing
Retro Robot wound up with key. Isolated with clipping path