Spiders and Robots and Crawlers, Oh My!

The Web is full of creepy-crawlies: spiders, bots, and other slithery things that skew your measurements and statistics. The IAB has a plan to play exterminator -- but is it a sound one?

Author

Idris Nagri

Date published October 25, 2001 Categories

On Monday, the Interactive Advertising Bureau (IAB) made an interesting move intended to inch the industry closer to a standard for online campaign measurement. Recognizing the problems spiders and robots pose in measurement, the IAB announced that it will work with ABC Interactive (ABCi), a leader in the site-auditing space, to provide a master list of spiders and robots for the benefit of the industry.

Spiders and robots are applications that crawl the Web indexing and retrieving content, usually for the benefit of search engines, information resources, and news organizations. As they travel, they become responsible for quite a bit of traffic that gets counted in traffic statistics and ad campaign reports. A master list of these spiders would be useful to the industry for filtering purposes. According to an IAB press release, ABCi will create and maintain the master list for the benefit of IAB members and ABCi customers. The list will be updated monthly.

Like many of its initiatives, the IAB gets an A+ for intent here, but a D at best for execution. The idea is sound, but it needs to be refined.

Let’s talk a bit about spiders and robots and how they affect reporting on traffic and campaign stats. The terms “spider,” “robot,” and “crawler” have been used interchangeably for years to describe applications that gather information from the Internet. These applications can surf the Web, much like you and I do. In their search for information, spiders can artificially inflate traffic statistics. A Web server typically cannot distinguish between information requested by a spider and information requested by a person. Sometimes spiders request ads from a server. Sometimes they’ll even follow links from ads, whether they’re text links or banners, thus registering ad views and clicks. Obviously, if you’re an advertiser, this isn’t desirable.

Just how widespread is spider activity? Consider that a spider can be any application that searches or indexes the Web, from the crawler that indexes pages for search engines like Google to the bot written by a computer science student in a sophomore Perl class. People write and use these applications for a variety of purposes and range of activities. Their use is more widespread than most nonprogrammers might think.

You might think that a master list of spiders to assist in filtering their activity is a good idea. It is. But it’s more complicated than putting together a list of IP addresses and updating it monthly. Why? The number of spiders on the Web at any given time isn’t finite and often isn’t tied to specific IP addresses.

Let’s use our sophomore computer science student as an example. Say he’s working on a spider to retrieve information from a variety of online news sites. He tests it on one computer lab PC on Wednesday and on another on Thursday. Both machines may end up on the master list. If their activity were filtered from traffic statistics using a database updated monthly, no activity would be registered from either of those two machines for the better part of a month (even if other students used the machine to surf the Web at other times).

It may seem like an obscure case, but when you consider how widespread spiders are we could be eliminating plenty of legitimate traffic for no good reason. Forget the geeky programmers for a second, and consider that some of the applications in use by many recreational Web surfers use spidering technology. Ever bookmark a page on Internet Explorer and check the dialogue box that says “Make available offline”? Guess what. When you do that, your computer runs a little application that spiders that page and pulls the content onto your hard drive. Spider use might be a bit more widespread than many people think.

Any database that is expected to track known spiders and crawlers must be updated much more frequently than once a month to be useful. The best way to do this is to observe behavior and filter spider activity at the server level. It’s relatively easy to write an application that would notice several page requests from the same IP address within a short period of time (e.g., 100 requests for different Web pages within a second) and recognize it as a spider. A human can’t read 100 pages of content in that amount of time. Should that spider’s IP address then be added to a master list and be filtered out of every server log in the future? Probably not. Who knows whether that IP address also hosts a Web browser used by a human being? The same spider might show up again in the future with an entirely different IP address. Best to filter activity that is clearly mechanical in nature and leave it at that.

The IAB’s idea may seem noble in concept, but it doesn’t makes sense in practice. I’m glad that it thought to address the issue. Spider and robot activity is not a subject the average online media planner gives much thought to, but it should be. The IAB deserves thanks for putting it on the agenda and reminding us all that it’s a big reason behind inaccurate measurement statistics.

Subscribe to get your daily business insights

More about:

Read the next article

Explore Tech Talks

Lucy

Lucy helps organizations leverage knowledge for in... View Tech Talk
TVSquared

TVSquared is the global leader in cross-platform T... View Tech Talk
Grata

Grata is a B2B search engine for discovering small... View Tech Talk

Whitepapers

US Mobile Streaming Behavior

Whitepaper | Mobile

US Mobile Streaming Behavior

Streaming has become a staple of US media-viewing habits. Streaming video, however, still comes with a variety of pesky frustrations that viewers are ...

View resource

Winning the Data Game: Digital Analytics Tactics for Media Groups

Whitepaper | Analyzing Customer Data

Winning the Data Game: Digital Analytics Tactics for Media Groups

Winning the Data Game: Digital Analytics Tactics f...

Data is the lifeblood of so many companies today. You need more of it, all of which at higher quality, and all the meanwhile being compliant with data...

View resource

Learning to win the talent war: how digital marketing can develop its people

Whitepaper | Digital Marketing

Learning to win the talent war: how digital marketing can develop its peopl...

Learning to win the talent war: how digital market...

This report documents the findings of a Fireside chat held by ClickZ in the first quarter of 2022. It provides expert insight on how companies can ret...

View resource

Engagement To Empowerment - Winning in Today's Experience Economy

Report | Digital Transformation

Engagement To Empowerment - Winning in Today's Experience Economy

Engagement To Empowerment - Winning in Today's Exp...

Customers decide fast, influenced by only 2.5 touchpoints – globally! Make sure your brand shines in those critical moments. Read More...

View resource

Why short form video is so popular

Video

Why short form video is so popular

1m Idris Nagri

Why short form video is so popular

The rise of short-form video content is not merely a fad; it is a transformative force that is here to stay. Embracing this trend and harnessing its p...

View article

Sora has upset the AI-image generation apple cart

2m Idris Nagri

Sora has upset the AI-image generation apple cart

With millions in funding secured, Sora signals where exponentially evolving AI aims to take visual media next. Ready to manifest visions at the speed ...

View article

The Power of Snapchat's 7/0 Optimisation Window

Marketing

The Power of Snapchat's 7/0 Optimisation Window

5m Fospha Team

The Power of Snapchat's 7/0 Optimisation Window

In the dynamic landscape of digital marketing, Snapchat's innovative 7/0 optimisation window is redefining the game, but how does this impact you as a...

View article

B2B advertising doesn’t need to be boring: why creativity is a key driver of profitability

Advertising & Promotion

B2B advertising doesn’t need to be boring: why creativity is a key driver o...

10m Tyrona Heath

B2B advertising doesn’t need to be boring: why cre...

Ahead of the Cannes Lions Awards, which includes a Creative B2B Lion, we caught up with Tyrona Heath, Director at the B2B Institute, to discuss the va...

View article

From TikTok to CTV: Why media diversification matters and how to achieve it

Media

From TikTok to CTV: Why media diversification matters and how to achieve it

1y Connie Del Bono

From TikTok to CTV: Why media diversification matt...

Household level targeting, reaching users beyond traditional methods such as display and pre-roll, and using CTV to complement client's linear TV camp...

View article

Video marketing playbook: Squeezing the most from your video content

Strategy

Video marketing playbook: Squeezing the most from your video content

2y Mark Pontrelli

Video marketing playbook: Squeezing the most from ...

It's not as easy as making a video and putting it out on all of your social media accounts. If you're doing that, stop. Instead, customize your conten...

View article

Upper funnel personalization: Kaiser Permanente's path to positive recall

Brand awareness

Upper funnel personalization: Kaiser Permanente's path to positive recall

2y Benjamin Broomfield

Upper funnel personalization: Kaiser Permanente's ...

Kaiser Permanente provides health care coverage, serving over 12.6 million members across nine states in the US. Kaiser Permanente is well recognized ...

View article

Google, LinkedIn, Facebook – which B2B ad platforms should be on every marketer’s radar?

ABM

Google, LinkedIn, Facebook – which B2B ad platforms should be on every mark...

2y Rachael King

Google, LinkedIn, Facebook – which B2B ad platform...

While building brand awareness is the primary goal, how can B2B marketers choose the best ad platforms to surface ads and maximize investment? Read Mo...

View article

Follow us

Spiders and Robots and Crawlers, Oh My!

Subscribe to get your daily business insights

Read the next article

Explore Tech Talks

Whitepapers

Whitepapers

US Mobile Streaming Behavior

US Mobile Streaming Behavior

Winning the Data Game: Digital Analytics Tactics for Media Groups

Winning the Data Game: Digital Analytics Tactics f...

Learning to win the talent war: how digital marketing can develop its peopl...

Learning to win the talent war: how digital market...

Engagement To Empowerment - Winning in Today's Experience Economy

Engagement To Empowerment - Winning in Today's Exp...

Related Articles

Why short form video is so popular

Why short form video is so popular

Sora has upset the AI-image generation apple cart

Sora has upset the AI-image generation apple cart

The Power of Snapchat's 7/0 Optimisation Window

The Power of Snapchat's 7/0 Optimisation Window

B2B advertising doesn’t need to be boring: why creativity is a key driver o...

B2B advertising doesn’t need to be boring: why cre...

From TikTok to CTV: Why media diversification matters and how to achieve it

From TikTok to CTV: Why media diversification matt...

Video marketing playbook: Squeezing the most from your video content

Video marketing playbook: Squeezing the most from ...

Upper funnel personalization: Kaiser Permanente's path to positive recall

Upper funnel personalization: Kaiser Permanente's ...

Google, LinkedIn, Facebook – which B2B ad platforms should be on every mark...

Google, LinkedIn, Facebook – which B2B ad platform...