Robots Exclusion Confusion, Part 2

How to use the wildcard character, avoid sending mixed signals to robots, and more. Second in a two-part series.

Author

Erik Dafforn

Date published March 18, 2009 Categories

In my last column, I discussed several techniques you can use in your robots.txt file, including broadcasting your site maps’ locations, testing your robots file to make sure it will work they way you need it to, and allowing URLs that look like directories while disallowing the content within the directory. Today, I finish up the topic by discussing how to use the wildcard character, how to direct the actions of multiple robots, and how to avoid sending mixed signals to robots.

The Asterisk Wildcard

You’ve seen the asterisk in the User-Agent: line and know it means “all robots.” In Allow and Disallow lines it works similarly, symbolizing any character group of contiguous characters.

Combining the $ and * symbols can be particularly powerful. Suppose you recently migrated from .php to .aspx, and you find yourself with some stray .php files clogging the indices. Here’ s a sample directive followed by an explanation of its components:

Disallow: /*php$

/ means that the string we’re disallowing starts at the first slash following the top-level domain (.com, .net, and so on). Begin all your allow and disallow lines this way.
* stands for any number of letters, numbers, or other characters (including slashes, which indicate additional directories).
php means that following the random character or characters symbolized by the asterisk, the URL string will contain the characters “php.”
$ further narrows the preceding step, dictating that only URLs ending in “php” will be affected by this particular disallow line.

A URL like /tags/php/oct-2008/, then, isn’t affected by the disallow line, because it doesn’t end with “php”.

But can the asterisk mean no character? Consider this line:

Disallow: /*products/default.aspx

We know it will exclude a URL such as /2004-products/default.aspx. But what will happen to the URL /products/default.aspx? The URL will also be excluded, because in addition to the asterisk symbolizing any character or characters, it may also symbolize no characters at all.

Allow Trumps Disallow

If you send mixed messages to robots within the same robots.txt file, which message do they honor? In other words, suppose your robots.txt file lists the following code:

The second and fourth lines contradict one another. Line two disallows a file that line 4 allows. So which line will engines respond to? Google will allow the file, based on both real-world experience as well as the Check Robots.txt tool within Webmaster Tools. MSN/Live and Yahoo should respond similarly because they both adhere to the advanced Robots Exclusion Protocol, although I recommend you verify this. It makes no difference whether the allow or disallow line comes first in the file, allow trumps disallow.

Rock, Paper, Scissors

Who wins when robots meta tags, robots.txt files, and XML files contradict each other about inclusion? Here are some guidelines:

If a URL is disallowed by your robots.txt file but it’s allowed by a robots meta tag or included in an XML site map, the robots.txt file will take precedence.
If your robots.txt file allows a URL but it has a robots “noindex” meta tag, the meta tag will take precedence.

Directing Specific Robots

It’s possible to give specific allow and disallow instructions to specific robots. Remember, once you’ve addressed a specific robot, that robot is no longer bound to global directives. For example, suppose your robots.txt file has the following code:

I’ve seen the same mistake many times: people think that the preceding code lines tell Google to disallow the /webmail/, /pdf/, and /files/printer-friendly/ directories. This isn’t the case. Because the code has a section dedicated to Googlebot, the bot will adhere only to the specific directions given to it within its specific section. Consequently, Google will crawl /webmail/ and /pdf/ since it hasn’t been specifically instructed not to. To get Google to exclude all three directories, you would need the following code:

Conclusion

The findings in this column are based on a combination of real-world observation and testing and a lot of time experimenting with Google’s robots.txt tools in Webmaster Tools. I hope it’s a helpful resource in your quest to deal with duplication, data privacy, and overall site maintenance.

Join ClickZ at Search Engine Strategies New York on March 25. More than one dozen online marketing professionals will discuss the latest issues in the larger universe of digital marketing.

Subscribe to get your daily business insights

More about:

Read the next article

Engagement To Empowerment - Winning in Today's Experience Economy

Report | Digital Transformation

Engagement To Empowerment - Winning in Today's Experience Economy

Engagement To Empowerment - Winning in Today's Exp...

Customers decide fast, influenced by only 2.5 touchpoints – globally! Make sure your brand shines in those critical moments. Read More...

View resource

Announcement Alert from Lee Arthur

Weekly briefing | Digital Transformation

Announcement Alert from Lee Arthur

Announcement Alert!! Read More

View resource

The 2023 B2B Superpowers Index

Whitepaper | Digital Transformation

The 2023 B2B Superpowers Index

The Merkle B2B 2023 Superpowers Index outlines what drives competitive advantage within the business culture and subcultures that are critical to succ...

View resource

Impact of SEO and Content Marketing

Whitepaper | Digital Transformation

Impact of SEO and Content Marketing

Making forecasts and predictions in such a rapidly changing marketing ecosystem is a challenge. Yet, as concerns grow around a looming recession and b...

View resource

The Brands AI Agents Will Recommend Are Already Pulling Ahead

eCommerce Marketing

The Brands AI Agents Will Recommend Are Already Pulling Ahead

4m ClickZ

The Brands AI Agents Will Recommend Are Already Pu...

For two decades, marketing invested in being findable. But as shopping moves inside AI conversations, the question facing senior marketers has changed...

View article

Zero-Click Search: What Marketers Get Wrong (and What to Do Next)

4m ClickZ

Zero-Click Search: What Marketers Get Wrong (and W...

Charlie Clark, Founder of Minty Digital explains why SEO fundamentals still underpin both traditional and AI search visibility, even as zero-click beh...

View article

SEO in a Zero-Click World

AI in marketing

SEO in a Zero-Click World

1y ClickZ

SEO in a Zero-Click World

The Rise of AI-Generated Answers and the Zero-Click Dilemma For years, SEO has been a game of attracting clicks. But as Google’s AI Overviews an...

View article

How long does SEO take for eCom brands?

SEO

How long does SEO take for eCom brands?

2y ClickZ

How long does SEO take for eCom brands?

One of the most common questions asked in SEO is “How long will it take to see results”. A fair question and one to which the answer will more often t...

View article

Impact of SEO and Content Marketing

Digital Marketing

Impact of SEO and Content Marketing

2y ClickZ

Impact of SEO and Content Marketing

Introduction Making forecasts and predictions in such a rapidly changing marketing ecosystem is a challenge. Yet, as concerns grow around a looming re...

View article

How Google's Search Generative Experience (SGE) is Reshaping SEO

SEO

How Google's Search Generative Experience (SGE) is Reshaping SEO

2y ClickZ News Staff

How Google's Search Generative Experience (SGE) is...

As the search giant delves deeper into the realm of AI, it is clear that SGE will have a profound impact on the future of SEO. Read More...

View article

The UK tops study as Europe's best country to work in SEO

2y ClickZ

The UK tops study as Europe's best country to work...

The UK tops the study of the best European countries to work in SEO, with workers in the industry earning an average annual salary of €49,647. Read Mo...

View article

Mastering voice search optimization: Talk like a local, rank like a pro

Search Marketing

Mastering voice search optimization: Talk like a local, rank like a pro

2y ClickZ News Staff

Mastering voice search optimization: Talk like a l...

Forget typing, voice search is booming. Businesses need Voice Search Optimization (VSO) to rank for conversational queries and secure top spots in sea...

View article

Follow us

Strategy

Innovation

Insights

Stats & Tools

Robots Exclusion Confusion, Part 2

Leave a Reply Cancel reply

Subscribe to get your daily business insights

Read the next article

Engagement To Empowerment - Winning in Today's Experience Economy

Engagement To Empowerment - Winning in Today's Exp...

Announcement Alert from Lee Arthur

Announcement Alert from Lee Arthur

The 2023 B2B Superpowers Index

The 2023 B2B Superpowers Index

Impact of SEO and Content Marketing

Impact of SEO and Content Marketing

Related Articles

The Brands AI Agents Will Recommend Are Already Pulling Ahead

The Brands AI Agents Will Recommend Are Already Pu...

Zero-Click Search: What Marketers Get Wrong (and What to Do Next)

Zero-Click Search: What Marketers Get Wrong (and W...

SEO in a Zero-Click World

SEO in a Zero-Click World

How long does SEO take for eCom brands?

How long does SEO take for eCom brands?

Impact of SEO and Content Marketing

Impact of SEO and Content Marketing

How Google's Search Generative Experience (SGE) is Reshaping SEO

How Google's Search Generative Experience (SGE) is...

The UK tops study as Europe's best country to work in SEO

The UK tops study as Europe's best country to work...

Mastering voice search optimization: Talk like a local, rank like a pro

Mastering voice search optimization: Talk like a l...