Getting Out of Search

How to hide and remove Web pages from search engine spiders and avoid a meta misadventure.

Most Web site marketers are very interested in getting into the major search engines, namely Google, Yahoo, and MSN, currently in that order. But about when bizarre circumstances collide and a page or two or 2,000 accidentally slip into search engine indices? Pages you didn’t necessarily want found — not found by search engines spiders and most certainly not by the general viewing public.

How do you do eradicate something from search engines fast and forever? Do you fax Google, call Yahoo, or e-mail Steve Ballmer? You could try all the above, and I sincerely wish you the best of luck in each fruitless endeavor.

Most Web marketers run screaming to the Web dev or IT team offices and have the pages deleted from the site. The logic is simple: out of the site means you’re no longer going out of your mind. Then you triumphantly high-five the team, e-mail your boss, and say, “Whew! Mischief managed!”

Only it’s not.

Increasingly more Web marketers need to know how to get the heck out of search. Deleting the source code simply doesn’t work anymore; Google’s and MSN’s spiders are much too fast and too greedy to miss out on a Web page faux pas. And there’s the cache. When you include removing links to the offending pages, getting pages out of search engine results can be a Herculean challenge.

But there are steps you can take today to avoid a meta misadventure tomorrow.

Steps to Take

First, make certain your Web site is designed to return a 404 error (define) — and an appropriate error message for users — when a page no longer exists. If your site is designed to default to the home page when the user enters a URL that’s long gone, the search engines actually think the woebegone page still exists. Ergo, there’s no reason for the search engines to naturally allow the page to fall out of their indices. That dead page looks like its alive.

Don’t let zombie pages ruin a perfectly good Web site. Get your 404 errors in order, then take the next logical step: prove to the search engine spiders that you own and manage the site. Validate your existence by authenticating your Web site with Google and Yahoo (and soon MSN Search) Webmaster tools. Doing so can help facilitate the ready removal of URLs gone wild.

If you haven’t authenticated your site yet, do so for speedier removal of rogue pages. If you’ve already authenticated your Web site at Google Webmaster Central or Yahoo Site Explorer, you’re one step closer to being able to readily remove undesirable content from indices forever.

In Yahoo, for example, sign in to Site Explorer, enter the URL/path in the “Explore URL” box, and hit the “Delete” button next to each URL you want removed. Be warned: when a URL is removed in such a manner, Yahoo deletes the specific URL as well as all the subpaths listed under that URL. Delete with caution.

Yahoo does help you, however, because it shows all the subpath URLs to be deleted during the confirmation process. After that, you’ll see a “Pending Delete” status in the “Actions” information page so you know when the URL removal goes into effect. Usually, Yahoo takes care of a request within 48 hours; you can set your Site Explorer preferences to receive an e-mail notification when the deed is done (just in case you need to prove it to the boss).

Remove Web pages from Google in a similar manner after authenticating your site at Webmaster Central.

Preventive Maintenance

Of course, you can keep your content out of the search engines in the first place by using the robots.txt protocol (define). This method will keep new or undesirable content out of the indices and help remove old, stale content already there. It takes some time, though, for the spiders to refresh their content and reflect your content’s removal. How long it takes for content to fall out of the search indices is a direct reflection of your site’s crawl frequencies.

Remember, using the robots.txt protocol to disallow the spiders from accessing certain content within your site ensures legitimate spiders don’t crawl your excluded URLs, but it doesn’t keep the URLs themselves out of the indices. That’s because the search engine spiders tend to discover references to excluded URLs from other Web sources, such as inbound links.

Even though it’s not a particularly speedy way to get content out of search indices, right now using the robots.txt file is just about the only way to get unwanted URLs out of MSN Search. Unfortunately, it may take several weeks for the engine to complete an indexing update that reflects your changes.

MSN also recommends adding a >noindexmetaMSN Search Site Owner Support directly. However, it could take several weeks to get a response.

Wrap Up

If you take a little time now, you can avoid a potential data disaster in the future. Check out your 404 error process, get to know your robots.txt file, become familiar with the use of bot messages in your meta tags, and get your site authenticated in Google and Yahoo — and maybe someday soon MSN Search. That way when the inexplicable happens and you need to make private information truly private again, you can act quickly and efficiently without panicking. In this case, an ounce of prevention is truly worth a pound of cure — especially if it’s your site’s or brand’s online reputation that gets pounded by the blogosphere for a little lapse in contextual judgment.

Join us for SES Chicago on December 3-6 and training classes on December 7.

Want more search information? ClickZ SEM Archives contain all our search columns, organized by topic.

Subscribe to get your daily business insights

Whitepapers

US Mobile Streaming Behavior
Whitepaper | Mobile

US Mobile Streaming Behavior

5y

US Mobile Streaming Behavior

Streaming has become a staple of US media-viewing habits. Streaming video, however, still comes with a variety of pesky frustrations that viewers are ...

View resource
Winning the Data Game: Digital Analytics Tactics for Media Groups
Whitepaper | Analyzing Customer Data

Winning the Data Game: Digital Analytics Tactics for Media Groups

5y

Winning the Data Game: Digital Analytics Tactics f...

Data is the lifeblood of so many companies today. You need more of it, all of which at higher quality, and all the meanwhile being compliant with data...

View resource
Learning to win the talent war: how digital marketing can develop its people
Whitepaper | Digital Marketing

Learning to win the talent war: how digital marketing can develop its peopl...

2y

Learning to win the talent war: how digital market...

This report documents the findings of a Fireside chat held by ClickZ in the first quarter of 2022. It provides expert insight on how companies can ret...

View resource
Engagement To Empowerment - Winning in Today's Experience Economy
Report | Digital Transformation

Engagement To Empowerment - Winning in Today's Experience Economy

2m

Engagement To Empowerment - Winning in Today's Exp...

Customers decide fast, influenced by only 2.5 touchpoints – globally! Make sure your brand shines in those critical moments. Read More...

View resource