Linking Mistakes to Avoid (Part 2): Removing Orphaned URLs

  |  October 26, 2001   |  Comments

You've got an active Web site. Files go up, come down, get archived, and are renamed. Meanwhile, all those search engines think your old links are current. Here's how to solve the problem, boost your traffic, and stay on the search engine radar.

Even now as you read this, you probably have orphaned URLs you don't know about, collecting dust in a forgotten pile at the bottom of the search engine indices. It happens to the best of us. Even I, self-proclaimed Link Mensch, was humbled recently to discover several old URLs in AltaVista's database that no longer physically exist on my Web server. Some expert I am.

During the life span of any Web site, you create, update, delete, and remove URLs on a regular or semiregular basis. New files go up, old ones come down, they may get renamed or archived. Sometimes, entire Web sites with thousands of pages get rehosted on new servers using new content management tools (as ClickZ did recently). I've even seen cases in which every URL on a site changed at once.

While you have been diligently running your Web site, adding, deleting, moving, and archiving files and URLs, search engine crawlers have been carousing through the Web. They have been visiting your server, on a hit-and-run basis, since the moment your site went live. Maybe a crawler came across one of your URLs as it scanned a newsgroup post at Deja News a couple of years ago. Maybe a newsletter wrote about your site and archived that edition, just as a crawler wandered by and stumbled onto your URL. There are countless ways a crawler could have found your URLs without ever going near your server. In fact, most URLs in any search engine's database were found and followed from source other than your own site.

What Matters Most

Of all the URLs your site has ever had, how many of them are still in the database of any given search engine?

Search engines have no idea if the URLs they have recorded and indexed are still in existence at any given moment. You may have updated your site and removed links and URLs, but search engines still think they exist. Search results are nothing but placeholders for the actual page on its server. Search results are a list of links.

Every URL from your site that no longer exists but a search engine thinks does exist is like a lump of coal waiting to be turned into a diamond. With search engines charging for indexing of URLs, it's even more important to revive dead links before the engines find out they are dead and purge them. A purged URL is lost... forever.

Nearly every marketer tries get its site fully indexed by the search engines. Most site owners wish they could get more of their sites' pages indexed. If you have old links showing up in search results, count yourself lucky. And get busy making those dead links live again.

Finding and Fixing Them

Here's one way to find out how many URLs from your site a search engine has indexed. Go to AltaVista and in the search box type "host:your domain" (replacing your domain with whatever your domain is, such as "host:pbs.org").

Look at the results. What you see is every file that AltaVista has in its index and thinks is active. Peruse the list. Put your cursor over the clickable link -- but don't click. Look at the bottom of your browser to see the filename of the URL. Are the file names you see still in existence? Probably not. If those names no longer exist on your site, create a new page with exactly the same filename as the one AltaVista thinks is still around, and get it on your server ASAP.

Let's say you once had a site-map page named site-map.html, and you see that file among the search results. Six months ago, you changed that file to map.html, and removed the site-map.html file from your server. The search engine doesn't know you removed the URL and still has a record of the old page and what was on it.

You can also examine your own server logs to find all page requests that result in a 404/file not found server request. This works even if you use custom 404 pages. This is how I discovered on my site there was a file that had been returning 404 error messages about 30 times a day, or almost 1,000 times a month. I created a file that had the same name and content as the one that no longer existed. Bingo. I recaptured every bit of that lost traffic. You can do the same. Start with your server logs and then try some test searches.

If you want to find out what URLs the engines have indexed from your site, Danny Sullivan's Search Engine Watch site has a section just for this.

Until next time, I remain

Eric Ward,
Link Mensch

ClickZ Live Chicago Learn Digital Marketing Insights From Leading Brands!
ClickZ Live Chicago (Nov 3-6) will deliver over 50 sessions across 10 individual tracks, including Data-Driven Marketing, Social, Mobile, Display, Search and Email. Check out the full agenda, or register and attend one of the best ClickZ events yet!

ABOUT THE AUTHOR

Eric Ward

Eric Ward founded the Web's first link building and content publicity service, called NetPOST. Today, Eric provides strategic linking consulting, link building services, training, and consulting via EricWard.com. The publisher of the strategic linking advice newsletter LinkMoses Private, Eric is a co-developer of AdGooroo's Link Insight.

Eric uses his experience and unique understanding of web's vast linking patterns to teach companies his link building techniques. He has developed content linking strategies for PBS.org, WarnerBros, The Discovery Channel, National Geographic, About.com, TVGuide.com, and Weather.com. Eric won the 1995 Tenagra Award for Internet Marketing Excellence, and in 2007 was profiled in the book Online Marketing Heroes.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Search newsletter delivered to you. Subscribe today!

COMMENTS

UPCOMING EVENTS

UPCOMING TRAINING

Featured White Papers

Google My Business Listings Demystified

Google My Business Listings Demystified
To help brands control how they appear online, Google has developed a new offering: Google My Business Locations. This whitepaper helps marketers understand how to use this powerful new tool.

5 Ways to Personalize Beyond the Subject Line

5 Ways to Personalize Beyond the Subject Line
82 percent of shoppers say they would buy more items from a brand if the emails they sent were more personalized. This white paper offer five tactics that will personalize your email beyond the subject line and drive real business growth.

WEBINARS

Resources

Jobs

    • Executive Assistant
      Executive Assistant (Agora Inc. ) - BaltimoreAgora Inc., an international publishing company, located in the Mt. Vernon district of Baltimore, MD...
    • Paid Search Specialist
      Paid Search Specialist (Boathouse, Inc.) - Waltham  Boathouse is looking for a Paid Search Specialist to work as a part of the Digital Acquisition...
    • Paid Search / Search Engine Marketing (SEM, PPC) Specialist
      Paid Search / Search Engine Marketing (SEM, PPC) Specialist (HeBS Digital) - New YorkJOB TITLE:     Paid Search / Search Engine Marketing...