Using Yahoo Site Explorer for Site Intelligence

  |  December 26, 2007   |  Comments

Google Webmaster Tools gets most of the glory, but Yahoo gives Webmasters some great tools too.

Recently, I've discussed how Google Webmaster Tools can help improve a site's performance, providing an assortment of reports, including one that examines external links and another that analyzes robots.txt files.

Google isn't the only game in town, however, for diagnosing, crawling, and indexing issues and helping you improve search engine performance. The world's second most popular engine offers Yahoo Site Explorer (YSE); it includes several tools that are every bit as innovative as Google's.

Authentication and Sitemap Submission

Most features discussed here rely on your site being "authenticated," which is nearly identical to verifying your site in Google Webmaster Tools, either via a unique URL uploaded to your site's root folder or via the addition of a special meta tag to the home page code.

While Yahoo adheres to the accepted protocol for XML sitemap feeds, as well as autodiscovery, you can also create a very simple text file for Yahoo that achieves the same result. With a file called urllist.txt, you can simply list all your site's URLs in a plain text file, separated by a hard return.

An important consideration: Yahoo doesn't often come back to re-fetch a sitemap feed unless you tell it to. Make sure you click the "Resubmit" button each time you update your feed or text file.

Duplicate Content Identification

One of YSE's best features is how it breaks down your site, URL by URL, and shows back-link data, sortable by whether the link comes from your own domain or an external site. But did you know that you can use YSE to detect possible duplication/fragmentation issues that Yahoo sees?

Consider the home page for the MTV series "Pimp My Ride." If you enter the URL into the YSE search box and click the "Explore URL" button, you'll see, right below the "Results" head, that Yahoo counts a total of five pages under this URL. In addition to the main URL already shown, YSE lists the following variants (I've cut the path out of each URL below to save space):

  • /series.jhtml?extcmp=SEO_SSP_Y

  • /series.jhtml?

  • /series.jhtml?popThis=launchVideo("vid=44478")

  • /series.jhtml?popThis=launchVideo("vid=37370")

It's tricky to detect this issue with YSE because of its tendency to show every page under a certain level when you input the URL. For example, if you tell it to explore, YSE might show hundreds of URLs, not because they're duplicates, but because the program wants to show you every URL that falls within that folder. Be patient while sorting through the URLs, and learn to spot the difference between duplicate pages and child pages.

Standardization and Removal of URL Parameters

Suppose your pages have dozens of versions, each with different referral parameters. Or, even worse, suppose your site has session IDs that are crippling your indexing efforts. And to top it all off, your IT department can't even promise it'll look into the issue within the next six months.

Unlike Google's URL removal tool, the use of which is about as calming as defusing a bomb on a runaway subway train ("Which wire do I cut?"), Yahoo's dynamic URL tool is simple and benign.

With this tool, Yahoo has reversed years of search engines telling you how to fix dynamic URL issues (using steps that are frequently prohibitive, given staffing and CMS issues) and instead said, "Don't worry; we'll handle it on this end." Google changed this old paradigm when it began allowing you to choose how your URLs would be displayed, either with or without the "www" prefix. But Yahoo has one-upped Google here.

Suppose your site appends a session ID parameter and value to each URL, such as: jsessionid=rx232d945kjhu8d734fl409gtdfoijr3t4t8430s. This causes problems, because the next time Yahoo visits that URL, the jsessionid value is different, yet the page's content is the same. The result? Duplicate content, diffusing your page's ability to score for its intended terms.

If your site requires session IDs due to your development software but you can't easily configure the site to avoid delivering them to engines, go to YSE's "Dynamic URLs Beta" section, enter the specific parameter name of your session ID (in this case, jsessionid), and select "Remove from URLs" from the "Action or Parameter" dropdown list.

In the MTV example, the solution would be to enter "extcmp," "source," and "popThis" into YSE's "Parameter Names" fields and select "Remove from URLs" from the "Action or Parameter" dropdown list. Presto, those duplicate URLs will drop from the index. (Unfortunately, Yahoo currently allows you to address only three parameters, so my example works here, but just barely.)

If your site absolutely must show session ID values, Yahoo has a solution for this, too. Instead of choosing "Remove from URLs" from the "Action or Parameter" dropdown list, select "Use Default Value" and type something like "yahoo" (no quotes) in the "Action or Parameter" box. This will ensure that while the dynamic portion of your URL isn't deleted, it will always remain the same value, which is more or less the same thing when it comes to avoiding duplicated content issues.


Spend some time getting to know Yahoo's tools. Regardless of current search engine market share numbers, you do your site a disservice if you avoid spending at least minimal time to see what it says about your site. If you could tweak, enhance, and further optimize for a quarter to a half of all your Google traffic, you would, right? You can. It's called "Yahoo traffic."

Want more search information? ClickZ SEM Archives contain all our search columns, organized by topic.


Erik Dafforn

Erik Dafforn is the executive vice president of Intrapromote LLC, an SEO firm headquartered in Cleveland, Ohio. Erik manages SEO campaigns for clients ranging from tiny to enormous and edits Intrapromote's blog, SEO Speedwagon. Prior to joining Intrapromote in 1999, Erik worked as a freelance writer and editor. He also worked in-house as a development editor for Macmillan and IDG Books. Erik has a Bachelor's degree in English from Wabash College. Follow Erik and Intrapromote on Twitter.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Search newsletter delivered to you. Subscribe today!



Featured White Papers

2015 Holiday Email Guide

2015 Holiday Email Guide
The holidays are just around the corner. Download this whitepaper to find out how to create successful holiday email campaigns that drive engagement and revenue.

Three Ways to Make Your Big Data More Valuable

Three Ways to Make Your Big Data More Valuable
Big data holds a lot of promise for marketers, but are marketers ready to make the most of it to drive better business decisions and improve ROI? This study looks at the hidden challenges modern marketers face when trying to put big data to use.