Using Yahoo Site Explorer for Site Intelligence

Recently, I’ve discussed how Google Webmaster Tools can help improve a site’s performance, providing an assortment of reports, including one that examines external links and another that analyzes robots.txt files.

Google isn’t the only game in town, however, for diagnosing, crawling, and indexing issues and helping you improve search engine performance. The world’s second most popular engine offers Yahoo Site Explorer (YSE); it includes several tools that are every bit as innovative as Google’s.

Authentication and Sitemap Submission

Most features discussed here rely on your site being “authenticated,” which is nearly identical to verifying your site in Google Webmaster Tools, either via a unique URL uploaded to your site’s root folder or via the addition of a special meta tag to the home page code.

While Yahoo adheres to the accepted protocol for XML sitemap feeds, as well as autodiscovery, you can also create a very simple text file for Yahoo that achieves the same result. With a file called urllist.txt, you can simply list all your site’s URLs in a plain text file, separated by a hard return.

An important consideration: Yahoo doesn’t often come back to re-fetch a sitemap feed unless you tell it to. Make sure you click the “Resubmit” button each time you update your feed or text file.

Duplicate Content Identification

One of YSE’s best features is how it breaks down your site, URL by URL, and shows back-link data, sortable by whether the link comes from your own domain or an external site. But did you know that you can use YSE to detect possible duplication/fragmentation issues that Yahoo sees?

Consider the home page for the MTV series “Pimp My Ride.” If you enter the URL into the YSE search box and click the “Explore URL” button, you’ll see, right below the “Results” head, that Yahoo counts a total of five pages under this URL. In addition to the main URL already shown, YSE lists the following variants (I’ve cut the path out of each URL below to save space):

  • /series.jhtml?extcmp=SEO_SSP_Y
  • /series.jhtml?
  • /series.jhtml?popThis=launchVideo(“vid=44478”)
  • /series.jhtml?popThis=launchVideo(“vid=37370”)

It’s tricky to detect this issue with YSE because of its tendency to show every page under a certain level when you input the URL. For example, if you tell it to explore, YSE might show hundreds of URLs, not because they’re duplicates, but because the program wants to show you every URL that falls within that folder. Be patient while sorting through the URLs, and learn to spot the difference between duplicate pages and child pages.

Standardization and Removal of URL Parameters

Suppose your pages have dozens of versions, each with different referral parameters. Or, even worse, suppose your site has session IDs that are crippling your indexing efforts. And to top it all off, your IT department can’t even promise it’ll look into the issue within the next six months.

Unlike Google’s URL removal tool, the use of which is about as calming as defusing a bomb on a runaway subway train (“Which wire do I cut?”), Yahoo’s dynamic URL tool is simple and benign.

With this tool, Yahoo has reversed years of search engines telling you how to fix dynamic URL issues (using steps that are frequently prohibitive, given staffing and CMS issues) and instead said, “Don’t worry; we’ll handle it on this end.” Google changed this old paradigm when it began allowing you to choose how your URLs would be displayed, either with or without the “www” prefix. But Yahoo has one-upped Google here.

Suppose your site appends a session ID parameter and value to each URL, such as: jsessionid=rx232d945kjhu8d734fl409gtdfoijr3t4t8430s. This causes problems, because the next time Yahoo visits that URL, the jsessionid value is different, yet the page’s content is the same. The result? Duplicate content, diffusing your page’s ability to score for its intended terms.

If your site requires session IDs due to your development software but you can’t easily configure the site to avoid delivering them to engines, go to YSE’s “Dynamic URLs Beta” section, enter the specific parameter name of your session ID (in this case, jsessionid), and select “Remove from URLs” from the “Action or Parameter” dropdown list.

In the MTV example, the solution would be to enter “extcmp,” “source,” and “popThis” into YSE’s “Parameter Names” fields and select “Remove from URLs” from the “Action or Parameter” dropdown list. Presto, those duplicate URLs will drop from the index. (Unfortunately, Yahoo currently allows you to address only three parameters, so my example works here, but just barely.)

If your site absolutely must show session ID values, Yahoo has a solution for this, too. Instead of choosing “Remove from URLs” from the “Action or Parameter” dropdown list, select “Use Default Value” and type something like “yahoo” (no quotes) in the “Action or Parameter” box. This will ensure that while the dynamic portion of your URL isn’t deleted, it will always remain the same value, which is more or less the same thing when it comes to avoiding duplicated content issues.


Spend some time getting to know Yahoo’s tools. Regardless of current search engine market share numbers, you do your site a disservice if you avoid spending at least minimal time to see what it says about your site. If you could tweak, enhance, and further optimize for a quarter to a half of all your Google traffic, you would, right? You can. It’s called “Yahoo traffic.”

Want more search information? ClickZ SEM Archives contain all our search columns, organized by topic.

Related reading

Brand Top Level Domains