Diagnosing Search Issues From the Query Box

Webmaster tools and third-party applications are generally a great way to get accurate search engine diagnostics, but sometimes simpler is better. Today, some quick diagnostic advice to help you efficiently pinpoint issues that may affect your organic search performance. You don’t need any expensive tools, and you don’t need to download any software. All you need is an Internet connection and a browser.

The Versatile “Site:” Operator

Indexing is one of the first big search engine obstacles to overcome. After all, your indexed pages may or may not rank and drive search traffic, but your unindexed pages absolutely won’t rank. Most people know that to view the number of site pages indexed by the engines you use this query in the search box: site:www.yourdomain.com, such as site:www.clickz.com.

But here’s an important caveat about this powerful operator. If your site contains 5,500 pages and a site: query shows that Google or Yahoo has indexed 5,500 pages, you can be happy, but only for a few seconds. The 5,500 pages on your site and the 5,500 pages indexed might not be the same 5,500 pages.

We recently helped a site clean up some canonical issues in which four different versions of the home page had been crawled and indexed. This won’t break a site, but it’s a housekeeping issue that tops my list of items that can nickel-and-dime your site’s authority away.

It’s a good idea to use the site: operator both with and without the “www” prefix in the query string. Typically, this will tell you the extent to which engines have indexed non-www pages, which can tip you off to canonical issues on your site. If site:www.yourdomain.com and site:yourdomain.com show significantly different numbers, you know that some non-www links are probably leaking out and resulting in several non-www pages being crawled.

This query: site:yourdomain.com -inurl:www will show you the subset of indexed pages on your site that don’t have “www” in their URLs. If you have multiple subdomains on your site, this becomes slightly trickier to diagnose. For example, if you have subdomains called “www,” “blog,” and “clients,” you’ll need to add those subdomains to the preceding query to find canonical issues: site:yourdomain.com -inurl:www -inurl:blog -inurl:clients.

Currently, both the site: and inurl: operators work in Google, Yahoo, and Live Search. Depending on your exact query at Yahoo, the engine might redirect you to Yahoo Site Explorer, but it will still show you the answer to your query.

Narrowing the Scope of the Site: Operator

If the site: operator shows that 9 million of your site’s pages are indexed, Google will give you the 9 million figure but will show you only 1,000 specific URLs. To see deeper into specific parts of your site, you need to tell engines exactly which part of the site you’re trying to examine. To do this, you have a few options.

If the pages are all within a specific folder on the site, you can simply add the folder name to the site: operator. For example, if you have a section on your site called /services/ with a series of pages within that folder, this query will show how many of those pages are indexed: site:www.yourdomain.com/services/.

Using this method is more reliable than a query like site:www.yourdomain.com inurl:services, because the latter will show URLs in which “services” appears anywhere in the URL. Contrast the results of these two queries to see the major difference: site:www.amazon.com/review and site:www.amazon.com inurl:review.


A reader recently contacted me, saying that her ability to use the cache: operator at Google was frequently thwarted by a 403 error, which stated that her query looked either automated or spy/malware-related.

Know that if this happens to you, it should reset itself within a few hours — typically no more than 12. But if it remains constant, someone on your IP address (i.e., at your company) might be running a tool that sends automated queries to Google. If this is the case, get them to stop (or outsource the reports), since they’re hindering your ability to get actual, helpful data.

The site: operator is one I use dozens of times each day. In part two, I explain how to use additional powerful operators to diagnose search engine issues, including variations of the cache: operator, which can show you your pages exactly the way engines view them.

Today’s column originally ran on July 23, 2008.

