Ways to find out whether engines see the same site you do.
Webmaster tools and third-party applications are generally a great way to get accurate search engine diagnostics, but sometimes simpler is better. Today, some quick diagnostic advice to help you efficiently pinpoint issues that may affect your organic search performance. You don't need any expensive tools, and you don't need to download any software. All you need is an Internet connection and a browser.
The Versatile "Site:" Operator
Indexing is one of the first big search engine obstacles to overcome. After all, your indexed pages may or may not rank and drive search traffic, but your unindexed pages absolutely won't rank. Most people know that to view the number of site pages indexed by the engines you use this query in the search box: site:www.yourdomain.com, such as site:www.clickz.com.
But here's an important caveat about this powerful operator. If your site contains 5,500 pages and a site: query shows that Google or Yahoo has indexed 5,500 pages, you can be happy, but only for a few seconds. The 5,500 pages on your site and the 5,500 pages indexed might not be the same 5,500 pages.
We recently helped a site clean up some canonical issues in which four different versions of the home page had been crawled and indexed. This won't break a site, but it's a housekeeping issue that tops my list of items that can nickel-and-dime your site's authority away.
It's a good idea to use the site: operator both with and without the "www" prefix in the query string. Typically, this will tell you the extent to which engines have indexed non-www pages, which can tip you off to canonical issues on your site. If site:www.yourdomain.com and site:yourdomain.com show significantly different numbers, you know that some non-www links are probably leaking out and resulting in several non-www pages being crawled.
This query: site:yourdomain.com -inurl:www will show you the subset of indexed pages on your site that don't have "www" in their URLs. If you have multiple subdomains on your site, this becomes slightly trickier to diagnose. For example, if you have subdomains called "www," "blog," and "clients," you'll need to add those subdomains to the preceding query to find canonical issues: site:yourdomain.com -inurl:www -inurl:blog -inurl:clients.
Currently, both the site: and inurl: operators work in Google, Yahoo, and Live Search. Depending on your exact query at Yahoo, the engine might redirect you to Yahoo Site Explorer, but it will still show you the answer to your query.
Narrowing the Scope of the Site: Operator
If the site: operator shows that 9 million of your site's pages are indexed, Google will give you the 9 million figure but will show you only 1,000 specific URLs. To see deeper into specific parts of your site, you need to tell engines exactly which part of the site you're trying to examine. To do this, you have a few options.
If the pages are all within a specific folder on the site, you can simply add the folder name to the site: operator. For example, if you have a section on your site called /services/ with a series of pages within that folder, this query will show how many of those pages are indexed: site:www.yourdomain.com/services/.
Using this method is more reliable than a query like site:www.yourdomain.com inurl:services, because the latter will show URLs in which "services" appears anywhere in the URL. Contrast the results of these two queries to see the major difference: site:www.amazon.com/review and site:www.amazon.com inurl:review.
A reader recently contacted me, saying that her ability to use the cache: operator at Google was frequently thwarted by a 403 error, which stated that her query looked either automated or spy/malware-related.
Know that if this happens to you, it should reset itself within a few hours -- typically no more than 12. But if it remains constant, someone on your IP address (i.e., at your company) might be running a tool that sends automated queries to Google. If this is the case, get them to stop (or outsource the reports), since they're hindering your ability to get actual, helpful data.
The site: operator is one I use dozens of times each day. In part two, I explain how to use additional powerful operators to diagnose search engine issues, including variations of the cache: operator, which can show you your pages exactly the way engines view them.
Today's column originally ran on July 23, 2008.
Does your company or client offer one of the best online marketing products or services? Nominate it now for one of the 2009 ClickZ Marketing Excellence Awards!
On the heels of a fantastic event in New York City, ClickZ Live is taking the fun and learning to Toronto, June 23-25. With over 15 years' experience delivering industry-leading events, ClickZ Live offers an action-packed, educationally-focused agenda covering all aspects of digital marketing. Register today!
Erik Dafforn is the executive vice president of Intrapromote LLC, an SEO firm headquartered in Cleveland, Ohio. Erik manages SEO campaigns for clients ranging from tiny to enormous and edits Intrapromote's blog, SEO Speedwagon. Prior to joining Intrapromote in 1999, Erik worked as a freelance writer and editor. He also worked in-house as a development editor for Macmillan and IDG Books. Erik has a Bachelor's degree in English from Wabash College. Follow Erik and Intrapromote on Twitter.
Hong Kong, May 5-6, 2015
Gartner Magic Quadrant for Digital Commerce
This Magic Quadrant examines leading digital commerce platforms that enable organizations to build digital commerce sites. These commerce platforms facilitate purchasing transactions over the Web, and support the creation and continuing development of an online relationship with a consumer.
Paid Search in the Mobile Era
Google reports that paid search ads are currently driving 40+ million calls per month. Cost per click is increasing, paid search budgets are growing, and mobile continues to dominate. It's time to revamp old search strategies, reimagine stale best practices, and add new layers data to your analytics.
May 6, 2015
12:00pm ET/9:00am PT