Diagnosing Search Issues from the Query Box, Part 2

  |  August 6, 2008   |  Comments

Additional ways to use query operators to gauge your site's presence.

In my last column, I discussed ways to use the site: and inurl: operators to detect indexing issues with your site. In this column, I discuss additional operators (such as cache:) and the ways in which they help you diagnose search engine issues and view your site the way engines do.

Additional Uses for the Site Operator

While sites such as CopyScape do a nice job of detecting duplicate content around the Web, I sometimes like the flexibility of finding duplicate content myself. My last column showed ways of detecting unwitting duplicate content on your own site (due to canonicalization issues). But what about your content being used on other sites?

To detect this, I recommend using the site: operator to filter out your site. Scan your site to find a string of text that should appear only on your site, then plug it into a query like this:

"this is the unique string of text I found on my site" -site:yourdomain.com

The quotation marks are required to search for the exact text string. The minus sign before the site: operator tells Google to exclude your domain from results. Consequently, the only results on the SERP (define) should be third-party sites using your copy.

Keep in mind that you need to be cautious before shouting "plagiarism" or "copyright infringement." These sites may be quoting yours in a fair-use context, or they might be directory sites that have pulled a description of your site prior to linking back to you. The best text strings to search for are longer, more obscure passages that really should be on your site only.

Searching for Specific URLs

Several years ago, you could simply enter a URL in a Google search box, and the resulting page would give you a short but helpful list of information about that particular URL, including links to related sites, the cached version of the page, links pointing to that page (although this feature is notoriously shallow in its coverage), pages that mention the specific URL text, and so on.

This sort of query was particularly helpful not so much for the links to additional information, but to quickly determine whether a specific engine had indexed a page. In short, a resulting page that said "Sorry, no information is available for the URL [URL]" was a quick way to spot an indexing problem, because that response was reserved for URLs that had either not yet indexed the page, or for pages that purposely avoided indexing (such as via the robots.txt exclusion or a robots "noarchive" meta tag).

Today, searching for a simple URL still works at MSN/Live and Yahoo. A couple years ago, however, Google changed its usage for URL queries. At Google, you must now precede a URL with the text info: to get indexing and informative link information. Make sure that you leave no space between colon and URL when performing this query.

In my opinion, this latter feature is of limited value, although it can represent a link-building opportunity, sometimes turning up less savvy sites that mention your URL as text but not as a link.

The Difference Between Cache and Text Cache

The cache: operator is a terrific tool that helps you determine whether engines see your page. Ironically, it's not an entirely accurate way to show you exactly what engines see. I can't emphasize this enough, so I'll rephrase: The cached version of your page is not necessarily the exact same version of the page that engines see, monitor, and consider in their algorithms.

To see the version of your page that engines see, you must take a technological step backwards and view the text cache of the page. The text cache strips away deceptive script code, rich media, and graphics, leaving only the skeletal remains of your page, the text and links.

Consider, for example, the cache version of www.usanetwork.com. You can see some rich media and graphics and a few links, but the main body section is empty.

Contrast that view with the text cache of the same page.

While the regular cached version of a page "includes" content such as rich media and JavaScript-spawned Flash files, don't assume Google notices or considers such content. In most cases, it's included only because Google has pulled the script and Flash code into its index -- not because it understands or weighs it.

To find the text cache version of a page at Google, you can add &strip=1 to the end of a cached URL, such as in the following:

You can also find a link to the text cache at the top of any cached page in Google. Look for the copy "Text-only version" at the top-right of a cached page, such as this cached version of the ClickZ home page.


Cached pages are available at all major engines, although only Google allows use of the actual cache: operator. For Yahoo and MSN Live, you can search for a URL then find a link to the cached version on the resulting page. Also, Google is the only one of the big three that differentiates and shows an actual text cache.

ClickZ Live Toronto Twitter Canada MD Kirstine Stewart to Keynote Toronto
ClickZ Live Toronto (May 14-16) is a new event addressing the rapidly changing landscape that digital marketers face. The agenda focuses on customer engagement and attaining maximum ROI through online marketing efforts across paid, owned & earned media. Register now and save!*
*Early Bird Rates expire April 17.


Erik Dafforn

Erik Dafforn is the executive vice president of Intrapromote LLC, an SEO firm headquartered in Cleveland, Ohio. Erik manages SEO campaigns for clients ranging from tiny to enormous and edits Intrapromote's blog, SEO Speedwagon. Prior to joining Intrapromote in 1999, Erik worked as a freelance writer and editor. He also worked in-house as a development editor for Macmillan and IDG Books. Erik has a Bachelor's degree in English from Wabash College. Follow Erik and Intrapromote on Twitter.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Search newsletter delivered to you. Subscribe today!



Featured White Papers

ion Interactive Marketing Apps for Landing Pages White Paper

Marketing Apps for Landing Pages White Paper
Marketing apps can elevate a formulaic landing page into a highly interactive user experience. Learn how to turn your static content into exciting marketing apps.

eMarketer: Redefining Mobile-Only Users: Millions Selectively Avoid the Desktop

Redefining 'Mobile-Only' Users: Millions Selectively Avoid the Desktop
A new breed of selective mobile-only consumers has emerged. What are the demos of these users and how and where can marketers reach them?


    • Contact Center Professional
      Contact Center Professional (TCC: The Contact Center) - Hunt ValleyLooking to join a workforce that prides themselves on being routine and keeping...
    • Recruitment and Team Building Ambassador
      Recruitment and Team Building Ambassador (Agora Inc.) - BaltimoreAgora, www.agora-inc.com, continues to expand! In order to meet the needs of our...
    • Design and Publishing Specialist
      Design and Publishing Specialist (Bonner and Partners) - BaltimoreIf you’re a hungry self-starter, creative, organized and have an extreme...