Diagnosing Search Issues from the Query Box, Part 2

  |  August 6, 2008   |  Comments

Additional ways to use query operators to gauge your site's presence.

In my last column, I discussed ways to use the site: and inurl: operators to detect indexing issues with your site. In this column, I discuss additional operators (such as cache:) and the ways in which they help you diagnose search engine issues and view your site the way engines do.

Additional Uses for the Site Operator

While sites such as CopyScape do a nice job of detecting duplicate content around the Web, I sometimes like the flexibility of finding duplicate content myself. My last column showed ways of detecting unwitting duplicate content on your own site (due to canonicalization issues). But what about your content being used on other sites?

To detect this, I recommend using the site: operator to filter out your site. Scan your site to find a string of text that should appear only on your site, then plug it into a query like this:

"this is the unique string of text I found on my site" -site:yourdomain.com

The quotation marks are required to search for the exact text string. The minus sign before the site: operator tells Google to exclude your domain from results. Consequently, the only results on the SERP (define) should be third-party sites using your copy.

Keep in mind that you need to be cautious before shouting "plagiarism" or "copyright infringement." These sites may be quoting yours in a fair-use context, or they might be directory sites that have pulled a description of your site prior to linking back to you. The best text strings to search for are longer, more obscure passages that really should be on your site only.

Searching for Specific URLs

Several years ago, you could simply enter a URL in a Google search box, and the resulting page would give you a short but helpful list of information about that particular URL, including links to related sites, the cached version of the page, links pointing to that page (although this feature is notoriously shallow in its coverage), pages that mention the specific URL text, and so on.

This sort of query was particularly helpful not so much for the links to additional information, but to quickly determine whether a specific engine had indexed a page. In short, a resulting page that said "Sorry, no information is available for the URL [URL]" was a quick way to spot an indexing problem, because that response was reserved for URLs that had either not yet indexed the page, or for pages that purposely avoided indexing (such as via the robots.txt exclusion or a robots "noarchive" meta tag).

Today, searching for a simple URL still works at MSN/Live and Yahoo. A couple years ago, however, Google changed its usage for URL queries. At Google, you must now precede a URL with the text info: to get indexing and informative link information. Make sure that you leave no space between colon and URL when performing this query.

In my opinion, this latter feature is of limited value, although it can represent a link-building opportunity, sometimes turning up less savvy sites that mention your URL as text but not as a link.

The Difference Between Cache and Text Cache

The cache: operator is a terrific tool that helps you determine whether engines see your page. Ironically, it's not an entirely accurate way to show you exactly what engines see. I can't emphasize this enough, so I'll rephrase: The cached version of your page is not necessarily the exact same version of the page that engines see, monitor, and consider in their algorithms.

To see the version of your page that engines see, you must take a technological step backwards and view the text cache of the page. The text cache strips away deceptive script code, rich media, and graphics, leaving only the skeletal remains of your page, the text and links.

Consider, for example, the cache version of www.usanetwork.com. You can see some rich media and graphics and a few links, but the main body section is empty.

Contrast that view with the text cache of the same page.

While the regular cached version of a page "includes" content such as rich media and JavaScript-spawned Flash files, don't assume Google notices or considers such content. In most cases, it's included only because Google has pulled the script and Flash code into its index -- not because it understands or weighs it.

To find the text cache version of a page at Google, you can add &strip=1 to the end of a cached URL, such as in the following:

http://64.233.167.104/search?q=cache:www.usanetwork.com&pws=0
http://64.233.167.104/search?q=cache:www.usanetwork.com&pws=0&strip=1

You can also find a link to the text cache at the top of any cached page in Google. Look for the copy "Text-only version" at the top-right of a cached page, such as this cached version of the ClickZ home page.

Conclusion

Cached pages are available at all major engines, although only Google allows use of the actual cache: operator. For Yahoo and MSN Live, you can search for a URL then find a link to the cached version on the resulting page. Also, Google is the only one of the big three that differentiates and shows an actual text cache.

ClickZ Live New York What's New for 2015?
You spoke, we listened! ClickZ Live New York (Mar 30-Apr 1) is back with a brand new streamlined agenda. Don't miss the latest digital marketing tips, tricks and tools that will make you re-think your strategy and revolutionize your marketing campaigns. Super Saver Rates are available now. Register today!

ABOUT THE AUTHOR

Erik Dafforn

Erik Dafforn is the executive vice president of Intrapromote LLC, an SEO firm headquartered in Cleveland, Ohio. Erik manages SEO campaigns for clients ranging from tiny to enormous and edits Intrapromote's blog, SEO Speedwagon. Prior to joining Intrapromote in 1999, Erik worked as a freelance writer and editor. He also worked in-house as a development editor for Macmillan and IDG Books. Erik has a Bachelor's degree in English from Wabash College. Follow Erik and Intrapromote on Twitter.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Search newsletter delivered to you. Subscribe today!

COMMENTS

UPCOMING EVENTS

UPCOMING TRAINING

Featured White Papers

Google My Business Listings Demystified

Google My Business Listings Demystified
To help brands control how they appear online, Google has developed a new offering: Google My Business Locations. This whitepaper helps marketers understand how to use this powerful new tool.

5 Ways to Personalize Beyond the Subject Line

5 Ways to Personalize Beyond the Subject Line
82 percent of shoppers say they would buy more items from a brand if the emails they sent were more personalized. This white paper offer five tactics that will personalize your email beyond the subject line and drive real business growth.

WEBINARS

    Information currently unavailable

Jobs

    • Creative Project Manager
      Creative Project Manager (Agora Inc. ) - BaltimoreThe Creative Project Manager of PubSVS will work directly with the IRIS team and will be responsible...
    • Digital Marketing Associate
      Digital Marketing Associate (Connections Media) - Washington, DCConnections Media, LLC, a Washington, DC-based digital agency providing strategy...
    • Lead Generation Specialist
      Lead Generation Specialist (The Oxford Club) - BaltimoreThe Oxford Club is seeking a talented writer/marketer to join our growing email lead-generation...