Diagnosing Search Issues from the Query Box, Part 2

  |  August 6, 2008   |  Comments

Additional ways to use query operators to gauge your site's presence.

In my last column, I discussed ways to use the site: and inurl: operators to detect indexing issues with your site. In this column, I discuss additional operators (such as cache:) and the ways in which they help you diagnose search engine issues and view your site the way engines do.

Additional Uses for the Site Operator

While sites such as CopyScape do a nice job of detecting duplicate content around the Web, I sometimes like the flexibility of finding duplicate content myself. My last column showed ways of detecting unwitting duplicate content on your own site (due to canonicalization issues). But what about your content being used on other sites?

To detect this, I recommend using the site: operator to filter out your site. Scan your site to find a string of text that should appear only on your site, then plug it into a query like this:

"this is the unique string of text I found on my site" -site:yourdomain.com

The quotation marks are required to search for the exact text string. The minus sign before the site: operator tells Google to exclude your domain from results. Consequently, the only results on the SERP (define) should be third-party sites using your copy.

Keep in mind that you need to be cautious before shouting "plagiarism" or "copyright infringement." These sites may be quoting yours in a fair-use context, or they might be directory sites that have pulled a description of your site prior to linking back to you. The best text strings to search for are longer, more obscure passages that really should be on your site only.

Searching for Specific URLs

Several years ago, you could simply enter a URL in a Google search box, and the resulting page would give you a short but helpful list of information about that particular URL, including links to related sites, the cached version of the page, links pointing to that page (although this feature is notoriously shallow in its coverage), pages that mention the specific URL text, and so on.

This sort of query was particularly helpful not so much for the links to additional information, but to quickly determine whether a specific engine had indexed a page. In short, a resulting page that said "Sorry, no information is available for the URL [URL]" was a quick way to spot an indexing problem, because that response was reserved for URLs that had either not yet indexed the page, or for pages that purposely avoided indexing (such as via the robots.txt exclusion or a robots "noarchive" meta tag).

Today, searching for a simple URL still works at MSN/Live and Yahoo. A couple years ago, however, Google changed its usage for URL queries. At Google, you must now precede a URL with the text info: to get indexing and informative link information. Make sure that you leave no space between colon and URL when performing this query.

In my opinion, this latter feature is of limited value, although it can represent a link-building opportunity, sometimes turning up less savvy sites that mention your URL as text but not as a link.

The Difference Between Cache and Text Cache

The cache: operator is a terrific tool that helps you determine whether engines see your page. Ironically, it's not an entirely accurate way to show you exactly what engines see. I can't emphasize this enough, so I'll rephrase: The cached version of your page is not necessarily the exact same version of the page that engines see, monitor, and consider in their algorithms.

To see the version of your page that engines see, you must take a technological step backwards and view the text cache of the page. The text cache strips away deceptive script code, rich media, and graphics, leaving only the skeletal remains of your page, the text and links.

Consider, for example, the cache version of www.usanetwork.com. You can see some rich media and graphics and a few links, but the main body section is empty.

Contrast that view with the text cache of the same page.

While the regular cached version of a page "includes" content such as rich media and JavaScript-spawned Flash files, don't assume Google notices or considers such content. In most cases, it's included only because Google has pulled the script and Flash code into its index -- not because it understands or weighs it.

To find the text cache version of a page at Google, you can add &strip=1 to the end of a cached URL, such as in the following:

http://64.233.167.104/search?q=cache:www.usanetwork.com&pws=0
http://64.233.167.104/search?q=cache:www.usanetwork.com&pws=0&strip=1

You can also find a link to the text cache at the top of any cached page in Google. Look for the copy "Text-only version" at the top-right of a cached page, such as this cached version of the ClickZ home page.

Conclusion

Cached pages are available at all major engines, although only Google allows use of the actual cache: operator. For Yahoo and MSN Live, you can search for a URL then find a link to the cached version on the resulting page. Also, Google is the only one of the big three that differentiates and shows an actual text cache.

ClickZ Live San Francisco This Year's Premier Digital Marketing Event is #CZLSF
ClickZ Live San Francisco (Aug 11-14) brings together the industry's leading practitioners and marketing strategists to deliver 4 days of educational sessions and training workshops. From Data-Driven Marketing to Social, Mobile, Display, Search and Email, this year's comprehensive agenda will help you maximize your marketing efforts and ROI. Register today!

ABOUT THE AUTHOR

Erik Dafforn

Erik Dafforn is the executive vice president of Intrapromote LLC, an SEO firm headquartered in Cleveland, Ohio. Erik manages SEO campaigns for clients ranging from tiny to enormous and edits Intrapromote's blog, SEO Speedwagon. Prior to joining Intrapromote in 1999, Erik worked as a freelance writer and editor. He also worked in-house as a development editor for Macmillan and IDG Books. Erik has a Bachelor's degree in English from Wabash College. Follow Erik and Intrapromote on Twitter.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Search newsletter delivered to you. Subscribe today!

COMMENTS

UPCOMING EVENTS

Featured White Papers

BigDoor: The Marketers Guide to Customer Loyalty

The Marketer's Guide to Customer Loyalty
Customer loyalty is imperative to success, but fostering and maintaining loyalty takes a lot of work. This guide is here to help marketers build, execute, and maintain a successful loyalty initiative.

Marin Software: The Multiplier Effect of Integrating Search & Social Advertising

The Multiplier Effect of Integrating Search & Social Advertising
Latest research reveals 68% higher revenue per conversion for marketers who integrate their search & social advertising. In addition to the research results, this whitepaper also outlines 5 strategies and 15 tactics you can use to better integrate your search and social campaigns.

WEBINARS

    Information currently unavailable

Jobs

    • Internet Marketing Campaign Manager
      Internet Marketing Campaign Manager (Straight North, LLC) - Fort MillWe are looking for a talented Internet Marketing Campaign Manager to join the...
    • Online Marketing Coordinator
      Online Marketing Coordinator (NewMarket Health) - BaltimoreWant to learn marketing from the best minds in the business? NewMarket Health, a subsidiary...
    • Call Center Manager
      Call Center Manager (Common Sense Publishing) - Delray BeachWanted: Dynamic Call Center Manager with a Proven Track Record of Improving Response...