Demystifying Google Webmaster Tools Reports, Part 3

  |  August 19, 2009   |  Comments

Examining the Diagnostics reports of Google Webmaster Tools.

This is the third part of a series that takes a close look at Google's Webmaster Tools (GWT). The first examined the Site configuration section. Last time, I looked at the "Your site on the web" reports. Today I'll begin to discuss the third and final section of GWT reports: the Diagnostics section.

While each of the three sections contains helpful information, the Diagnostics section provides the information from which you can most easily produce a to-do list for improving your site's visibility. Today I'll be discussing the Crawl errors section of the Diagnostics reports.

Crawl Errors

GWT's Crawl errors section is one of the most important reports that Google offers. It shows pages that Google either can't access regularly or can't find at all. Remember one important aspect of SEO (define): a page will never show up in the SERPs (define) if the search engine can't find it. This report area is one of the best ways to ensure that your pages are found and to diagnose crawling obstacles. Following are the types of crawling errors shown in this area:

  • HTTP. This section shows all pages that Google tried to access but couldn't. In general, it was an HTTP error code that kept Googlebot from reading the page, such as 404 (page not found), 403 (forbidden), and 500 (server error). Scan this list of pages regularly. If you find pages in this list that should be available to engines and users, find the error's cause. It could be simply that the last time Google tried to access it, your site was down. If so, don't be particularly worried, as long as you can access the URL now.

  • In Sitemaps. This data illustrates a list of errors very similar to that of the HTTP section, but the errors are limited to those URLs that exist within the XML site maps you've submitted. Remember that XML site maps are an important signal in determining canonical authority, so if the URLs in your site maps produce errors when requested, you're leaving engines very few options in determining which of your URLs to show in results pages.

  • Not followed. The URLs in this list are an interesting contrast to URLs that Google couldn't crawl. In many cases, these URLs represent pages that Google probably could have partially crawled but chose not to, such as URLs it thinks has session IDs or URLs with multiple chain-style redirects. In addition, this list can show such URLs as those that require cookies (which Googlebot can't accept), unreasonably long URLs, or pages that redirect to a page that doesn't exist. As is typical in many of these reports, this list is often equally good at diagnosing site-wide architecture issues and individual problematic URLs.

  • Not found. This report is the traditional warehouse for 404 page-not-found errors. In my opinion, this report is one of the most helpful in all of GWT because in addition to showing specific URLs that are showing the 404 error code, it also lists the internal and external URLs that are pointing to your missing pages. This makes it unbelievably easy to reclaim inbound links that are already pointing to your site by redirecting these old URLs to their new location or fixing the URL so that it shows content. In the Linked from column, you'll see a hyperlinked quantity of links that points to each not-found URL. Click these links to see which sites are linking to your page. Prioritize the changes by looking at the quantity and quality of links pointing to your URLs. Fixing a URL with 15 incoming links, for example, will help your site more than fixing a dead URL with only 2 incoming links.

  • Restricted by robots.txt. Site owners sometimes inadvertently exclude files by using robots.txt directives incorrectly. This report shows all URLs that Google tried to crawl but couldn't due to your robots.txt file's directives. Keep in mind that this conflict comes into play only when your site (or another site) has links pointing to specific URLs that are excluded by robots.txt.

  • Timed out. This report is a great way to diagnose server issues. By definition, URLs appear in this report because Google received a timeout when trying to access your domain, a specific URL, or your robots.txt file. Failure to reach any of these implies that your server might be taking too long to serve requested URLs, and you should investigate that issue.

As you go through these reports, don't let the mere presence of multiple errors worry you. An error is a problem only when it concerns an important URL and Google's inability to crawl it. That's something you should address immediately. On the other hand, if older, obsolete URLs show 404 errors or your robots.txt disallows 5,000 pages and you want it that way, don't worry that these "errors" are really hurting your site.

GWT also includes these same reports for two different types of mobile content: compact HTML (CHTML) and mobile WML/XHTML, so depending on how many different types of content you offer and whether you've submitted mobile-specific site maps, you may have multiple versions of these reports to examine.

Conclusion

In my final column in this series, I'll discuss the other half of the meaty Diagnostics section of Google Webmaster Tools, which includes crawl stats and HTML suggestions.

ClickZ Live Chicago Learn Digital Marketing Insights From Leading Brands!
ClickZ Live Chicago (Nov 3-6) will deliver over 50 sessions across 10 individual tracks, including Data-Driven Marketing, Social, Mobile, Display, Search and Email. Check out the full agenda, or register and attend one of the best ClickZ events yet!

ABOUT THE AUTHOR

Erik Dafforn

Erik Dafforn is the executive vice president of Intrapromote LLC, an SEO firm headquartered in Cleveland, Ohio. Erik manages SEO campaigns for clients ranging from tiny to enormous and edits Intrapromote's blog, SEO Speedwagon. Prior to joining Intrapromote in 1999, Erik worked as a freelance writer and editor. He also worked in-house as a development editor for Macmillan and IDG Books. Erik has a Bachelor's degree in English from Wabash College. Follow Erik and Intrapromote on Twitter.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Search newsletter delivered to you. Subscribe today!

COMMENTS

UPCOMING EVENTS

UPCOMING TRAINING

Featured White Papers

Google My Business Listings Demystified

Google My Business Listings Demystified
To help brands control how they appear online, Google has developed a new offering: Google My Business Locations. This whitepaper helps marketers understand how to use this powerful new tool.

5 Ways to Personalize Beyond the Subject Line

5 Ways to Personalize Beyond the Subject Line
82 percent of shoppers say they would buy more items from a brand if the emails they sent were more personalized. This white paper offer five tactics that will personalize your email beyond the subject line and drive real business growth.

WEBINARS

Resources

Jobs

    • Executive Assistant
      Executive Assistant (Agora Inc. ) - BaltimoreAgora Inc., an international publishing company, located in the Mt. Vernon district of Baltimore, MD...
    • Paid Search Specialist
      Paid Search Specialist (Boathouse, Inc.) - Waltham  Boathouse is looking for a Paid Search Specialist to work as a part of the Digital Acquisition...
    • Paid Search / Search Engine Marketing (SEM, PPC) Specialist
      Paid Search / Search Engine Marketing (SEM, PPC) Specialist (HeBS Digital) - New YorkJOB TITLE:     Paid Search / Search Engine Marketing...