Demystifying Google Webmaster Tools Reports, Part 3

  |  August 19, 2009   |  Comments

Examining the Diagnostics reports of Google Webmaster Tools.

This is the third part of a series that takes a close look at Google's Webmaster Tools (GWT). The first examined the Site configuration section. Last time, I looked at the "Your site on the web" reports. Today I'll begin to discuss the third and final section of GWT reports: the Diagnostics section.

While each of the three sections contains helpful information, the Diagnostics section provides the information from which you can most easily produce a to-do list for improving your site's visibility. Today I'll be discussing the Crawl errors section of the Diagnostics reports.

Crawl Errors

GWT's Crawl errors section is one of the most important reports that Google offers. It shows pages that Google either can't access regularly or can't find at all. Remember one important aspect of SEO (define): a page will never show up in the SERPs (define) if the search engine can't find it. This report area is one of the best ways to ensure that your pages are found and to diagnose crawling obstacles. Following are the types of crawling errors shown in this area:

  • HTTP. This section shows all pages that Google tried to access but couldn't. In general, it was an HTTP error code that kept Googlebot from reading the page, such as 404 (page not found), 403 (forbidden), and 500 (server error). Scan this list of pages regularly. If you find pages in this list that should be available to engines and users, find the error's cause. It could be simply that the last time Google tried to access it, your site was down. If so, don't be particularly worried, as long as you can access the URL now.

  • In Sitemaps. This data illustrates a list of errors very similar to that of the HTTP section, but the errors are limited to those URLs that exist within the XML site maps you've submitted. Remember that XML site maps are an important signal in determining canonical authority, so if the URLs in your site maps produce errors when requested, you're leaving engines very few options in determining which of your URLs to show in results pages.

  • Not followed. The URLs in this list are an interesting contrast to URLs that Google couldn't crawl. In many cases, these URLs represent pages that Google probably could have partially crawled but chose not to, such as URLs it thinks has session IDs or URLs with multiple chain-style redirects. In addition, this list can show such URLs as those that require cookies (which Googlebot can't accept), unreasonably long URLs, or pages that redirect to a page that doesn't exist. As is typical in many of these reports, this list is often equally good at diagnosing site-wide architecture issues and individual problematic URLs.

  • Not found. This report is the traditional warehouse for 404 page-not-found errors. In my opinion, this report is one of the most helpful in all of GWT because in addition to showing specific URLs that are showing the 404 error code, it also lists the internal and external URLs that are pointing to your missing pages. This makes it unbelievably easy to reclaim inbound links that are already pointing to your site by redirecting these old URLs to their new location or fixing the URL so that it shows content. In the Linked from column, you'll see a hyperlinked quantity of links that points to each not-found URL. Click these links to see which sites are linking to your page. Prioritize the changes by looking at the quantity and quality of links pointing to your URLs. Fixing a URL with 15 incoming links, for example, will help your site more than fixing a dead URL with only 2 incoming links.

  • Restricted by robots.txt. Site owners sometimes inadvertently exclude files by using robots.txt directives incorrectly. This report shows all URLs that Google tried to crawl but couldn't due to your robots.txt file's directives. Keep in mind that this conflict comes into play only when your site (or another site) has links pointing to specific URLs that are excluded by robots.txt.

  • Timed out. This report is a great way to diagnose server issues. By definition, URLs appear in this report because Google received a timeout when trying to access your domain, a specific URL, or your robots.txt file. Failure to reach any of these implies that your server might be taking too long to serve requested URLs, and you should investigate that issue.

As you go through these reports, don't let the mere presence of multiple errors worry you. An error is a problem only when it concerns an important URL and Google's inability to crawl it. That's something you should address immediately. On the other hand, if older, obsolete URLs show 404 errors or your robots.txt disallows 5,000 pages and you want it that way, don't worry that these "errors" are really hurting your site.

GWT also includes these same reports for two different types of mobile content: compact HTML (CHTML) and mobile WML/XHTML, so depending on how many different types of content you offer and whether you've submitted mobile-specific site maps, you may have multiple versions of these reports to examine.


In my final column in this series, I'll discuss the other half of the meaty Diagnostics section of Google Webmaster Tools, which includes crawl stats and HTML suggestions.

ClickZ Live New York Want to learn more?
Attend ClickZ Live New York March 30 - April 1. With over 15 years' experience delivering industry-leading events, ClickZ Live brings together over 60 expert speakers to offer an action-packed, educationally-focused agenda covering all aspects of digital marketing. Register today!


Erik Dafforn

Erik Dafforn is the executive vice president of Intrapromote LLC, an SEO firm headquartered in Cleveland, Ohio. Erik manages SEO campaigns for clients ranging from tiny to enormous and edits Intrapromote's blog, SEO Speedwagon. Prior to joining Intrapromote in 1999, Erik worked as a freelance writer and editor. He also worked in-house as a development editor for Macmillan and IDG Books. Erik has a Bachelor's degree in English from Wabash College. Follow Erik and Intrapromote on Twitter.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Search newsletter delivered to you. Subscribe today!




Featured White Papers

A Buyer's Guide to Affiliate Management Software

A Buyer's Guide to Affiliate Management Software
Manage your performance marketing with the right solution. Choose a platform that will mutually empower advertisers and media partners!

Google My Business Listings Demystified

Google My Business Listings Demystified
To help brands control how they appear online, Google has developed a new offering: Google My Business Locations. This whitepaper helps marketers understand how to use this powerful new tool.


    • Website Optimizer - SEO, CRO, Analytics
      Website Optimizer - SEO, CRO, Analytics (Marcel Digital) - ChicagoMarcel Digital, an award winning interactive marketing agency established in 2003...
    • Director of Marketing
      Director of Marketing (Patron Technology) - New YorkDirector of Marketing We are seeking a Director of Marketing to manage and build our marketing...
    • Senior Interactive Producer
      Senior Interactive Producer (Ready Set Rocket) - New YorkWhat You'll Do As a member of our team, the Senior Producer reports directly to our...