Demystifying Google Webmaster Tools Reports, Part 3

  |  August 19, 2009   |  Comments

Examining the Diagnostics reports of Google Webmaster Tools.

This is the third part of a series that takes a close look at Google's Webmaster Tools (GWT). The first examined the Site configuration section. Last time, I looked at the "Your site on the web" reports. Today I'll begin to discuss the third and final section of GWT reports: the Diagnostics section.

While each of the three sections contains helpful information, the Diagnostics section provides the information from which you can most easily produce a to-do list for improving your site's visibility. Today I'll be discussing the Crawl errors section of the Diagnostics reports.

Crawl Errors

GWT's Crawl errors section is one of the most important reports that Google offers. It shows pages that Google either can't access regularly or can't find at all. Remember one important aspect of SEO (define): a page will never show up in the SERPs (define) if the search engine can't find it. This report area is one of the best ways to ensure that your pages are found and to diagnose crawling obstacles. Following are the types of crawling errors shown in this area:

  • HTTP. This section shows all pages that Google tried to access but couldn't. In general, it was an HTTP error code that kept Googlebot from reading the page, such as 404 (page not found), 403 (forbidden), and 500 (server error). Scan this list of pages regularly. If you find pages in this list that should be available to engines and users, find the error's cause. It could be simply that the last time Google tried to access it, your site was down. If so, don't be particularly worried, as long as you can access the URL now.

  • In Sitemaps. This data illustrates a list of errors very similar to that of the HTTP section, but the errors are limited to those URLs that exist within the XML site maps you've submitted. Remember that XML site maps are an important signal in determining canonical authority, so if the URLs in your site maps produce errors when requested, you're leaving engines very few options in determining which of your URLs to show in results pages.

  • Not followed. The URLs in this list are an interesting contrast to URLs that Google couldn't crawl. In many cases, these URLs represent pages that Google probably could have partially crawled but chose not to, such as URLs it thinks has session IDs or URLs with multiple chain-style redirects. In addition, this list can show such URLs as those that require cookies (which Googlebot can't accept), unreasonably long URLs, or pages that redirect to a page that doesn't exist. As is typical in many of these reports, this list is often equally good at diagnosing site-wide architecture issues and individual problematic URLs.

  • Not found. This report is the traditional warehouse for 404 page-not-found errors. In my opinion, this report is one of the most helpful in all of GWT because in addition to showing specific URLs that are showing the 404 error code, it also lists the internal and external URLs that are pointing to your missing pages. This makes it unbelievably easy to reclaim inbound links that are already pointing to your site by redirecting these old URLs to their new location or fixing the URL so that it shows content. In the Linked from column, you'll see a hyperlinked quantity of links that points to each not-found URL. Click these links to see which sites are linking to your page. Prioritize the changes by looking at the quantity and quality of links pointing to your URLs. Fixing a URL with 15 incoming links, for example, will help your site more than fixing a dead URL with only 2 incoming links.

  • Restricted by robots.txt. Site owners sometimes inadvertently exclude files by using robots.txt directives incorrectly. This report shows all URLs that Google tried to crawl but couldn't due to your robots.txt file's directives. Keep in mind that this conflict comes into play only when your site (or another site) has links pointing to specific URLs that are excluded by robots.txt.

  • Timed out. This report is a great way to diagnose server issues. By definition, URLs appear in this report because Google received a timeout when trying to access your domain, a specific URL, or your robots.txt file. Failure to reach any of these implies that your server might be taking too long to serve requested URLs, and you should investigate that issue.

As you go through these reports, don't let the mere presence of multiple errors worry you. An error is a problem only when it concerns an important URL and Google's inability to crawl it. That's something you should address immediately. On the other hand, if older, obsolete URLs show 404 errors or your robots.txt disallows 5,000 pages and you want it that way, don't worry that these "errors" are really hurting your site.

GWT also includes these same reports for two different types of mobile content: compact HTML (CHTML) and mobile WML/XHTML, so depending on how many different types of content you offer and whether you've submitted mobile-specific site maps, you may have multiple versions of these reports to examine.


In my final column in this series, I'll discuss the other half of the meaty Diagnostics section of Google Webmaster Tools, which includes crawl stats and HTML suggestions.

ClickZ Live Chicago Join the Industry's Leading eCommerce & Direct Marketing Experts in Chicago
ClickZ Live Chicago (Nov 3-6) will deliver over 50 sessions across 4 days and 10 individual tracks, including Data-Driven Marketing, Social, Mobile, Display, Search and Email. Check out the full agenda and register by Friday, Oct 3 to take advantage of Early Bird Rates!


Erik Dafforn

Erik Dafforn is the executive vice president of Intrapromote LLC, an SEO firm headquartered in Cleveland, Ohio. Erik manages SEO campaigns for clients ranging from tiny to enormous and edits Intrapromote's blog, SEO Speedwagon. Prior to joining Intrapromote in 1999, Erik worked as a freelance writer and editor. He also worked in-house as a development editor for Macmillan and IDG Books. Erik has a Bachelor's degree in English from Wabash College. Follow Erik and Intrapromote on Twitter.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Search newsletter delivered to you. Subscribe today!



Featured White Papers

IBM: Social Analytics - The Science Behind Social Media Marketing

IBM Social Analytics: The Science Behind Social Media Marketing
80% of internet users say they prefer to connect with brands via Facebook. 65% of social media users say they use it to learn more about brands, products and services. Learn about how to find more about customers' attitudes, preferences and buying habits from what they say on social media channels.

An Introduction to Marketing Attribution: Selecting the Right Model for Search, Display & Social Advertising

An Introduction to Marketing Attribution: Selecting the Right Model for Search, Display & Social Advertising
If you're considering implementing a marketing attribution model to measure and optimize your programs, this paper is a great introduction. It also includes real-life tips from marketers who have successfully implemented attribution in their organizations.


    • Tier 1 Support Specialist
      Tier 1 Support Specialist (Agora Inc.) - BaltimoreThis position requires a highly motivated and multifaceted individual to contribute to and be...
    • Recent Grads: Customer Service Representative
      Recent Grads: Customer Service Representative (Agora Financial) - BaltimoreAgora Financial, one of the nation's largest independent publishers...
    • Managing Editor
      Managing Editor (Common Sense Publishing) - BaltimoreWE’RE HIRING: WE NEED AN AMAZING EDITOR TO POLISH WORLD-CLASS CONTENT   The Palm...