This is the third part of a series that takes a close look at Google’s Webmaster Tools (GWT). The first examined the Site configuration section. Last time, I looked at the “Your site on the web” reports. Today I’ll begin to discuss the third and final section of GWT reports: the Diagnostics section.
While each of the three sections contains helpful information, the Diagnostics section provides the information from which you can most easily produce a to-do list for improving your site’s visibility. Today I’ll be discussing the Crawl errors section of the Diagnostics reports.
GWT’s Crawl errors section is one of the most important reports that Google offers. It shows pages that Google either can’t access regularly or can’t find at all. Remember one important aspect of SEO (define): a page will never show up in the SERPs (define) if the search engine can’t find it. This report area is one of the best ways to ensure that your pages are found and to diagnose crawling obstacles. Following are the types of crawling errors shown in this area:
- HTTP. This section shows all pages that Google tried to access but couldn’t. In general, it was an HTTP error code that kept Googlebot from reading the page, such as 404 (page not found), 403 (forbidden), and 500 (server error). Scan this list of pages regularly. If you find pages in this list that should be available to engines and users, find the error’s cause. It could be simply that the last time Google tried to access it, your site was down. If so, don’t be particularly worried, as long as you can access the URL now.
- In Sitemaps. This data illustrates a list of errors very similar to that of the HTTP section, but the errors are limited to those URLs that exist within the XML site maps you’ve submitted. Remember that XML site maps are an important signal in determining canonical authority, so if the URLs in your site maps produce errors when requested, you’re leaving engines very few options in determining which of your URLs to show in results pages.
- Not followed. The URLs in this list are an interesting contrast to URLs that Google couldn’t crawl. In many cases, these URLs represent pages that Google probably could have partially crawled but chose not to, such as URLs it thinks has session IDs or URLs with multiple chain-style redirects. In addition, this list can show such URLs as those that require cookies (which Googlebot can’t accept), unreasonably long URLs, or pages that redirect to a page that doesn’t exist. As is typical in many of these reports, this list is often equally good at diagnosing site-wide architecture issues and individual problematic URLs.
- Not found. This report is the traditional warehouse for 404 page-not-found errors. In my opinion, this report is one of the most helpful in all of GWT because in addition to showing specific URLs that are showing the 404 error code, it also lists the internal and external URLs that are pointing to your missing pages. This makes it unbelievably easy to reclaim inbound links that are already pointing to your site by redirecting these old URLs to their new location or fixing the URL so that it shows content. In the Linked from column, you’ll see a hyperlinked quantity of links that points to each not-found URL. Click these links to see which sites are linking to your page. Prioritize the changes by looking at the quantity and quality of links pointing to your URLs. Fixing a URL with 15 incoming links, for example, will help your site more than fixing a dead URL with only 2 incoming links.
- Restricted by robots.txt. Site owners sometimes inadvertently exclude files by using robots.txt directives incorrectly. This report shows all URLs that Google tried to crawl but couldn’t due to your robots.txt file’s directives. Keep in mind that this conflict comes into play only when your site (or another site) has links pointing to specific URLs that are excluded by robots.txt.
- Timed out. This report is a great way to diagnose server issues. By definition, URLs appear in this report because Google received a timeout when trying to access your domain, a specific URL, or your robots.txt file. Failure to reach any of these implies that your server might be taking too long to serve requested URLs, and you should investigate that issue.
As you go through these reports, don’t let the mere presence of multiple errors worry you. An error is a problem only when it concerns an important URL and Google’s inability to crawl it. That’s something you should address immediately. On the other hand, if older, obsolete URLs show 404 errors or your robots.txt disallows 5,000 pages and you want it that way, don’t worry that these “errors” are really hurting your site.
GWT also includes these same reports for two different types of mobile content: compact HTML (CHTML) and mobile WML/XHTML, so depending on how many different types of content you offer and whether you’ve submitted mobile-specific site maps, you may have multiple versions of these reports to examine.
In my final column in this series, I’ll discuss the other half of the meaty Diagnostics section of Google Webmaster Tools, which includes crawl stats and HTML suggestions.
Last Friday at a packed-out Brighton SEO conference, expert local search consultant Greg Gifford delivered a fast and furious presentation on the secrets ... read more
Google’s official slogan is “Don’t Be Evil”, but it’s long been rumoured that the company has a second, internal motto that they ... read more
A report by Ofcom has found that just 60% of adults can realise that PPC ads in search results are in fact ... read more
By optimizing your website for Google, you could be sabotaging your site for Baidu in China and Yandex in Russia and Eastern Europe.