The first half of this series discussed some helpful Google Webmaster Tools (GWT) reports, including an external link report and ways to find rankings your site may be narrowly missing. Now, additional reports that offer benefits to GWT users.
Robots.txt Verification and Error Checking
A robots.txt file isn't necessary for a site that performs well organically, but engines are finding more ways a robots.txt file can benefit site owners. To access this report from the main GWT area, click the "Tools" link in the left navigation, then click "Analyze robots.txt" from the submenu.
If you have a robots.txt file, this report helps you determine whether specific URLs are excluded as they should be. In the box labeled "Test URLs against this robots.txt file," enter an actual URL from your site and click the "Check" button. In the subsequent "URL Results" field, Google will tell you whether the URL is "blocked" or "allowed." Running tests helps determine how and when to use such characters as wildcards and trailing slashes to most effectively block those URLs you don't want indexed.
This report is also helpful if you use your robots.txt file to provide engines with the location of your XML sitemap. Remember, however, this page doesn't validate the XML sitemap itself. It validates only the way you refer to the file. In other words, you can point to the sitemap in a valid way, but the sitemap itself may not validate. Compare this with asking someone for directions to a specific restaurant. The directions may be accurate, but the restaurant could be out of business. Similarly, the sitemap reference in the robots.txt file can be valid, but the sitemap itself might not be.
Fortunately, GWT can also tell you whether your XML sitemap is valid. Find this report in the main "Sitemaps" section. If your sitemap feed is valid, you'll see "OK" in the "Sitemap Status" column of that report.
One important thing to remember about excluding files via the robots.txt file is that while rare, these pages can technically show up in results pages if they have significant external link popularity. On our company blog, we have a login link for staff members. A Google search for that page shows a link to it but not a valid title or description. Google partially crawled that link but couldn't fully access it, as it's password protected.
Overcoming Canonical Issues
Canonical issues on a site are architectural glitches that inadvertently create multiple versions of identical URLs. One example is a site that resolves with or without the "www" prefix. Another example is a page that resolves at both the folder level, such as "/products/," and the page level, like "/products/index.aspx."
It's true engines are getting better at detecting and accounting for canonical issues. It's also true that you can never provide engines with too much information about the proper way to crawl, index, and interpret a site. So if you're unsure about whether such a setting is necessary, my advice is utilize it.
GWT has an area that lets you account for the "www" prefix issue. From the "Tools" menu, select "Set preferred domain." On this page you'll see three options:
Select the appropriate choice, and click the OK button. This takes a while to take effect, and it can take even longer to undo it if you change your mind down the road. So be sure about your needs before you make a choice.
Important points to remember about this feature:
Every second you spend poking around GWT is time well spent. I've watched its evolution closely, and I find the GWT team to be very responsive to user requests and concerns and really focused on providing data that's truly helpful.
Next: I'll spend some time in Yahoo's Site Explorer and discuss ways Yahoo is informing site owners about their sites.
Want more search information? ClickZ SEM Archives contain all our search columns, organized by topic.
Erik Dafforn is the executive vice president of Intrapromote LLC, an SEO firm headquartered in Cleveland, Ohio. Erik manages SEO campaigns for clients ranging from tiny to enormous and edits Intrapromote's blog, SEO Speedwagon. Prior to joining Intrapromote in 1999, Erik worked as a freelance writer and editor. He also worked in-house as a development editor for Macmillan and IDG Books. Erik has a Bachelor's degree in English from Wabash College. Follow Erik and Intrapromote on Twitter.