Google Webmaster Tools contains a trove of analytical and diagnostic data for your sites. But is there too much information there? For many marketers, there’s so much data that it’s often hard to know which reports and features are “need to have” and which are “nice to have.”
Google Webmaster Tools (GWT) reports are broken into three categories: “Site configuration,” “Your site on the web,” and “Diagnostics.” (This naming is unfortunate, since all three sections have at least some diagnostic data.) In today’s column, I’ll walk through the reports in the “Site configuration” category and explain what the reports are, what they tell you, and how (and when) to act on them.
Before getting into the subcategories, however, let’s look at the top right of your GWT dashboard. There you’ll see an envelope icon with a number next to it. This is your “Message” area and it’s a record of the conversations between you and the GWT team. Check this area regularly, as the GWT team may send a message to you here if there’s a problem with any of your sites. This location also hosts any site reconsideration requests you’ve submitted in the past.
“Sitemaps” is the first subcategory within the site configuration category. From this area, you can submit an XML site map that you’ve already uploaded to your site. (Note: “submit” in this context means “inform Google about,” not “upload.” To submit a site map using this form, the file must already exist on your server.) The interface was recently updated so that you no longer need to tell Google what type of site map it is. Instead, Google now detects the type.
Don’t be alarmed if the number of reported “Indexed URLs” is smaller than the number of “URLs Submitted.” I have access to dozens of sites’ data, and not a single site has a 100 percent indexing rate. Historically, this report has been a bit buggy. Also remember that Google will, at its discretion, decide whether to index URLs that are too similar to others.
Two other important columns are “Downloaded” and “Status.” The most recent “Downloaded” date may range from a week or more ago to only a few hours ago. If you’ve uploaded a new site map since the last download date, be sure to check the box next to the site map and click the “Resubmit” button. In addition, many tools exist to automatically update your site map and ping Google each time you add content to your site.
The “Status” column will contain either a green checkmark graphic or a red “x” graphic. This tells you whether your file is valid (green checkmark) or invalid or missing (red “x”). Remember that a green checkmark does not necessarily mean that all your URLs are correct or indexed. It means only that the site map file you submitted contains valid XML.
The “Crawler access” area is your robots.txt file command center. In the “Test robots.txt” tab, you can see the status of your file (when it was last downloaded and whether it’s valid). You can also test specific URLs against your existing robots.txt file to see whether they’ll be excluded under your existing file’s commands. And nicest of all, you can use this area as a laboratory to tweak your robots.txt commands and test them against any URLs you want until you have your file just right.
The “Generate robots.txt” tab enables you to create a robots.txt file that includes and excludes any specific files or directories you want, from any robot you want. You can create exclusion rules with robots other than Google’s own herd. But remember that if it’s a non-Google robot, you’ll need to know the name of it, because non-Google crawlers are not included in the dropdown. Here’s a good list of major crawlers and their user-agents.
The “Remove URL” tab lets you request that a specific file be removed entirely from Google’s index. After submitting a request, this area then shows the status of the request. Remember that this process is secondary to actually doing your best to delete the file on your own.Google’s requirements for removing a URL state that any URL that you delete using this report must first return a 404 and/or be blocked by the robots.txt file before requests will be accepted.
If you have sitelinks, you’re probably pretty happy with them. If you don’t, then this section won’t have any options for you.
But if Google shows a sitelink for your site that you’d prefer it not show (such as an internal login link), you can use this feature to tell Google to no longer show that link in SERPs (define). Simply click the “block” button by the appropriate URL, and the link will no longer appear in sitelinks for your queries.
If you choose to block a certain URL from the list of sitelinks, Google may either replace it with another URL of its choosing, or it may simply leave that slot blank and not replace it.
Change of Address
Moving to a new office or residence? Don’t bring that information here, because that’s not what it’s for. Instead, this is a way to tell Google when your site has undergone a full domain migration from an old domain to a new one.
You still need to perform the necessary 301 redirects on your server to switch domains, and this feature is available to you only if you’re verified in GWT for both the old and new domain, but in its own words, it “lets you notify Google when you are moving from one domain to another, enabling us to update our index faster and hopefully creating a smoother transition for your users.” The bottom line is, this feature isn’t a replacement for old-school domain migration. Instead, it supplements it and ideally makes it more efficient.
The “Settings” subcategory lets you control three separate aspects of Google’s crawling, indexing, and ranking of your pages. The “Geographic target” section allows you to focus your site’s traffic on a single country. If you’re geo-targeting, and want traffic only from a single nation, I recommend using this tool and watching this video.
The “Preferred domain” section enables you to dictate how your URLs appear in Google search results, either with or without the “www” prefix. This is not the same as a canonical redirect. Instead, it’s only a suggestion and affects the search results only cosmetically. If you have www-based canonical issues, I still strongly urge you to fix them.
The “Crawl rate” section lets you tell Google how much restraint Googlebot should use while crawling your site. Leave this option alone unless you’re having significant problems on one end, with Googlebot crashing your server because it’s crawling too many pages too quickly, or on the other end, with so many pages (and spare processing power) that you want Google to turn up the speed.
In subsequent columns, I’ll get into the grittier aspects of diagnosing specific problems like crawl errors, reclaiming links lost to 404 errors, and other fun tasks. In the mean time, if you have specific questions about the reports covered today, please leave a note in the comments section.
Erik is off today. This column was originally published July 22, 2009 on ClickZ.
When you’re just starting out as a business owner it’s easy to become wrapped up in the seemingly endless number of metrics ... read more
Visual search on the web has been around for some time. In 2008, TinEye became the first image search engine to use ... read more
We’ve written an awful lot about Google’s open source accelerated mobile pages project (better know as Google AMP) over that last 12 ... read more