Tips for Rapid Crawling and Indexing

  |  November 25, 2009   |  Comments

Ways to help search engines notice your site's new content and URLs more quickly.

You can add content to your site in two different ways: update an existing page (URL), or add content on entirely new URLs. While new content should eventually be found during the engines' crawling cycles, you can do many things to help engines notice the new content or new URLs more quickly than they would on their own.

Getting a New Page Noticed Quickly

Like so many other terms in SEO (define), getting a page "noticed" is code for getting links to it. And as with most conversations about links, the argument for quality over quantity is never more important than it is here.

But that "quality" isn't confined to the external links you drive to your site. Your own internal pages can vary significantly in their ability to get your new content crawled quickly. Recently, I watched an experiment as a client released new content (about 40 pages each) on two separate sites. In the first site, the "top" page of the new content was linked to a special link on the home page. In the second site, that similar "top" page of new content was linked to the HTML sitemap.

The difference in crawling times for those two new content sections was dramatic. For the first site (with the home page link), half of the new pages had been indexed within about 10 days. For the second site (with the link to new content from the sitemap), it took Google more than a month to find the link to the new content in the HTML sitemap. It will likely be three months or more before the second site accomplishes what the first site did in 10 days.

XML Sitemap Changes

One of the easiest ways to inform engines about new or updated content is by modifying your XML sitemap:

  • Remember, an XML sitemap is no substitute for an efficiently designed navigation structure. Simply including a new URL in the sitemap will probably get it crawled, and may even get it indexed briefly, but a page with little or no internal linkage will probably never perform optimally.

  • If you've added new content to an old URL, make sure that the value in the XML sitemap reflects the date that the URL was updated.

  • If you've created new URLs, make sure that the value reflects the date the URLs went live.

  • Resubmit the XML sitemap through the various Webmaster tools areas, or configure your server to ping engines when the sitemap has been updated. For example, this page offers instructions about telling Google that your XML file has changed.

Crying Wolf With

One of the most pervasive myths about site crawling is that to increase the crawling frequency of your pages, all you need to do is change the value to a more frequent setting. According to the Sitemaps.org specs for the value, you can set your preferred crawling frequency to any of seven values, ranging from "always" to "never" and everything in between (yearly, monthly, weekly, daily, and hourly).

I'm not sure why you'd want to burn your bridge and use a value of "never." It's similarly unrealistic to go to the other extreme and pick a value that drastically overestimates how often your content actually changes.

If you look around at random sites and their XML sitemaps, it's easy to find values of "daily" or "weekly" on pages with content that hasn't changed in months or years. This doesn't necessarily hurt anything, but if your date hasn't changed in 24 months, why would engines go out of their way to honor a "weekly" value? They likely won't, unless they notice that your content actually changes that often.

Crawled, Indexed, or Both?

I recently watched a sample page closely to mark the distance between cache dates. When I started watching the page in early October, it had been cached on September 30. Its value was set to "monthly," so the goal was to see whether Google would adhere to the suggested value. With no additional "external" forces prompting Google to crawl the page (such as a "last mod" page, re-submission of an XML sitemap, and so on), the page was eventually cached again on November 11, for a total time of about 42 days between cache dates.

The slight time variance between caching and indexing is interesting. For example, I mentioned that the second cache date in my prior example was November 11. However, I didn't find that out until several days after that date. As of November 12 and 13, the text cache still had a date of September 30. It wasn't until the 14th that the cache date had changed to the 11th, which means there was a three-day gap between Google caching the page and Google telling me it had cached it.

Conclusion

As with most things in SEO, the "easy" way doesn't usually pay off. These methods won't get your pages crawled any more frequently unless you've actually updated your content. And even if those methods did work, there's no real benefit to continually re-crawling a page with unchanged content.

Join us for SES Chicago on December 7-11, 2009. Now in its 11th year, the only major Search Marketing Conference and Expo in the Midwest will be packed with 70+ sessions covering PPC management, keyword research, Search Engine Optimization (SEO), social media, local, mobile, link building, duplicate content, video optimization and usability, while offering high-level strategy, keynotes, an exhibit floor, networking events and more.

ClickZ Live Chicago Join the Industry's Leading eCommerce & Direct Marketing Experts in Chicago
ClickZ Live Chicago (Nov 3-6) will deliver over 50 sessions across 4 days and 10 individual tracks, including Data-Driven Marketing, Social, Mobile, Display, Search and Email. Check out the full agenda and register by Friday, Oct 3 to take advantage of Early Bird Rates!

ABOUT THE AUTHOR

Erik Dafforn

Erik Dafforn is the executive vice president of Intrapromote LLC, an SEO firm headquartered in Cleveland, Ohio. Erik manages SEO campaigns for clients ranging from tiny to enormous and edits Intrapromote's blog, SEO Speedwagon. Prior to joining Intrapromote in 1999, Erik worked as a freelance writer and editor. He also worked in-house as a development editor for Macmillan and IDG Books. Erik has a Bachelor's degree in English from Wabash College. Follow Erik and Intrapromote on Twitter.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Search newsletter delivered to you. Subscribe today!

COMMENTS

UPCOMING EVENTS

Featured White Papers

IBM: Social Analytics - The Science Behind Social Media Marketing

IBM Social Analytics: The Science Behind Social Media Marketing
80% of internet users say they prefer to connect with brands via Facebook. 65% of social media users say they use it to learn more about brands, products and services. Learn about how to find more about customers' attitudes, preferences and buying habits from what they say on social media channels.

An Introduction to Marketing Attribution: Selecting the Right Model for Search, Display & Social Advertising

An Introduction to Marketing Attribution: Selecting the Right Model for Search, Display & Social Advertising
If you're considering implementing a marketing attribution model to measure and optimize your programs, this paper is a great introduction. It also includes real-life tips from marketers who have successfully implemented attribution in their organizations.

Jobs

    • Tier 1 Support Specialist
      Tier 1 Support Specialist (Agora Inc.) - BaltimoreThis position requires a highly motivated and multifaceted individual to contribute to and be...
    • Recent Grads: Customer Service Representative
      Recent Grads: Customer Service Representative (Agora Financial) - BaltimoreAgora Financial, one of the nation's largest independent publishers...
    • Managing Editor
      Managing Editor (Common Sense Publishing) - BaltimoreWE’RE HIRING: WE NEED AN AMAZING EDITOR TO POLISH WORLD-CLASS CONTENT   The Palm...