Make Way for the Deep Crawl

  |  August 29, 2001   |  Comments

The "deep Web" -- large dynamic sites with thousands, and sometimes millions, of URLs -- remained "uncrawlable" or invisible to most search engines. Until now. The good news? The "deep crawl" is coming.

I think most of you know my position (no pun intended) regarding paid-inclusion programs. In my humble opinion, paid inclusion will show up as a small blip on the screen in the future of search engine optimization (SEO). As with everything else that transforms over time, paid inclusion is only a stepping-stone to the final product.

As you know, paid placement refers to paid listings on GoTo.com and sponsored links on other engines. Paid inclusion guarantees that submitted pages will be listed within an index but doesn't guarantee positioning. Paid submission speeds up the processing of a listing but doesn't guarantee the site will be listed. The latest entry in paid services is "Trusted Feed," which is good news for dynamic sites -- more on that below.

You all want to be found, but paid inclusion offers no guarantee of being found anywhere. Furthermore, many of you have large dynamic sites with thousands, and some with millions, of URLs to be crawled and found. Up to now, these dynamic URLs remained "uncrawlable" or invisible to most search engines.

The good news is that the deep crawl is coming, and it will provide you with several options for future search engine positioning (SEP) success.

You may recall an article I wrote last August titled "How Deep Is the Web?" I reported on the existence of a hidden "deep Web" with approximately 500 billion individual documents, most of which are available to the public but not accessible through conventional search engines. That's because many of these documents use frames or are in database-driven Web sites such as eBay, Amazon.com, and the Library of Congress, which the spiders can't crawl. As a result, billions of dynamic HTML documents, including Flash, MP3, and video files, have been out of the reach of conventional search engines.

A year later, we've had some significant breakthroughs that will enable portals and engines to access the vast, hidden parts of the "deep Web." These advances will permit the indexing of more than 100,000 large dynamic sites, giving access to information on these database-driven sites to businesses, researchers, and consumers.

Good News for Frame Sites

Google and Inktomi can now index sites using frames. Google can crawl any URL that a browser can read, but most other search engines can't crawl URLs with the characters "?" and "&," which are used to separate common gateway interface (CGI) variables (e.g., "http://www.towerrecords.com/music.asp?genre=Country").

Some people get around this by creating static versions of the site's dynamic pages for search engine crawlers. But this is a lot of work, and it takes time and continuous maintenance. A better strategy is to rewrite your dynamic URLs in a syntax that search engines can crawl. For details on how to implement this strategy, visit Spider Food's Dynamic Web Page Optimization.

New Services for Dynamic Sites

Trusted Feed is a new service from AltaVista announced at Search Engine Strategies last week. It's ideal for submitting Web pages that are traditionally difficult for crawlers to index, such as framed pages and pages with dynamic content. It allows businesses to submit 500 or more URLs via an eXtensible Markup Language (XML) feed directly into AltaVista's index. In addition, partners receive detailed performance reporting for each URL submitted -- important for determining return on investment (ROI).

Trusted Feed-like programs, which simultaneously position relevant keywords and phrases within a search engine index, will produce exactly what we've all been waiting for -- targeted, high-quality, and relevant traffic for large dynamic sites.

The first of the Trusted Feed systems have begun at Alta Vista and Inktomi. Of course, they are pay-per-click (PPC) models and will compete directly with the more traditional GoTo PPC accounts. These new paid services will require close monitoring to determine the best ROI. Those who contract with SEO companies maintaining customized monitoring tools, techniques, services, and allies will see significant savings in their SEO and cost-per-click (CPC) campaigns.

New Technologies for Deep Web Crawl

Quigo recently launched a series of new technologies for forward-thinking portals and engines, allowing those with the ability to manage and access deep Web content to surpass competitors that use traditional crawling techniques.

And those with entrepreneurial blood will see the light go on! With the ability to feed the huge dynamic site database into their own databases, enabling users to search deeply and directly to exactly what they're looking for -- do you think there's any revenue opportunities here? You'd better believe it.

Quigo's QUIBOT remotely crawls through pages from the deep Web, enabling it to index a large portion of the deep Web and making this content available to users searching on Quigo and partner portals. It does not require any modifications, such as cloaking or doorway pages, on the indexed Web sites.

Quigo's DeepWebGateway enables search engines to index deep Web content that they do not access directly. This technology also solves other problems related to deep Web crawling and indexing, such as spider traps and personalization.

Quigo's DeepWebSonar is for portals and features page indexing, query analysis, and a unique ranking system. By redirecting user queries to DeepWebSonar, portals can immediately offer deep Web results to their users.

The idea of providing large dynamic Web sites with 100 percent visibility is going to rock. The portals, engines, and directories that provide the best services both to users and to advertisers are going to lead the pack.

ClickZ Live Toronto Twitter Canada MD Kirstine Stewart to Keynote Toronto
ClickZ Live Toronto (May 14-16) is a new event addressing the rapidly changing landscape that digital marketers face. The agenda focuses on customer engagement and attaining maximum ROI through online marketing efforts across paid, owned & earned media. Register now and save!

ABOUT THE AUTHOR

Paul J. Bruemmer

Paul J. Bruemmer is CEO of Web-Ignite Corporation, a search engine optimization (SEO) and positioning provider. Founded in 1995, Web-Ignite has helped promote over 15,000 Web sites and was recognized by ICONOCAST as one of the top 10 most reputable SEO firms. Services include optimization, submission, registration, positioning, monitoring, maintenance, paid-inclusion, and paid-placement management for fixed monthly fees. Recent client testimonials report search engine traffic increased from 150 to 500 percent.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Search newsletter delivered to you. Subscribe today!

COMMENTS

UPCOMING EVENTS

Featured White Papers

ion Interactive Marketing Apps for Landing Pages White Paper

Marketing Apps for Landing Pages White Paper
Marketing apps can elevate a formulaic landing page into a highly interactive user experience. Learn how to turn your static content into exciting marketing apps.

eMarketer: Redefining Mobile-Only Users: Millions Selectively Avoid the Desktop

Redefining 'Mobile-Only' Users: Millions Selectively Avoid the Desktop
A new breed of selective mobile-only consumers has emerged. What are the demos of these users and how and where can marketers reach them?

Jobs

    • Contact Center Professional
      Contact Center Professional (TCC: The Contact Center) - Hunt ValleyLooking to join a workforce that prides themselves on being routine and keeping...
    • Recruitment and Team Building Ambassador
      Recruitment and Team Building Ambassador (Agora Inc.) - BaltimoreAgora, www.agora-inc.com, continues to expand! In order to meet the needs of our...
    • Design and Publishing Specialist
      Design and Publishing Specialist (Bonner and Partners) - BaltimoreIf you’re a hungry self-starter, creative, organized and have an extreme...