How Deep Is the Web?

Are we back to discussing whether or not the earth is flat? There appears to be significant physical evidence that the web is much larger than search engines would like us to know. What would you say to a web of 500 billion searchable documents versus the reported 1 billion? Paul tells you about the "deep" web versus the "surface" web and a new search technology that searches both by handling multiple direct queries simultaneously.

Author

Paul J. Bruemmer

Date published August 16, 2000 Categories

BrightPlanet released a white paper estimating that we’ve got more than 100,000 content-rich searchable databases available on the web. The study suggests the existence of a hidden “deep web” with approximately 500 billion individual documents, most of which are available to the public.

This study also indicates that the deep web is a vast pool of Internet content that is 500 times larger than the known “surface” of the World Wide Web. The significance of this is that quality content exists in documents within searchable databases on the web, but conventional search engines can’t access it. Just think what this could mean to businesses, researchers, and consumers – to gain access to valuable, difficult-to-find information on the web with accuracy and ease.

BrightPlanet has developed LexiBot technology, claimed to be the first and only search technology capable of identifying, retrieving, categorizing, and organizing both “deep” and “surface” content from the World Wide Web. LexiBot has the ability to query multiple search sites directly and simultaneously, which allows deep web content to be retrieved.

The deep web differs qualitatively from the surface web in that its sources store content in searchable databases that produce results dynamically in response to a direct request. But direct queries are an arduous way to search because they are handled one at a time. LexiBot automates the process of handling multiple direct queries simultaneously by means of its multiple-thread technology. Traditional search engines create their databases by spidering or crawling “surface” web pages. To be indexed, a page must be static and linked to other pages. Traditional search engines cannot see or retrieve content in the deep web because their technology can’t probe beneath the surface. So while the deep web has always been present, it’s been inaccessible up to now.

Since we are well into the Information Age when usable, relevant data is highly prized, the value of deep web content is incalculable. That’s why researchers of web infrastructure have proposed new search engine models and believe we need a fundamental restructuring of the way search engines work.

The BrightPlanet study found that public information on the deep web can be 400 to 550 times larger than what is known as the World Wide Web. The deep web actually contains 7,500 terabytes of information versus the 29 terabytes of information in the surface web.

Not only that, it is estimated that more than 100,000 nonindexed deep web sites currently exist. Sixty of these collectively contain about 750 terabytes of information, exceeding the size of the surface web 40 times over.

On average, sites in the deep web receive about 50 percent more traffic monthly than surface sites and are more highly linked to; however, the typical deep web site is not well known to the Internet search public. The deep web is believed to be the largest growing source of new information on the Internet.

These sites in the deep web contain quality content – content that is highly relevant to every information need. More than half of the deep web content resides in topic-specific databases.

Ninety-five percent of the deep web contains publicly accessible information that is not subject to fees or subscriptions.

So what does all this mean? It would appear that simultaneous searching of both the surface and deep web is necessary when comprehensive information retrieval is required. And the structure of our currently popular search engines might be in for evolutionary change.

BrightPlanet has automated the identification of deep web sites and the retrieval process for simultaneous searches. It has also developed a direct-access query engine consisting of approximately 22,000 sites, projected to increase to 100,000 sites. A list of these sites can be found in the CompletePlanet search portal.

Subscribe to get your daily business insights

More about:

Read the next article

Explore Tech Talks

Lucy

Lucy helps organizations leverage knowledge for in... View Tech Talk
TVSquared

TVSquared is the global leader in cross-platform T... View Tech Talk
Grata

Grata is a B2B search engine for discovering small... View Tech Talk

Whitepapers

US Mobile Streaming Behavior

Whitepaper | Mobile

US Mobile Streaming Behavior

Streaming has become a staple of US media-viewing habits. Streaming video, however, still comes with a variety of pesky frustrations that viewers are ...

View resource

Winning the Data Game: Digital Analytics Tactics for Media Groups

Whitepaper | Analyzing Customer Data

Winning the Data Game: Digital Analytics Tactics for Media Groups

Winning the Data Game: Digital Analytics Tactics f...

Data is the lifeblood of so many companies today. You need more of it, all of which at higher quality, and all the meanwhile being compliant with data...

View resource

Learning to win the talent war: how digital marketing can develop its people

Whitepaper | Digital Marketing

Learning to win the talent war: how digital marketing can develop its peopl...

Learning to win the talent war: how digital market...

This report documents the findings of a Fireside chat held by ClickZ in the first quarter of 2022. It provides expert insight on how companies can ret...

View resource

Engagement To Empowerment - Winning in Today's Experience Economy

Report | Digital Transformation

Engagement To Empowerment - Winning in Today's Experience Economy

Engagement To Empowerment - Winning in Today's Exp...

Customers decide fast, influenced by only 2.5 touchpoints – globally! Make sure your brand shines in those critical moments. Read More...

View resource

Mastering voice search optimization: Talk like a local, rank like a pro

Search Marketing

Mastering voice search optimization: Talk like a local, rank like a pro

1m ClickZ News Staff

Mastering voice search optimization: Talk like a l...

Forget typing, voice search is booming. Businesses need Voice Search Optimization (VSO) to rank for conversational queries and secure top spots in sea...

View article

How to Create Impactful SEO Reports that Drive Business Success

2m ClickZ News Staff

How to Create Impactful SEO Reports that Drive Bus...

Wielding graphs and analytics has its place. But to truly capture executive attention in today’s impatient digital arena, we must step into the shoes ...

View article

How Google's Search Generative Experience (SGE) is Reshaping SEO

2m ClickZ News Staff

How Google's Search Generative Experience (SGE) is...

As the search giant delves deeper into the realm of artificial intelligence (AI), it is clear that SGE will have a profound impact on the future of SE...

View article

The secrets to getting the best SEO traffic without even ranking

11m Daniel Tannenbaum

The secrets to getting the best SEO traffic withou...

Did you know that there are ways to get to the top of Google without ranking your own site? You can still get lots of good organic traffic using alter...

View article

How SEO is changing because of ChatGPT

11m Daniel Tannenbaum

How SEO is changing because of ChatGPT

When ChatGPT was introduced in 2022, it changed the internet. Today, we speak to some startups and experts to understand how ChatGPT is changing SEO R...

View article

Winning at search: why vigilance and strategy alignment are necessary evils

Data-Driven Marketing

Winning at search: why vigilance and strategy alignment are necessary evils

11m Prasanna Dhungel

Winning at search: why vigilance and strategy alig...

As brands and agencies struggle to prioritize visibility of ever-changing SERP features, here's how they can build effective, holistic search strategi...

View article

What role does page speed play for SEO?

SEO

What role does page speed play for SEO?

1y DebugBear

What role does page speed play for SEO?

Page speed has been a ranking factor for a long time, but it has increased in importance over the last two years. Learn about Google’s Core Web Vitals...

View article

iOS 14 uncovers measurement vulnerabilities for business

322023

iOS 14 uncovers measurement vulnerabilities for business

1y Jamie Bolton

iOS 14 uncovers measurement vulnerabilities for bu...

How will marketers handle the advertising industry upheaval in regard to data and measurement? Read More...

View article

Follow us

How Deep Is the Web?

Subscribe to get your daily business insights

Read the next article

Explore Tech Talks

Whitepapers

Whitepapers

US Mobile Streaming Behavior

US Mobile Streaming Behavior

Winning the Data Game: Digital Analytics Tactics for Media Groups

Winning the Data Game: Digital Analytics Tactics f...

Learning to win the talent war: how digital marketing can develop its peopl...

Learning to win the talent war: how digital market...

Engagement To Empowerment - Winning in Today's Experience Economy

Engagement To Empowerment - Winning in Today's Exp...

Related Articles

Mastering voice search optimization: Talk like a local, rank like a pro

Mastering voice search optimization: Talk like a l...

How to Create Impactful SEO Reports that Drive Business Success

How to Create Impactful SEO Reports that Drive Bus...

How Google's Search Generative Experience (SGE) is Reshaping SEO

How Google's Search Generative Experience (SGE) is...

The secrets to getting the best SEO traffic without even ranking

The secrets to getting the best SEO traffic withou...

How SEO is changing because of ChatGPT

How SEO is changing because of ChatGPT

Winning at search: why vigilance and strategy alignment are necessary evils

Winning at search: why vigilance and strategy alig...

What role does page speed play for SEO?

What role does page speed play for SEO?

iOS 14 uncovers measurement vulnerabilities for business

iOS 14 uncovers measurement vulnerabilities for bu...