New Study Sizes Up the Web

  |  June 29, 2005   |  Comments

Just how big is the Web, anyway? And how much of it do the search engines crawl?

It's been ages since I've seen anyone try to estimate the size of the Web. Now, a new paper puts it at 11.5 billion pages or more for January 2005.

The paper, from Antonio Gulli of Università di Pisa (who is also director of advanced products for Ask Jeeves) and Alessio Signorinialso of the University of Iowa, estimates what percentage of the Web is covered by each search engine:

Search Engine Self-Reported Size (Billions) EstimatedSize (Billions) Coverage of Indexed Web (%) Coverage of Total Web (%)
Google 8.1 8.0 76.2 69.6
Yahoo 4.2 (est.) 6.6 69.3 57.4
Ask 2.5 5.3 57.6 46.1
MSN (beta) 5.0 5.1 61.9 44.3
Indexed Web N/A 9.4 N/A N/A
Total Web N/A 11.5 N/A N/A
Note: "Indexed Web" refers to the part of the Web considered to have been indexed by search engines.

The first thing you wonder is whether any of the search engines are lying about their size. Google claims to have the biggest search index, 8.1 billion pages. The estimate shows Google's claim is pretty much on target. The same holds true for MSN and Ask Jeeves.

Yahoo doesn't provide an estimate of its index. The figure in the table dates back to 2004, when it said it was comparable to Google. The paper's estimate is useful, because we finally have an updated sense of where Yahoo might be.

There are a ton of caveats. Estimates are for the "visible" Web, URLs search engines can easily reach. The "invisible," or "deep," Web refers to content locked behind databases or other systems that search engines haven't extracted. I've seen estimates in the past that the deep Web might be 500 billion pages.

Though the study does some URL normalization, it still seems mirror or duplicate pages may have been counted. So though there may be a certain number of pages, the number of unique pages may be lower.

Finally, size shouldn't be a surrogate for relevancy. Having a ton of pages doesn't mean anything if you can't return the best pages in the top results. It's helpful to know a search engine has good Web coverage, but it's only one of many factors to consider.

It's still great to have some updated estimates of the Web's size, as well as search coverage. For background on size issues, see some historic articles on Search Engine Watch Search Engine Sizes page. I'm planning to update figures there, but the reference material is all still valid.

Want more search information? ClickZ SEM Archives contain all our search columns, organized by topic.

ClickZ Live Chicago Join the Industry's Leading eCommerce & Direct Marketing Experts in Chicago
ClickZ Live Chicago (Nov 3-6) will deliver over 50 sessions across 4 days and 10 individual tracks, including Data-Driven Marketing, Social, Mobile, Display, Search and Email. Check out the full agenda and register by Friday, Oct 3 to take advantage of Early Bird Rates!

ABOUT THE AUTHOR

Danny Sullivan

Danny Sullivan left Search Engine Watch as of Dec. 1, 2006.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Search newsletter delivered to you. Subscribe today!

COMMENTS

UPCOMING EVENTS

Featured White Papers

IBM: Social Analytics - The Science Behind Social Media Marketing

IBM Social Analytics: The Science Behind Social Media Marketing
80% of internet users say they prefer to connect with brands via Facebook. 65% of social media users say they use it to learn more about brands, products and services. Learn about how to find more about customers' attitudes, preferences and buying habits from what they say on social media channels.

An Introduction to Marketing Attribution: Selecting the Right Model for Search, Display & Social Advertising

An Introduction to Marketing Attribution: Selecting the Right Model for Search, Display & Social Advertising
If you're considering implementing a marketing attribution model to measure and optimize your programs, this paper is a great introduction. It also includes real-life tips from marketers who have successfully implemented attribution in their organizations.

Resources

Jobs

    • Recent Grads: Customer Service Representative
      Recent Grads: Customer Service Representative (Agora Financial) - BaltimoreAgora Financial, one of the nation's largest independent publishers...
    • Managing Editor
      Managing Editor (Common Sense Publishing) - BaltimoreWE’RE HIRING: WE NEED AN AMAZING EDITOR TO POLISH WORLD-CLASS CONTENT   The Palm...
    • Senior Paid Search & Advertising Manager
      Senior Paid Search & Advertising Manager (Smarty Had A Party) - St. LouisCompany Description: A warm, loving, [slightly wacky] startup, we view...