Last week we talked about the viability of the search engine business model. As the Internet becomes increasingly significant in our lives, the power and importance of search engines is enormous. Yet search technology has not kept pace with the endless expansion of the Internet, nor does search indexing provide an impartial and universal database. Researchers of web infrastructure have proposed new models, but do these models have a chance against the current indexing system that’s so widely accepted by the Internet culture?
The Power of Search Engines
We know the majority of users go to search engines and directories when they want to find web pages; this is well documented. And we know the engines and directories have their own arbitrary methods of indexing or listing sites. This puts a tremendous amount of power in the hands of a few major Internet portals. The search engines are like gatekeepers to a web site’s success because their decisions on which pages to index (or not to index) greatly influence a web site’s ability to be found.
Nobody knows the exact size of the web, but current estimates say it’s composed of between 1 and 1.5 billion documents. Inktomi and the NEC Research Institute estimated 1 billion indexable pages on the web in February 2000. This figure excludes private databases and intranets.
Google, the largest index online, claims to index 1 billion pages. In reality, it has a direct search capacity of 500 million pages because half of the pages are referenced but not actually indexed. This represents only half of what’s out there.
Although many portals and sites offer search functionality, the actual indexing of the web is performed by a few major players. Google’s technology powers the search engines for Netscape and Yahoo. Inktomi powers HotBot, iWon, GoTo, AOL, and LookSmart. Most other search sites use either the Excite or Fast Search databases.
The Structure of Search Engines
Search engines have three major elements: a spider, an index, and the search engine software. The spider crawls the web to visit web pages, reading them and following links to other pages within the site. Spiders crawl the web on a regular basis, looking for changes.
All the data picked up by the spider goes into the index, a giant database of all web pages found by the spider. When web pages change, they are updated in the index by the spider. It can take time for new web pages to be added to the index even though they’ve been spidered because of the time it takes to record the pages in the index.
Search engine software is used to provide results to users. The program runs through the billions of pages recorded in the index to find matches to a search, ranking these matches in order of relevancy. Each search engine has different rules of relevancy, but all search engines follow more or less the same basic rules.
When you perform a search, the results are ranked by the number of times your search term occurs on a particular page; whether or not that term appears on the page title (URL); how close to the top of the page your term appears in the text; whether or not your site has meta tags; your keyword frequency; and your link popularity. Not all engines use all the above criteria, but that’s basically what they’re looking at to index your site.
So we have search engines that are both powerful and widely accepted, but they’re structured in such a way that listings are not automatic. Not only that, it’s impossible to index all the pages anyway. This is why new models have been suggested.
New Search Engine Models
Researchers and theoreticians of web infrastructure have raised serious concerns about the operation of current search engines. Some believe the real solution requires a fundamental restructuring of the way search engines work. Rather than allowing a few giant web indexes to shape the Internet, they envision a more personal means of indexing web content.
David Gelernter is a Yale professor and entrepreneur. He created a system called Lifestreams that allows users to create their own search engines. As users browse content online, their computers are at work indexing the pages viewed and the surrounding material, creating a searchable index based on their browsing patterns.
Gelernter’s theory is that the emerging Internet landscape will have three interlocking components: “cyberbodies” (collections of information) that would tune in using a “viewer” and a “calling card.” Once tuned in, the right application software shows up automatically. Your whole electronic life is stored in a cyberbody, which is constantly growing.
Some viewers will resemble today’s computers, phones, or TV sets. Others will have new shapes. But they will all have the same function: tuning in cyberbodies. Walk up to any viewer (like a computer on your desk or a kiosk at the mall) and enter your calling card and password. The calling card tells the viewer how to find the cyberbody; the cyberbody itself tells the viewer what to do next.
Gelernter envisions large numbers of cyberbodies, the most important of which will be a chronological stream. For instance, you, your company, or your car, even, are all vehicles moving forward through time, each leaving a stream-shaped cyberbody (of information) behind. The vapor trail of “crystallized experience” is a growing stream in cyberspace.
In today’s systems, your computer screen is the interface; in tomorrow’s systems, you will look through the screen to a three-dimensional “information landscape” lying beyond.
So what does a stream look like, how do you browse and search one, and what can you do with it? Gelernter’s answer is his commercial software called Lifestreams. Customers can walk up to any Net-connected computer and tune into the cyberbody streams.
Sounds far out, but here’s what you can do with it if you buy it: Browse, retrieve, and manage documents, email, multimedia, scheduling, and the web in a dynamic, visual environment that works the way you do. Based on its patented Lifestreams technology, this browser-based program organizes all your information, instantly and automatically, in visual streams that reflect the way you work. There’s no need to juggle between programs to open files or between drives and directories to find them.
This sounds more like a replacement OS, but it can also be a solution for restructuring the way search engines work.
Another solution for restructuring search engines is found in the technology being developed by nano, a Silicon Alley infrastructure company that has developed a software platform for the dynamic exchange of information, products, and services across all markets and all mediums. It hopes to change the way you use the Internet.
Nano works intuitively, organizing the web in a way that makes sense, bringing you the information on the Internet that is more relevant to you. It gives you what you need when you want it rather than bury you in reams of irrelevant information.
When you download and activate nano (the beta version is free), it takes a snapshot of what’s on your screen. Whether you’re reading email, a Word document, or an Internet page, nano understands the “idea” behind whatever it is you’re reading. Nano then uses that idea to suggest material you may want to read next, related items you may want to buy, interesting things you can do, or anything else you may be interested in based on the content you’re looking at.
Whenever you’re reading something you want to know more about, activate nano. You can refine your searches by highlighting a specific word, phrase, or paragraph before activating nano, and it will return items related to the highlighted text only, rather than everything on your screen.
You can use nano for doing research, shopping, checking out stocks, following current events, looking for an interesting chat and/or community site, catching up with the latest on your favorite sports team, looking for new music, or finding what’s new on the web.
Technology companies such as nano and Lifestreams will quite possibly provide the solutions we are all looking for. The use of Application Programming Interface (API), artificial intelligence (AI), and XML will vastly improve the way we find things on the Internet or, should I say, the way marketers will find you.
But it may take a while… In 1997, I remember the prophecy of yelling refrigerators and nagging toasters on ClickZ, but have yet to see one of these Internet appliances we’ve all heard were going to be the rage.