Search, That Was Mighty Sociable

There’s an anecdote I tell at conferences when I’m speaking on linkage and connectivity data. It’s the story of how foremost computer scientist Jon Kleinberg discovered the flaw in search engines trying to rank Web pages based purely on the text on the page. Back in 1997, when Alta Vista was the search engine, he did a search for “search engine.” He was totally surprised to learn that “alta vista” didn’t appear in its own results.

He then tried an informational query for “Japanese automotive manufacturer.” He was even more astonished to observe manufacturers such as Nissan, Toyota, and Honda didn’t appear at the top of the results.

When he went back to the Alta Vista home page, he realized the words “search engine” didn’t appear anywhere on the page. Similarly, when he went to the Nissan home page, there was no sign of the phrase “Japanese automotive manufacturer” at the Toyota or Honda Web sites.

In the fascinating book, “Six Degrees: The Science of a Connected Age,” written by world-renowned physicist Duncan Watts, there’s much mention of Kleinberg and the work they collaborated on when making discoveries in the new science of a connected age. This led to Kleinberg developing the algorithm known as HITS (define), which is based on connectivity data and ranks documents on what are known as hub and authority scores. (This occurred around the same time Larry Page and Sergey Brin were developing Google’s PageRank algorithm.)

In a nutshell, Kleinberg helped improve the quality of Web search by applying social network analysis to the ranking mechanism. Instead of page quality being judged by the text the page contains, it was better judged by the overall quality of pages that link to it.

Hence the reason there’s so much emphasis on link building in the SEO (define) community. But here’s an interesting thing I started thinking about a couple of years ago. If a link is a kind of vote from one Web page author to another, as Google refers to it, how do people without Web pages vote (i.e., the haves vs. the have-nots)?

With that in mind, I spent a lot of time looking at the kind of signals that search engines get from end users who aren’t page authors. Of course, one of the first signals you could pick up is click popularity. Pages that rank at the top of the results and are most frequently clicked on demonstrate a sure sign of quality.

So search engines have improved over time by taking into account such signals as text on a page, the connectivity data surrounding a page, and a searcher’s click pattern (plus other data sources that enable them to take advantage of numerous features of all those types). And generally speaking, result relevancy at general-purpose search engines has improved enormously.

But something else has changed enormously too: the Web itself. And the highly heterogeneous data types that search engines can now examine and fold in include text and HTML documents, query logs, user profiles, vertical content, listings, different ad forms, user interactions, images, video…the list goes on.

Perhaps, though, the search landscape’s biggest change is the shift toward information seeking on social networking sites. People are increasingly using social networking sites as information-finding tools. The knowledge possessed by your friends and other people you know, which supplements the Web’s huge amount of other, less verifiable information, can provide extremely qualified answers to specific queries. Information seeking in a network of friendships is equal to information seeking via a chain of trust.

Online communities are becoming an increasingly important area of research because of the rich signals they can send to search engines. Web pages are no longer purely static. Real-time chat takes place constantly on the Web. And tagging and folksonomy (define) data arrangement, along with rating and reputation systems, are beginning to take search into a whole new era.

Kleinberg himself has shifted his research focus from a search engine’s centralized index to online communities’ large social structure.

And just last week, Google announced the launch of a new API (define) to graph social networks across the Web. This intensive research by the major search engines into the Web’s social fabric clearly indicates that we’re moving into a new form of information retrieval on the Web.

As we gradually move into this new phase, we must figure out how to ride the wave. Search’s transformation is undergoing a seismic shift: from the early days of basic text analysis, through the various phases, to today’s tapping into the collective wisdom of social networks.

Let me know where you think search is going. More to the point, let me know your opinions on what kind of role you feel the search marketing industry will play as search continues to evolve.

Mike is off this week. Today’s column ran earlier on ClickZ.

Related reading

Brand Top Level Domains