I’m in the U.K. to spend a week or so with my wife before she up sticks and moves to Geneva, Switzerland, to start her new job. One of the things I miss most about my own job-related move to New York is the huge reference library I’ve built up in the U.K. office. I just don’t have space for my library in New York.
So while I’m back, I’m in a heightened state of research as I continue scribbling my new tome. Yesterday, I was skimming through an older information retrieval book and was reminded of Google’s description of PageRank from back in the day. It read:
- PageRank relies on the uniquely democratic nature of the Web by using its vast link structure as an indicator of an individual page’s value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B.
I guess, as I’ve been thinking for some considerable time, it’s not really democratic to instantly alienate a few hundred million Web users (back then) who simply didn’t have links to vote with. In a very crude analogy, it’s a bit like saying we should be able to vote for and rank the top TV programs, but only people with interactive (cable/satellite) are allowed to vote.
Historically, search engines have based their results around two major signals: page content and link structure. I know I’m banging the same old drum again, but billions of end users who are the devourers of this search engine data obviously deserve a say in it.
I’ve written before about how search engines glean some feedback from end users (mainly implicit data). And all the more I’ve been thinking about new signals to search engines. There must be ways to pick up some fairly strong feedback signals from a number of different sources when it comes to the all-important factor: is the end user happy with the results?
The strongest signals come from the inside out (text and linkage data), which is readily available to search engines. But it’s somewhat more difficult to track signals from the outside in. And it’s not just about picking up the signals, it’s about folding them in.
Google’s ubiquitous toolbar picks up all kinds of additional browsing data. Some in this industry think the toolbar may have some kind of influence over Google’s crawling pattern. I doubt that very much for a number of reasons (too long to go into in this column). However, bookmarking is one sure sign that someone’s happy with a search result.
I’m not saying that Google’s toolbar is spying on your bookmarks, but it’s must be possible to track repeated direct navigation to a URL discovered initially via a search. And certainly, aggregated data of this nature would suggest the popularity of specific URLs.
Of course, more explicit signals are emerging from the social tagging, bookmarking, and rating phenomenon. Here, a huge amount of useful data can be gleaned about specific Web pages.
Trust networks are a major area of research. Social search within a network is part of a broad trend of information seeking based on the knowledge of people you know and the people they know. There may be tons more information available at search engines, but this is less verifiable than answers to questions derived from a chain of trust.
Much research is taking place into methods of combining data from social networks and document reference networks (such as PageRank) to create a dual layer of trust enhanced (or socially enhanced) search-result ranking. A case of an algorithm mind-meld with the wisdom of crowds.
It’s interesting because about four years ago I was talking to a well-known information retrieval scientist about human evaluation being folded into the algorithm. We were talking about Yahoo’s early recognition that it could never scale its human-powered index with the exponential growth of the Web.
I mentioned that there had to be added value knowing that an editor had actually viewed your Web pages and indexed them. Then he explained that the hubs and authorities algorithm developed by Jon Kleinberg did just that. Effectively, with the hub sites, you had hundreds of thousands of editors (maybe many millions) picking out authority sites and working on your index.
So, perhaps, in a similar manner, as bookmarking, tagging, and rating gains more popularity and scales up, the wisdom of crowds and (the voice of the end user) will have a lot more influence in what appears in the SERPs (define).
One search engine dabbling in this is FuzzFind. It’s not quite the sophisticated machine I had been alluding to — it’s more or less just a mash-up of search engine results blended with social bookmarking sites. Still, it’s worth a peek.
It’s funny, as I started writing this column, I paused to think what the results from a search engine powered purely by the end user would be like. I’m not talking about something as untrustworthy as Wiki Search. I mean a real search engine.
And then it occurred to me that there is, in fact, a huge, global search engine with results powered by the end users.
It’s called AdWords!
It’s a slow burner, but I’m continuing with my thread on the future of search in the Search Engine Watch forums. Do come and join me.