Factoring in the Human Touch at Search Engines

Back in the day, when I was researching material for the first e-book I wrote on SEO (define), hyperlink-based algorithms were the number one topic. Of course, linkage data still sends very loud signals to search engines. However, the area of research I’ve been monitoring the most lately is a very active topic in the industry: methods to improve relevancy and ranking by incorporating end user behavior information.

Relevance feedback has a history in information retrieval dating back more than 30 years. Three years ago, in a TV interview I gave to the BBC, I talked a little about the personalization factor that search engines were (and still are) so keen to incorporate to get direct feedback. But personalization is still hindered by the fact that people are very reluctant to provide explicit information about themselves online.

So it’s only natural that implicit feedback measurement, which takes advantage of end user behavior to understand user interests and preferences, is such a hot research area.

Ranking results has always been a problem for search engines. From the abundance of pages returned for each query, the engines must rank the most relevant at the top. And in this industry, we’re acutely aware of just how important it is to be in that first handful of results.

This was reinforced for me when I read the results of a study carried out by researchers analyzing user behavior within Google search results. First, researchers carried out an eye-tracking study to gain explicit information. Then, they analyzed click-through data as implicit feedback for comparison.

During the course of the experiment, they monitored such things as whether users scan the results from top to bottom; the number of abstracts (or snippets, as Google calls them) that are read before clicking; and how behavior changes if the results are artificially manipulated without users’ knowledge.

Users do apparently read from top to bottom. The abstracts (snippets) ranked at number one and two receive the most attention. Users click substantially more on the first link than the second, even though they view both snippets with equal frequency.

Around rank six or seven, both viewing behavior and the number of clicks change. Of course, abstracts receive fewer clicks. But unlike ranks two through five, snippets 6 through 10 receive more equal attention. That’s because typically these days, only the first five or six are visible without scrolling.

Once a user starts scrolling (if she starts scrolling!), rank becomes less of an influence for attention and clicks.

Let’s go back to actual ranking mechanisms and what makes one page rank higher than another.

Generally, most approaches to ranking still focus on certain key factors: similarity of a query and a page; overall quality of the page, title tag, and link anchor text; and a quality score of linkage in-degree.

In SEO circles, the focus is usually on on-page factors and linkage data, as described above. Yet little or no focus is given to the terabytes of end user data that search engines suck up every single day.

Search engine users most often perform a sequence of searches, or a “query chain,” with a similar information need. Those query chains can generate new types of preference judgements from search engine logs. As stated in another paper I read recently, if a search engine repeatedly observes the query “special collections” followed by another for “rare books,” you could deduce that Web pages relevant to the second query are also relevant to the first.

Getting systems to learn behavior rules from experience in some environment is still an expanding area of AI research. In natural language, a system may learn syntactic rules from example sentences; in vision, a system may learn to recognize some object given some example images. And in expert systems, rules may be learned from example cases. So there’s obvious interest at search engines in using machine learning methods for retrieval functions.

Recently, researchers performed a large evaluation of over 3,000 queries and more than 12 million user interactions with a major search engine, incorporating implicit feedback features directly into a trained ranking function.

The experiments show implicit user feedback greatly improves Web search performance when incorporated directly with content and link-based features.

I’ve been talking about end user click-through data and using integrated marketing to help increase query streams for specific keywords and keyword phrases for some time. And I touched on vector support machines and machine learning way back in that first e-book (and even in my very first ClickZ column).

All too often in our industry, people are happy to tell their clients after they start to sink in the SERPs: “The search engine changed its algorithm.” But what if the truth really is you did on-page stuff, you did link building, and you ranked for a little while. But now end users have voted with their clicks, and you’re not as popular as you thought you were?

I’m not just talking about click frequency. In my big interview with Matt Cutts, he pointed out how easy it could be to “insert bad data” in there. But I firmly believe that implicit end user data does have a huge effect on ranking over time.

And that’s why I believe search needs to be completely integrated into an organization’s overall marketing plan. Because the outcome of all your marketing efforts on your target audience will affect your success in search.

Nominate your favorite product or campaign for the 2006 ClickZ Marketing Excellence Awards, October 16 through close of business (EST) on October 24. Final voting begins on October 30.

Want more search information? ClickZ SEM Archives contain all our search columns, organized by topic.

Related reading

Brand Top Level Domains