Beyond Words on a Page and Linkage Data

At ad:tech New York last week, I spent a lot of time with potential consulting clients. I’m doing some independent work while I decide what I really want to do when this industry grows up. Most of the people I talked to had SEO (define) firms on board already and were looking for new vendors.

It’s interesting to listen to clients explain the SEO knowledge they’ve gained from their vendors. They talk mainly about keywords, especially the keyword density analysis (KDA) performed by their SEO firms, and linkage data, predominantly still attached to the term “PageRank.”

It’s more interesting to see their jaws drop when I explain KDA is nothing more than anecdotal SEO and is chitchat about as scientifically advanced in its application as boiling an egg. Of course, I then point them in the direction of this paper.

Their concerns over the little green PR meter on the Google toolbar always brings a smile to my face. But for someone who’s lived for months (perhaps years) under the idea that the toolbar data is a success indicator, there’s not a lot to smile about. I then point them in the direction of this paper.

Because I personally know the authors (leading scientific researchers) of those papers, I’ve had time to discuss their work and thoughts relating to SEO efforts. And it’s true, you can achieve a lot by practicing good SEO techniques for getting indexed and even decently ranked. But regardless of your opinions of KDA and linkage data, there’s a lot more to ranking and re-ranking documents at search engines that must be considered.

Most end users find it difficult to formulate queries that are well designed for retrieval. Some simply reformulate their queries if they don’t see anything that appears to be relevant enough; they perform “query chains.”

This provides search engines with what’s called “relevance feedback.” The user reformulates and refines the query and adds new terms. This means existing terms in the query can be re-weighted based on the feedback. And that has nothing whatsoever to do with KDA.

I’ve talked about the importance of click-through data at many conferences and seminars as well as mentioned it here a number of times. But I’m not just talking about the number of clicks or frequency.

Search engine click-through data comes in triplets: the query; the presented ranking; and the links the end user clicked on. Users don’t click on links at random. There’s usually a (somewhat) informed choice based on abstracts. Search engines can factor in the informed decisions among the abstracts the end user observes and the clicks that reflect relevance judgements.

However, the data is biased in at least two ways. There’s a trust bias in which higher-ranking links are clicked more often, even if the abstracts are less relevant. Then there’s a quality bias. The user’s clicking decision is influenced not only by the clicked link’s relevance but also by the overall quality of the other abstracts in the ranking.

Although click-through data is typically noisy, the clicks convey a lot of information. By mining log files at search engines, a support vector machine (learning machine) algorithm can improve retrieval substantially.

There’s a lot of historical and statistical data search engines have access to and continue to make use of. And the more I read and understand about how end user data is folded into the ranking mechanism, the more I have to consider beyond a page’s text and links when it comes to SEO.

Because we can glean so little about end user behavior, it’s even more important to get into the top five or six results. If users aren’t prepared to scroll and are unlikely to click through to a second results page, we must maximize visibility efforts. We must also set realistic goals with clients about what our success ratio is likely to be.

And if we don’t make the top two or three, we must think creatively about how to optimize pages so we get the very best description within the abstract the search engine is likely to show.

The effect of a truly integrated marketing approach has on search will increase the query stream on keywords and keyword phrases around our products and services.

One thing is for sure. When we see strange changes taking place around certain results where linkage data in particular hasn’t changed, it really has to be end user data that’s responsible.

