Lies, Lies, and LSI

  |  October 2, 2006   |  Comments

Should SEOs lose sleep over latent semantic indexing?

It's five years since I first referenced latent semantic indexing (LSI) ( define) and the work of Microsoft super scientist Susan Dumais in the first edition of my best practice guide to search marketing (or search engine positioning, as it was known then).

At the time, there was a whole lot of confusion and some very bad information floating around about the vector space model (developed by Dr. Gerard Salton), and exactly what term vectors are. A research paper entitled "The Term Vector Database: fast access to indexing terms for Web pages," only seemed to add more fuel to the fire. People in the rapidly developing SEO industry openly speculated as to how this new technology would challenge and affect optimization efforts.

As I often pointed out in forums and newsletters back then, term vector theory wasn't new at all (it predates the Web by some considerable time). I also referenced many times an interview I did with Brian Pinkerton, developer of WebCrawler, arguably the Web's first full text retrieval search engine. Pinkerton explained to me he had applied the vector space model to WebCrawler from the very beginning. And that was back in 1994.

Latent semantic indexing has also been around for a very long time. One of the first papers I read on the subject dates back to 1990.

Recently, I received a spam message which declared:

SEMANTIC WEB VERSION II

Google is coming up with Semantic web. Are you ranking well with this latest algorithm of search engines and will you continue to rank well?

Is you website LSI compliant?

Search Engines like Google (who are pallbearers for technology) are already reaching out for it by adopting LSI in their ranking algorithms.

We will check your website for its LSI algorithm readiness.

What a complete crock of you-know-what.

I read a newsletter promoting LSI tools and technology for your Web site. It even referred to the term vector database (which I doubt ever worked anyway!). Most of these so-called LSI tools and technology are nothing more than parlor tricks. Anyone can knock together a tool that takes a query and runs a thesaurus look-up on it.

Should you lose any sleep over LSI?

I asked my buddy and SEO expert Rand Fishkin of the popular seomoz resource for his thoughts. I referenced Dr. Edel Garcia's recent tutorials on LSI and SVD (which he had already was aware of) and basically I asked:

Should SEOs care about LSI anyway, should we lose sleep over it?

If we should care about it, how would we go about optimizing for it?

In the first case he said:

"Care about it, absolutely. Lose sleep over it, almost certainly no. LSI is a method for determining semantic relationships and in all honesty, while I do believe it's critical for an SEO to be informed enough to explain the concept to a client, I don't see a lot of practical use. With the advancement in search engine algorithms over the last 2-3 years (particularly at Google & Yahoo!), SEO has shifted away from manipulating language use and placement to building a savvy marketing campaign."

And to the second question, he said:

"I believe that one of Dr. Garcia's primary points when examining the math behind LSI is that without access to accurate data about the search engines' indices and the use of language therein, we're shooting in the dark to a certain degree. He's laid out a process in his articles on the subject that will allow for rough calculations to uncover potentially more valuable combinations of words and phrases for optimizing text for search engines. However, as Dr. Garcia notes:

'These days we know that most current LSI models are not based on mere local weights, but on models that incorporate local, global and document normalization weights. Others incorporate entropy weights and link weights.'

I'm inclined to believe the value we get out of "local" weight calculations for terms in a document provide only the most minimal value to SEOs.

However, this could be very useful to spammers writing programs to auto-generate text designed to pull in long tail searches and serve contextual ads - even a slight improvement in 50 million documents could turn to big $$ for that crowd."

I asked Dr Garcia for his own thoughts.

"Many SEOs are misquoting old papers and the focus of that old research. Many of these SEO "experts" don't even know how to do basic SVD decomposition, nor do they understand the how-to steps involved in computing LSI scores. In the process they have stretched such research findings and added a few of their own myths in order to market better whatever they sell. For instance, today one can see some suggesting that to have documents "LSI friendly" one needs to stuff content with synonyms or related terms. This perception is incorrect."

So if your SEO vendor is throwing terms such as LSI at you, you should really get them to qualify what they actually know about the subject.

Take a look at Dr Garcia's fast-track paper (download PDF) yourself. Even if you don't grasp any of the math and only have a half a clue of what it's all about, don't worry. At least by reading it, you may never understand what it is or what it does: But Garcia certainly emphasizes what it isn't. And that little bit of knowledge will certainly help you to dispel any BS thrown at you by snake oil SEOs.

ClickZ Live Chicago Join the Industry's Leading eCommerce & Direct Marketing Experts in Chicago
ClickZ Live Chicago (Nov 3-6) will deliver over 50 sessions across 4 days and 10 individual tracks, including Data-Driven Marketing, Social, Mobile, Display, Search and Email. Check out the full agenda and register by Friday, Sept 5 to take advantage of Super Saver Rates!

ABOUT THE AUTHOR

Mike Grehan

Mike Grehan is Publisher of Search Engine Watch and ClickZ and Producer of the SES international conference series. He is the current president of global trade association SEMPO, having been elected to the board of directors in 2010.

Formerly, Mike worked as a search marketing consultant with a number of international agencies, handling such global clients as SAP and Motorola. Recognized as a leading search marketing expert, Mike came online in 1995 and is author of numerous books and white papers on the subject. He is currently in the process of writing his new book "From Search To Social: Marketing To The Connected Consumer" to be published by Wiley in 2013.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Search newsletter delivered to you. Subscribe today!

COMMENTS

UPCOMING EVENTS

Featured White Papers

IBM: Social Analytics - The Science Behind Social Media Marketing

IBM Social Analytics: The Science Behind Social Media Marketing
80% of internet users say they prefer to connect with brands via Facebook. 65% of social media users say they use it to learn more about brands, products and services. Learn about how to find more about customers' attitudes, preferences and buying habits from what they say on social media channels.

Marin Software: The Multiplier Effect of Integrating Search & Social Advertising

The Multiplier Effect of Integrating Search & Social Advertising
Latest research reveals 68% higher revenue per conversion for marketers who integrate their search & social advertising. In addition to the research results, this whitepaper also outlines 5 strategies and 15 tactics you can use to better integrate your search and social campaigns.

Resources

Jobs

    • Senior Director US Agency Ad Sales
      Senior Director US Agency Ad Sales (Expedia, Inc.) - ChicagoJob Title:  Senior Director US Agency Ad Sales   Position Overview: The Senior...
    • Senior Director US Agency Ad Sales
      Senior Director US Agency Ad Sales (Expedia, Inc.) - New YorkPosition Overview: The Senior Director US Agency Ad Sales is responsible for managing...
    • Digital Marketing Analyst
      Digital Marketing Analyst (GovLoop) - Washington D.C.Are you passionate about audience acquisition? Love effective copy and amazingly effective...