Lies, Lies, and LSI

  |  October 2, 2006   |  Comments

Should SEOs lose sleep over latent semantic indexing?

It's five years since I first referenced latent semantic indexing (LSI) ( define) and the work of Microsoft super scientist Susan Dumais in the first edition of my best practice guide to search marketing (or search engine positioning, as it was known then).

At the time, there was a whole lot of confusion and some very bad information floating around about the vector space model (developed by Dr. Gerard Salton), and exactly what term vectors are. A research paper entitled "The Term Vector Database: fast access to indexing terms for Web pages," only seemed to add more fuel to the fire. People in the rapidly developing SEO industry openly speculated as to how this new technology would challenge and affect optimization efforts.

As I often pointed out in forums and newsletters back then, term vector theory wasn't new at all (it predates the Web by some considerable time). I also referenced many times an interview I did with Brian Pinkerton, developer of WebCrawler, arguably the Web's first full text retrieval search engine. Pinkerton explained to me he had applied the vector space model to WebCrawler from the very beginning. And that was back in 1994.

Latent semantic indexing has also been around for a very long time. One of the first papers I read on the subject dates back to 1990.

Recently, I received a spam message which declared:


Google is coming up with Semantic web. Are you ranking well with this latest algorithm of search engines and will you continue to rank well?

Is you website LSI compliant?

Search Engines like Google (who are pallbearers for technology) are already reaching out for it by adopting LSI in their ranking algorithms.

We will check your website for its LSI algorithm readiness.

What a complete crock of you-know-what.

I read a newsletter promoting LSI tools and technology for your Web site. It even referred to the term vector database (which I doubt ever worked anyway!). Most of these so-called LSI tools and technology are nothing more than parlor tricks. Anyone can knock together a tool that takes a query and runs a thesaurus look-up on it.

Should you lose any sleep over LSI?

I asked my buddy and SEO expert Rand Fishkin of the popular seomoz resource for his thoughts. I referenced Dr. Edel Garcia's recent tutorials on LSI and SVD (which he had already was aware of) and basically I asked:

Should SEOs care about LSI anyway, should we lose sleep over it?

If we should care about it, how would we go about optimizing for it?

In the first case he said:

"Care about it, absolutely. Lose sleep over it, almost certainly no. LSI is a method for determining semantic relationships and in all honesty, while I do believe it's critical for an SEO to be informed enough to explain the concept to a client, I don't see a lot of practical use. With the advancement in search engine algorithms over the last 2-3 years (particularly at Google & Yahoo!), SEO has shifted away from manipulating language use and placement to building a savvy marketing campaign."

And to the second question, he said:

"I believe that one of Dr. Garcia's primary points when examining the math behind LSI is that without access to accurate data about the search engines' indices and the use of language therein, we're shooting in the dark to a certain degree. He's laid out a process in his articles on the subject that will allow for rough calculations to uncover potentially more valuable combinations of words and phrases for optimizing text for search engines. However, as Dr. Garcia notes:

'These days we know that most current LSI models are not based on mere local weights, but on models that incorporate local, global and document normalization weights. Others incorporate entropy weights and link weights.'

I'm inclined to believe the value we get out of "local" weight calculations for terms in a document provide only the most minimal value to SEOs.

However, this could be very useful to spammers writing programs to auto-generate text designed to pull in long tail searches and serve contextual ads - even a slight improvement in 50 million documents could turn to big $$ for that crowd."

I asked Dr Garcia for his own thoughts.

"Many SEOs are misquoting old papers and the focus of that old research. Many of these SEO "experts" don't even know how to do basic SVD decomposition, nor do they understand the how-to steps involved in computing LSI scores. In the process they have stretched such research findings and added a few of their own myths in order to market better whatever they sell. For instance, today one can see some suggesting that to have documents "LSI friendly" one needs to stuff content with synonyms or related terms. This perception is incorrect."

So if your SEO vendor is throwing terms such as LSI at you, you should really get them to qualify what they actually know about the subject.

Take a look at Dr Garcia's fast-track paper (download PDF) yourself. Even if you don't grasp any of the math and only have a half a clue of what it's all about, don't worry. At least by reading it, you may never understand what it is or what it does: But Garcia certainly emphasizes what it isn't. And that little bit of knowledge will certainly help you to dispel any BS thrown at you by snake oil SEOs.

ClickZ Live New York Want to learn more?
Attend ClickZ Live New York March 30 - April 1. With over 15 years' experience delivering industry-leading events, ClickZ Live brings together over 60 expert speakers to offer an action-packed, educationally-focused agenda covering all aspects of digital marketing. Register today!


Mike Grehan

Mike Grehan is currently chief marketing officer and managing director at Acronym, where he is responsible for directing thought leadership programs and cross-platform marketing initiatives, as well as developing new, innovative content marketing campaigns.

Prior to joining Acronym, Grehan was group publishing director at Incisive Media, publisher of Search Engine Watch and ClickZ, and producer of the SES international conference series. Previously, he worked as a search marketing consultant with a number of international agencies handling global clients such as SAP and Motorola. Recognized as a leading search marketing expert, Grehan came online in 1995 and is the author of numerous books and white papers on the subject and is currently in the process of writing his new book From Search to Social: Marketing to the Connected Consumer to be published by Wiley later in 2014.

In March 2010 he was elected to SEMPO's board of directors and after a year as vice president he then served two years as president and is now the current chairman.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Search newsletter delivered to you. Subscribe today!




Featured White Papers

A Buyer's Guide to Affiliate Management Software

A Buyer's Guide to Affiliate Management Software
Manage your performance marketing with the right solution. Choose a platform that will mutually empower advertisers and media partners!

Google My Business Listings Demystified

Google My Business Listings Demystified
To help brands control how they appear online, Google has developed a new offering: Google My Business Locations. This whitepaper helps marketers understand how to use this powerful new tool.



    • Website Optimizer - SEO, CRO, Analytics
      Website Optimizer - SEO, CRO, Analytics (Marcel Digital) - ChicagoMarcel Digital, an award winning interactive marketing agency established in 2003...
    • Director of Marketing
      Director of Marketing (Patron Technology) - New YorkDirector of Marketing We are seeking a Director of Marketing to manage and build our marketing...
    • Senior Interactive Producer
      Senior Interactive Producer (Ready Set Rocket) - New YorkWhat You'll Do As a member of our team, the Senior Producer reports directly to our...