It’s five years since I first referenced latent semantic indexing (LSI) ( define) and the work of Microsoft super scientist Susan Dumais in the first edition of my best practice guide to search marketing (or search engine positioning, as it was known then).
At the time, there was a whole lot of confusion and some very bad information floating around about the vector space model (developed by Dr. Gerard Salton), and exactly what term vectors are. A research paper entitled “The Term Vector Database: fast access to indexing terms for Web pages,” only seemed to add more fuel to the fire. People in the rapidly developing SEO industry openly speculated as to how this new technology would challenge and affect optimization efforts.
As I often pointed out in forums and newsletters back then, term vector theory wasn’t new at all (it predates the Web by some considerable time). I also referenced many times an interview I did with Brian Pinkerton, developer of WebCrawler, arguably the Web’s first full text retrieval search engine. Pinkerton explained to me he had applied the vector space model to WebCrawler from the very beginning. And that was back in 1994.
Latent semantic indexing has also been around for a very long time. One of the first papers I read on the subject dates back to 1990.
Recently, I received a spam message which declared:
SEMANTIC WEB VERSION II
Google is coming up with Semantic web. Are you ranking well with this latest algorithm of search engines and will you continue to rank well? Is you website LSI compliant? Search Engines like Google (who are pallbearers for technology) are already reaching out for it by adopting LSI in their ranking algorithms. We will check your website for its LSI algorithm readiness.
Google is coming up with Semantic web. Are you ranking well with this latest algorithm of search engines and will you continue to rank well?
Is you website LSI compliant?
Search Engines like Google (who are pallbearers for technology) are already reaching out for it by adopting LSI in their ranking algorithms.
We will check your website for its LSI algorithm readiness.
What a complete crock of you-know-what.
I read a newsletter promoting LSI tools and technology for your Web site. It even referred to the term vector database (which I doubt ever worked anyway!). Most of these so-called LSI tools and technology are nothing more than parlor tricks. Anyone can knock together a tool that takes a query and runs a thesaurus look-up on it.
Should you lose any sleep over LSI?
I asked my buddy and SEO expert Rand Fishkin of the popular seomoz resource for his thoughts. I referenced Dr. Edel Garcia’s recent tutorials on LSI and SVD (which he had already was aware of) and basically I asked:
Should SEOs care about LSI anyway, should we lose sleep over it?
If we should care about it, how would we go about optimizing for it?
In the first case he said:
“Care about it, absolutely. Lose sleep over it, almost certainly no. LSI is a method for determining semantic relationships and in all honesty, while I do believe it’s critical for an SEO to be informed enough to explain the concept to a client, I don’t see a lot of practical use. With the advancement in search engine algorithms over the last 2-3 years (particularly at Google & Yahoo!), SEO has shifted away from manipulating language use and placement to building a savvy marketing campaign.”
And to the second question, he said:
“I believe that one of Dr. Garcia’s primary points when examining the math behind LSI is that without access to accurate data about the search engines’ indices and the use of language therein, we’re shooting in the dark to a certain degree. He’s laid out a process in his articles on the subject that will allow for rough calculations to uncover potentially more valuable combinations of words and phrases for optimizing text for search engines. However, as Dr. Garcia notes:
‘These days we know that most current LSI models are not based on mere local weights, but on models that incorporate local, global and document normalization weights. Others incorporate entropy weights and link weights.’
I’m inclined to believe the value we get out of “local” weight calculations for terms in a document provide only the most minimal value to SEOs.
However, this could be very useful to spammers writing programs to auto-generate text designed to pull in long tail searches and serve contextual ads – even a slight improvement in 50 million documents could turn to big $$ for that crowd.”
I asked Dr Garcia for his own thoughts.
“Many SEOs are misquoting old papers and the focus of that old research. Many of these SEO “experts” don’t even know how to do basic SVD decomposition, nor do they understand the how-to steps involved in computing LSI scores. In the process they have stretched such research findings and added a few of their own myths in order to market better whatever they sell. For instance, today one can see some suggesting that to have documents “LSI friendly” one needs to stuff content with synonyms or related terms. This perception is incorrect.”
So if your SEO vendor is throwing terms such as LSI at you, you should really get them to qualify what they actually know about the subject.
Take a look at Dr Garcia’s fast-track paper (download PDF) yourself. Even if you don’t grasp any of the math and only have a half a clue of what it’s all about, don’t worry. At least by reading it, you may never understand what it is or what it does: But Garcia certainly emphasizes what it isn’t. And that little bit of knowledge will certainly help you to dispel any BS thrown at you by snake oil SEOs.
Some brands are creating great content, but it can fail to reach its full potential if SEO isn't considered as part of the strategy.
Remarketing adverts are designed to remind people who visited your website of what you sell and reinforce your brand when they’re moving around ... read more
Whether you’re happy with the EU referendum result or not, there’s no doubt that it has stirred up plenty of political debate. ... read more
If you think the lowly keyword is dead, think again. Good research can help a business position itself with the the right content to engage the audience at different points of the consumer purchase journey.