Should SEOs lose sleep over latent semantic indexing?
It's five years since I first referenced latent semantic indexing (LSI) ( define) and the work of Microsoft super scientist Susan Dumais in the first edition of my best practice guide to search marketing (or search engine positioning, as it was known then).
At the time, there was a whole lot of confusion and some very bad information floating around about the vector space model (developed by Dr. Gerard Salton), and exactly what term vectors are. A research paper entitled "The Term Vector Database: fast access to indexing terms for Web pages," only seemed to add more fuel to the fire. People in the rapidly developing SEO industry openly speculated as to how this new technology would challenge and affect optimization efforts.
As I often pointed out in forums and newsletters back then, term vector theory wasn't new at all (it predates the Web by some considerable time). I also referenced many times an interview I did with Brian Pinkerton, developer of WebCrawler, arguably the Web's first full text retrieval search engine. Pinkerton explained to me he had applied the vector space model to WebCrawler from the very beginning. And that was back in 1994.
Latent semantic indexing has also been around for a very long time. One of the first papers I read on the subject dates back to 1990.
Recently, I received a spam message which declared:SEMANTIC WEB VERSION II
Google is coming up with Semantic web. Are you ranking well with this latest algorithm of search engines and will you continue to rank well?
Is you website LSI compliant?
Search Engines like Google (who are pallbearers for technology) are already reaching out for it by adopting LSI in their ranking algorithms.
We will check your website for its LSI algorithm readiness.
What a complete crock of you-know-what.
I read a newsletter promoting LSI tools and technology for your Web site. It even referred to the term vector database (which I doubt ever worked anyway!). Most of these so-called LSI tools and technology are nothing more than parlor tricks. Anyone can knock together a tool that takes a query and runs a thesaurus look-up on it.
Should you lose any sleep over LSI?
I asked my buddy and SEO expert Rand Fishkin of the popular seomoz resource for his thoughts. I referenced Dr. Edel Garcia's recent tutorials on LSI and SVD (which he had already was aware of) and basically I asked:
Should SEOs care about LSI anyway, should we lose sleep over it?
If we should care about it, how would we go about optimizing for it?
In the first case he said:
"Care about it, absolutely. Lose sleep over it, almost certainly no. LSI is a method for determining semantic relationships and in all honesty, while I do believe it's critical for an SEO to be informed enough to explain the concept to a client, I don't see a lot of practical use. With the advancement in search engine algorithms over the last 2-3 years (particularly at Google & Yahoo!), SEO has shifted away from manipulating language use and placement to building a savvy marketing campaign."
And to the second question, he said:
"I believe that one of Dr. Garcia's primary points when examining the math behind LSI is that without access to accurate data about the search engines' indices and the use of language therein, we're shooting in the dark to a certain degree. He's laid out a process in his articles on the subject that will allow for rough calculations to uncover potentially more valuable combinations of words and phrases for optimizing text for search engines. However, as Dr. Garcia notes:
'These days we know that most current LSI models are not based on mere local weights, but on models that incorporate local, global and document normalization weights. Others incorporate entropy weights and link weights.'
I'm inclined to believe the value we get out of "local" weight calculations for terms in a document provide only the most minimal value to SEOs.
However, this could be very useful to spammers writing programs to auto-generate text designed to pull in long tail searches and serve contextual ads - even a slight improvement in 50 million documents could turn to big $$ for that crowd."
I asked Dr Garcia for his own thoughts.
"Many SEOs are misquoting old papers and the focus of that old research. Many of these SEO "experts" don't even know how to do basic SVD decomposition, nor do they understand the how-to steps involved in computing LSI scores. In the process they have stretched such research findings and added a few of their own myths in order to market better whatever they sell. For instance, today one can see some suggesting that to have documents "LSI friendly" one needs to stuff content with synonyms or related terms. This perception is incorrect."
So if your SEO vendor is throwing terms such as LSI at you, you should really get them to qualify what they actually know about the subject.
Take a look at Dr Garcia's fast-track paper (download PDF) yourself. Even if you don't grasp any of the math and only have a half a clue of what it's all about, don't worry. At least by reading it, you may never understand what it is or what it does: But Garcia certainly emphasizes what it isn't. And that little bit of knowledge will certainly help you to dispel any BS thrown at you by snake oil SEOs.
Join the Industry's Leading eCommerce & Direct Marketing Experts in Chicago
ClickZ Live Chicago (Nov 3-6) will deliver over 50 sessions across 4 days and 10 individual tracks, including Data-Driven Marketing, Social, Mobile, Display, Search and Email. Check out the full agenda and register by Friday, Oct 3 to take advantage of Early Bird Rates!
Mike Grehan is currently CMO & managing director at Acronym where he is responsible for directing thought leadership programs and cross platform marketing initiatives, as well as developing new, innovative content marketing campaigns.
Prior to joining Acronym, Grehan was global VP, Content, at Incisive Media, publisher of Search Engine Watch and ClickZ, and producer of the SES international conference series. Previously, he worked as a search marketing consultant with a number of international agencies handling global clients such as SAP and Motorola. Recognized as a leading search marketing expert, Grehan came online in 1995 and is the author of numerous books and white papers on the subject and is currently in the process of writing his new book “From Search To Social: Marketing To The Connected Consumer” to be published by Wiley later in 2014.
In March 2010 he was elected to SEMPO’s board of directors and after a year as VP he then served two years as president and is now the current chairman.
IBM Social Analytics: The Science Behind Social Media Marketing
80% of internet users say they prefer to connect with brands via Facebook. 65% of social media users say they use it to learn more about brands, products and services. Learn about how to find more about customers' attitudes, preferences and buying habits from what they say on social media channels.
An Introduction to Marketing Attribution: Selecting the Right Model for Search, Display & Social Advertising
If you're considering implementing a marketing attribution model to measure and optimize your programs, this paper is a great introduction. It also includes real-life tips from marketers who have successfully implemented attribution in their organizations.
September 17, 2014
September 23, 2014
September 30, 2014
1:00pm ET/10:00am PT